Wav2Vec 2.0 is a self-supervised learning framework for speech recognition. With this framework the model is trained in two phases. First, it begins in self-supervised mode using unlabeled data and tries to achieve the best possible speech representation. Then it uses supervised fine-tuning, during which labeled data teaches the model to predict particular words or phonemes. We've used Wav2Vec and find its approach quite powerful for building automatic speech recognition models for regional languages with limited availability of labeled data.