Since we last talked about BERT (Bidirectional Encoder Representations from Transformers) in the Radar, our teams have successfully used it in a few natural language processing (NLP) projects. In one of our engagements, we observed significant improvements when we switched from the default BERT tokenizer to a domain-trained word-piece tokenizer for queries that contain nouns like brand names or dimensions. Although NLP has several new transformer models, BERT is well understood with good documentation and a vibrant community, and we continue to find it effective in an enterprise NLP context.
BERT stands for Bidirectional Encoder Representations from Transformers; it's a new method of pretraining language representations which was published by researchers at Google in October 2018. BERT has significantly altered the natural language processing (NLP) landscape by obtaining state-of-the-art results on a wide array of NLP tasks. Based on Transformer architecture, it learns from both the left and right side of a token's context during training. Google has also released pretrained general-purpose BERT models that have been trained on a large corpus of unlabelled text including Wikipedia. Developers can use and fine-tune these pre-trained models on their task-specific data and achieve great results. We talked about transfer learning for NLP in our April 2019 edition of the Radar; BERT and its successors continue to make transfer learning for NLP a very exciting field with significant reduction in effort for users dealing with text classification.