Fairseq is a sequence-to-sequence modelling toolkit by Facebook AI Research that allows researchers and developers to train custom models for translation, summarization, language modeling and other NLP tasks. For users of PyTorch, this is a good choice. It provides reference implementations of various sequence-to-sequence models; supports distributed training across multiple GPUs and machines; is very extensible; and has a bunch of pretrained models, including RoBERTa which is an optimization on top of BERT.