Installation & requirements
We use fairseq for developing our machine translation system. fairseq (Ott et al., 2019) is a sequence modelling toolkit that allows researchers and developers to train custom models for translation, among other tasks.
You need to have the following installed:
PyTorch version >= 1.5.0
Python version >= 3.6
For training new models, you’ll also need an NVIDIA GPU and NCCL
To install fairseq and develop locally:
git clone https://github.com/pytorch/fairseq cd fairseq pip install --editable ./ # on MacOS: # CFLAGS="-stdlib=libc++" pip install --editable ./ # to install the latest stable release (0.10.x) # pip install fairseq
For faster training install NVIDIA’s apex library:
git clone https://github.com/NVIDIA/apex cd apex pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \ --global-option="--deprecated_fused_adam" --global-option="--xentropy" \ --global-option="--fast_multihead_attn" ./
We also need to download mosesdecoder and subword_nmt from Github:
git clone https://github.com/moses-smt/mosesdecoder git clone https://github.com/rsennrich/subword-nmt
For large datasets install PyArrow: pip install pyarrow
If you use Docker make sure to increase the shared memory size either with –ipc=host or –shm-size as command line options to nvidia-docker run .
The full documentation contains instructions for getting started, training new models and extending fairseq with new model types and tasks.