Installation & requirements

We use fairseq for developing our machine translation system. fairseq (Ott et al., 2019) is a sequence modelling toolkit that allows researchers and developers to train custom models for translation, among other tasks.

You need to have the following installed:

PyTorch version >= 1.5.0
Python version >= 3.6
For training new models, you’ll also need an NVIDIA GPU and NCCL

To install fairseq and develop locally:

git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

# on MacOS:
# CFLAGS="-stdlib=libc++" pip install --editable ./

# to install the latest stable release (0.10.x)
# pip install fairseq

For faster training install NVIDIA’s apex library:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

We also need to download mosesdecoder and subword_nmt from Github:

git clone https://github.com/moses-smt/mosesdecoder
git clone https://github.com/rsennrich/subword-nmt

For large datasets install PyArrow: pip install pyarrow
If you use Docker make sure to increase the shared memory size either with –ipc=host or –shm-size as command line options to nvidia-docker run .

The full documentation contains instructions for getting started, training new models and extending fairseq with new model types and tasks.