Installation & requirements

We use fairseq for developing our machine translation system. fairseq (Ott et al., 2019) is a sequence modelling toolkit that allows researchers and developers to train custom models for translation, among other tasks.

You need to have the following installed:

  • PyTorch version >= 1.5.0

  • Python version >= 3.6

  • For training new models, you’ll also need an NVIDIA GPU and NCCL

  • To install fairseq and develop locally:

    git clone https://github.com/pytorch/fairseq
    cd fairseq
    pip install --editable ./
    
    # on MacOS:
    # CFLAGS="-stdlib=libc++" pip install --editable ./
    
    # to install the latest stable release (0.10.x)
    # pip install fairseq
    
  • For faster training install NVIDIA’s apex library:

    git clone https://github.com/NVIDIA/apex
    cd apex
    pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
      --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
      --global-option="--fast_multihead_attn" ./
    
  • We also need to download mosesdecoder and subword_nmt from Github:

    git clone https://github.com/moses-smt/mosesdecoder
    git clone https://github.com/rsennrich/subword-nmt
    
  • For large datasets install PyArrow: pip install pyarrow

  • If you use Docker make sure to increase the shared memory size either with –ipc=host or –shm-size as command line options to nvidia-docker run .

The full documentation contains instructions for getting started, training new models and extending fairseq with new model types and tasks.