Installation & requirements =========================== We use fairseq for developing our machine translation system. fairseq (Ott et al., 2019) is a sequence modelling toolkit that allows researchers and developers to train custom models for translation, among other tasks. You need to have the following installed: * PyTorch version >= 1.5.0 * Python version >= 3.6 * For training new models, you'll also need an NVIDIA GPU and `NCCL `_ * To install fairseq and develop locally: .. code-block:: bash git clone https://github.com/pytorch/fairseq cd fairseq pip install --editable ./ # on MacOS: # CFLAGS="-stdlib=libc++" pip install --editable ./ # to install the latest stable release (0.10.x) # pip install fairseq * For faster training install NVIDIA's `apex `_ library: .. code-block:: bash git clone https://github.com/NVIDIA/apex cd apex pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \ --global-option="--deprecated_fused_adam" --global-option="--xentropy" \ --global-option="--fast_multihead_attn" ./ * We also need to download mosesdecoder and subword_nmt from Github: .. code-block:: bash git clone https://github.com/moses-smt/mosesdecoder git clone https://github.com/rsennrich/subword-nmt * For large datasets install PyArrow: pip install pyarrow * If you use Docker make sure to increase the shared memory size either with --ipc=host or --shm-size as command line options to nvidia-docker run . The `full documentation `_ contains instructions for getting started, training new models and extending fairseq with new model types and tasks.