Installation & requirements
===========================

We use fairseq for developing our machine translation system. fairseq (Ott et al., 2019) is a sequence modelling toolkit that allows researchers and developers to train custom models for translation, among other tasks.

You need to have the following installed:

* PyTorch version >= 1.5.0
* Python version >= 3.6
* For training new models, you'll also need an NVIDIA GPU and `NCCL <https://github.com/NVIDIA/nccl>`_
* To install fairseq and develop locally:

   .. code-block:: bash

      git clone https://github.com/pytorch/fairseq
      cd fairseq
      pip install --editable ./

      # on MacOS:
      # CFLAGS="-stdlib=libc++" pip install --editable ./

      # to install the latest stable release (0.10.x)
      # pip install fairseq

* For faster training install NVIDIA's `apex <https://github.com/NVIDIA/apex>`_ library:

   .. code-block:: bash

      git clone https://github.com/NVIDIA/apex
      cd apex
      pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
        --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
        --global-option="--fast_multihead_attn" ./

* We also need to download mosesdecoder and subword_nmt from Github:

  .. code-block:: bash

     git clone https://github.com/moses-smt/mosesdecoder
     git clone https://github.com/rsennrich/subword-nmt


* For large datasets install PyArrow: pip install pyarrow
* If you use Docker make sure to increase the shared memory size either with --ipc=host or --shm-size as command line options to nvidia-docker run .

The `full documentation <https://fairseq.readthedocs.io/>`_ contains instructions for getting started, training new models and extending fairseq with new model types and tasks.