Installation & requirements
We use NLLB models for training and inference, with Hugging Face Transformers as the primary framework.
You need to have the following installed:
Python version >= 3.10
PyTorch with CUDA support (recommended for training)
For training new models, you’ll also need an NVIDIA GPU and NCCL
Install core NLLB dependencies:
pip install torch transformers datasets evaluate sentencepiece sacrebleu accelerate
The NLLB checkpoints used in this project include:
facebook/nllb-200-distilled-600Mfacebook/nllb-200-1.3B(optional larger variant)
Optional: install fairseq (used in this project for preprocessing/binarization compatibility):
pip install fairseq
Optional preprocessing tools:
git clone https://github.com/moses-smt/mosesdecoder git clone https://github.com/rsennrich/subword-nmt
Also recommended:
For large datasets:
pip install pyarrowIf you use Docker, increase shared memory size (
--ipc=hostor--shm-size) for stable training
Quick verification:
python -c "from transformers import AutoTokenizer; t=AutoTokenizer.from_pretrained('facebook/nllb-200-distilled-600M'); print('NLLB tokenizer OK')"