Looking for Expert NLP/ML Engineer for Language Translation Model Training (Indic Languages)
Project Description:I am looking to hire an experienced NLP/ML engineer to train high-quality machine translationmodels for Indic languages. The goal is to develop single language-pair models, such as:● English → Telugu● English → Hindi(and additional language pairs, if needed)You may choose the most suitable model architecture based on your expertise (e.g., mBART,mT5, NLLB fine-tuning, Transformer variants, etc.), as long as the final models deliver strong translation quality. Dataset:● You can use the AI4Bharat datasets including:● Samanantar● BPCC● Other open Indic parallel corporaScope of Work:The freelancer will be responsible for:1.Data Handling● Cleaning, filtering, and preprocessing datasetsSentence alignment (if needed)● Tokenization and vocabulary preparation (SentencePiece/BPE/etc.)2. Model Training● Selecting an appropriate model architecture● Training single language-pair translation models● Implementing best practices for training efficiency (FP16, gradient accumulation, etc.)● Hyperparameter tuningCheckpoint management and monitoring3. Evaluation● Compute BLEU, SacreBLEU, and other relevant metrics● Provide side-by-side qualitative translation samples● Benchmarking against baseline models4.Delivery● Final trained model weights● Inference scripts (Python) for quick testing● Instructions for running and continuing training● Documentation of preprocessing and training pipeline● Optional: Dockerfile or virtual environment setupRequirements:The ideal candidate should have:● Strong experience in NLP, Transformers, and neural MT models● Prior work with Indic languages (big plus)● Experience with training libraries such as PyTorch, Hugging Face Transformers, Fairseq, OpenNMT, or similar● Ability to handle large-scale training and dataset preprocessing● Familiarity with SentencePiece, tokenization strategies, and MT evaluation metrics● Ability to deliver clean, well-documented codeAdditional Notes:● Compute resources can be discussed (I can provide compute, or you can use yours).● More language pairs may be added later as separate follow-up projects. ● Quality of translation is the highest priority. Apply tot his job