Our paper "SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages" is accepted to EMNLP 2022

TL;DR We introduce SMaLL-100, a distilled version of the M2M100 (12B) model, a massively multilingual machine translation model covering 100 languages.

Link for the paper : https://arxiv.org/abs/2210.11621.


Neural Machine Translation (NMT) systems are usually trained on datasets consisting of millions of parallel sentences, thus still performing poorly on low-resource languages, i.e., languages without a large amount of training data. Over the past few years, previous work has proposed several approaches to improve the quality of translations in low-resource languages, e.g., Multilingual Neural Machine Translation (MNMT) models, back-translation, and unsupervised machine translation.

Massively MNMT models are particularly interesting for low-resource languages as they benefit the most from knowledge transfer from related languages. However, it is also seen that curse of multilinguality hurts the performance of high-resource languages. So, previous work attempted to increase the model size to maintain the translation performance in both high and low-resource languages. This makes the use of these massively MNMT models challenging in real-world resource-constrained environments. To overcome this problem, we propose SMaLL-100, a Shallow Multilingual Machine Translation Model for Low-Resource Languages covering 100 languages, which is a distilled alternative of M2M-100 (12B), the most recent and biggest available multilingual NMT model.

In this paper, we focus on very-low and low-resource language pairs as there is no reasonable-size universal model that achieves acceptable performance over a great number of low-resource languages. We do so by training SMaLL-100 on a perfectly balanced dataset. While this leads to lower performance on the high-resource languages, we claim that this loss is easily recoverable through further fine-tuning. We evaluate SMaLL-100 on different low-resource benchmarks, e.g., FLORES-101, Tatoeba, and TICO-19.

Our contributions are as follows:

  1. We propose SMaLL-100, a shallow multilingual NMT model, focusing on low-resource language pairs.
  2. We evaluate SMaLL-100 on several low-resource NMT benchmarks.
  3. We show that our model significantly outperforms previous multilingual models of comparable size while being faster at inference. Additionally, it achieves comparable results with M2M-100 (1.2B) model, with 4.3x faster inference and a 3.6x smaller size.
  4. While SMaLL-100 reaches 87.2% performance of the 12B teacher model, we show that this gap can be closed with a few fine-tuning steps both for low and high-resource languages.
Christophe Gravier
Christophe Gravier
Professor of Computer Science, Head of Télécom Saint-Etienne

Scientific Coordinator of the Diké Project