Publications

BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition
Quality at a glance: An audit of web-crawled multilingual datasets
Quantity vs. Quality of Monolingual Source Data in Automatic Text Translation: Can It Be Too Little If It Is Too Good?
The African Stopwords Project: Curating Stopwords for African Languages
Extending General Sentiment Lexicon to Specific Domains in (Semi-)Automatic Manner
MasakhaNER: Named Entity Recognition for African Languages
Abstract We take a step towards addressing the under- representation of the African continent in NLP research by bringing together …
A Survey on Machine Learning Techniques in Movie Revenue Prediction