Shamsuddeen Hassan Muhammad

Imperial College London. Google DeepMind Academic Fellow. Co-founder of HausaNLP. Founder of Arewa Data Science Academy.

prof_pic.jpg

I am an Advanced Research Fellow and a Google DeepMind Academic Fellow at Imperial College London. I received my PhD from the University of Porto, Portugal, under the supervision of Professor Pavel Brazdil and Professor Alipio Jorge. Prior to that, I earned an MS in Computer Science from the University of Manchester, UK, and a BSc in Computer Science from Bayero University, Kano, Nigeria. I also serve as a faculty member at the Faculty of Computing, Bayero University, Kano-Nigeria.

I am deeply passionate about diversity and inclusion. To further this cause, I co-founded the HausaNLP research group, which aims to advance research and development in Hausa language, one of the most widely spoken languages in Africa. I also founded the Arewa Data Science Academy, which aims to democratize data science and AI education by providing free data science and machine learning training to underserved students in Nigeria.ß

news

latest posts

selected publications

  1. AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages
    Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, and 24 more authors
    In , 2025
  2. AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages
    \textbfShamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ayele, and 24 more authors
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023
  3. IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
    David Ifeoluwa Adelani, Jessica Ojo, Israel Abebe Azime, and 23 more authors
    ArXiv, Dec 2024
  4. Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages
    Edward Bayes, Israel Abebe Azime, Jesujoba Oluwadara Alabi, and 11 more authors
    ArXiv, Dec 2024
  5. AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages
    Jiayi Wang, David Ifeoluwa Adelani, and  others
    ArXiv, Dec 2023
  6. Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
    Isaac Caswell, Julia Kreutzer, Lisa Wang, and 49 more authors
    Transactions of the Association for Computational Linguistics, Dec 2021
  7. Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages
    Wilhelmina Onyothi Nekoto, Vukosi Marivate, Tshinondiwa Matsila, and 45 more authors
    ArXiv, Dec 2020
  8. HausaHate: An Expert Annotated Corpus for Hausa Hate Speech Detection
    Francielle Alves Vargas, Samuel Guimarães,  \textbfShamsuddeen Hassan Muhammad, and 6 more authors
    Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024), Dec 2024
  9. An Overview of Sentiment Analysis Approaches
    \textbfShamsuddeen Hassan Muhammad
    Dec 2019
  10. BibleTTS: A Large, High-Fidelity, Multilingual, and Uniquely African Speech Corpus
    Josh Meyer, David Ifeoluwa Adelani, Edresson Casanova, and 17 more authors
    ArXiv, Dec 2022
  11. Separating Grains from the Chaff: Using Data Filtering to Improve Multilingual Translation for Low-Resourced African Languages
    Idris Abdulmumin, Michael Beukman, Jesujoba Oluwadara Alabi, and 8 more authors
    ArXiv, Dec 2022
  12. Mitigating Translationese in Low-resource Languages: The Storyboard Approach
    Garry Kuwanto, Eno-Abasi Urua, Priscilla Amuok, and 21 more authors
    ArXiv, Dec 2024
  13. MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African languages
    Cheikh M. Bamba Dione, David Ifeoluwa Adelani, Peter Nabende, and 41 more authors
    In Annual Meeting of the Association for Computational Linguistics, Dec 2023
  14. AfriWOZ: Corpus for Exploiting Cross-Lingual Transfer for Dialogue Generation in Low-Resource, African Languages
    Tosin P. Adewumi, Mofetoluwa Adeyemi, and  others
    2023 International Joint Conference on Neural Networks (IJCNN), Dec 2023
  15. Incremental Approach for Automatic Generation of Domain-Specific Sentiment Lexicon
    \textbfShamsuddeen Hassan Muhammad, Pavel Brazdil, and Alípio Mário Jorge
    Advances in Information Retrieval, Dec 2020
  16. HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language
    Shantipriya Parida, Idris Abdulmumin,  \textbfShamsuddeen Hassan Muhammad, and 7 more authors
    In Annual Meeting of the Association for Computational Linguistics, Dec 2023
  17. MasakhaNER: Named Entity Recognition for African Languages
    David Ifeoluwa Adelani, Jade Z. Abbott, Graham Neubig, and 58 more authors
    Transactions of the Association for Computational Linguistics, Dec 2021
  18. SemEval Task 1: Semantic Textual Relatedness for African and Asian Languages
    Nedjma Djouhra Ousidhoum,  \textbfShamsuddeen Hassan Muhammad, Mohamed Abdalla, and 14 more authors
    ArXiv, Dec 2024
  19. AFRIDOC-MT: Document-level MT Corpus for African Languages
    Jesujoba Oluwadara Alabi, Israel Abebe Azime, Miaoran Zhang, and 13 more authors
    In , Dec 2025
  20. A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation
    David Ifeoluwa Adelani, Jesujoba Oluwadara Alabi, Angela Fan, and 42 more authors
    In North American Chapter of the Association for Computational Linguistics, Dec 2022
  21. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
    Teven Le Scao, Angela Fan, Christopher Akiki, and 20 more authors
    ArXiv, Dec 2022