Data Science Nigeria

Academic Research

Our academic research endeavours epitomise the highest standards of scholarly inquiry in Data Science and Artificial Intelligence. Through rigorous methodologies and peer-reviewed publications, we relentlessly push the boundaries of knowledge, uncovering novel insights and driving advancements at the cutting edge of our field.
Academic Research

Some of Our Award-winning Academic Research Papers

Geo-Semantics Analysis Of Environmental Disasters In Nigeria Using National Print Media Data For Disaster Management

This research paper was submitted to the International Conference on Learning Representations – ICLR 2025

Untapping African Music With Ai: Investigating Music Source Separation Models For Low-Resource African Languages

This research paper was submitted to the International Conference on Learning Representations – ICLR 2025

AI In Yorub` A´ Stem Education For Early Childhood Learning: An Evaluation On Translation Quality And Context

Evaluating Speech To Text For Children With Speech Impairment

This research paper was submitted to the International Conference on Learning Representations – ICLR 2025

Bridging The Language Gap: Evaluating Machine Translation For Animal Health In Lowresource Settings

This research paper was submitted to the International Conference on Learning Representations – ICLR 2025

How Effective Are AI Models In Translating English Scientific Texts To Nigerian Pidgin: A Low-Resource Language?

This research paper was submitted to the International Conference on Learning Representations – ICLR 2025

Neural Machine Translators (NMTs) as Efficient Forward and Backward Arabic Transliterators

This research paper was submitted to the 2024 Conference on Neural Information Processing Systems – NeurIPS 2024

The Influence Of Geo-Location Mentions In The Projection Of A Music

This research paper was submitted to the International Conference on Learning Representations – ICLR 2024

Geo-Semantic Analysis Of Medical Research Trends In Nigeria

This research paper was submitted to the International Conference on Learning Representations – ICLR 2024

Leveraging Geo-Nlp For Enhanced Antiretroviral Drug Distribution In Nigeria: Insights From Social Media And News Data

This research paper was submitted to the International Conference on Learning Representations – ICLR 2024

Prediction Of The Ability And Motivation To Adopt Reproductive Health Behavioural Change Using Anonymized Customer Center Audio Data

This research paper was submitted to the International Conference on Learning Representations – ICLR 2024

Geo-parsing and Geo-Visualization of Road Traffic Crash Incident Locations from Print Media for Emergency Response and Planning

This research paper was submitted to the International Conference on Learning Representations – ICLR 2024 and Deep Learning Indaba 2024

Tangalenlp: Building Po Tangle To English Parallel Corpora And Machine Translation Of The Tangle (Tangale) Language

This research paper was submitted to the International Conference on Learning Representations – ICLR 2024

Enhancing Transformer Models For Igbo Language Processing:A Critical Comparative Study

This research paper was submitted to the International Conference on Learning Representations – ICLR 2024

Afrihg: News Headline Generation For African Languages

This paper introduces AfriHG—a news headline generation dataset created by combining from XLSum and MasakhaNEWS datasets focusing on 16 languages widely spoken by Africa. We experimented with two seq2eq models (mT5- base and AfriTeVa V2), and Aya-101 LLM. This research paper was submitted to the International Conference on Learning Representations – ICLR 2024

Contextual Evaluation Of LLM’s Performance On Primary Education Science Learning Contents In The Yoruba Language

This research critically assesses the ability of LLMs, including ChatGPT-3.5, Gemini, and PaLM 2, to comprehend and generate contextually relevant science education content in Yoruba. This research paper was submitted to the International Conference on Learning Representations – ICLR 2024

Comparative Study Of LLMs For Personal Financial Decision In Low Resource Language

This study seeks to investigate the accuracy of LLM models in responding to basic day-to-day financial questions both in the English and Yoruba languages. The result shows that ChatGPT4.0 outperformed ChatGPT3.5 and Bard(LaMDA) in all three phases. T This research paper was submitted to the International Conference on Learning Representations – ICLR 2024
academic research

Using Advanced Analytics Of Social Media Content To Understand Rape In Nigeria

Rape is a worldwide epidemic which stands on the same pedestal with the offence of murder and has been largely characterized by a culture of impunity. The definition of rape is inconsistent between governmental health organizations, law enforcement, health providers, and legal professions. It has varied historically and culturally.

AI Class Monitor

This work explores how Machine Learning-powered cameras in a classroom can help teachers to recognize and assess their pupils’ engagement levels and classroom behavioural patterns with the goal of intervening when students need motivation and assistance.

AFRIGAN: African Fashion Style Generator using Generative Adversarial Networks(GANs)

Afrocentric fashion images suitable for machine learning tasks are not well represented in open datasets.In this work we present AFRIGAN, a generative adversarial model for contemporary african fashion images.

Semantic Enrichment of Nigerian Pidgin English for Contextual Sentiment Classification

Nigerian English adaptation, Pidgin, has evolved over the years through multilanguage code switching, code mixing and linguistic adaptation. While Pidgin preserves many of the words in the normal English language corpus, both in spelling and pronunciation, the fundamental meaning of these words have changed significantly.

A Comparative Analysis of Semi-Supervised & Self-supervised Classification for Labeling Tweets about Police Brutality

Social media has proven to be influential in social justice advocacy. In 2020, thousands of Nigerians and allies used the EndSARS hashtag to protest police brutality in Nigeria. 

A Practical Framework for Building & Managing Digital Crowdsourcing Communities

Digital communities for data crowdsourcing are here to stay. This work presents our tested framework for activation, maintenance, and long-term engagement of digital crowdsourcing communities for data collection.

A spaCy model for Medical Named Entity Recognition(NER) in Nigeria Social Media

Africa is known for its rich and diverse culture. Africans are flamboyant and fashion is an expression of our shared heritage as a people.Traditional African fashion is rich, colourful and vibrant,and allows for adaptable customization.

AFRIFASHION1600: A Contemporary African Fashion Dataset for Computer Vision

Fashion is everywhere. It is one of the main ways that humans present themselves to others. It signals what individuals want to communicate about their sexuality, wealth, professionalism, subcultural and political allegiances. African fashion is as diverse and dynamic as the continent and the people who live there.

AFRIFASHION40000 : A GAN generated dataset for African Fashion

We present AFRIFASHION40000, an openly available dataset of African fashion images generated using Generative Adversarial Networks (GANs). This work explores the practical application of Artificial Intelligence to contemporary African art for new designs, image data synthesis, and representation in computer vision tasks.

AFRIRAZER - A Deep learning Model to remove background & Skin from Traditional African Fashion Images

AFRIRAZER is a deep learning model for background and skin removal in African fashion images. While the concept of image segmentation using deep learning for fashion is not new, traditional African images are not well represented in the common fashion training images.

AI-powered understanding of Family Planning Behavioral Change using the Fogg Model

In this work, we share our methodology and findings from applying named entity recognition (NER) using machine learning, to identify behavioural patterns in transcribed family planning client call centre data in Nigeria based on the Fogg’s model.

Artificial Intelligence for Pharmacovigilance in Nigerian Social Media

Under-reporting of suspected adverse drug reactions (ADRs) to public health facilities is one of the challenges facing pharmacovigilance in Nigeria. The advent of social media has opened up a new wave of communication where people can openly discuss a range of topics and they can choose to use pseudonyms.

COVNLP: A Multisource COVID-19 Dataset for Natural Language Processing

We propose COVNLP, a novel dataset for Natural language processing tasks. The openly available dataset consists of 3,199 de-identified peer-to-peer messages shared across different channels like Whatsapp, SMS and Social media channels from volunteers during the COVID-19 pandemic in Nigeria.

EDUSTT: In-Domain Speech Recognition for Nigerian Accented Educational Content in English

English Automatic Speech Recognition systems are trained on regular speech, therefore they may struggle to perform well on accented and domain-specific speech.

Federated Grassroots Community Model for Catalyzing Artificial Intelligence for Common Good

We are proposing a methodology for developing Artificial Intelligence solutions that adopt a grassroots federated community model involving a rare collaboration between problem solvers, end-users, and other stakeholders in identifying problems and conceptualization and development of Artificial Intelligence solutions.

NaijaNER: Comprehensive Named Entity Recognition for 5 Nigerian Languages

We present our findings on Named Entity Recognition for 5 Nigerian Languages (Nigerian English, Nigerian Pidgin English, Igbo, Yoruba and Hausa). These languages are considered low-resourced, and very little openly available Natural Language Processing work has been done in most of them.

Problems and Solutions Documentation Template for Machine Learning Competitions

Democratizing access to Artificial Intelligence and truly utilizing it for the common good requires multi-stakeholder AI competitions focussed on real and prevalent problems with the potential for large scale impact and that promotes and ensures the explainability, reproducibility, contextualization and incremental enhancements of solutions.

Real-Time Crowdsourcing of Health Data in Low-income Countries

Malaria is one of the leading causes of high morbidity and mortality in Nigeria despite various policy interventions to frontally address the menace. While a national malaria policy agenda exists on the use of Artemisinin-based antimalarial as the first-line drug of choice for the treatment.