Data Science Nigeria

AI Commons Health & Wellbeing Hackathon Solutions

Overview

The AI Commons Project is a proof of concept of a new methodology of developing Artificial Intelligence solutions that allows anyone, anywhere to benefit from the possibilities that AI can provide. The project aims to increase/improve the accessibility, reproducibility, contextualization and enhancement of Artificial Intelligence solutions globally and especially in emerging markets.

The project aims to demonstrate how a global community of AI experts can learn and co-create mutually beneficial solutions with the opportunity for cross-county incremental enhancement.

Na Lie

Statement of Purpose

Introduction

Problem Definition

A study revealed that all mobile subscribers in Nigeria receive spam sms, receiving an average of 2.45 spam sms daily. In recent times, the proliferation of fraud and fake information has made it challenging to identify trustworthy messages and information. Fraudsters specifically use this window as a major agent of fraud, thus increasing the need to provide a clear perception into the reliability of online content.

All mobile subscribers in Nigeria are affected and this made everyone with a mobile phone in Nigeria susceptible to fraud

Individuals/organizations often send broadcast that people should disregard certain types of messages because they are scam and also refrain from forwarding unverified messages.

Solution

NaLie is a solution that provides real time validation system for text messages. It uses CrowdML and NLP for Detection and verification of Text-based Financial Fraud and Fake messages. It was first released in 2019
Poster presentation:
 Here

The output of the solution is a response of the class the text the message belongs to. Text classes are Fake BVN, Investment Scam, 419 Scam, Fake job and Good Text. The solution validates input text based on two major criteria. One is Database method (i.e. Sender Id, Profile and author ) and the other is Feature based method (Message Content and Linguistic feature).

Mobile subscribers.

Technical expertise required to build solution include: Programming skills, Natural Language Processing, Software engineer/ML engineer.

Usage

The aim of the solution is to proactively detect and prevent text-based financial fraud and fake messages. For instance, a mobile subscriber receives a text message to click on a link to update her Bank Verification Number (BVN) details as a result of the system update currently going on in her bank. Immediately the message drops, a NaLie notification pops up to warn the user that the message is fraudulent.

Anyone who owns a mobile phone.

The solution receives text as input from the user and returns a response/notification to the user’s screen.

The solution can be made to read user’s incoming text automatically and return a notification appropriately.

Dataset

The dataset comprises fake/fraudulent messages in the Nigerian financial and labour sectors. It contains varieties of fake message received via text and online on bank alerts and job alerts.

The dataset was created mainly for this project but it can be extended and used for similar problem scope.

The dataset was created by the research team which include the four solution implementers listed above.

Composition

Training set : 22867 instances , Testing set : 5880 with Nan inclusive.

An instances comprises of the text column and a column each for the five classes of the label. That is Text, Fake BVN, Investment Scam, 419 Scams, Fake Job and Good Text. A sample content of the text column is “Dear Customer, We are running a compulsory security enrollment of all ATM cards issued by banks in Nigeria. CBN as the apex body will block all cards not enrolled within 24hrs of receiving this notification. Visit link: http://217.71.50.11/~update to secure your card now.” Also a text can only belong to one class, i.e. for every instance, only one class of label can be true. If Fake BVN is 1(true), then all other classes will be 0 (false).

Yes, the label feature has five classes namely, Fake BVN, Investment Scam, 419 Scam, Fake job and Good Text for the classification task.

Collection Process

Maintenance

Model

Model Details

Result

Result Details

OneVsRestClassifier(estimator=LGBMClassifier(boosting_type=’gbdt’, class_weight=None, colsample_bytree=1.0,
importance_type=’split’, learning_rate=0.1, max_depth=-1,
min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,
n_estimators=100, n_jobs=-1, num_leaves=31, objective=None,
random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,
subsample=1.0, subsample_for_bin=200000, subsample_freq=0),
n_jobs=None)

Training runs = 100 , Evaluation runs = 1

Accuracy = (TP + TN)/(TP + TN + FP + FN)
where: TP = True positive; FP = False positive; TN = True negative; FN = False negative