DEV Community

Cover image for Deal Scoring- Part 1: Data Pre-processing in MySQL
Ochwada Linda
Ochwada Linda

Posted on

Deal Scoring- Part 1: Data Pre-processing in MySQL

In my last blog, I wrote about AI & Sales; today, I will start the process of sales data analytics. I will be building a logistic regression model to assign a score to each of the prospects that a company can use to target potential deals. A higher score would mean that the deal is hot, i.e. is most likely to convert, whereas a lower score would mean that the deal is cold and will mostly not get converted.

When closing deals, a salesperson with a rich pipeline of potential clients must decide where to focus their time to close a particular deal. Often, salespeople make decisions based on their intuition and incomplete information. Using AI, data scientists compile historical information about a client, company, social media postings and the salesperson’s customer interaction history (e.g. emails sent, calls made, text sent etc. ); and rank the opportunities or leads in the pipeline according to their chances (probability) of closing successfully. One tool built on this methodology is Dealcode GmbH.

The data obtained was in several CSV files; I used MySQL to inspect and preprocess the data.

-- Creating NEW Table
DROP TABLE IF EXISTS crm_activity;

CREATE TABLE crm_activity (
    id INT PRIMARY KEY,
    crmId INT,
    crmUserId INT,
    crmDealId INT,
    crmLeadId INT,
    crmDuration TIME NULL,
    crmType VARCHAR(255),
    crmDueTime TIME NULL,
    crmAddTime VARCHAR(255),
    crmUpdateTime VARCHAR(255),
    crmDueDate DATE,
    crmSubject VARCHAR(255),
    crmDone INT,
    isDeleted INT,
    createDate VARCHAR(255),
    companyId INT
);

LOAD DATA LOCAL INFILE 'path/crm_activity.csv' IGNORE 
INTO TABLE crm_activity
FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n';

SELECT * FROM crm_activity
LIMIT 100;
Enter fullscreen mode Exit fullscreen mode

Data type at a field like createDate is presented as character data that is varying (VARCHAR). Convert the data type to a timestamp using str_to_date, and insert the converted string-to-date in a new column. The string date-time column can then be deleted by using DROP.

ALTER TABLE crm_activity ADD (
    crmAddTime_ts TIMESTAMP,
    crmUpdateTime_ts TIMESTAMP,
    createDate_ts TIMESTAMP
);

UPDATE crm_activity SET  crmAddTime_ts = str_to_date( crmAddTime, '"%Y-%m-%d %H:%i:%s"');
UPDATE crm_activity SET crmUpdateTime_ts = str_to_date( crmUpdateTime, '"%Y-%m-%d %H:%i:%s"');
UPDATE crm_activity SET  createDate_ts = str_to_date(  createDate, '"%Y-%m-%d %H:%i:%s"');

DELETE FROM crm_activity WHERE id=0;

COMMIT;
Enter fullscreen mode Exit fullscreen mode

MySQL automatically generates a database schema view from the created (now existing) database (From the interface --> Database --> Reverse Engineer --> Follow the instructions).

Database Schema Generated
Database Schema

Database schemas are essential because they help developers visualize how a database should be structured and give a clear point of reference about what tables and fields a project contains.

Data Processing

Fetching data from MySQL into Anaconda (python), requires mysql.connector package, installed it by pip install mysql-connector-python

mydb = mysql.connector.connect(
  host="localhost",
  user="myuser",
  password="yourpassword",
  database="name_of_the_database"
)

# Loading and visualizing in Anaconda
crm_deal = mydb.cursor()
crm_deal.execute("SELECT * FROM crm_deal ")
crm_deal_results = crm_deal.fetchall()
for leads in crm_deal_results:
  print(leads[:10])
Enter fullscreen mode Exit fullscreen mode

Continue reading - Deal Scoring - Part 2: Data processing with Python

Top comments (0)