DEV Community

Judy
Judy

Posted on

Update the Latest Data in a CSV File to the Database #eg66

Problem description & analysis

Below is CSV file emp.csv:

EMPID,FIRSTNAME,LASTNAME,EMAIL,UPDATEDATE

123,John,Smith,john.smith01@email.com,01/01/2020

234,Bruce,Waye,bruce.wayne@wayneenterprises.com,02/02/2020

123,John,Smith,john.smith02@email.com,02/15/2020

345,Clark,Kent,clark.kent@dailyplanet.com,02/16/2020

123,John,Smith,john.smith03@email.com,02/20/2020

In the above file, UPADATEDATE field is ordered and EMPID field contains duplicates.

We are trying to divide records in the CSV file into two parts:

  1. Write the newest employee records, which are those with the latest UPDATEDATE values after the file is grouped by EMPID, to EMP table;

  2. Write the rest of the records, which are the difference, to EMP_HIS table according to the original order.

Solution

Write the following script p1.dfx in esProc:

Image description
Explanation:

A1  Import data from the CSV file, during which the first row is read as column headers and UPDATEDATE is parsed into date type (while the original date format is MM/dd/yyyy).

A2  Group A1’s table by EMPID and get the last member from each group (the record with the latest UPDATEDATE value).

A3  Get the rest of the records, which is the difference of A1 and A2.

A4  Connect to the database.

A5  Update the latest employee record to the database by EMPID or insert it to EMP table.

A6  Update the historical employee records by EMPID and UPDATEDATE to the database, or insert them to EMP_HIS table.

A7  Close database connection.

Find how to integrate the script code with a Java program in How to Call an SPL Script in Java.

SPL open source address

Download

Top comments (0)