I was recently loading a lot of files into PostgreSQL, and i had the need of doing it without many resources and to be fast.
The requirements I had were the following:
- Low footprint, not reading all lines of a file at once
- Easy to load data
- Write data efficiently into Postgres using "COPY TO" sentence
With these requirements, I came up with the idea of doing with streams in NodeJS because:
- It's efficient
- It's simple
Let's say I have the following file
first_name, last_name, country
Karie,Zorn,Ukraine
Thebault,Flicker,France
Danya,Atcock,Canada
Lorne,Westmorland,Russia
Page,Greeve,Canada
Rene,Riccardini,Serbia
Adair,Mullin,Philippines
Della,Gumb,China
Charlie,Swadlinge,Cambodia
Celka,Karlowicz,Canada
And the following table in PostgreSQL:
CREATE EXTENSION "uuid-ossp";
CREATE TABLE PERSON(
id uuid default uuid_generate_v4(),
first_name varchar(30) NOT NULL,
last_name varchar(30) NOT NULL,
country varchar(30) NOT NULL
);
As we're going to use curl in the PostgreSQL sentence in order to get the file, we're going to extend the Postgresql Docker image with the following:
FROM postgres:10.4
RUN apt-get update && apt-get install -y curl
I named this file postgresql.Dockerfile, when you have that file, we're going to get all running with the following commands:
docker build -f postgresql.Dockerfile -t postgres-with-curl .
docker run --restart always --name postgres --network host -d postgres-with-curl
docker run --network host -e MINIO_ACCESS_KEY=user -e MINIO_SECRET_KEY=password --restart always --name minio -d minio/minio:RELEASE.2018-06-22T23-48-46Z server /data
So let's do it, first we need to create the connections for PostgreSQL and Minio:
const pg = require("pg")
const minio = require("minio")
const minioClient = new minio.Client({
accessKey: "user",
secretKey: "password",
secure: false,
endPoint: "localhost",
port: 9000,
region: "us-east-1",
})
const pgPool = new pg.Pool({
user: "postgres",
password: "postgres",
host: "localhost",
port: 5432,
database: "postgres",
})
Then we're going to do basically 3 things:
- Read the file
- Upload to minio
- Execute SQL in PostgreSQL instance
const fs = require("fs")
const path = require("path")
const TABLE_NAME = "public.people"
async function main() {
const input = fs.createReadStream(path.join(__dirname, "people.csv"), {
encoding: "utf-8",
})
const bucketName = "tmp"
const objectName = "new.csv"
await minioClient.putObject(bucketName, objectName, input)
const csvUrl = await minioClient.presignedGetObject(
bucketName,
objectName,
10000, // duration in seconds of the url
)
const copyDataSql = `
COPY ${TABLE_NAME} (FIRST_NAME, LAST_NAME, COUNTRY)
FROM PROGRAM 'curl "${csvUrl}"'
WITH (FORMAT CSV , HEADER, QUOTE '"', DELIMITER ',' );`
const client = await pgPool.connect()
await client.query(copyDataSql)
await client.release()
await pgPool.end()
}
main()
If we execute the following code in nodejs we should see the rows in the database with the following query:
SELECT * FROM PUBLIC.PEOPLE;
The script can fail because of the following reasons:
- Minio is not up
- Postgresql is not up
- Postgresql table has not been created
- Minio bucket "tmp" has not been created
This technique can be useful when we have no control over the disk in postgres and we have to perform a batch operation.
Thanks for reading.
Top comments (0)