DEV Community

Cover image for Inserting Data from a Large CSV in Laravel Over 10x Faster
isaacdew
isaacdew

Posted on

Inserting Data from a Large CSV in Laravel Over 10x Faster

Typically, when inserting data from a CSV, you might read the file one line at a time and do batch inserts. Something like this:

$file = fopen(storage_path('app/test.csv'), 'r');

$i = 0;
$batch = [];
$batchSize = 1000;
while ($line = fgetcsv($file)) {
   if ($i === 0) {
       $i++;
       continue;
   }
   $batch[] = [
       'column' => $line[1],
       // etc...
    ];

    $i++;

    if (($i % $batchSize) === 0) {
        Model::insert($batch);
        $batch = [];
    }

}

// Insert the remaining batch
if (! empty($batch)) {
    Model::insert($batch);
}
Enter fullscreen mode Exit fullscreen mode

This approach is great if you're dealing with a CSV that has hundreds or even thousands of rows. But what if the CSV has millions of rows? Enter MySQL's LOAD DATA statement.

The LOAD DATA statement is MySQL's lightning fast, built-in way of loading CSVs into a table in your database. With it, you can specify a custom separator or line ending or even transform the data before it's saved to your table and much more!

So how does it compare to the approach above? Let's see!

Benchmarking

Setup

I have a people table with first_name, last_name, email, phone, street_address, city, state, postal_code, country and date_of_birth columns. My 1.5 GB CSV has 10 million rows with similar columns - ID, First Name, Last Name, Email, Phone, Street Address, City, State, Postal Code, Country and Date of Birth. The Date of Birth field is in MM/DD/YYYY format so we have to convert it to a DB friendly format.

Computer: M1 Pro MacBook Pro
PHP: 8.3
Laravel: 11
MySQL/MariaDB: MariaDB 11.2.2

Batched Inserts

This will look very similar to the approach above. The code:

// Disable unique checks and foreign key checks to speed up inserts
DB::statement('SET @@session.unique_checks = 0');
DB::statement('SET @@session.foreign_key_checks = 0');

$file = fopen(storage_path('app/test.csv'), 'r');

$i = 0;
$batch = [];
$batchSize = 1000;
while ($line = fgetcsv($file)) {
    if ($i === 0) {
        $i++;

        continue;
    }

    $batch[] = [
        'first_name' => $line[1],
        'last_name' => $line[2],
        'email' => $line[3],
        'phone' => $line[4],
        'street_address' => $line[5],
        'city' => $line[6],
        'state' => $line[7],
        'postal_code' => $line[8],
        'country' => $line[9],
        'date_of_birth' => Carbon::parse($line[10])->format('Y-m-d'),
    ];

    $i++;

    if (($i % $batchSize) === 0) {
        Person::insert($batch);
        $batch = [];
    }

}

// Insert the remaining batch
if (! empty($batch)) {
    Person::insert($batch);
}
Enter fullscreen mode Exit fullscreen mode

This takes an average of 6 minutes.

LOAD DATA Statement

To write the statement, I'm using a package I wrote - isaacdew/load-data.

The code:

LoadData::from(storage_path('app/test.csv'))
    ->to(Person::class)
    ->fieldsTerminatedBy(',')
    ->fieldsEnclosedBy('"', true)
    ->useFileHeaderForColumns()
    ->onlyLoadColumns([
        'first_name',
        'last_name',
        'email',
        'phone',
        'street_address',
        'city',
        'state',
        'postal_code',
        'country',
        'date_of_birth',
    ])
    ->setColumn('date_of_birth', "STR_TO_DATE(@date_of_birth, '%c/%d/%Y')")
    ->load();
Enter fullscreen mode Exit fullscreen mode

This takes an average of 0.53 minutes.

To run the benchmarks yourself, see the repository I setup.

Conclusion

MySQL's LOAD DATA statement is incredibly fast and flexible. If you need to load large CSVs into your database, I highly recommend it. You can install my package using composer to make writing your load statements in Laravel easy - composer require isaacdew/load-data

Top comments (0)