For a couple of projects I needed to read and write a couple of very large Excel and CSV files. I didn't find a good package that does this so I decided to create one myself. Under the hood it uses generators, so memory usage will remain low, even when working with large files.
In this blogpost I'd like to walk you through spatie/simple-excel.
Using simple-excel
As the name implies using simple-excel is... simple :-)
Imagine you have a CSV with this content.
email,first_name
john@example.com,john
jane@example.com,jane
You can read it like this:
// $rows is an instance of Illuminate\Support\LazyCollection
$rows = SimpleExcelReader::open($pathToCsv)->getRows();
$rows->each(function(array $rowProperties) {
// in the first pass $rowProperties will contain
// ['email' => 'john@example', 'first_name' => 'john']
});
getRows
will return an instance of Illuminate\Support\LazyCollection
. This class is part of the Laravel framework. Behind the scenes generators are used, so memory usage will be low, even for large files.
You'll find a list of methods you can use on a LazyCollection
in the Laravel documentation.
Here's a quick, silly example where we only want to process rows that have a first_name
that contains more than 5 characters.
SimpleExcelReader::open($pathToCsv)->getRows()
->filter(function(array $rowProperties) {
return strlen($rowProperties['first_name']) > 5
})
->each(function(array $rowProperties) {
// processing rows
});
Reading an Excel file is identical to reading a CSV file. Just make sure that the path given to the create
method of SimpleExcelReader
ends with xlsx
or xls
.
Let's turn our attention to writing files. Here's how you can do that:
$writer = SimpleExcelWriter::create($pathToCsv)
->addRow([
'first_name' => 'John',
'last_name' => 'Doe',
])
->addRow([
'first_name' => 'Jane',
'last_name' => 'Doe',
]);
The file at pathToCsv
will contain:
first_name,last_name
John,Doe
Jane,Doe
Again, if you want create an excel file, just use xls
or xlsx
as the extension.
How does it work under the hood
Reading very large files will still only use a tiny bit of memory. Let's take a look at how the package accomplishes that. If you take a look at the requirements of the package, you'll see that box/spout
is listed as a dependency.
Let's work with this CSV:
email,first_name
john@example.com,john
This is how you can use Spout directly.
use Box\Spout\Reader\Common\Creator\ReaderEntityFactory;
$reader = ReaderEntityFactory::createReaderFromFile('/path/to/file.csv');
$filePath = '/path/to/file.ext';
$reader = ReaderEntityFactory::createReaderFromFile($filePath);
$reader->open($filePath);
$sheet = $this->reader->getSheetIterator()->current();
foreach ($sheet->getRowIterator() as $row) {
$arrayWithValuesOfRow = $this->getValueFromRow($row)
}
$reader->close();
It's not too much code, but it's not very intuitive (I have a very low tolerance for code that is not easy to use).
In the first pass the $arrayWithValuesOfRow
will be filled with ['email','first_name']
. The second one with ['john@example.com','john']
. You'll have to write some boring code to combine the header names with the values.
Our simple-excel package will do all of that for you. Here is the code needed:
SimpleExcelReader::open($pathToCsv)
->getRows();
->each(function(array $rowProperties) {
// in the first pass $rowProperties will contain
// ['email' => 'john@example', 'first_name' => 'john']
});
The getRows
method will return an instance of Illuminate\Support\LazyCollection
. This is a class that was recently introduced in Laravel. In short, this class allows you to wrap a generator so you can use most of the Collection
API on it.
Here's the code that wraps Spout's generator in a LazyCollection.
``
php
// inside the getRows function
return LazyCollection::make(function () {
while ($this->rowIterator->valid()) {
$row = $this->rowIterator->current();
yield $this->getValueFromRow($row);
$this->rowIterator->next();
}
});
`
Alternatives
If you just have to working with CSV files and don't mind processing a header row yourself, you might now even need a package. Take a look at the native fgetcsv
function.
If you need something framework agnostic, that can handle both CSVs and Excel files, you could also opt to use Spout directly.
In the Laravel ecosystem, Laravel Excel is a popular choice. It's a well written package with lots of powerful options. You can write files to disks you configured in Laravel and use importables and exportables to describe your files.
Patrick, the author of the laravel-excel, did an awesome job creating it, but I think his use cases are a bit different from mine. I like to read files residing at a given path (and not having to use a configured disk). For simple imports and exports the importables and exportables feel to heavy for me. I just want to write the data directly and be done with it. There also seem to be some performance/memory issues when handling very large files.
Still, if you like it's API and don't have to work with large files, laravel-excel is a great alternative to our simple-excel package.
Closing thoughts
The simple-excel package has some more interesting options that were not mentioned in this blogpost such as disabling automatic header rows, using styles, manually using a reader/writer object, using alternative delimiters, ... To learn more about these options, head over to the documentation on GitHub.
Be sure to also check out this list of open source packages my team and I have created previously.
Top comments (0)