As a developer, you're probably no stranger to the challenges associated with large datasets. From optimizing your code to managing database queries, there's a lot to consider when working with massive amounts of data. But have you ever stopped to consider the impact of your choice of data structure?
Enter Doctrine Collections. These powerful data structures are a staple of object-oriented PHP development, offering a convenient way to manage complex data relationships. However, as any seasoned developer would tell you, working with Doctrine Collections can also be a double-edged sword.
On the one hand, these collections offer a powerful and flexible way to organize and manipulate your data. On the other hand, they can quickly become a bottleneck when working with large datasets, slowing down your application and hindering your ability to make progress.
In this article, we'll explore the challenges and opportunities presented by Doctrine Collections, and provide you with a number of practical tips and strategies for managing large datasets. Whether you're a beginner or an experienced developer, you'll come away with a deeper understanding of how to optimize your code and get the most out of your data structures.
Filtering
Doctrine Collections have many implemented methods, including “map”, “matching”, and “filter”. In this article, we’ll be focusing specifically on the latter method, used to filter a collection of objects based on a given Closure. This can help you to simplify your code and make it more readable by allowing you to express the filtering criteria in a concise and declarative manner. The “filter” method is a good choice in cases where you want to display only a subset of the objects in a collection based on some criteria, such as those meeting specific conditions. For instance, you might want to filter a collection of customer orders that matches a specific order state.
How does filtering work under the hood?
When you call the “filter” method on a collection, Doctrine creates a new instance of the collection class and iterates over the original collection. An entity is added to the new collection only when the filter function for it returns “true”.
protected function createFrom(array $elements)
{
return new static($elements);
}
The “filter” method uses PHP's built-in “array_filter” function to apply the “filter” method to each entity in the collection. The ”array_filter” function takes an array as its input and returns a new array that contains only the elements that satisfy the “filter” method.
With Doctrine's collection classes, the input to “array_filter” is an array of entities. The “filter” itself, on the other hand, is a callable that takes an entity as its input and returns a boolean value, indicating whether the entity should be included in the filtered collection.
public function filter(Closure $p)
{
return $this->createFrom(array_filter($this->elements, $p, ARRAY_FILTER_USE_BOTH));
}
Once the filtered collection has been created, Doctrine returns it as the result of the “filter” method call, leaving the original collection unchanged.
Problem
To identify the disadvantages of using the “filter” method, let’s choose a specific example, such as the ones provided below on Symfony with Doctrine. We have already installed and prepared:
- Symfony 6.2 with Doctrine ORM 2.14
- PHP 8.2
- MySQL 8 (InnoDB Engine)
Let’s begin with a situation. For the following example, I’ll be using Order and Customer entities from the Sylius e-commerce framework:
- Order Entity
- Customer Entity
To test the “filter()” method, I’ve prepared 100,000 rows of orders and 100,000 rows of customers as the data input. In the following case, we’ll be trying to find all the specific customer orders with the status “Fulfilled”. Presently, the customer has 11,653 orders – 8,658 with the “Fulfilled” status, and 2,995 with other statuses. This test is executed 10 times in order to calculate average times and check memory consumption. All examples are compared with the logically identical QueryBuilder method to see the clear difference between use cases.
For the following test, I’ve prepared a method using above mentioned “filter”:
public function getFulfilledOrdersWithFilter(): Collection
{
return $this->orders->filter(static fn(Order $order) => $order->getState() === 'fulfilled');
}
From this data table, we can clearly see that the “filter” method works slower than Doctrine QueryBuilder by 38.5% on average. Furthermore, memory consumption is significantly (27.2%) higher than with the QueryBuilder. Why is that? As I’ve mentioned in the previous section, Doctrine Collections loads all the order objects that belong to the customer and then filters only those that meet the given criteria. This consumes significantly more memory and takes longer to complete.
Solution
How can the code be refactored to avoid using the “filter” Collection method? With the code provided, we actually have 2 options:
- Doctrine Query Builder: As the above example shows, Doctrine QueryBuilder is the most efficient in terms of time and memory. This method fetches rows and matches them with the provided criteria. After fetching, subsequent PHP-level operations have less data to work with, resulting in a faster and less memory-intensive process.
- Doctrine Criteria: “Matching” is another useful method of Doctrine Collections. We can use this method along with the “Criteria” object. Let’s see how we might refactor the code using Doctrine Criteria and the “matching” method. Since both are still going to use Doctrine collections, our aim will be to find a way to make this query faster. For this test, I’ve prepared a method using a Criteria object:
public function getFulfilledOrdersWithCriteria(): Collection
{
$expressionBuilder = Criteria::expr();
$criteria = Criteria::create();
$criteria->where($expressionBuilder->eq('state', 'fulfilled'));
return $this->orders->matching($criteria);
}
From the data above, we can see that using “Criteria” takes about as long as QueryBuilder, and consumes almost as much memory. Given that both the “filter” and the “matching” methods are implemented in the same manner, why do they differ in terms of duration and memory consumption? The answer is that, if the collection has not yet been loaded from the database, Criteria performs at the SQL level to optimize access to large collections. Meanwhile, the “filter” method iterates over the data for already selected rows from the Database. This explains the virtually identical duration and memory requirements of the Criteria and QueryBuilder processes.
Summary
Doctrine Collections and QueryBuilders serve different purposes and have different performance characteristics.
The “filter” method on a Doctrine Collection is used to filter the elements of the collection based on a given condition. This method works on the collection in memory and requires that all the collection elements be loaded into memory before filtering. This can be an expensive operation if the collection is large, as it consumes a lot of memory.
On the other hand, the QueryBuilder is used to create SQL queries that operate at the database level. The QueryBuilder constructs a SQL query that is sent to the database, and only the matching rows are returned. This approach is much more efficient when dealing with large datasets, as it only retrieves the relevant data from the database, rather than requiring all of it to be loaded into memory.
Moreover, the ”matching” method is faster than “filter” when you have a large collection and a complex filtering condition. This is because the matching method will create a SQL query that is optimized for database filtering. The resulting SQL query will fetch only the data that matches the filtering condition from the database, instead of fetching all the data and then filtering it in PHP – which is what the “filter” method does.
In conclusion, if you are dealing with large datasets, using QueryBuilder is generally faster and more memory-efficient than using the “filter” method on a Doctrine Collection. However, if you are dealing with a small dataset, the performance difference may be negligible, and you can choose based on personal preference or convenience.
Copyright NFQ Technologies 2024
Top comments (0)