DEV Community

loading...
Cover image for Reduce time complexity by indexing your arrays

Reduce time complexity by indexing your arrays

Khalyomede
Fullstack developer @ Carlili
・5 min read

Indexing in a database

Indexing, maybe that is a term you already heard when building or using a database.

Put indexes, else your request will take too much time.

Alright, that works, my primary key is set. My requests are faster when I join my primary key with another table column. But why?

Publisher

id    name        age
=====================
1     John        38
2     Elizabeth   24
3     Mike        17    
Enter fullscreen mode Exit fullscreen mode

We have publishers, and they post some articles on a web app.

Article

id    title                                createdAt    publisherId
===================================================================
1     Unit tests for your next library     2019-01-26   2
2     WebP: when performance matter        2019-02-25   2
3     Indexes: why would I need it         2019-03-09   3      
Enter fullscreen mode Exit fullscreen mode

So you will join them to get the posts for each publishers:

SELECT
  publisher.name,
  article.title,
  article.createdAt
FROM
  publisher
JOIN
  article on article.publisherId = publisher.id
Enter fullscreen mode Exit fullscreen mode

With the expected result:

name        title                            createdAt
Elizabeth   Unit tests for your next library 2019-01-26
ELizabeth   WebP: when performance matter    2019-02-25
Mike        Indexes: why would I need it     2019-03-09
Enter fullscreen mode Exit fullscreen mode

Up to this, nothing wrong. Now, imagine we have 50 publishers and 1000 articles.

In order for the database to figure out to which publisher belongs to which articles, it takes your article publisherId, and then, for each of these, will loop for each publisher until it founds the publisher that matches the id, and pulls out the desired data.

article.publisherId    publisher.id
===================================
59                     1 ? No
59                     2 ? No
59                     3 ? No
...
59                     59 ? Yes -> pulls out related data

32                     1 ? No
32                     2 ? No
32                     3 ? No
...            
32                     32 ? Yes -> pulls out related data


11                     1 ? No
11                     2 ? No
11                     3 ? No
...
11                     11 ? Yes -> pulls out related data
...
Enter fullscreen mode Exit fullscreen mode

In the worst case, let us imagine only the publisher #50 posts the 1000 articles. This means for each posts, n posts, we will loop for each publishers, n publishers, to find the relation. Which is a complexity of n x n. We imagine the request takes 0.5 milliseconds (500 µs) for each round.

(0.5 ms * 1000 articles) * (0.5 ms * 50 publishers)
500 ms * 25 ms
12 500 ms
12.5 sec

Current query complexity: O(n²)

To fix this, database engines supports primary keys indexing to help the engine figure out quickly to which records belongs which identifier. So instead of storing your records in the form of an array, the database engine will index your rows, in this manner (schematized using JSON notation, not how the database truely stores the data):

Publisher

{
  1: {
    "name:": "John",
    "age": 38
  },
  2: {
    "name": "Elizabeth",
    "age": 24
  },
  3: {
    "name": "Mike",
    "age": 17
  }
}
Enter fullscreen mode Exit fullscreen mode

Which make the database able to "attack" directly the indexed id instead of looping through each records until he finds the correct key!

So if you want the publisher name with id 50, you can do it in 1 round:

const name = Publisher[50].name;
Enter fullscreen mode Exit fullscreen mode

Back with our 1000 articles made by 50 publishers, we can now directly get the publisher (no need to loop for each of them), so only 1000 rounds. Our database still takes 0.5 milliseconds per round (500 µs).

(0.5 ms * 1000 articles) * (1 ms * 1 publisher)
500 ms * 1 ms
500 ms
0.5 sec

Current query complexity: O(n)

We have gone from 12 sec to nearly a half second. Not that bad!

Indexing an array

Indexing arrays is the exact same process. Imagine we are pulling routes (urls) and their associated content, like in a CMS.

use App\Article;

$articles = Article::all();

print_r($articles);
Enter fullscreen mode Exit fullscreen mode

Resulting in the console:

[
  [
    id" => 1, 
    "url" => "/advantages-of-indexing-arrays", 
    "content" => "..."
  ],
  [
    id" => 2, 
    "url" => "/webp-is-the-new-fast", 
    "content" => 
    "..."
  ],
  [
    id" => 3, 
    "url" => "/vue-js-async-components", 
    "content" => "..."
  ],
  ...
]

Enter fullscreen mode Exit fullscreen mode

Alright, now the user is requesting the route "/7-tips-to-be-more-productive-on-laravel-5", and your database contains 1000 articles.

use App\Article;
use App\Request;
use App\Response;

class ArticleController {
  // Shows the desired article content to the client
  public function show(Request $request): Response {
    $articles = Article::all();
    $url = $request->url();

    foreach ($articles as $article) {
      if ($article->url === $url) {
        // found it, pull content and display the view

        break;
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

In that configuration, your controller will loop for each articles, and when he finds the correct one, displays the content. In the worst scenario, 1000 articles, the user request the last article, 1000 loop rounds.

With 1 ms per round, this gives us:

1000 articles * 1 ms
1000 ms
1 sec

Current controller complexity: O(n)

O(n), which means n articles (we do not know how many article we have in advance).

1 sec to send the result to the client browser, potentially more time for the browser to render the page. We can do better by indexing this array.

use App\Article;
use App\Request;
use App\Response;

class ArticleController {
  // Shows the desired article content to the client
  public function show(Request $request): Response {
    $articles = $this->articles();
    $url = $request->url();

    if (isset($articles[$url])) {
      // found it in one try!
    }
  }

  // The indexing happens here
  protected function articles(): array {
    $articles = Article::all();
    $indexedArticles = [];

    foreach ($articles as $article) {
      $indexArticles[$article->url] = $article;
    }

    return $indexArticles;
  }
}
Enter fullscreen mode Exit fullscreen mode

Now we can directly "attack" the array, and get the article data. isset() returns true if the key is in the array, else returns false. It also directly target the key, so it does not loop the entire array.

1 route * 1 ms
1 ms

Current controller complexity: O(1)

O(1), because no matter how many articles we have in database, it will be either found or not instantly.

Cautions

This example is partially correct, because I do not take into account the complexity of fetching articles (maybe my Article class performs additional process, for each rows, so potentially more seconds spent by the back-end).

Also, you definitively not want to index your array at each server requests. You probably will cache it using Redis or in a file (in a form of a JSON for example). Because in the end, the indexing is a process that requires to loop through your array anyway.

Conclusion

I hope this article gave you more insights around indexes and when using them will make our requests quicker.

This technique is very useful when your first loop needs to loop again and find an element (like a database would need to make a join).

If you need to join on several criteria (e.g. column value), this is the same process.

{
  "GET": {
    "/": { "controller": "HomeController@index" },
    "/newsletter": { "controller": "NewsletterController@index" }
  },
  "POST" => {
    "/newsletter": { "controller": "NewsletterController@store" }
  }
}
Enter fullscreen mode Exit fullscreen mode

You can attack this object by many criteria (in this case, 2 criterias, the url and the protocol):

const routes = []; // filled by the array above
const protocol = "POST";
const url = "/newsletter";

const controller = routes[protocol][url].controller;
Enter fullscreen mode Exit fullscreen mode

Happy optimizations!

Photo by rawpixel.com from Pexels

Discussion (0)

Forem Open with the Forem app