CacheLane Logo

Chapter 06

The Need for a Trie

Until now, we've been using a straightforward object to store our routes. While this is easy to understand, it's not the most efficient way to store routes, especially when we have a large number of them or when we introduce dynamic routing capabilities like /users/:id. It's a simple and readable approach but lacks efficiency and the capability for dynamic routing. As we aim to build a robust, scalable, and high-performance backend framework, it is crucial to optimize our routing logic.

As long as you don't need dynamic parameters, or query parameters, you'd be good enough with a javascript object (like we do now), or a Map. But a backend framework that doesn't supports dynamic parameters, or query parsing is as good as a social media site without an ability to add friends.

In this chapter, we'll explore a new data-structure that you may not have heard of before - Trie. We'll also look at how we can utilize it to enhance our router's performance.

For example, imagine we have the following four routes:

GET /api/v1/accounts/friend
GET /api/v1/accounts/stats
GET /api/v1/accounts/upload
GET /api/v1/accounts/blocked_users
POST /api/v1/accounts/friend
POST /api/v1/accounts/stats
POST /api/v1/accounts/upload
POST /api/v1/accounts/blocked_users

Our current implementation will have them stored as separate keys in the object:

{
    "GET /api/v1/accounts/friend": function handle_friend() { ... },
    "GET /api/v1/accounts/stats": function handle_stats() { ... },
    "GET /api/v1/accounts/upload": function handle_upload() { ... },
    "GET /api/v1/accounts/blocked_users": function handle_blocked_users() { ... },
    "POST /api/v1/accounts/friend": function handle_friend() { ... },
    "POST /api/v1/accounts/stats": function handle_stats() { ... },
    "POST /api/v1/accounts/upload": function handle_upload() { ... },
    "POST /api/v1/accounts/blocked_users": function handle_blocked_users() { ... }
}

That is not efficient. For most of the applications this is nothing to worry about, but there's a better way. Also with this approach it becomes impossible to extend our router with the other functionalities we mentioned - dynamic routes, queries etc. There's a way to do some regex sorcery to achieve it, but that method will be way way slower. You don't need to sacrifice performance in order to support more features.

A better way to store the routes could be the following:

{
    "/api": {
        "/v1": {
            "/accounts": {
                "friend": function handle_friend() { ... },
                "stats": function handle_stats() { ... },
                "upload": function handle_upload() { ... },
                "blocked_users": function handle_blocked_users() { ... }
            }
        }
    }
}

This is an easy way to think of how a Trie stores the paths.

What is a Trie anyway?

A Trie, which is also known as a prefix tree, is a specialized tree structure used for storing a mapping between keys and values, where the keys are generally strings. This structure is organized in such a way that all the child nodes that stem from a single parent node have a shared initial sequence of characters, or a "common prefix." So the position of a node in the Trie dictates what key it corresponds to, rather than storing the key explicitly in the node itself.

Imagine we have the following routes:

'GET /users'
'GET /users/id'
'POST /users'

With our current implementation, the routes object would look like:

{
    "GET /users": handler,
    "GET /users/id": handler,
    "POST /users": handler
}

But, with a Trie, it will look like the following:

    [root]
      |
     GET
      |
    users
     / \
   POST GET
          \
          id

Every node, including root will be an object that contain some necessary information with it.

  1. handler: The function to be executed when the route represented by the path to this node is accessed. Not all nodes will have handlers, only the nodes that correspond to complete routes.

  2. path: The current route segment in string, for example - /users or /id

  3. param and paramName: If the current path is /:id and the client makes a request at /xyz, the param will be xyz and the paramName will be id.

  4. children: Any children nodes. (We'll get more deep into this in the upcoming chapters)

Enough with the theory. In the next chapter, we'll dive into our very first exercise for this book: implementing a Trie.

Previous
Improving the Router API