loading...
Cover image for  Data driven Frontend development using RNN and Markov Chains

Data driven Frontend development using RNN and Markov Chains

iammowgoud profile image Hatem Hassan πŸ‘¨β€πŸ’»β˜•οΈπŸ’»πŸŒΊπŸ˜Ž Originally published at hatem-hassan.com ・6 min read

Originally posted on my personal blog

The dark ages of the Web

Throughout my career as a frontend engineer, I've worked with many libraries, packages, and dependencies. I admit that when I used JQuery for the first time almost 10 years ago I never really thought about what was happening behind this innocent <script> tag. I was amazed with how easy it to $('.cool-logo').slideUp(), I didn't even consider how it works behind the scenes.

  <script src="https://code.jquery.com/jquery.min.js"></script>
  <script src="/assets/slideshow.js"></script>
  <script src="/assets/form-validations.js"></script>
  <script src="/assets/moment.js"></script> <!-- We need those `a year ago` strings, don't we? -->
  <script src="/assets/thatAnimationThingWeUseInOnePage.js"></script>

-- Part of a very cool website <head> tag.

Not only is this way hard to manage, because if one library depends on another we will probably mess it up, it also can make your website size grow rapidly without even noticing. You don't know how many of these libraries imports loadash or which version does it import. Are these libraries updated and secure? Are there duplicates? πŸ€·πŸ½β€β™‚οΈ

We need a change

Now the web has evolved a lot and we don't only have libraries, we have frameworks, like Angular. We've seen lots of changes and innovation in the way we build web applications. JavaScript bundling is a major component of any Frontend framework in 2019. Basically what a Static File Bundler does is put your JavaScript files (and assets) and all its dependencies together in one (or more) file(s). Two of the most popular bundlers are browserify and webpack.

Webpack homepage

Webpack

Webpack is wildly adopted because it is the bundler used by Angular CLI to build production assets. In case of static websites, what it does is finding and eliminating all those random <script> tags in HTML all over your project and only includes one single JavaScript file (or a few).

Getting started with Webpack configuration can have a steep learning curve but it is nothing compared to manually managing dependencies. The basic concept a beginner engineer needs to understand about Webpack is that you give it an entry file; which Webpack will access to recursively look for all those imports and requires to figure out all of the dependencies in the project. Then it builds a dependency tree, which is very useful for various reasons, one of which is removing duplicate libraries. Finally it compiles (and possibly compress) everything into bundle(s).

It doesn't stop here!

Bundle optimization is another hot topic right now. Frameworks like Angular are trying optimize how these bundles are built and splitted. For example, if you use one big graphing library in some pages of a dashboard Webapp, it doesn't really make sense to load this library in all pages and right away. What would you do with it in the /login page? Always remember that our goal is to eventually decrease page load time.

That's one reason we divide our Angular app into modules. Angular and Webpack will create a specific bundle for all of the pages contained in that module and load it on demand. Assuming the modules structure is mapped to the path structure, this can be called Route-level code splitting.

There are lots in play when it comes to bundle optimization. Lazy loading, Eager loading and Preloading are all strategies used to optimize bundling and decrease page load time. You can read more about it in this sweet article

Machine Learning comes to the rescue!

Googlers from the Angular team started a very cool project called Guess.js to tackle the bundling issues in Angular as well as static sites.

Guess.js

Google's Guess.js is optimizing code bundling and prefetching using TensorFlow.js RNN Machine Learning model (or Markov Chain) to learn navigation patterns. These patterns are used to predict users' next transition, as in next possibly visited page (or pages). Why? to prefetch these pages and provide instant transitions with your application. Cool, right?

WTH is RNN?

Recurrent Neural Network

RNN is a Recurrent Neural Network that uses its internal memory to process a sequence of inputs. In this case, think about a web navigation sequence:

User 1: /login => /dasboard => /dasboard/report/1 => /dashboard/report/2 => /logout
User 2: /login => /account  => /dashboard/add/user => /dashboard/add/user/success/ => /logout
User 3: ...

What RNN does is that it learns the common patterns in such sequences then given a sequence of inputs it can predict the next item in the sequence.

But why RNN ?

The output of an RNN unit does not only depend on the current input but it also considers the previous hidden state which carries past information. This means the network learns from its past to come up with better target (prediction).

**Note:* I'm not sure how RNNs are implemented exactly inside Guess.js, this is just a description of how RNNs are relevant to navigation patterns prediction*.

There are limitations of RNN though:

These are the two variants of RNN that tackle these limitations:

The most obvious difference between those two is GRU has output and hidden state, while LSTM has output, hidden state and cell state. Check the list in the end of this post for more details about RNNs.

So in summary RNN contains internal state that gets updated every time we feed it with new input. During output sequence prediction, the knowledge of the past is used through hidden states.

What about Markov Chains?

Once again, this is general description of Markov Chains, not Guess.js implementation.

Markov chain is named after mathematician Andrey Markov and it is a probabilistic model that simulates the flow from one "state" to another. In other words, in a space of multiple events/states, the model can tell us how likely it is that we "hop" from event A to event B, or from B => C, what about B => C => A, and so on.

Markov Chains Example

In our case, a Markov Model would give us for instance the probability of a certain user jumping from the /product page to the /checkout page. So if there is a high probability of the user "transitioning" from this product page to the checkout page, Guess.js can start loading that Stripe payment JS bundle in the background.

Data-driven predictions

So how do we get those sequences? Where is the data?

The brilliant thing about Guess.js is that it grabs its data from Google Analytics to better train the RNN model and perform data-driven route-level JavaScript parsing and code splitting optimizations. That's how it predicts which JavaScript bundle it should load next and when.

Here comes the cool part. Google Analytics has been widely used by many websites for years and it provides exactly the kind of data we need to feed Guess.js.

This combination of Google Analytics + Guess.js automagically figures out the best way to bundle your application and optimize its resources for better performance.

It even goes so much deeper that it can predict the next piece of content (article, product, video) a user is likely to want to view and adjust or filter the user experience to account for this. It also can predict the types of widgets an individual user is likely to interact with more and use this data to tailor a more custom experience.

I honestly think this is a breakthrough in Machine Learning empowering Customer Experience and Web Performance. It is basically instant page transitions.

Discussion

pic
Editor guide