DEV Community

Cover image for Scheduling Derivations in Reactivity
Ryan Carniato for This is Learning

Posted on • Edited on

Scheduling Derivations in Reactivity

Most developers think about Reactivity as an event system. You have some state. You update that state and things derived from it re-evaluate. Ultimately that change is reflected as a side effect.



let name = state("John");
const upperName = memo(() => name.toUpperCase());

effect(() => console.log(upperName));


Enter fullscreen mode Exit fullscreen mode

We will be using pseudocode not to cater to the syntax of a specific library or framework.

But this is an oversimplification. As we learned in the previous article though there are multiple ways this change can propagate through the system, "Push", "Pull", or even "Push-Pull":

While we tend to keep a simpler "Push" model in our heads as we talk about Reactivity, almost no modern framework uses a purely "Push" system. It is incapable of providing the guarantees we've come to expect.

Once you leave purely "Push" events, scheduling becomes a necessary part of the solution. If work isn't going to happen immediately it will need to happen later. What gets scheduled and when it runs has consequences.


Immediate vs Lazy vs Scheduled

Image description

On the creation of something reactive, we have 3 choices when we evaluate it.

First of all, we could just run it immediately. An effect that creates other effects we may want to execute depth-first rather than breadth-first. We might want to evaluate the tree in one pass. This isn't that uncommon when rendering.

We might want to lazily defer evaluating it until we know the value will be read itself. Maybe we have a derived value that is never going to be read. Maybe it calculates something expensive that is only used if some other state in the UI changes. So why evaluate it if we won't be using it right away or ever?

Finally, we might want to schedule the node to run later. We want to make sure all the intermediates are sorted before running it. Maybe it is an effect, so it isn't read itself. You can only lazily evaluate nodes that can be read. Instead, we add it to a queue to execute later.

Upon an update, we have similar options. We don't run things immediately outside of "Push" but we can similarly choose whether to schedule the node or rely on it being read to be evaluated.

At first glance, it might appear obvious that we should lazily defer what we can and schedule what we need to. Otherwise, we could schedule unnecessary work. Derived state is a prime candidate for lazy evaluations because it must be read to be used. But are there any other considerations when determining what to schedule?


Reactive Ownership

Image description

It is useful to understand another benefit of lazy evaluation other than reducing the risk of unnecessary work. Lazy derivations can be automatically garbage collected.

In reactive systems, like Signals, that follow the observer pattern there have historically been concerns around memory leaks. That is because usually, the implementation of subscribers and dependencies links both directions. When a reactive expression runs, it subscribes to the source signals and adds them to its dependencies. The reason for both directions is that signals upon update need to notify their dependent nodes, and those impacted nodes need to reset their dependencies for all nodes they access. In this way, dependencies are dynamic with each execution.

But it also means that losing reference to one of these nodes is insufficient for garbage collection. If you have a Signal and an Effect, just because you no longer have use for the Effect, the Signal will still have a reference and the Effect to it. If they both are no longer referenced they may be able to be disposed of, but it is not uncommon for state to outlive its side effects.

Generally, effects require manual disposal. However, derived state could release itself if no one reads from it. If something were to read it in the future it could re-run at that time and build its dependencies, in the same way that when first created it doesn't need to run until read.

Scheduling derived state instead means that the nodes and dependencies are always created eagerly regardless of whether it is read. In such a system we don't know at the time of scheduling whether a derived value will be read and thus it gets evaluated and dependencies created regardless. In so it is much more challenging to have it automatically dispose.

Creating UIs with systems that require manual disposal is cumbersome. Most external state libraries are concerned only with state and derived state and leave effects to the render library. So it has been beneficial that neither requires explicit disposal.

But what if there is no rendering library?

This is why S.js pioneered the Reactive Ownership model that has become a staple in Fine-Grained Renderers like SolidJS. If manually disposable nodes are created under a parent reactive context, then upon the parent re-executing, as with its dependencies, we dispose of those child nodes.

This is a secondary graph to the reactive dependency graph, but it links our Effects and other scheduled nodes together so all disposal can be automated. This is also the mechanism that powers things like the Context API and enables the grouping of boundaries for Errors or Suspense. It is a tree not unlike a VDOM, but it contains fewer nodes. Its nodes are decided by dynamic decisions(conditionals) rather than the number of elements and Components.

Still in either case, scheduling dictates what can live comfortably within and outside of the tree given its impact on how nodes can be disposed of.


A Phased Approach

Image description

Should code run at a predictable time? With reactivity, we have the means to model all sorts of systems and aren't limited to the normal sense of time and progression. One line doesn't need to run after the other. But developers are only human, and when things occur can be of consequence.

You can't take back side effects. Once you are committed to displaying something you have to show it all or it is inconsistent. If something errors you need to block out everything related. It's why there are concepts like Error Boundaries and Suspense. And it is why we tend to schedule when things run with purpose.


React's Three Phases

React has popularized a model with 3 phases of execution.

  1. Pure - User Code executes (components, calculations)
  2. Render - VDOM is diffed and DOM is patched
  3. Post-Render - User Effects execute

I am using this naming as React has taken the term "render" in a way that is inconsistent with how other frameworks work. I use "render" to mean update the DOM, not to run component code.

As a developer, all your code is executed during the Pure phase except the effects, which are executed Post-Render.

Image description

That includes dependency arrays. React's model is aware of all dependencies for the updates before it runs any internal or external side effects. This ability to bail out of an update cycle until ready to commit is what powers things like concurrency. Some code can always throw a Promise without impacting what is currently on the screen.

This model works well in React's "Pull" reactivity where Components are re-run repeatedly. Every time they run you can expect the same behavior as the code executes on whole to completion.


Phases with Granular Rendering

With "Push-Pull" one can also use a system like above, but then you wouldn't fully leverage its ability to "Push" more granular updates out. However, there are other ways to accomplish similar phased execution.

But first, we should recognize that left alone, lazily-evaluated derived values will execute when the earliest type of effect that reads them runs. If you were to introduce a renderEffect that runs before user defined effects that is when the corresponding derived values would run.

Image description

Changing where the reactive expression or derived value is read can change the timing between being run before or after render. Incidentally adding it to a new phase by dependency could change the current behavior of otherwise unrelated code.

When I first created SolidJS 8 years ago I wasn't too concerned with this lazy behavior. We scheduled all computed nodes, both Derived and Effects. While it was true extra work could happen, a lot of state in Components is hierarchical so if things are unused they tend to be unmounted. But scheduling meant we could get this behavior:

Image description

Subtle difference from above, but it meant all our Pure calculations happened before our Effects.

But there is one difference. getFirstLetter runs during Post-Render. Any dependency that occurs for the first time during an effect that isn't scheduled happens too late to be discovered before any effects run. Since our Async primitives are also scheduled nodes this has very little consequence but it is a small but understandable discrepancy.

Solid like React has 3 defined phases. This is probably why Solid is the only Signals-based framework to support Concurrent rendering. You may be aware that unlike Solid almost all newer Signals libraries lazily derive state. And we've been looking at doing the same in the next major version.

But giving up the benefits of the Phased approach isn't an acceptable tradeoff. So let's explore an alternative.


Rethinking Dependencies

Well, what works for "Pull" works for "Push-Pull".

Image description

Probably the last thing anyone wants to see is the return of "Dependency Arrays". But if effects were split between the pure tracking part and the effectful part, all user code except the effect itself could happen during the Pure phase before any rendering.

Similar to above:

  1. Pure - Run all tracking contexts: front half of renderEffects and effects, reading (and maybe evaluating) all derived values.
  2. Render - Run the back half of renderEffects
  3. Post-Render - Run the back half of effects

This still differs from dependency arrays in that components don't re-run and they can be dynamic, reading different dependencies on every run. No Hook rules. But if one wants to have Lazy Derived values and still ensure the Phases are followed to enable consistent scheduling this is how you could do it.


Deriving Async

The other reason to think about scheduling is Async. Most reactive systems are synchronous. Async works outside of the system. You create an effect and it updates your state when it is ready.



let userId = state(1);
let user = state();
effect(() => {
  fetchUser(userId).then(value => user = value);
});


Enter fullscreen mode Exit fullscreen mode

But like with synchronous synchronization we lose the information that user depends on userId. If we could represent asynchronous updates as a derivation then we could know exactly what depends on it.



let userId = state(1);
const user = asyncMemo(() => fetchUser(userId));


Enter fullscreen mode Exit fullscreen mode

And this doesn't just apply to direct dependencies but anything downstream:



let userId = state(1);
const user = asyncMemo(() => fetchUser(userId));

const upperName = memo(() => user.firstName.toUpperCase());


Enter fullscreen mode Exit fullscreen mode

upperName depends on user which depends on userId and it could possibly be async.

This is useful information if you want to implement systems like Suspense. We need to be able to trigger Suspense when userId is updated. So we need to know that it is a dependency of async operation. Also, it is better to suspend closest to where the data is ultimately used rather than immediately where the first node derives from it. We want to suspend when reading upperName not where upperName is defined. You want to be free to fetch your data higher in the tree to use in more places below than block rendering of the whole tree below that point.


Should Async Be Lazy or Scheduled?



let userId = state(1);
const user = asyncMemo(() => fetchUser(userId));

const upperName = memo(() => user.firstName.toUpperCase());


Enter fullscreen mode Exit fullscreen mode

What happens if fetchUser hasn't resolved by the time upperName evaluates?

user is undefined initially. You might expect a "Cannot find property 'firstName' on undefined" error.

We can solve this. You can provide default values. But not everything wants to have a default value and with deep nested data you might have to mock more than you desire.

You can null check everywhere. This is fine. But it does mean a lot of code for checking if values exist dispersed around your app. It often leads you to check higher up in the tree than desired to avoid making additional checks.

Or you can throw a special error and re-run it when the value resolves. React has pioneered the approach of throwing Promises in this scenario. It's nice as you don't need to null check or provide a default value and you can trust that everything will be there when it finally commits.

But an old problem resurfaces:



const A = asyncState(() => fetchA(depA));
const B = asyncState(() => fetchB(depB));

const C = memo(() => A + B)


Enter fullscreen mode Exit fullscreen mode

If you go with throwing or some other type of conditional short-circuiting, and derived values are lazy, you, my friend, have accidentally created a waterfall. When we read C it will begin by evaluating A. It can start fetching A but it will throw as it hasn't resolved. B won't be read until it re-runs again after A has resolved. Only at that point will it start fetching B.

However, if scheduled A and B will start fetching regardless of whether C is read. This means even if A throws, B may be finished fetching by the time A resolves as everything is fetched in parallel.

In general Async values probably should be scheduled. While I could see it being powerful to lazily resolve Async by using the path through the code to determine what gets fetched it doesn't take much to cause performance issues. Waterfalls are very easy to create in systems that use throwing to manage unresolved async, so using scheduling and our knowledge of the reactive graph is one way to avoid that.


Conclusion

Image description

I hope through this exploration you can see that scheduling plays a big part in Reactive systems. And that "Push-Pull" is a "Pull" system built inside a "Push" one. Lazily Derived State has many consequences that you don't find in systems that schedule everything or ones that are purely "Pull". Even when trying to optimize for laziness there are still several things that should be scheduled.

However, if carefully constructed "Push-Pull" is incredibly powerful in that it adds another dimension to typical "Pull" Reactivity. One can get all the consistency and predictability benefits while being able to apply them more granularly.

This is still an open area of research. Along with work towards Solid 2.0, I am thinking about this more because of progress on TC-39's Signals Proposal and the wider community asking that scheduling be built into the browser and DOM APIs. There is still a lot we don't understand or agree upon here so approaching this prematurely could be disastrous.

Next time we will look deeper into the nature of Asynchronous reactivity. Beyond scheduling Async poses an interesting challenge to what it means to be reactive.

Top comments (6)

Collapse
 
mfp22 profile image
Mike Pearson

Beyond scheduling Async poses an interesting challenge to what it means to be reactive.

I tried implementing 28+ RxJS operators on top of React hooks and realized that the compositional API of operators is actually necessary because it is the only way to keep downstream behavior totally separate from upstream behavior. Otherwise there were weird data structures that got mixed into other behaviors. Like with concatMap you have a queue that you add to when a dependency changes, then you remove from the queue when the reactive operation finishes. I can't remember the details, but I found that sometimes other operations had to become aware of these kinds of queues unless there were higher-order operators that could transform a reactive pipeline into an entirely new one - basically like RxJS. I might not be making sense.

As far as the meaning of reactivity, I believe it's 100% about code structure. People get confused sometimes because the function inside effects "reacts" to its dependencies - but what it does with those values is imperative, not reactive. To me, effects are necessary because we haven't found a way to express all values/side-effects reactively yet, and maybe someday 99% of use cases can have reactive implementations just like HTML-like markup so we almost never need to write effect functions ourselves.

Collapse
 
ryansolid profile image
Ryan Carniato

I think we are saying similar things in terms of async transformations. There is a reason RxJS has operators. While some amount of change could be just applied as an imperative looking transformation.. a "map" or "reduce" operation, combinations of these especially any dependent on time is way more difficult to model. I don't think it is the job of Signals to worry about this sort of thing outside of making sure the state can be handled in a predictable manner.

The reason for effects is for things outside of the system. JSX/templating hides the effects we apply to the DOM. Unfortunately there isn't always a great way to get things back in. So effects are often used to read the DOM for cases where there aren't events, or when synchronizing with external systems. I'm not sure the latter ever goes away, and the former might be hard to handle every case generically. If you apply a reactive change that updates the layout of the DOM you might need to read the DOM and write to state to react to it.

Collapse
 
mfp22 profile image
Mike Pearson

Sometimes these complex requirements sneak in later. I'm looking for syntax that can smoothly adapt to higher complexity. I personally wouldn't want to build async behavior on top of signals without a plan to handle more complex async behavior reactively if necessary, but that can also mean interop with RxJS, which SolidJS and Angular mostly have.

Idk, I feel like I could figure out how to do most side-effects reactively. localStorage is easy. I recently figured out route navigations I think. Sending data to the server is interesting because most SPAs aren't responsible for what happens on the server - they just send requests and wait for a response. But there should be a way to do everything declaratively imo.

Yeah actually an example of this is CycleJS. It has this concept of sinks and that's where all side-effects are handled. And I don't think prop drilling is necessary for this though. AndrΓ© Staltz really cared about pure functions when he designed it, but I don't care that much. As long as the cause of each effect could be found in a single place with "click to definition", that's reactive/declarative.

Collapse
 
mfp22 profile image
Mike Pearson

I hope there's something simpler than RxJS for handling async stuff, but it's common in these situations to also want to add a debounce, or a throttle, or a buffer, etc... The automatic cancelation/resetting that observables have is also required for full reactivity so imperative cleanup and re-initialization isn't required.

Collapse
 
ryansolid profile image
Ryan Carniato

My interest in async is still integration. What I mean is that I'm concerned with synchronization and less with data flow. So nothing I'm looking at replaces RxJS for transformation. I care that if there are observables or promises in your code your UI updates as expected. So if you wanted to model debounce or throttle as a series of promises that's cool, although I suspect observables might be better.

It's also possible what you want isn't actually throttling or debouncing. I remember a Jay Phelps talk where he showed that for search the best experience wasn't to throttle the input but to manage the handling of the data back. As you typed in new characters it always fired the request immediately but it was selective when to show the results. That's what this solution helps with. It automatically removes race conditions.

Collapse
 
mfp22 profile image
Mike Pearson

Yeah the default behavior of TanStack Query has been very good for most situations.

My opinions have been formed by having to implement a few complex real-time features at multiple jobs. Firebase fired an event for every existing message in a thread and I had to use buffer in that case. At another place there was a dashboard editor with real-time values everywhere, and RxJS operators made it manageable. TanStack Query's approach definitely wouldn't have worked in these cases. And since I have to know RxJS for stuff like that, I like it as a general solution in case advanced behavior becomes necessary.

My experience is also biased, at least after the first job, because I got good at RxJS to the point of enjoying it immensely, so I started to look for that in future jobs. This effect is probably why web developers talk over each other so much. Because the filtering also happens in the other direction, and pretty much for everything we can label and consciously identify and look for.