Reactive Web Dashboards with Shiny Transcripts
Chapter: Reactivity
Lecture: Reactive calc
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
So far when we've been using reactivity, we've been using what I'd call shallow
0:05
reactive graphs, where each of the inputs are being directly consumed by an output.
0:09
So here in this application we're looking at, we have these two inputs and
0:13
each of them are being consumed by the rendering function of these outputs. This is great, but it doesn't give us a very nuanced way of building our
0:22
application. And what we actually want is something with more depth. We want to be
0:25
able to maybe store intermediate values and have them kind of re-execute intelligently across those values. And there's something called reactive
0:33
calculations which does just that. So let's take an example here to sort of build up our intuition about what we want. So this is a second tab of this
0:42
application. This would be like if we had deployed our model and we are taking a
0:47
sample of the data from production, kind of a query to see like, okay how is our
0:50
model doing in production. And we have three inputs here. We have the same
0:55
account input that we had before, but we also have this dates input which lets
0:59
us select a time frame from the database. And we have a sample size that we're
1:04
able to make a change to sample. You know, I want to sample 20,000 records or
1:08
25,000 records from the database. And what I want here is a few things. I want this sample to be stable. So this dates and sample size, I want it to be
1:20
stable and I also want it to run not that often, right. This might be an expensive database query or something where, you know, I don't want to have it
1:27
rerun. So I want to like pull all the account information within this
1:33
sample and have that same sample be used by both of these plots. And then I want to
1:38
be able to do these filters in memory. So I want to pull the sample for all of the accounts into memory, into the memory of my application. And I
1:46
want to be able to make changes to these accounts without retaking the sample, without checking the sample. And lastly, whenever I make a change here
1:55
from sample size to sample size, I want this to be a fresh sample each time. So I
2:00
want the sample to be retaken whenever I change the dates or the sample size, but not when I change the account. So let's just write this down, what I
2:08
actually want. So I want to query the database for a sample between dates and
2:12
I want to filter that account by the account name in memory. And lastly, and
2:17
then I want to use that same data for both of the plotting functions. Okay, so that's the basic like what I want the application to do. But I'm really
2:24
demanding, I have some more things. I'd like to cache the results of one and two.
2:28
What caching means is that I want this to, in certain circumstances, keep the copy of the data in memory somewhere and return the cached value
2:39
instead of recalculating it. So when I change the account name, I want to return
2:43
the cached value of the sample instead of taking a new sample each time. But I also want to invalidate the cache whenever the upstream inputs change.
2:53
So when the sample size gets changed or the dates gets changed, I want to
2:57
invalidate the cached sample result. And lastly, I don't want to do any thinking or work. So this is kind of the demanding thing that I think Shiny
3:05
does that the other application frameworks don't do so well. So these last three. So I want this sort of caching and validation without actually
3:14
requiring me, the software engineer, to do much work and thinking about how that caching should happen is where Shiny really shines.
3:23
So reactive calculations are the tool that we use to do this. They're defined
3:27
with a new decorator, a reactive calc decorator, and they cache their values so it's cheap to call them repeatedly. And we get perfect cache
3:35
invalidation through reactivity. Since it takes place within that reactive graph, we know exactly when we should discard the cached value whenever
3:44
its upstream elements change. And when that cache value is discarded, we know that we need to notify the downstream elements to cause them to
3:53
recalculate. This is how you would actually, the code that you would use to do this. So we have a reactive calc decorator on sample data. We have
4:03
another reactive calc decorator on filter data. So this is the one that takes the sample from the database. This one filters that sample data in
4:10
memory. And then finally it's sent to the plotting function, the plotting renderer, at the end of the day. Reactive calcs are defined like
4:19
functions. They are functions. And they're called like functions. So we define sample data as a function. And then when we call it to get its value,
4:27
we use the filter data, just like you would with an input. And for the purpose of this, adding these to the graph, I'm going to use hexagons for
4:35
reactive calculations. And they'll be blue when it's retrieved from a cache and orange when it's recalculating. Taking a look at again our
4:43
application, we have three inputs, account, dates, and sample size. And we have two outputs, the scores plot and the, what am I calling it, API
4:53
response time plot. And I have these two hidden reactive calculations, the sample that stores the data that's pulled from the database, and the
5:05
filtered data which stores the in-memory data which has been filtered by account. Again, when Shiny starts up, it doesn't know the relationship
5:12
between any of these things, although I've kind of drawn them so you can guess what the relationship is going to be. We first start with calculating
5:19
the model scores. So we calculate the model scores and we go and get the filtered reactive calculation. We try to calculate that reactive
5:27
calculation. We discover we need this other one. We need the account information and we also need the sample. Go and calculate sample and we
5:34
need to get those inputs, date, and sample size. So right there we have, sort of, in the same way we auto-detected the graph for the simpler
5:42
application, we've auto-detected this kind of more complicated or an application with more depth. When we calculate API response though, since
5:51
filtered hasn't changed, like none of filtered's inputs have changed, it's still a valid reactive calculation, we don't recalculate it. We retrieve the
6:01
value from cache. This is always going to happen in this case because API response is going to calculate after model scores, so it's always going to
6:08
get its value from cache. When the account changes, so if I go up here and I change the account, but I don't change the sample size, I'm changing this
6:19
back and forth, what happens? Well, we do the same algorithm that we did before. So we have the account, we follow the arrows down, and we
6:27
invalidate its immediate descendants. But whenever we invalidate the descendant, we're going to also invalidate its descendants. So we're
6:35
going to validate, so account changes, filter invalidates, the two plotting functions invalidate, and then once they're done, we recalculate. So model
6:43
scores calculates, then we go get filtered, go get the account, and we get the sample. But since samples, its precedence didn't change, its
6:53
dependencies didn't change, we can get the value from cache. So we're not taking a new sample every time the account changes. And then API response
7:00
gets the filtered reactive calc from cache. When sample size changes, the whole plot invalidates, or the whole graph invalidates. So sample
7:10
invalidates, filtered invalidates, the two plots invalidate, and we're back to the beginning. Calculate model scores, get the reactive calc, get its
7:18
dependencies, and then we get the other inputs, and API response fires, and we get the filtered value. So right there we have this reactive graph with
7:30
more depth. We're able to store these intermediary values as reactive calculations and reuse them in multiple places. And because all of those
7:39
calculations exist in this reactive graph, we can use that graph to do calculate, to do cache invalidation. We know exactly when those need to be
7:47
recalculated, and we know what should happen when they recalculate. So they recalculate whenever its upstream inputs change, and when it recalculates,
7:57
it notifies its downstream, the things that depend on it, that they also need
8:01
to recalculate. We did this without the user really necessarily even needing to know that this is happening. But what this lets us do is build up
8:11
applications with a lot more complexity while maintaining the performance of
8:15
those applications. So you, the engineer, don't need to necessarily think about
8:20
how the application is re-rendering all the time. All you need to do is make sure that whenever you have repeated calculations, you're storing them in
8:27
reactive calculations and using them in downstream elements.