Cache Invalidation is Hard

Felipe,Fri Sep 17 2021•5 min read

My first blog post ever

Ever since I came across Blitz.js (opens in a new tab), I've been thinking about rewriting Contatempo (opens in a new tab) using it. I've also been thinking about creating a blog for a long time (I guess it was my 2018's new year resolution). Now that I'm learning Blitz I figured it would be a great motivation to document my learings and thoughts on a blog. So, hopefully, this will be the first of many posts to come.

Context

For this very first post, I'll write about my adventures on facing one the two hard things (opens in a new tab) in computer science: cache invalidation. Or so I thought that was my problem (spoiler alert: it wasn't, at least not the root cause). I'm sure that nowadays the problem I faced shouldn't be as hard as it was back then when Phil Karlton coined the phrase, but it took me 3 days (after spending more than a week not even thinking about it) to find a solution.

So, one of the first things I'm working on the rewrite is the feature that shows the total time elapsed on the records (a record has a begin date and an end date), including the time elapsed for the ongoing record. The total elapsed time is then composed of two parts: the first one being the elapsed time of the completed records, for which I created a getTotalTime query, and the second one being the elapsed time from the begin date of the ongoing record until now, for which I used the getRecords query that returns the list of records to be displayed and then filtered the ongoing record. The total is the sum of the two parts. (And now I realize that maybe getTotalTime was not the best name, as it not actually the total total time)

The problem

For now the creation (or the start) and the finishing (or the end) are set by a single Start/Stop button. When it is first click (displaying the "Start" label) a new record is created and when it is clicked again (now displaying the "Stop" label) that same record is updated with and end date. When any on those actions are performed, I want to use Blitz's invalidateQuery (opens in a new tab) on both the getRecords and getTotalTime queries, so both the list of records and the total elapsed time are updated. This worked, but as I went testing I noticed that sometimes (actually it was every time) the total elapsed time blinked right after a record was stopped! A console.log showed that it was actually being briefly rendered with an unexpected value. After several hours I found an explanation for this unexpected value (or should I say values): sometimes the total elapsed time would include the time of the just ended record twice, showing a higher value than expected, and sometimes it would not include the time of the just ended record, showing a lower value than expected. But why?

Why it occurred

After another several hours, on the next day, I finally realized what was happening: when the elapsed time showed a higher value, it was because the cache for the getTotalTime query was invalidated first, then triggering a refetch for that query with the just finished record accounted for, but the cache for the getRecords wasn't invalidated yet, making the same just finished record time still be included and therefore be counted twice. Similarly, when the elapsed time showed a lower value it was because the cache for the getRecords query was invalidated first, triggering a refetch including the just finished record on the list of records, which would make the second part of the total elapsed time (the part which counted the elapsed time from the ongoing record) be zero, but the getTotalTime query would still not include this just finished record.

The cause

The problem then was that I needed a way to invalidate both caches at the same time so this brief inconsistent state could disappear. I went through a lot the documentation of both Blitz and react-query (which is what Blitz uses under the hood) and, with the help of the Blitz community on discord, I came across queryClient.invalidateQueries (opens in a new tab). With that I could invalidate both queries in a single call. I replaced both invalidateQuery calls with the single queryClient.invalidateQueries and did a test. And then, just like that, the problem was still happening! Why??

The real cause

A day later I had a new theory: it doesn't matter if I invalidate both queries at the same time! Maybe React was already doing it for me even when I was calling invalidateQuery for each query. The problem really was that, after both queries were invalidated, both were refetched on the background, independently of one another, and each refetch would trigger a new render. This new theory made me realize that the problem was that I had a piece of information (the total elapsed time) that was dependent on two separate, indepent queries. And it even made me notice a bug I could take a while to notice: the getRecords query was paginated, and it may not always return the ongoing record, which would result in the total elapsed time being calculated wrong.

The solution

So, after some consideration I came to the conclusion that, instead of trying to change the behaviour of the framework and library at play, the best way to fix this problem was to rethink my queries, and make them return all the information needed for what they were created. So what I finally did was to change the getTotalTime query to also return the ongoing record, so the total elapsed time would need only this query to be calculated, a simple solution (opens in a new tab) that made much more sense.

Oh, I also renamed getTotalTime to getFinishedRecordsTimeAndOngoingRecord (opens in a new tab). Yes, naming things really is hard 😅.