Obsidian 4.0.0 Technical Brief

Server-side cache invalidation added to Deno’s first GraphQL caching solution

Yogi Paturu
JavaScript in Plain English

--

Introduction

Obsidian is Deno’s first GraphQL caching solution, providing client-side caching for React components using a browser cache and server-side caching for Oak Routers using a Redis cache.

Performant Caching in Mind

Since GraphQL responses can be arbitrarily large and nested, there is potential for these API calls to be expensive. Introducing a GraphQL caching strategy to the system’s architecture provides a cheaper alternative to making a GraphQL call: a cache check.

Request Time = Cache Check (Hit Rate) + Service Call Time (1 — Hit Rate) + Cache Write (1- Hit Rate)

When there is a cache hit, the request time is equivalent to the cache check time.

When there is a cache miss, the request time is equivalent to the GraphQL call time and cache write time.

In order to have a performant cache, Obsidian’s priorities are to:
1) Minimize making expensive GraphQL service calls
2) Maximize the cache hit rate

Caching Trade-offs

Managing both cache space and cache consistency is the most challenging aspect of caching as prioritizing one over the other has trade-offs. Obsidian 4.0.0 introduces a robust cache invalidation strategy and a complimentary cache eviction strategy for server-side caching.

Cache invalidation

Suppose a query for all sci-fi movies in a database is made, is not in cache, and is added to cache. A subsequent mutation (i.e., create, update, or delete) to one of those sci-fi movies makes the entire cached response stale.

Mutations introduce a challenge: if the cache is static, not only is it impossible to know what is stale but it is entirely possible that everything in cache becomes stale if the right mutations happen.

Cache eviction

When cache is out of memory, a decision must be made on what is evicted to make space. The two strategies explored were Least Recently Used and Least Frequently Used policies. While the implementation of both is trivial with built-in Redis functions and Obsidian Router is unopinionated in which strategy the user chooses, a decision to default to one was made.

Obsidian 4.0.0: Implementation and Rationale

There are only two hard things in Computer Science: cache invalidation and naming things — Phil Karlton

The priorities for cache invalidation and eviction were to first minimize making expensive service calls and to second maximize the hit rate. Engineering decisions were made through the lens of these priorities.

Cache invalidation

Rather than storing static GraphQL responses, they are transformed into references before caching. When read, the piece of cache is detransformed and then returned. If the cache write and cache read is made more expensive, what is the compensatory trade-off?

This allows users to avoid making expensive GraphQL calls when mutations occur. If there is a mutation to only one element of a stored response, there is no need to invalidate the entire response. Instead, only that one element is invalidated. Since responses are transformed into references, the detransformation process returns the accurate response.

In order to make the GraphQL response caching dynamic, a normalized cache was implemented. GraphQL responses are normalized and each component is added into the cache with a reference name. These references replace corresponding values in the GraphQL response before caching.

Mutation queries and their responses are never cached. They are, however, run. A query’s GraphQL Abstract Syntax Tree is checked to identify if it is a mutation or not. Furthermore, if the mutation’s response is found in the cache, Obsidian deduces that it is a delete mutation and deletes that reference from the cache. In all other cases, add or update mutations, the reference is written to cache.

Cache eviction

LFU is a great candidate for most eviction strategies. It ensures that pieces of cache with a higher likelihood of a hit are kept and evicts those that have a higher likelihood of a miss. Because LFU optimizes for hit rate and its ease of implementation in Redis, LFU is the base case. The reason Obsidian 4.0.0 decided to default to LRU instead is that it better complements the new cache invalidation strategy.

Since the references will be read more frequently than the GraphQL responses themselves, LRU is the default to prevent them from being evicted as they will be less frequently used than their reference components. While the user has the ability to change this default setting, it is strongly encouraged that it remains set to LRU.

Restructured server-side caching logic

Figure 1: Overall server-side cache logic flow diagram

There are five scenarios:
1) Query is cached, and no references were evicted or deleted
2) Query is cached, and at least one reference was evicted or deleted
3) Query is not cached, and it’s a read query
4) Query is not cached, and it’s a delete mutation
5) Query is not cached, and it’s a create or update mutation

The best-case scenario is one in which the query is cached and no references were evicted. This saves the user from making a GraphQL call and replaces the operation with a cheap cache read. All other scenarios require a GraphQL call.

Figure 2: Redesigned function flow of server-side caching logic which incorporates cache invalidation

The server-side caching logic was redesigned to incorporate cache invalidation. Initially, the design involved two independent paths: one for cache hit and one for a cache miss. After further consideration, it became evident that an evicted or deleted reference would require the query to be run. Since the alternative would be to return an incomplete response, this trade-off to make the GraphQL call was worthwhile.

In the case of a cache hit and all references are found in the cache, the detrasformation process dynamically rebuilds the expected GraphQL response from cached references.

In the case of a cache hit and at least one reference is not found in the cache (i.e., it was evicted or deleted), the GraphQL service call is made, its response is attached to Obsidian’s response body, and the reference values are rewritten to cache.

In the event of a cache miss, the GraphQL service call is made and its response is attached to the Obsidian’s response body. Subsequent operations are performed for caching. The GraphQL response is normalized, and separate operations are performed for a read query, delete mutation, and all else.

In the case of a read query, the GraphQL response is transformed into an object of reference and cached with its query string as the Redis hash. Furthermore, each reference object is cached with its reference string as the Redis hash.

In the case of a delete mutation, the references are abstracted from the GraphQL response and invalidated from the cache through deletion.

In all other cases — indiscriminate of creating mutation or updating mutation — each reference object is cached with its reference string as the Redis hash. If the reference already exists, it is overwritten with the updated value.

Normalization, Transformation, and Detransformation

To illustrate, the following GraphQL response will go through the core functionalities to prepare for caching.

Figure 3: GraphQL response to be processed

The backbone of the new server-side caching logic is the recursive normalization algorithm. An arbitrarily nested GraphQL response is flattened into an object of unique reference string keys and their corresponding objects in linear time.

These reference strings are created by unique identifiers that default to an array of “id” and “__typename”. An object is considered hashable if it contains all the unique identifiers as properties. This is cut and added to the normalized object up to the point of any nesting. Nesting is traversed through recursive calls to the function.

Figure 4: Normalized GraphQL response with references and corresponding values

In the transformation process, objects are replaced with references. Nesting is traversed through recursive calls. Each object is checked if it is hashable or not in the same way it is determined in the normalization algorithm. If hashable, it is replaced with its reference. Detransformation undoes this process.

Figure 5: Transformed dynamic GraphQL response object

Conditional Cache Invalidation Techniques

GraphQL query strings are converted into Abstract Syntax Trees to check if the operation is of mutation type. If so, the mutation is called and the GraphQL response is normalized. The kind of mutation is inferred by reading references from Redis.

If Redis returns a null value, a created mutation is inferred. The new reference is written to cache.

If the Redis value and the GraphQL response are deeply equal to each other, a delete mutation is inferred. The reference is deleted from the cache.

If the equality is false, an update mutation is inferred. The reference is overwritten with the updated value.

Test-Driven Development

To ensure reliability, a test-driven development approach was taken. Each of the major pieces of the functionality described above (normalization, transformation, detransformation, and cache invalidation) had to pass predefined tests in order to be merged and released.

Considerations for Future Features and Improvements

Horizontally Scaling the Cache by Sharding

The rationale for choosing the LRU eviction strategy was to prevent valuable GraphQL responses from being evicted. Having a sharded cache in which one shard holds GraphQL responses and one shard holds references would be a major cache eviction improvement. This would not only allow each respective cache to have its own eviction strategy but would prevent the two categories of cache to interfere with each other. In fact, an LFU cache would optimize for the hit rate for the respective shard while being able to make comparisons between more comparable pieces of information.

Invalidate GraphQL Responses in Cache: Avoid Under-responding After Add Mutations

There is currently no way to invalidate GraphQL responses. Only references are invalidated. If there is a query to read from the database followed by a mutation that adds, it is possible that the same query to read would respond from cache with incomplete data. The original query to read will not include this recently added reference in the transformed cached value. Although a GraphQL call is avoided, Obsidian will under-respond.

Invalidate GraphQL Responses in Cache: Avoid Over-responding

Although over-responding is not as problematic as under-responding, it is worth noting. If a query is cached and a subsequent query is made requesting fewer fields from the first, the transformation process tricks Obsidian into thinking that the two responses should be the same. This is because there is no way for Obsidian to differentiate a “~7~Movie” reference that includes a “Release year” property and one that does not include it. What it always cached is the one with more fields.

While invalidating responses can resolve this problem, other methods should be first explored before invalidating expensive GraphQL responses from the cache. It may be better for the user to simply be aware of this nuance.

Parallelize Obsidian’s Response and Writing to Cache

GraphQL’s response is attached to Obsidian’s response when it is received. Caching is performed afterward. Because they are not dependent on each other, these paths can be parallelized. Doing so would introduce performance benefits.

Selectively Query for Evicted or Deleted References

If any references are missing in Redis due to eviction or deletion, the whole GraphQL call is made. Instead, a less expensive call for just the missing reference would also introduce performance benefits.

If there is more than one key not found in redis it would be great to group queries that return the response we’re looking for and send it as one query to reduce the number of network requests we’re making to GraphQL.

Automated End-to-end Testing

While end-to-end tests were performed, automating those tests would improve Obsidian’s internal development experience. At the minimum, one test for each of the five scenarios would be beneficial, as these tests need to be done in the development team to ensure Obsidian is behaving as expected.

Additional tests where different combinations of queries and mutations are made would also improve future lead times.

Getting Started

Try out Obsidian’s exciting new features with our easy-to-use demo. Read our documentation to start incorporating Obsidian in your own app. Here is our complimentary launch article.

Obsidian is an open-source product produced under the tech accelerator Open Source Lab. We welcome contributions and feedback via GitHub.

Other articles on Obsidian:

Obsidian 1.0
Obsidian 2.0
Obsidian 3.0
Obsidian 3.1
Obsidian 3.2
Obsidian 4.0
Obsidian 4.0 Technical Brief

Co-authored by Team Obsidian:
Sardor Akhmedov| GitHub
Michael Chin| GitHub | LinkedIn
Dana Flury| GitHub
Yogi Paturu| GitHub | LinkedIn

More content at plainenglish.io. Sign up for our free weekly newsletter. Get exclusive access to writing opportunities and advice in our community Discord.

--

--