How we improved third-party availability and latency with Nginx in Kubernetes

Introducing a gateway to cache your third-party API can significantly improve its performance and stability. Discover how we configured Nginx in a Kubernetes environment for this usage.

Grégoire Deveaux

Published in

Back Market Blog

8 min readSep 20, 2022

Nginx as a gateway to cache third-party API accesses

Third-party dependencies

Tech companies rely more and more on third-party services to handle parts of their application stack. Relying on external services is a way to scale faster and let internal developers concentrate on what’s specific to your business. But having part of your software out of your control can lead to availability and latency degradation.

At Back Market, we have externalized part of our product catalog to a third-party: Akeneo. My team needed to ensure that this catalog data could be accessed from inside our Kubernetes clusters (dispatched on multiple continents) reliably fast, independently of our third-party servers' availability and response time.

We introduced a gateway proxy that caches all internal usages of this third-party API. We started with a simple yet powerful implementation for this new service: Nginx.

Nginx as a gateway to cache third-party API accesses in a multi clusters environment

Results

Using a gateway caching the third-party responses works well for our usages. After a few days of populating the cache, only 1% of read requests from internal services needed to wait for the third-party to get their response. This is the distribution of responses cache statuses for requests from services using the gateway over one week:

HIT: valid response found in the cache → cache used
STALE: expired response found in the cache → cache used, calls third-party in background
UPDATING: expired response found in the cache (already updating in the background) → cache used
MISS: no response found in the cache → calls third-party synchronously

Even during 12 hours of downtime of the third-party, the gateway was able to return a response from the cache for 96% of the requests, resulting in almost no impact for end users.

Going through the internal gateway gives responses way faster than calling the third-party API directly (the third-party is located in Europe, calls from the US take even longer):

P90 of requests duration for a cached path in ms (log scale, 1e3 is 1 second)

Let’s see how we configured and deployed Nginx for this usage!

Nginx cache configuration

Below, we provide a simplified extract of the Nginx configuration (see doc, guide and a more comprehensive example configuration):

proxy_cache_path ... max_size=1g inactive=1w;proxy_ignore_headers Cache-Control Expires Set-Cookie;
proxy_cache_valid 1m;
proxy_cache_use_stale error timeout updating
                      http_500 http_502 http_503 http_504;                                     proxy_cache_background_update on;

The goal here is to minimize the dependency on the third-party for read requests (i.e. HTTP GET).

If the response is not in the cache, we need to wait for the third-party response, that’s what we want to avoid as much as possible. This will happen for a given URL if it has never been requested or if its cached response was purged due to expiration after a week (inactive=1w) or if it was part of the least recently used data and the global cache size 1Go limit (max_size=1g) was reached.

If the response is in the cache, the cached response will always be returned directly to the client (proxy_cache_background_update on) even if it’s older than a minute. If the cached response is older than a minute (proxy_cache_valid 1m), a background call to the third party is started to refresh the cache. This implies we’re serving staled cache content, but eventual consistency is acceptable in our context.

This means that we can consider the cache to have a TTL of 1 minute (plus the duration of a background refresh) when the third-party is up and the URLs are frequently used. This is highly sufficient for products data which usually don’t change every day.

Provided we don’t have a global cache size higher than 1Go, it also means that we can serve cached responses as old as 1 week if our third-party is unreachable or in error for a week (hopefully this will be enough!). This can also happen if a given URL is not called for an entire week.

To further minimize the load on the third-party, parallel background refreshes for the same URL are disabled:

proxy_cache_lock on;

The third-party API may return self-referencing absolute links in its responses (e.g. pagination links). These URLs must be rewritten to ensure these links point to the gateway:

sub_filter 'https:\/\/$proxy_host' '$scheme://$http_host';                                                                   sub_filter_last_modified on;                                     sub_filter_once off;                                     sub_filter_types application/json;

Rewriting URLs comes with the cost of disabling gzip responses because sub_filter does not support it (see ticket):

# Required because sub_filter is incompatible with gzip reponses:                                     proxy_set_header Accept-Encoding "";

Going back to our configuration outlined at the beginning of this section, you will note we’ve enabled proxy_cache_background_update. As its name suggests, that flag enables background updates to the cache which seems like a good idea. Unfortunately, it has a limitation.

When a client request triggers a background cache update (due to a STALE cache status), a response from the cache will be returned without waiting for the background update response (thanks to proxy_cache_use_stale updating). But the following requests on the same client connection will be put on hold by Nginx and will have to wait for the background update response to be received before being handled (see ticket)! The following line ensures we get one client connection by request, thus ensuring all requests can receive responses from the staled cache and never end up waiting for the background update to finish:

# Required to ensure no request waits for background cache updates:
keepalive_timeout 0;

The downside is that clients will need to create a new connection for each request. In our case, the cost is far less, and the behavior is far more predictable than having some clients randomly waiting for a cache refresh.

Kubernetes deployment

The above Nginx configuration is packaged with the Nginx unprivileged Docker image and deployed as any other web application to our Kubernetes clusters. The hard-coded values seen in the Nginx configuration extract are injected thanks to the Nginx Docker image environment variables substitution (for more info about this, see “Using environment variables in Nginx configuration” in Nginx Docker image documentation).

The gateway is accessed via a Kubernetes Service and is deployed in each cluster by Deployment and replicated in multiple Pods, which are subject to volatility.

This is problematic for the Nginx cache, which relies on the local filesystem for persistence!

Persistent cache with volatile pods

As you can see in the configuration extract above, we use a very long cache retention with a much shorter cache validity to guarantee fresh data as long as the third-party is available while being able to keep serving older data if the third-party is down or is returning errors.

We needed to ensure we didn’t end up losing our cache and start from scratch on each Kubernetes pod start (due to rollout updates, automatic scaling, or re-assignation of pods to other nodes). That’s why we introduced a persistent storage of our cache, shared between all Nginx instances.

This is done via a synchronization of the pod local cache directory with an S3 bucket. Two containers are deployed on each pod alongside the Nginx container. The containers share a common local volume emptyDir mounted on /mnt/cache. Both containers use the AWS CLI Docker image and rely on our internal Vault for AWS connection credentials.

Persistent cache for volatile Nginx pods in a Kubernetes environment

An init container is started on each pod before the Nginx startup and is responsible for fetching saved cache data from the S3 bucket to the local storage on startup:

aws s3 sync s3://thirdparty-gateway-cache /mnt/cache/complete

A side-car container lives alongside the Nginx container. It is responsible for saving cache data from the local storage to the S3 bucket (simplified extract):

while true
do
  sleep 600
  aws s3 sync /mnt/cache/complete s3://thirdparty-gateway-cache 
done

To avoid uploading partially written cache entries to our bucket, Nginx has a neat use_temp_path option:

proxy_cache_path /mnt/cache/complete ... use_temp_path=on;proxy_temp_path /mnt/cache/tmp;

By default aws s3 sync will never delete anything from the bucket, so we also had to configure the bucket retention policy to prevent it from growing indefinitely (see doc):

<LifecycleConfiguration>
  <Rule>
    <ID>delete-old-entries</ID>
    <Status>Enabled</Status>
    <Expiration>
      <Days>8</Days>
    </Expiration>
  </Rule>
</LifecycleConfiguration>

Limitations

This solution is a good fit when eventual consistency is fine and when your traffic is read-intensive. It doesn’t add value to rarely hit endpoints, neither to write requests (POST, DELETE, …).

Due to the pure proxy approach, this won’t allow you to introduce any abstraction or customization on top of your third-party. As a matter of fact, our team will introduce a new service for this purpose which will become the main client of this gateway.

Unless some kind of client service authentication (via service mesh headers, for example) is used as a part of the cache key, then cache results are shared between all client services. This is good for performances but could be problematic if your internal services require differing levels of access to your third-party data. In our case, it is not a problem, product data being quite public and the “authentication sharing” induced by the cache affecting only read requests.

On the security aspect, you also need to take into account the fact that anybody gaining access to the bucket will be able to read and potentially temper the responses served by the gateway. So don’t forget to make your bucket private and ensure only the required permissions are given.

A centralized persistent cache storage, as described above, leads to an eventually shared cached (that is, the cache is shared on the S3 bucket and replicated to every pod on the next rollout). That being said, it’s not how Nginx recommends implementing high availability shared cache. We may try to implement the recommended primary/secondary architecture at some point, but it’s not trivial in a Kubernetes environment and would probably require creating a second service.

Conclusion

As we’ve seen, this solution provides big performance and stability improvements for a quite small effort, building on solid bricks you may already use in your infrastructure (Nginx, Kubernetes, and AWS S3).

Please let me know what you think of this design, and don’t hesitate to ask if you want more details on a specific part.

But wait, how do you test such a service? Well, that’s for another story 😃

If you want to join our Bureau of Technology or any other Back Market department, take a look here, we’re hiring! 🦄