Avtrz
BlogEngineering
Engineering8 min read

Edge-side image transforms: shaving 200ms off every avatar

How we replaced a Sharp worker pool with edge-rendered crops, and the three production bugs we hit on the way.

Maya Okonkwo··8 min read
−200ms
performance
performance

For most of Avtrz’s life, an avatar request that missed the cache went to a pool of workers running Sharp. The worker pulled the original photo, resized and cropped it, and handed it back. It worked. It was also a fleet of machines we had to run, scale, and patch so that a request could wait in a queue.

This spring we moved the transform to the edge. The crop now happens in the same edge function that serves the response, next to the user, with no worker pool behind it. It cut about 200ms off every cache miss. It also cost us three production bugs, which are the interesting part.

Why move it at all

A worker pool is latency you pay for twice: the network hop to the pool, and the time the request spends queued behind other requests when traffic spikes. Warm-up jobs spike traffic by design, so the queue was worst exactly when customers cared most.

Edge runtimes can now decode, resize, and re-encode an image inline. So the transform moved to where the request already was:

TypeScriptcopy / paste
export default async function avatar(req: Request) {
  const src = await fetchOriginal(req);
  const out = await transform(src, { size: 64, crop: "face" });
  return new Response(out, {
    headers: { "Content-Type": "image/webp" },
  });
}
The transform runs inside the edge response, not a worker hop away.
Moving work closer to the user is easy. Noticing the assumptions that moved with it is the hard part.

The three bugs

One: memory limits, not CPU limits

The worker pool had room to decode a large source image into memory. The edge runtime is tighter. A handful of unusually large originals decoded fine in staging and then hit the memory ceiling under real traffic. The fix was to cap the source dimensions before decoding and reject anything implausible early.

Two: the cache key lost a field

The worker keyed its output cache on size and output format. The first edge version keyed only on size, so a request for webp could be served a cached png. It looked fine until a client sent an Accept header we had not seen in testing. We added format back to the key and the Vary header.

Three: cold starts on rare regions

Popular regions stayed warm and fast. A handful of low-traffic regions cold-started the transform code on the first request and looked slower than the old worker pool there. We now keep the function warm with a low-rate health ping per region. Boring, and it worked.

Where it landed

−200ms
per cache miss
0
worker machines
3
bugs, all in this post

The win was real and so were the bugs. Every one of them came from an assumption that was true in the worker pool and silently false at the edge: more memory, a forgiving cache key, always-warm code. Moving work closer to the user is easy. The job is finding the assumptions that did not make the trip.

Written by
Maya Okonkwo
Works on storage and the edge at Avtrz. Believes most infrastructure problems are really cost problems wearing a hat.
More posts
Twice a month, max

Get the next customer story in your inbox.

Engineering posts, customer stories, and the occasional changelog. Unsubscribe in one click.

Edge-side image transforms: shaving 200ms off every avatar · Avtrz Blog