Incident anecdote: The one with the weird A/B test

October 20, 2021 — 2 minutes to read — Tags: incidents, site reliability

I appreciate reading stories of how complex software systems fail and the hard-earned lessons to make them more resilient.

Here is another of my personal favorite incident anecdotes. It sticks out because it helped broaden my thinking about appropriate uses for A/B tests.

Key lesson

A/B tests can be useful to verify that a software system works correctly—they’re not just for testing the user experience.

Setting the scene

My team had spent about 8 months rewriting our e-comm frontend from dynamic, backend-rendered HTML to a statically rendered frontend (called SSG for static site generation). The main goal of the project was to make our site more scalable (by reducing the need for backend rendering and DB queries), and reduce latency.

We began QA’ing the new SSG version of Glossier behind a feature flag with fancy Cloudflare routing config.

In order to quantify the revenue impact of the project, leadership requested we do an A/B test on the conversion rate.

The team and I were initially reluctant, since an A/B test for this particular infra migration required one-off support in our Cloudflare Workers. We hadn’t planned to A/B test SSG because it wasn’t an optional feature — we needed SSG for our Black Friday traffic.

But it’s fair to ask us to back up our aspirational claims with data. And boy were we surprised when the early A/B results showed SSG had a worse conversion rate than our slow, dynamically-generated control.

We dug in, and realized that almost no customers from the UK converted in our SSG treatment. That helped us pinpoint a typo in our localization code (en-UK instead of en-GB). This caused customers with a UK IP address to default to the US store. Confused, they’d bounce rather than change their locale in a footer widget.

Note that we’d certainly tested many international locales; but we’d tested it by manually changing our locale (which worked) rather than the geo-IP lookup that’s the default for most users.

We fixed the typo, re-ran the A/B test, and sighed with relief at a modest lift in the conversion rate.

The A/B test was useful for QA! It would have been more difficult and costly to find that typo had we launched without an A/B test.

Key lesson

Setting the scene

Hi, I'm Aaron Suggs. 😀👋