What a Fields Medal means to a sysadmin

arXiv under load due to Perelman's Fields Medal

Tuesday morning Grigory "Grisha" Perelman was awarded the Fields Medal for his work on (proof of) the Poincaré Conjecture, one of the million dollar Millenium Prize Problems. NY Times, CNN, the BBC and other web sites point out that he published his work solely on the Internet; but they didn't mention that Perelman's papers are only available from arXiv.org, the web site I maintain.

I found out that Perelman was awarded the Fields Medal when I checked my email ~8:45am Tuesday morning. I had an automated warning message that the main arXiv.org server was unresponsive.
Now, if the server were a pet cat, this is pretty much the equivalent to waking up and stepping in a hairball on the way to the bathroom. I can't even open an SSH
session to better diagnose the problem.

At this point I'm assuming some script went haywire on the server, or there's some sort of malicious attack. I reboot the server (sadly ending over 200 days of uptime). Normally, I would call/email several supervisors to get someone more knowledgeable involved. But my boss Simeon is on vacation ice climbing(!!) in British Columbia. Paul Ginsparg, the arXiv partriach, was unreachable by phone or email at that moment. And Thorsten Schwander, a sysadmin/developer at LANL and arXiv consultant who knows more about Linux than anyone I know, won't wake up for two more hours. I realize that if arXiv is going to work at all this morning, it will be because I make it work.

When the server finishing booting, the load average immediately spikes due to web traffic.

I turn off the web server program so that the rest of the server is responsive enough for me to work. Looking at the web server logs, about a bajillion requests are coming in for everything related to Perelman. Since last weekend, I was expecting demand for his papers, but I didn't realize that arXiv was the only place to get them. Moreover, Perelman's accomplishment has the reclusive genius/Cinderella narrative, so the MSM covers the story far more than they would a typical math prize.

The technical problem was that all arXiv pages are dynamically generated b/c of legacy issues. The server simply couldn't generate Perelman's pages fast enough to keep up with demand. My solution was to grab a copy of all of Perelman's papers off an arXiv.org mirror, and use some redirects to send traffic from the dymanic pages to the very fast static pages.

I turn on the web server, and things hum along smoothly. People can download e-prints as much as they want. I put on my copyediting hat and post a notice on the front page (pictured above).

Total time for me to diagnose and solve the problem: 7 minutes.

In those 7 minutes, I may have done more service to the scientific community than my entire two years at arXiv. I feel that I have atoned for slacking off in Complex Analysis class.

← Previously: Overheard | All posts | Next: Make me a sandwich →