Oops! This site used over 200 gigabytes of bandwidth in less than 24 hours
Late at night on December 1st, I got a ping on my company Slack account. I checked, worried something may be going on at work, since I’m up for on-call starting on December 4th in case it’s something I need to be tracking for handoff.
It was something along the lines of “@Alyx is on the front page of Hacker News!”. I went over to check, and sure enough, at the moment I checked at around 11:30 PM PST, my post about the Mikrotik CCR2004-1G-2XS-PCIe (“Review: A Dive into Mikrotik’s Weird SmartNIC”) . Presumably, someone had seen my Mastodon post from earlier in the evening about having bought two more, and then reposted my year and a half old article about it.
There’s this phenomenon, known by many names, the most common of which are “the Slashdot effect” and “the reddit hug of death”. When someone posts a smaller site on a site like Reddit and it gets popular, makes its way up the ranks, the traffic climbs. Most small sites aren’t equipped to handle this kind of surge in traffic, and will often starve for compute or network resources, and the site will go down.
I hadn’t yet seen this first hand, so I was very curious to see how this would go down. Up until about a month and a half ago, I was running it on three separate servers spread across Seattle, Oakland and Dallas, but for ease of publishing I switched to Cloudflare pages. When I make a commit to my GitHub repository, Cloudflare will see it, and automatically deploy it in less than a minute.
I’m sure I would’ve survived even if I hadn’t been using Cloudflare Pages or even if I didn’t have any CDN caching at all. This blog, as you’re seeing it now, is completely a static HTML page on a server. There’s no server-side dynamic content, and no databases. Just a static HTML page. One of my other projects handles far more dynamic traffic on the same server that used to run this blog, and it barely causes a bump in system resource usage. An influx of static page traffic from being on the front page of Hacker News is nothing.
Enough bragging about my servers. What’s the damage?
In the first hour, approximately nearly 5000 unique users loaded various pages on the site, making over 17,000 requests and using a total of 26 gigabytes of bandwidth. That’s about 1.5 MB/request, which checks out because a lot of pages on the site are fairly heavy on images.
From there, it peaked at around an hour in, and slowly fell off until dropping sharply between 14 and 16 hours in when it dropped from ~50% to ~15% of peak load. Over-all during the 24 hours following making the front page of Hacker News, the site got 147,000 requests and used 210 GB of bandwidth from ~29,000 unique visitors. The cache hit rate was quite high, but ultimately didn’t matter because all the cache misses hit Cloudflare, too.