- Nothing to report. I seem to be working at a pace of one or two hours a fortnight lately.
The main thing to report from this week is that we carried out some load testing of our staging environment. We hit our origin servers with a little over 6 times the current (which is also the peak) load; about 7500 concurrent nginx connections, vs the 1200 we normally see (there is more traffic than this to GOV.UK, but it’s mostly handled by our CDN). Our test data consisted of 7993 distinct paths, taken from our logs, so they represented actual usage.
The good news is that our static content held up pretty well. The bad news is that our dynamic content didn’t. It’s particularly important that we handle dynamic content well, as our CDN doesn’t cache that. The machines themselves didn’t seem to be falling over, certainly not enough to explain the large number of timeouts I was seeing, so I began to wonder how many concurrent connections our apps support. This is distinct to the number of concurrent connections our origin servers support, as they reverse-proxy requests to various apps (or a cache).
It turns out that the answer is not that many! We use a Ruby HTTP server called unicorn, which uses a thread pool to handle concurrent connections. Our finder-frontend app is only set up to have 6 unicorn threads, so requests could easily start to pile up, and eventually time out. Ideally, this number will be increased; but that will put additional load on our elasticsearch server, which is used by more things than just finder-frontend, so care needs to be taken. As a partial solution, 5-minute caching was switched on for finder-frontend.
There’s some follow-up work planned for the next sprint to look into some actual application errors the testing revealed, and also to do a survey of how many concurrent connections all of our apps support and decide if it’s enough.
I started doing Advent of Code in Haskell, putting my solutions on GitHub. They’re fun little challenges to work through: the input and outputs are well specified, there’s no need to worry about error handling, and there’s often a standard algorithmic problem at the heart of each one.
I got a couple of new boardgames, though I’ve not had a chance to play them yet:
Having become fed up with my non-stick-pan-which-isn’t, I decided to go all out and get a fancy cast iron pan with a pyrex lid. It arrived on Wednesday, with the lid broken in two and no visible damage to the packaging. I’m replacing it with a pan that has a cast iron lid as well.