[Here’s a bit of background about our previous cache setup. Skip ahead to “Using Varnish To Cache WhatClinic.com” if you want to jump straight into the Varnish section.]
Our main website is built on Microsoft’s IIS and we have been using its built-in page and component level caching to serve html pages for several years. This built-in caching is easy to setup and quite flexible, but it is very memory hungry.
The memory issue isn’t much of a problem on small static websites with only a couple of hundred pages. Unfortunately though, WhatClinic.com is a dynamic site with potentially millions of individual pages to serve. Typically we were getting only 12% of our pages served from the cache, and sometimes this was as low as 6%. It was almost pointless running the cache at all.
The biggest problem for us is the breadth of the website. On a typical day we have 30,000 unique visitors, but they land on 23,000 distinct URLs. Over the course of a month this balloons to 145,000 distinct landing pages. Worse still, they look at over half a million distinct pages on the site.
To try and improve the performance of the existing IIS cache we tried writing the page cache to disk. Under test conditions with relatively small numbers of pages this worked well, but to get even 50% of our pages from one month’s visits in the cache it meant having 250,000 pages written to the disk. In the end the NT file system on our servers starting grinding to a halt, not because of request volume but purely because of the number of individual files involved.
Using Varnish To Cache WhatClinic.com
We came up with some ways around the NT file system problem but decided in the end it would be better to move the cache off the main box altogether. At the same time we decided to look at Varnish as a solution, with a view to hosting it on AWS.
On the upside Varnish is lightweight and powerful, but it also introduced a number of new problems for us to overcome:
1. Varnish Caches Cookies
We use a cookie to store all kinds of information about a new visitor, including things like their country of origin, so we can display clinics’ prices in the visitor’s local currency. To get around Varnish serving up pages based on one person’s cookie all the time we had to move our cookie drop into a javascript call rather than doing it on the page. No big deal, but something to be aware of.
2. All Requests Go Through The Varnish Box
To determine a visitor’s location we look at their IP address, but since all requests were going through the Varnish server our own server was only seeing one IP address hit it all the time. We changed the code to pass the referring IP address along and so we could pick it off.
Problem solved, except now our default access logs don’t record the proper IP address of each visitor. We use Google Analytics and our own logs for the bulk of our reporting so this isn’t a big deal, but at some point we might have to look at writing our own access logs with the referring IP address if only to give us the peace of mind that when something does go wrong we can track it in the raw log data.
3. Altering Our Landing Pages
Depending on whether you have just landed on WhatClinic.com, or are browsing subsequent pages, we alter the layout of the page. The layout differences are quite extensive even though the data is all the same, so it isn’t efficient to make the changes on the client side. We need to cache two different versions of the same page.
The solution involved getting Varnish to pass along the referring URL and using something like (isReferringDomainWhatClinic.com) as part of the key for the cache as well as the requested URL itself. In the end this was pretty easy to do too, but it did double the number of pages in the cache. However, we were trying in particular to improve the speed of our landing pages so it is worth it to us.
4. Time To Live
As we said in the intro, we have a very broad site. Our pages also change quite infrequently, so we wanted to have the maximum possible time to live for the cached pages, in the order of several months. However, some pages do change, and a change to any one of our customer’s data may have effects that ripple over hundreds of pages that their clinic might appear on.
The solution was to set our time to live to several months, and then remove pages from the cache only when they had been updated. Having implemented a means to remove the pages from our cache, we then had to determine when a change to a clinic’s data had occurred and which pages were affected by the change, so we knew which pages to remove from the cache and update.
Working out exactly which pages were affected turned out to be a little problematic but we solved it eventually and we’re reasonably happy that we’ve covered all the cases. We also coded a big red “Remove All This Clinic’s Data From The Cache” for use in case of emergencies.
The Results
Overall, it has been a big win. After about three weeks of operation we have a page hit rate of around 65%, which is a huge improvement from the 12% we used to get. Cached pages are returned now somewhere in the order of 100-200ms instead of 2000-5000ms, and the load on our server has dropped dramatically, improving performance for those pages which are never going to be in the cache too.
Of course, having improved the efficiency of generating the page html, we are now looking at the speed of all our own JavaScript, our external calls to analytics, our social media buttons and other external client-side calls.
Performance improvements never end, do they?











Most Recent Comments