HTTP’s Best-Kept Secret: Caching Ryan Tomayko (Heroku)
- Sinatra maintainer.
- Rack core team.
- Creator and maintainer of Rack::Cache.
- NOT Rails Caching
- HTTP caching headers in requests: Cache-control: If-Modified-Since: If-None-Match:
- and responses: Cache-control: Last-Modified: ETag: Vary:
- This stuff is defined in RFC2616, we won’t be going into this that deeply.
Types of Cache
- Built into browsers and other types of client.
- 1:1 relationship between cache and client. The cache only serves one client (private cache).
- How much bandwidth does each cache save: can’t beat it.
Shared Proxy Cache
- Setup for an organization
- 1:many relationship between cache and clients. Serves more than one client (shared cache).
- Is closer to the client than the server, therefore saves a lot of bandwidth.
- a.k.a. Reverse Proxy Cache
- Situated inside of the origin site
- 1:everyone relationship between cache and clients.
- Reduces bandwidth the least.
- The answer to this has changed over time.
- In Nov 1990 there was 1 guy on the web – Tim Berners-Lee.
- In Feb 1996 the web population was 20M. State of the art connectivity was a 28.8kbps modem. At that speed, loading the current http://yahoo.com (~350k) would take 2:48s. Bandwidth was the largest issue. RFC1945 HTTP 1.0 included the Expires: and Last-Modified: headers.
- In March 1999 RFC2616 HTTP 1.1 was released. Addressed 1996 caching problems.
- Today: we cache so we can scale. Keep your back-ends free from as much work as possible. Push as much work up the stack as possible.
HTTP 1.1 defines 2 caching models
- Back-end sets Cache-Control: public, max-age: 60
- Gets cached in gateway cache an browser cache.
- Public says it is good for many clients.
- Cached for 60s.
def show expires_in 60.seconds, :public -> true # stuff render ... end
headers['Cache-Control'] = 'public, max-age=60'
Validation (Conditional GET)
- Back-end adds ETag or Last-modified, e.g. ETag: abcdef012345
- Last-modified is redundant, basically there for HTTP 1.0 clients.
- On 2nd request, gateway cache realizes it has this page in cache, then sends a GET /foo, Host: foo.com, If-None-Match: abcdef012345 to the back-end.
- If back-end returns a 304 Not Modified, gateway cache returns cached version.
def show @foo = Foo.find(params[:id]) fresh_when :etag => @foo, :last_modfied => @foo.updated_at.utc
def show @foo = Foo.find(params[:id]) modified = @foo.updated_at.utc if stale?(:etac => @foo, :last_modifed => modified) respond_to ...
get '/foo' do @foo = Foo.find(paramsp:id]) etag @foo.etag erb :foo end
Combine Expiration & Validation
- Back-end sets Cache-control: public, max=age=60 and ETag: abcdef012345
- In < 60 seconds, cache-control takes precedence
- After 60 seconds, it queries back-end using ETag
- Back end can then send back a 304 not modified with a new Cache-control: public, max-age: 60
- Never Generate the Same Response Twice
Recommend using Rack:cache
gem install rack-cache config.middlware.use Rack::Cache, :verbose => true, :metatstore => "fie:/var./cahe/rack/meta", :entitystore => "file var/cache/rack/body", :allow_reload => false, :allow_revalidate => false
The client controls what happens at the cache as well as the server using Cache-control. Refresh send Cache-control: no-cache. No-cache means gateway cache MUST revalidate ETag before sending response. This is bad and people can pound your back-end. :allow_reload => false disables this.
- High-Performance Caches: Squid, Varnish (Heroku uses this)
- Interesting discussion about ESI at the end.
- Rails by default uses id of model, classname and last_updated to create an MD5 hash for etag.
- Need to start with a seed that covers your release version, otherwise etag will not change. Rails now has a mechanism to handle this.
- 2.3 branch has a new “touch” mechanism too.
- Browser behavior differs and varies quite significantly when using SSL.
About the Author