Home > Help Center > General > Introduction to Memcached

Introduction to Memcached

There are three things that an application needs to survive in today’s demanding world: scale, security, and performance.

It is for both reasons of scale and performance that memcached has become such a popular solution in modern application architectures.

It aids in scalability by offloading database requests, which naturally increases the capacity of the database to answer queries not answerable by memcached.

It improves performance, of course, by providing very fast responses to queries that in turn, are able to be returned to the user with greater alacrity.

Memcached is a free & open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.

It’s in-memory, which makes it fast. Disk I/O is one of the most latency-incurring actions on any given system, so eliminating the need to go to disk to seek out data – which is pretty much a requirement in a database system – is critical to improving performance.

And it’s based on key-value pairs, the basis for NoSQL databases which have arisen in response to the need to speed up access to traditional relational databases (like MySQL, Microsoft SQL, and Oracle) that are far more complex under the covers.

Using MySQL with Memcached

Memcached is a simple, highly scalable key-based cache that stores data and objects wherever dedicated or spare RAM is available for quick access by applications, without going through layers of parsing or disk I/O.

Benefits of using memcached include:

Because all information is stored in RAM, the access speed is faster than loading the information each time from disk.

Because the value portion of the key-value pair does not have any data type restrictions, you can cache data such as complex structures, documents, images, or a mixture of such things.

If you use the in-memory cache to hold transient information, or as a read-only cache for information also stored in a database, the failure of any memcached server is not critical. For persistent data, you can fall back to an alternative lookup method using database queries, and reload the data into RAM on a different server.

The typical usage environment is to modify your application so that information is read from the cache provided by memcached. If the information is not in memcached, then the data is loaded from the MySQL database and written into the cache so that future requests for the same object benefit from the cached data.

cache-memcached

In the example structure, any of the clients can contact one of the memcached servers to request a given key. Each client is configured to talk to all of the servers shown in the illustration.

Within the client, when the request is made to store the information, the key used to reference the data is hashed and this hash is then used to select one of the memcached servers. The selection of the memcached server takes place on the client before the server is contacted, keeping the process lightweight.

The same algorithm is used again when a client requests the same key. The same key generates the same hash, and the same memcached server is selected as the source for the data. Using this method, the cached data is spread among all of the memcached servers, and the cached information is accessible from any client.

The result is a distributed, memory-based, cache that can return information, particularly complex data and structures, much faster than natively reading the information from the database.

The data held within a traditional memcached server is never stored on disk (only in RAM, which means there is no persistence of data), and the RAM cache is always populated from the backing store (a MySQL database). If a memcached server fails, the data can always be recovered from the MySQL database.

Memcached use cases

Memcached is, most of the time, used as a caching layer.

When you need to store or update a document or a record in a database, you also do it in a memcached server.

When you need to retrieve a document or a record, you first ask a memcached server, and if the data is missing or out of date, you read it from the database.

The point is that a Memcached server is usually way faster than a database like MySQL and writing data in two different data stores can lead to important issues:

  • How to invalidate a set of cached objects?
  • How to guarantee atomicity and eventual consistency?
  • How to go distributed?

Invalidation of Cached Objects

Always include a version number in the name of your keys.

Don't use profile:fred as a memcached key.

Rather use profile:fred:v1 or v1:profile:fred or profile:v1:fred or anything else that includes a version number.

If the profile gets updated, just bump the version number and let memcached or a background job automatically expire previous versions.

Atomicity and Consistency

Versioning implies that you have to retrieve the current version first, through another memcached query. This is perfectly fine and well worth it.

Guaranteeing atomicity and eventual consistency might be important, especially for objects with a long time to live.

Memcached variants offer on-disk persistence and keeping all objects forever is tempting. In such a scenario, memcached becomes your primary store, and SQL servers are mainly there for more complex, possibly offline queries.

But if the memcached content and the database content are not (even eventually) in sync, this will lead to a lot of inconsistencies, and it can be a mess to recover from such a situation. And it will happen if the way you write to both data stores is not partition tolerant.

One way to address this is to use event sourcing. Instead of having your application server directly write to both data stores, have it update the cache and queue a description of the event to a reliable message queue.

Redis can perfectly fit the bill, and the content of a message can be as simple as "object [object id] has been updated". Asynchronous workers read every message, fetch the current object from the cache, and commit the change to SQL servers.

How to go distributed?

Back in the days, your only options were either to use Moxi, the Memcache proxy, or to reinvent a sharding layer at client-level.

Nowadays, using Couchbase[1] or Kumofs[2] makes it a snap. Everything is automatically handled at server-level, including rebalancing. And if only for the web-based supervision interface, opting for Couchbase is totally worth it.

A Memcache server can also be useful outside a caching layer context.

An example is a real-time chat. You want online people to be able to communicate with each other. So just use Memcache to store conversations for user’s pairs. No need for any persistent storage if you don't need to keep track of past conversations.

A limitation of Memcache servers is that they don't provide any complex structures (although they can be built with some scripting on Kumofs).

[1] Couchbase Server is an open-source, distributed multi-model NoSQL document-oriented database software which provides client protocol compatibility with memcached.

[2] Kumofs is a simple and fast distributed key-value store. You can use a memcached client library to set, get, CAS or delete values from/into kumofs.

Varnish and Memcache Model

Memcached is a key value store, more or less a rather simple database. It doesn’t persist data and only stores it in memory. It also doesn’t really care if it throws data out. The natural use for memcached is to cache things internally in your application or between your application and your database. Memcached uses its own specific protocol to store and fetch content.

Varnish[3] on the other hand stores rendered pages. It talks HTTP so it will typically talk directly to an HTTP client and deliver pages from its cache whenever said page is stored in the cache, what is commonly called a cache hit.

When an object, any kind of content i.e. an image or a page, is not stored in the cache, then we have what is commonly known as a cache miss, in which case Varnish will go and fetch the content from the web server, store it and deliver a copy to the user and retain it in cache to serve in response to future requests.

These are two pretty different pieces of software. The end goal of both pieces of software is the same, though, and most sites would likely use both technologies in order to speed up delivery.

They will deploy Varnish to speed up delivery of its cache hits, and when you have a cache miss the application server might have access to some data in Memcached, which will be available to the application faster than what the database is capable of delivering.

The performance characteristics are pretty different.

Varnish will start delivering a cache hit in a matter of microseconds whereas a PHP page that gets rendered content from Memcached will likely spend somewhere around 15-30 milliseconds doing so.

The reason Varnish can do it faster is that Varnish has its content in local memory whereas the PHP script needs to get on the network and fetch the content over a TCP connection. In addition, you’ll have the overhead costs of the interpreter. It’s not only that Varnish is better, it’s just that Varnish has a much easier job to do and it is faster because of it.

[3] Varnish is an HTTP accelerator designed for content-heavy dynamic web sites.

PHP Memcached vs PHP memcache

There are two PHP memcache extensions available from the PHP Extension Community Library - PHP memcached and PHP memcache.

These two PHP extensions are not identical.

PHP Memcache is older, very stable but has a few limitations. The PHP memcache module utilizes the daemon directly while the PHP memcached module uses the libMemcached client library and also contains some added features.

Conclusion

Memcached was originally developed by Brad Fitzpatrick for LiveJournal in 2003.

Memcached is simple yet powerful. Its simple design promotes quick deployment, ease of development, and solves many problems facing large data caches. Its API is available for most popular languages.

  • - -

Reference:
MySQL 8.0 Reference Manual@dev.mysql.com
Frank Denis@quora.com
Lori MacVittie@devcentral.f5.com
wiki@varnish-software.com

#server #cache #overview

Still not finding what you're looking for?

Contact our support team with any additional questions or concerns.

Contact support