Pentagull's migration to Redis for high availability session state
Session state is an integral part of the ESB platform - we use it to keep track of logged in users, transactions that are in progress and a whole host of other bits and pieces that don't make sense to send to the client or store in the database. Because of this, the infrastructure that provides our session state needs to be highly available and resilient to both planned and unplanned events.
ASP.NET gives us several choices for storing session state out of the box: In process session state is the simplest option, and stores the session data in the same process as the application. This makes it very fast, but also very volatile: Every time our application domain is restarted we lose all our session data. There's also no way to make sessions work across multiple application processes in a web garden or web farm. Next up is State Server. This is an out-of-process way of storing session data. It runs as a Windows service and is completely independent of IIS. That means we can restart app domains or application pools and not lose our sessions, and multiple processes in a web garden can share the session state. However, we're still tied to a single app server, and if that server is restarted our sessions are all gone. Another option is to use SQL server. This might seem like a great choice - we already put a lot of time and effort into making our databases highly available so why not leverage that same infrastructure for session data? The simple answer is performance. Session data is all about key/value pairs, and SQL server is more suited to relational data. To that end we'd rather conserve our SQL server resources for their intended tasks and leave the management of session data to something more suited to the job.
For a long time, we used a single centralised server running the ASP.NET state service. This worked very well in that all the app servers in our fleet could share session data, so we didn't have to worry about server affinity. The glaringly obvious downside to this approach was that the state server became a single point of failure.
Introducing Redis
Redis is an open-source in memory data store that is perfectly suited to the storage of session data. It has been around for many years in the Linux world. It supports replication for high availability and can scale horizontally through the use of clustering and partitioning. Thanks to Microsoft's ASP.NET session state provider for Redis, it is now a drop-in replacement for State Server. This made it a very compelling proposition for us as our development team wouldn't need to make any changes to our software.
We are fortunate to have a great technical team that spans both Windows and Linux skillsets, so mixing both technologies to provide the optimal solution was well within their capabilities. Our Redis infrastructure consists of two replicated nodes in an active/passive configuration, monitored by three Redis Sentinels. The job of the sentinels is to monitor the health and connectivity of the Redis nodes and to handle automatic failover and election of a new primary server in the event of a failure. At least two of the three sentinels must agree to form a quorum, which helps reduce the likelihood of a split-brain scenario, which occurs in situations where both nodes believe they are the primary. Our nodes and sentinels are distributed across multiple data centres giving us both high availability and disaster recovery.
On the client, the failover happens with no more than a few seconds delay, with no noticeable impact to the application or the user. Its also a lot easier to monitor what's going on inside Redis - for example we know exactly how many active sessions are in use across each installation and can better manage our capacity. With State Server this level of detail was simply not available.
Our move to Redis has given us a highly available and scalable solution for session state management that we expect will serve our needs for years to come.