China Railway Corporation (CRC) is the national railway operator in China and manages all commuter rail and freight transport. With 5,700 train stations, they support the largest population in the world. Its website books 4.5 million tickets per day, based on 20 million daily users. Holiday travel periods create peaks of 15,000 tickets sold per minute, 1.4 billion page views per day and 40,000 visits per second. With so many people relying on the website for travel, it must be continuously available. Demand has far exceeded expectations and the future shows as much as 50% growth per year as mobile phone access is added.
Inability to Scale Traditional RDBMS
In 2011, millions of people began to use the China Railways website on a daily basis and the system began experiencing problems with performance. In addition, holiday travel periods like the Chinese National Holiday and New Year created a severe burden. Peak use at these times caused regular outages, poor performance, booking errors, payment failures and issues with ticket confirmations.
At the time, the system ran on 72 UNIX boxes with a traditional RDBMS server and Dr. Jiansheng Zhu, Vice Director of the China Academy of Railway Sciences, stepped in to address the issues. As the project executive for new system development, his team performed a root cause analysis, identifying two fundamental bottlenecks in the old system: the relational database was overloaded and could not handle either the scale of incoming requests or the level of reliability required to meet their SLAs and the computational power of the UNIX servers was inadequate for the capacity requirements.
First, the team began to evaluate traditional mainframes and found the same inherent architectural bottlenecks as the current UNIX/RDBMS system. According to the Vice Director, “Traditional RDBMS and mainframe computing models just do not scale like a system built to run in memory across multiple nodes. Our website was proof of this and trying to scale our legacy database was going to become very expensive.”
The Vice Director and his team began to look at In-Memory Data Grids (IMDG) with proven track records in financial services, airlines and web commerce – environments with extreme transaction processing needs and unpredictable scale. They selected International Integrated Systems, Inc. (IISI) to help evaluate solutions and Dr. YC Liu lead the IISI team out of Nanjing, China. IISI had a strong track record in migrating companies and government organizations from legacy systems to new, cloud application architectures and IMDGs. They recommended Pivotal GemFire to meet the performance, scale and availability requirements as well as the ability to run on low cost compute infrastructure.
From Proof of Concept to Production with Pivotal GemFire
Facing significant problems from legacy UNIX servers and RDBMS, the organization recognized the need to overcome architecture hurdles while gaining continuous availability, scale and extreme performance. The IISI team created a proof of concept and demonstrated several advantages with Pivotal GemFire: improving the speed of ticket calculation performance by 50 to 100 times, low latency, fast query response times on a consistent basis, as load increased and excellent, near-linear scalability, high availability and elasticity – with an ability to add server capacity on demand.
The Vice Director oversaw the process from POC to production and shared, “First, Pivotal GemFire offered proof in a realistic test environment and pilot as the existing site experienced many issues for Spring Festival in January of 2012. At that time, we began using GemFire to optimize and improve ticket availability inquiries. For the October 2012 National holiday we saw improvements in this part of the architecture while other issues remained. So, we started a second GemFire project to improve order processing. For the January 2013 Spring Festival, we saw order processing improve and outstanding issues were caused by specialized plugin browsers that are now blocked. As seen in the most recent National Holiday for 2013, the system is operating with solid performance and uptime. Now, we have a reliable, economically sound production system that supports record volumes and has room to grow.”
Today, the system runs on ten primary x86 servers with over two terabytes of memory and there are ten backup servers – this has replaced the 72 UNIX boxes and traditional RDBMS with a more efficient, cost-effective approach.
High Performance and Continuous Uptime
Pivotal GemFire stores data in-memory and manages partitioning and distributed transactions across China Railway’s servers. GemFire easily handles thousands of transactions per second and, while it can act as a cache to mission-critical databases and mainframes, China Railway Corporation used it to completely replace their traditional RDBMS.
On-Demand Scale for Data
With a forecast of growth, Pivotal GemFire allows member nodes to be added to the system as needed and can scale from ten to thousands of commodity computers at near-linear response rates. While there is a lot of flexibility in designing data distribution, GemFire automatically redistributes data to new nodes as they are brought online. Importantly, there is no need for applications to be aware of the data distribution strategy and code does not have to be updated when nodes are added.
Increased Developer Productivity
The team realized several benefits from developing with Pivotal GemFire. In addition to the fact that it offers access through C++, C#, Java and REST (via a familiar hash map type of interface, Spring Data) GemFire accelerates development by providing repositories, declarative namespace support, annotations, continuous query support and project templates.
In conjunction with IISI, Pivotal GemFire has provided China Railway Corporation with a way to maintain high levels of performance, scale to meet SLAs and deliver cost effective solutions while keeping developers productive. In the future, China Railways hopes to store more than a single month of ticket data in GemFire by using Hadoop within the architecture to store as much as 10-20 times more information.