Scaling online sales for the largest railway in the world

High performance and continuous uptime

On-demand scale for data

Increased developer productivity

More cost-effective operations


China Railway Corporation is the national railway operator in China and manages all commuter rail and freight transport. With 5,700 train stations, they support the largest population in the world. Its website books 4–5 million tickets per day, based on 20 million daily users. Holiday travel periods create peaks of 15,000 tickets sold per minute, 1.4 billion page reviews per day, and 40,000 visits per second. With so many people relying on the website for travel, it must be continuously available. Demand has far exceeded expectations, and the future shows as much as 50 percent growth per year as mobile phone access is added.


Inability to scale traditional relational database management system

In 2011, millions of people began to use the China Railway website on a daily basis, and the system began experiencing problems with performance. In addition, holiday travel periods, such as the Chinese National Holiday and New Year, created a severe burden. Peak use at these times caused regular outages, poor performance, booking errors, payment failures, and issues with ticket confirmations.

At the time, the system ran on 72 UNIX boxes with a traditional relational database management system (RDBMS) server, and Dr. Jiansheng Zhu, vice director of the China Academy of Railway Sciences, stepped in to address the issues. As the project executive for new system development, his team performed a root cause analysis, identifying two fundamental bottlenecks in the old system: 1. The relational database was overloaded and could not handle either the scale of incoming requests or the level of reliability required to meet their SLAs. 2. The computational power of the UNIX servers was inadequate for the capacity requirements.

First, the team began to evaluate traditional mainframes and found the same inherent architectural bottlenecks as the current UNIX/RDBMS system. According to the vice director, “Traditional RDBMS and mainframe computing models just do not scale like a system built to run in memory across multiple nodes. Our website was proof of this, and trying to scale our legacy database was going to become very expensive.”

The vice director and his team began to look at in-memory data grids (IMDG) with proven track records in financial services, airlines, and web commerce—environments with extreme transaction processing needs and unpredictable scale. They selected International Integrated Systems, Inc. (IISI) to help evaluate solutions, and Dr. YC Liu to lead the IISI team out of Nanjing, China. IISI had a strong track record in migrating companies and government organizations from legacy systems to new, cloud application architectures and IMDGs. They recommended VMware Tanzu GemFire to meet the performance, scale and availability requirements as well as the ability to run on low-cost compute infrastructure.


From proof of concept to production with Tanzu GemFire

Facing significant problems from legacy UNIX servers and RDBMS, the organization recognized the need to overcome architecture hurdles while gaining continuous availability, scale, and extreme performance. The IISI team created a proof of concept (POC) and demonstrated several advantages with GemFire:

  • Insights into critical business questions
  • Training for key analysts
  • Conversion of existing models

The vice director oversaw the process from POC to production and shared, “First, [Tanzu] GemFire offered proof in a realistic test environment and pilot as the existing site experienced many issues for Spring Festival in January of 2012. At that time, we began using [Tanzu] GemFire to optimize and improve ticket availability inquiries. For the October 2012 National Holiday, we saw improvements in this part of the architecture while other issues remained. So, we started a second [Tanzu] GemFire project to improve order processing. For the January 2013 Spring Festival, we saw order processing improve, and outstanding issues were caused by specialized plug-in browsers that are now blocked. As seen in the most recent National Holiday for 2013, the system is operating with solid performance and uptime. Now, we have a reliable, economically sound production system that supports record volumes and has room to grow.”

Today, the system runs on 10 primary x86 servers with more than 2 terabytes of memory, and there are 10 backup servers—this has replaced the 72 UNIX boxes and traditional RDBMS with a more efficient, cost-effective approach.


High performance and continuous uptime

GemFire stores data in memory, and manages partitioning and distributed transactions across China Railway’s servers. GemFire easily handles thousands of transactions per second and, while it can act as a cache to mission-critical databases and mainframes, China Railway Corporation used it to completely replace their traditional RDBMS.

The system is operating with solid performance and uptime. Now, we have a reliable, economically sound production system that supports record volumes and has room to grow.”
Dr. Jiansheng Zhu, Vice Director, China Academy of Railway Sciences

On-demand scale for data

With a forecast of growth, GemFire allows member nodes to be added to the system as needed and can scale from 10 to thousands of commodity computers at near-linear response rates. While there is a lot of flexibility in designing data distribution, GemFire automatically redistributes data to new nodes as they are brought online. Importantly, there is no need for applications to be aware of the data distribution strategy, and code does not have to be updated when nodes are added.


Increased developer productivity

The team realized several benefits from developing with GemFire. In addition to the fact that it offers access through C++, C#, Java and REST (via a familiar hash map type of interface, Spring Data), GemFire accelerates development by providing repositories, declarative namespace support, annotations, continuous query support, and project templates.


Conclusion

In conjunction with IISI, GemFire has provided China Railway Corporation with a way to maintain high levels of performance, scale to meet SLAs, and deliver cost-effective solutions while keeping developers productive. In the future, China Railway hopes to store more than a single month of ticket data in GemFire by using Hadoop within the architecture to store as much as 10–20 times more information.