Scaling with RabbitMQ @ Soundcloud

July 18, 2013 Stacey Schneider

If you aren’t familiar with SoundCloud, they are one of the fastest growing sites in the US with 10.2 million uniques in March and traffic growth of 26% from the prior month. They reach over 200M people worldwide.

They are also one of the coolest social networks for sharing music and sound, and many of the most popular, modern day musicians, producers, and DJs release music here to a global audience that also collectively uploads 12 hours of music every minute, covering electronic, classical, jazz, blues, comedy, storytelling, and more.

This past June, Sebastian Ohm, Technical Lead at SoundCloud, gave a talk on their use of RabbitMQ at the Erlang User Conference in Stockholm. His talk and this article cover the functionality, messaging architecture, and lessons learned.

SoundCloud’s Functionality

SoundCloud is a social platform—instead of uploading and commenting on pictures, SoundCloud users upload and comment on audio via a waveform image as shown below. The content on SoundCloud is driven entirely by end users and 3^rd parties, and it can be accessed via the SoundCloud website, embedded in other websites like Facebook or blogs, and heard via mobile apps like Android or iOS.

One of the places that RabbitMQ provides a service is when a user uploads an audio file to SoundCloud. Upon upload, RabbitMQ is used to asynchronously process the audio file, build the waveform image, and also notify followers of the new sound.

The Messaging Architecture—Transcoding and Activity Updates

SoundCloud stores media in Amazon S3, and the worker pool is in EC2. A message-based architecture was chosen a few years back to coordinate these separate storage and processing clouds. After reviewing STOMP and other protocols, the engineering team settled on AMQP with RabbitMQ. The team wanted producers and consumers to be entirely decoupled so that pools of resources could be scaled independently.

The application was developed with Ruby on Rails, and, when a new media file is uploaded, the Ruby code creates a record in MySQL and publishes a message to the media exchange with a unique ID for the media. Both the Ruby app and RabbitMQ are running in the SoundCloud data center in Amsterdam while the consumption end of the queue, a transcoding service, runs in EC2. The consumers receive the unique ID, transcode the media, and publish another message to the media exchange with some meta data and a unique ID for the files on S3, available via URI. A Rails app receives these messages and pushes some of the information into the database.

This approach addressed one of their first scale challenges—scaling uploads. Now, they can add resources to the pool of transcoders quickly and automatically with any spike in traffic—RabbitMQ is used to parallel process the workload across all transcoders and can recover from 10,000s of uploads within a few hours.

They also use a separate RabbitMQ broker to update the dashboard. The dashboard shows users the most recent activities or updates from the musicians and other users that they follow. Scale is not a problem until a user like Skrillex uploads 10 tracks at once and has about one million followers. In these cases, the system would have to synchronously publish a write to Cassandra 10 million times. Instead, the engineering team added a broadcast within their application’s domain and used RabbitMQ for staged, asynchronous processing—including three steps:

Fan-out determines where activities should propagate
Personalization captures the relationship between users and filters an index entry
Serialization persists the information in Cassandra for end user display or API access

Key Lessons Learned

With their current approach, the team has been running about 20-30,000 persistent messages per second (as shown in the graph below). Sebastian was kind enough to share the honest challenges they faced and some key lessons learned during his talk:

While things have not gone perfectly, Sebastian believes Erlang and RabbitMQ have had great performance and no operational issues, even though they had no Erlang knowledge before
Separating production, test, and dev environments are important and reduce headaches and errors
Don’t put every type of processing on one queue or one broker, separate workloads with different profiles of use so they can scale independently
Use clustering—a load balancer in front allows us to publish once and then workers can subscribe to all
AMQP heartbeats worked more smoothly than one TCP connection per broker

Find similar information on RabbitMQ:

View Sebastian’s 40 minute talk with code examples and additional depth
Read other Pivotal POV blog articles on RabbitMQ with overviews of talks from other conferences and case studies
Read over 50 blog articles on RabbitMQ from VMware’s vFabric blog
Read more about Pivotal One—and where RabbitMQ fits into our application fabric

About the Author

Biography

QML Timer Workaround in BB10

BlackBerry’s Cascades framework allows developers to add a lot of functionality in QML that would otherwise...

McKinsey on Big Data Analytics: The #1 Key to US Economic Growth?

Out of 100s of ideas, McKinsey believes big data analytics is one of the 5 top catalysts that can increase ...

Scaling with RabbitMQ @ Soundcloud

SoundCloud’s Functionality

The Messaging Architecture—Transcoding and Activity Updates

Key Lessons Learned

About the Author

Previous

Next

Scaling with RabbitMQ @ Soundcloud

SoundCloud’s Functionality

The Messaging Architecture—Transcoding and Activity Updates

Key Lessons Learned

About the Author

Previous

Next

Related content in this Stream

VMware Tanzu announces the General Availability of AWS Commitment Discount Recommendations, which provides recommendations for all reservable services in AWS through VMware Tanzu CloudHealth.

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.

If you're into FinOps, you've probably heard of FOCUS. Introducing our FOCUS FlexReports template for AWS, Azure, and GCP. Turn your cloud bills into FOCUS-compliant reports in minutes!

The latest Spring Boot simplifies infrastructure setup with Docker Compose. Now, supporting Bitnami images, it opens new possibilities for developers. Exciting times ahead!

Shape the future of Spring! Participate in the State of Spring Survey 2024. Share insights, collaborate with the community, and drive innovation.

Extend Apache Tomcat support with Tanzu Spring Runtime. Seamless transition, enhanced security, and uninterrupted workflow for Java applications.

Welcome to another edition of What’s new with Tanzu Application Catalog. This is a quarterly round up of all things related to Tanzu Application Catalog.

As we stand at the threshold of a new era in data management, Greenplum continues to lead the industry with its commitment to innovation.

Experience enhanced security with Tanzu Application Platform. Elevate your organization's defenses from code to build with SLSA Level 3, image scanning scheduling & automatic upgrades for new patches.