How to commit to high performance at scale
Introduction: The key challenges load-testing Calls vs. Chat
The recent launch of Sendbird Calls includes a commitment to high uptime and low latency server infrastructure. As an engineering team, before we could make this commitment, we had to validate if we could fulfill the uptime and latency requirements at scale.
The process for load testing Sendbird Calls was initially patterned after the load tests performed previously on the Sendbird Chat infrastructure. For load testing our chat infrastructure, we created Node.JS processes throughout EC2 instances, and had them send/receive messages and call APIs using Sendbird Chat JS SDK. There were, however, two major differences when load testing SendBird Calls:
- Controlling the runtime environment for Sendbird Calls is more complex than it is for Chat because Calls depends on WebRTC
- To generate a load for Sendbird Calls is far more resource-intensive than it is for Sendbird Chat for three reasons:
- Computing resources: We must spin up more instances in order to generate the required load
- Packet size: Both inbound and outbound packets are much bigger for Calls than they are for Chat
- Environment: Virtual instances of the Chrome browser support WebRTC. These instances require a substantial memory footprint.
Given these differences between load-testing Chat and Calls, load-testing Calls presented our team with two main challenges:
- To spawn enough instances to generate the required load while optimizing for cost at the same time
Running a headless browser
We selected Selenium WebDriver to handle the clients within headless instances of Chrome. Selenium WebDriver’s interfaces enable testers to send commands that execute within an environment that emulates the specified browser (Chrome in this case).
We created two Sendbird Calls SDK users and connected them to call for 10 minutes. The following code briefly demonstrates how we implemented Selenium WebDriver in the load-test: https://github.com/sendbird/calls-loadtesting-blog/blob/master/loadtest.js
- Chrome unexpectedly shut down far more frequently than it should have. When this occurred, we had no choice but to reinitialize the client.
- We identified a limiting factor in Selenium ChromeDriver. Despite the EC2 instance having ample CPU capacity, the Selenium ChromeDriver seemed to have a limit to the number of Chrome processes that it can manage. When this limit is surpassed, performance drops significantly. We tested out many process numbers but figured out that 15 seemed to be the maximum value that ChromeDriver can handle stably. Therefore, we manually set the number of processes to 15.
Managing instances via Kubernetes
We used Kubernetes to orchestrate client instances. Kubernetes is an open-source container-orchestration system for automating computer application deployment, scaling, and management. The Cloud Native Computing Foundation maintains it. Kubernetes uses the concept of “pods” and “nodes” to deploy distributed systems. Given that the Sendbird infrastructure runs on AWS, we implemented Amazon Elastic Container Registry (ECR) and Amazon Elastic Kubernetes Service (EKS) for container management.
Our Kubernetes setup was configured accordingly to allocate one pod to one node: https://github.com/sendbird/calls-loadtesting-blog/blob/master/loadtest_deployment.yaml
When a client instance is low on resources, WebRTC automatically reduces the call quality in response. The logic that governs pod allocation in any given node is black-boxed. Therefore, we limited the number of pods per node manually in order to maintain a sufficient margin of CPU resources on each node. In the future, the recently released PodTopologySpread feature may be helpful in managing these kinds of issues.
In the end, we successfully ran 10K calls between 20K clients simultaneously using the Selenium WebDriver and a properly configured Kubernetes setup. We recommend following this process to anyone needing headless agents to run Sendbird Calls. After successfully load-testing 10,000 calls, we continued to tune our infrastructure to meet the traffic requirements of several new customers. When they released their in-app call feature to all users, we were confident that our systems would support the Calls load created by our customers’ users.