SBM blog CTA mobile 1

Boost ops efficiency, drive revenue, & save big with omnichannel messaging

On This Page

How does Sendbird secure AWS?

Tutorial cover Top 12 API security best practices you need to know
Jan 24, 2025
Laxman author image
Laxman Eppalagudem
Product Security
SBM blog CTA mobile 1

Boost ops efficiency, drive revenue, & save big with omnichannel messaging

On This Page
SBM blog CTA mobile 1

Boost ops efficiency, drive revenue, & save big with omnichannel messaging

Like many tech companies, Sendbird has used Amazon Web Services (AWS) as its primary infrastructure provider from day one. But while AWS is an easy-to-use cloud provider, additional work is needed to make it secure-by-default.

In this blog, we’ll explore how Sendbird matured its AWS security posture, empowering developers to deploy in a way that’s both user-friendly and secure.

The history of AWS security at Sendbird

Sendbird’s AWS setup predates the Security team, so it was configured for quick engineering workflows that maintained a base level of security and compliance. As the Security team was built and staffed appropriately, we assumed the responsibility of comprehensively securing the cloud.

The following section provides a high-level overview of Sendbird’s initial understanding of the security landscape (as a growth-stage startup) and its limitations. In the final section, we discuss how our understanding matured, and the steps we took in collaboration with our friends in Engineering to make it better.

IAM users with MFA

In the beginning, our engineers accessed AWS through identity and access management (IAM) user logins with multi-factor authentication (MFA) enabled. Individuals used a username and password to access AWS accounts. Users also had inline policies that made it difficult to know who had access to what in the AWS accounts. We also had root users accessing our accounts for management tasks (surprise!). Overall, this setup was not ideal for auditing IAM users and their AWS API activity via user keys.

CSPM

We initially used a cloud security tool for cloud security posture management (CSPM) that essentially ingested our cloudtrail logs and alerted us to security misconfigurations.

IAM bot users

We used IAM users and credentials for cross-account service-to-service communications. However, the credentials were used in multiple services, which made it difficult to differentiate expected user behavior from anomalies. This was a major area of improvement we wanted to invest in, since if these user credentials were compromised, the threat exposure from an adversary would have been significant.

SSH access, EC2 hosts, and database access

To debug applications and resolve customer queries, engineers needed access to secure shell (SSH) to Amazon Elastic Compute Cloud (EC2) hosts and databases. The Engineering team built out a custom solution to get manager approvals for an access request to our databases.

Terraform via local IAM users

Our infrastructure was (and still is) managed via Terraform. We used Github and Atlantis as our security configuration management (SCM) and CI/CD pipeline for managing changes in our infrastructure. AWS roles were created in all accounts that used the Atlantis infrastructure to apply changes. However, for a variety of reasons, some accounts that hadn’t been onboarded to Atlantis, so changes to these accounts were handled by engineers through their local machine and IAM user credentials.

Breakglass IAM roles for incidents

We set up Breakglass roles with admin privileges in every account for when engineers’ IAM users lacked the necessary permissions to debug urgent customer-facing issues. We had an alerting mechanism for when someone assumed a break-glass role in an account, but we didn’t have a process of approval for someone to use it.

AWS GovCloud

We had a presence in the AWS gov region for one customer, and all issues listed above were relevant to the gov accounts as well.

11 steps for a truly secure future

Once we’d figured out the lay of the land, and were adequately funded and staffed, the Security team set out to create a truly robust security posture for Sendbird. This involved working toward a cloud infrastructure that was secure-by-default, with multiple layers of security, logging, and alerting in place — all while keeping it easy to use for Engineering teams.

Here are the steps we took to improve Sendbird’s AWS security posture:

1. Using a holistic cloud security tool

We opted to replace our CSPM-only vendor with one that could help to improve the holistic security posture of our cloud infrastructure, including posture management, container security, and runtime security with Kubernetes. We chose Orca. The main reason for this choice were the agentless snapshot scanning capabilities of Orca, which make onboarding and management easy and keep our infrastructure folks happy.

2. Using Okta SSO for all users via AWS IAM Identity Center

We set up Okta single sign-on (SSO) for all our AWS accounts, as AWS had introduced AWS Identity Center to manage AWS organizations and SSO setup. We set up SSO via Okta (our identity provider) and assigned permission sets to team Okta groups. This simplified team management and made user onboarding and offboarding automatic because we had our Okta groups synced to our HR platform.

We started using Terraform for managing all user-assigned policies via permission sets and customer managed policies. This removed the majority of inline policies, ultimately making it easier to identify permissions assigned to a team.

Using Okta SSO for all users via AWS IAM Identity Center

3. Removing IAM users for engineers

Once we’d assigned SSO permission set to teams through Okta groups, and confirmed the permission parity between the old IAM users and the new SSO roles, we deleted the IAM users. This was done carefully with multiple rounds of communication with engineering to ensure their access wasn’t broken once the IAM users were deleted. We started by announcing the deprecation of IAM users and gave a deadline for teams to test the new SSO roles. This way, they could inform us of any missing permissions so we can add them to the roles.

Following this buffer period of parallel existence of IAM users and SSO roles, and we were confident we gave teams’ SSO roles the majority of the permissions they need on a daily basis, we deleted all IAM profiles used by humans. This process of communication and iterative change ensured that our engineers were aware of the change and had time to adopt the new access workflows.

4. Setting up EC2 SSH and database access for engineers

Our engineers needed to access EC2 instances and databases in a secure, scalable and auditable way. We needed a vendor and found Teleport to be a viable solution. Teleport gave us the ability to have users create an access request to an EC2 instance or database that would then be reviewed by a manager for approval. Without going on at length about the setup process, Teleport provided a secure way for engineers to request access to individual hosts or databases.

Setting up EC2 SSH and database access for engineers

5. Creating Kubernetes access for engineers

In late 2024, our engineering team decided to move all services to Kubernetes for better cost optimization. As mentioned above, we were using Teleport for EC2 SSH access. But with the adoption of K8s, our EC2 footprint would be significantly reduced, and the SSH access workflow we built would become moot.

We needed to replicate the EC2 access request workflow for K8s clusters. Fortunately, Teleport supports K8s cluster access, and this was a major reason we opted to go with them following a proof-of-concept. Once we’d onboarded all our K8s clusters to Teleport, we set up access so that teams could request access to certain namespaces in a cluster. If the request is approved by a manager, the engineer gets access to only the requested namespaces for a set amount of time.

6. Managing Terraform via Atlantis

We onboarded all missing AWS accounts to our Atlantis pipeline by creating the appropriate IAM roles and configurations, ensuring that all Terraform changes happened only via this pipeline. This ensured that all infrastructure changes got a peer review, as we mandated a GitHub pull request review before changes could be merged.

7. Using break glass role approvals via Okta OIG

Since all engineers had default access to the overpermissive rescue roles in AWS accounts, they tended to use them often in ways that deviated from the purpose of the rescue role ( e.g., debugging daily issues). To address this, we brought in Okta OIG and introduced an approval workflow for when users wanted access to a breakglass admin role in an account.

The request requires manager approval, after which the user is assigned the role for a fixed period of time. Once this time elapses, access to the account role is revoked and the user has to raise another request to use the role again. This workflow was introduced in similar fashion to the IAM user deletion: we worked closely with the engineering team, communicating often to test the new workflow and get feedback. This way, when we finally removed the legacy default assignment of the breakglass role, they were used to it.

8. Setting up SIEM x CloudTrail x Orca detections

Using our AWS CloudTrail and Orca alerts, the Detection & Response team set up security information and event management (SIEM) tool detections for our cloud infrastructure. This tool uses multiple sources to enrich data to make user correlations, effectively increasing the quality of detections to keep our cloud secure from unauthorized access.

9. Using Okta trust-based login for AWS

At Sendbird, we’ve set up Okta device trust to help us gate which devices can access which Okta applications. Since we made AWS login Okta-based, it’s listed as an Okta app. Through Okta device trust, we granted access to the AWS app to only company-managed devices. This effectively ensures that only Sendbird-owned, up-to-date, and secured systems can access AWS.

10. Adding service control policies (SCPs)

We didn’t want anyone to be able to make changes at the AWS account level, so we adopted service control policies to add restrictions on critical actions and resources at that level. Adding SCPs enables us to fully close off critical actions to users, even if they have access to breakglass users during application level incidents.

11. Cleaning up AWS GovCloud

All the issues mentioned in the first section were relevant to our gov accounts, so we replicated all the above controls into our AWS GovCloud accounts as well.

AWS security at Sendbird: What’s next?

After all the improvements made to Sendbird’s AWS infrastructure through the years, our security posture is quite robust and comprehensive. However, given this evolving nature of the security landscape, there’s always a new challenge to tackle. Here are a couple things the Security team at Sendbird is proactively working on:

Cleaning up bot IAM users

We have a few IAM users whose secrets keywords are used in application services that we want to remove and replace with IAM roles. We’ve initiated this effort to early success, but given its application-level changes, we’re moving cautiously. After all, we don’t want to disrupt our customers’ experience with our products as we implement stronger security measures.

Shift left for container, Kubernetes, and IaC

With a recent internal push for Kubernetes and container adoption, we plan on ramping up our security program around containers, Kubernetes, and infrastructure as code (IaC). We plan to establish scanning and reporting capabilities that surface security issues to engineers in their GitHub Ci/Cd pipelines. This way, issues can be addressed well before the applications are deployed to our infrastructure.

To learn more about security at Sendbird, you can check out these related resources: