Guide

Interaction API buyer’s guide: chat, voice, video

What is an interactions API?

Twenty-first century digital interactions include large- or small-scale group chat, 1:1 messaging, voice and video calls, all sent using internet protocol (IP). An interactions API is a suite of API products that give your business a simple way to customize fully featured chat or voice and video calls into your brand’s application, and delegate the management of server traffic generated by this service to a third party.

The interactions API provides a RESTful program interface to allow your product and development teams access to every functionality required to build chat, voice, and video, allowing them to customize a communication experience based on simple HTTP requests and responses. Software development kits (SDKs) package the API to help you implement the client application quickly without having to start from scratch. In the end, it enables your end users to chat and make 1:1 voice and video calls from your application.

The benefits of an interactions API

Forward-thinking businesses of all types benefit from an interactions API:

marketplaces, on-demand services, digital health, live-streaming apps, social media, online communities, and gaming.

Depending on how a business deploys chat, voice, and video, it can benefit them in many ways:
    Benefit |    Increase
  • Gross merchandise value (GMV)
  • Gross transaction value (GTV)
  • Sales conversion
  • User engagement in app
  • User retention in app
  • Resolution of customer inquiries
  • Immersive user experience
    Benefit |    Decrease
  • Booking cancellations
  • Abandoned carts
  • Time-to-resolve customer inquiries

Why an Interactions API?

Digital interactions have only become more crucial in 2020, when this mode of communication has proven that social distancing and quarantine strengthened the importance for people to connect online. Over 5 billion people chat worldwide through a messaging app. Over 90% of users agree that video calls improve connectedness.

Consumers are already adapted to modern chat, voice, and video experiences and take them for granted as the standard experience. They not only want to send and receive messages, but they also want to send previewable images, emojis, GIFs; they want crisp voice, clear video, and no interruptions; different kinds of moderation features; and the list goes on. Without an interactions API, it would be a serious challenge for a business not focused on chat, voice, or video to create a product that could compete with interaction experiences that 80% of adults use everyday.

Why a chat API is necessary to deliver high-quality, customizable chat

Take a chat’s read receipts as an example. It’s a must-have feature for chat. This seemingly simple feature requires that a server keep every user’s read status for every conversation forever. And this read status must synchronize to other users’ devices when they’re online. And it must be updated for every message from each member of the conversation. All told, the chat system needs to be tested to handle 10-100x more event traffic for every message sent because of this one feature.

Or consider building a chat system that reliably delivers messages during poor connectivity or when some recipients’ devices are temporarily offline. Or during spikes in server traffic, when your users jump from 500 to 5,000 or more sending messages, images, and reactions in multiple channels at once. Compounded with the other features layered over a simple message, like the read receipt, delivering a smooth end-user experience quickly becomes untenable.

Why a voice and video API delivers an immersive call experience so quickly

In-app voice and video calling also presents a number of challenges for developers. It requires a specialized networked infrastructure not only to connect calls, but also to process and relay the audio and video streams between thousands of concurrent call participants. Quality poses a challenge too. This interaction must approximate the standard mobile phone calling experience as much as possible: incoming call notifications (even when the app is not open), caller ID, mute, and the ability to select a microphone, speaker, and camera.

Outside of the call experience, the developer faces other challenges like managing a user’s call history, enabling redial, and providing a layer of analytics based on individual and aggregate usage. Delivering this level of functionality requires an enormous attention to detail, extensive testing, and continual operational monitoring, all of which consume valuable human resources from a team for whom in-app voice and video calling is a valuable — but ancillary — feature.

Interactions API emerges as the solution for quality, scale, speed, and flexibility

Beyond high standards for chat, voice, and video, it is challenging to build server architecture that scales with each new feature and your user growth, or keeps up with client version changes or security vulnerabilities — not to mention compliance, performance optimization, maintenance, synchronizing data across iOS, Android, and JavaScript platforms.

With an interactions API, businesses can now create the highest quality chat, voice, and video experience and scale quickly without having to worry about server maintenance, performance, or security. The flexibility of an API allows development teams to customize the service’s functions and features to meet their needs or even use the interactions API as a platform to build the entire communication journey across a user’s lifecycle. SDKs help with client implementation so flexibility is balanced with a speedy time to market — for example, you can make your first in-app voice call in as little as 15 minutes.

As a result of all these technical challenges, the interactions API emerged as the solution to reliably create the highest quality chat, voice, and video experiences at scale with minimum time to market and maximum flexibility.

Implementing chat, voice and video

The project scope for implementing chat, voice, and video will vary according to your definition of a minimum viable product (MVP). To implement chat or voice and video with all the necessary features for basic functionality, expect a timeline as fast as a few hours or a few days depending on your developer’s experience and bandwidth.

Project scope: implementing chat using an API

For most customers, an MVP equates to implementing chat in their application with necessary features like creating users and channels, displaying channels, and sending and receiving messages. This could take anywhere from a few hours to a few days, depending on developer experience and bandwidth, especially given good documentation and fully functional sample apps for each platform. For other customers, an MVP might include every function required to ship.

Whatever goals you set for an MVP, the beauty of an API is that it can get you to market quickly so you can provide a proof of concept or demonstrate initial business outcomes to justify a bigger investment in your roadmap.

The following gives you a range of timelines for different project scopes.


Project scope: implementing voice and video using an API

For customers using a voice and video API, an MVP tends to mean basic functionality like making a call, receiving a call, ending a call, caller ID and logs, and using contact lists — all across multiple platforms. While Sendbird Calls allows you to make your first test call in 15 minutes, you can develop a high-quality and fully functional MVP for in-app voice and video in at least half the time it takes to build from scratch. Since the goal of voice and video is simpler than chat, you can integrate it into your app’s UX with few modifications.


Without an API and managed infrastructure, however, the MVP is the smaller effort. The greatest effort will be dedicated to load testing for a large scale of concurrent callers. With a managed infrastructure, you can avoid this effort altogether.

A voice and video API and managed infrastructure, not only accelerates MVP development, but it also scales to your needs right away without additional effort.

Interactions API features

Beyond the MVP, a chat or voice and video API provides other features that can complement your application’s business logic or help you reach specific business outcomes, improve user experience, or develop more insight into your users’ experience.

Beyond the MVP, a chat or voice and video API provides other features that can complement your application’s business logic or help you reach specific business outcomes, improve user experience, or develop more insight into your users’ experience.

Chat API features

    Messaging Features

  • Typing indicators
  • Read Receipts
  • Invitations
  • Chat history
  • Video and image thumbs
  • Send and receive structured media
  • Emojis and reactions
  • GIFS
  • URL Previews
  • Rich text editing
  • Push notifications

    Moderation Features

  • User-to-user blocking
  • Smart throttling
  • Auto-moderation
  • Image moderation
  • Regex and profanity filtering

    Data and security

  • Advanced Analytics
  • Encryption in transit and at rest

    Integration

  • Link API services – maps or payments API directly in the message

Voice and video API features

    Call features

  • Make a call
  • Receive a call
  • End a call
  • Caller ID
  • Contact List
  • Display a user’s call log in the application
  • Push notifications for incoming calls
  • Multi-device support
  • High quality and performance across platforms: iOS, Android, React Native, Javascript
  • Link voice and video calls to a specific Sendbird chat channel

    Data and security

  • Call metadata for depper insights: topic, time, context
  • Manage and view call logs in a detailed view, including metadata
  • Enable test-only token-less user authentication
  • Encrypt call metadata transfer
  • Implement standard user login credential policies

How does it fit with your technology?

The simplicity of an interactions API allows you to use it flexibly with your code. An interactions API can be boiled down to four different HTTP methods — POST, PUT, DELETE, GET — working in tandem with a server infrastructure. An interactions API also defines the structure of the responses in JSON or XML returned from these HTTP methods. These responses generalize data sent from the servers so they can be as flexible as possible. Using these responses, you can customize how you make use of the interactions API.

Although each feature uses these simple HTTP methods, the functional implementation of chat, voice, and video is significantly more complex. For example, if you create channel handlers that notify your servers when a user edits a message or if you retry sending messages or making a call during poor connectivity, each has a complex implementation built on the backbone of those four HTTP methods and their responses.

A chat or voice, and video SDK manages the complex implementations for you, while the API creates flexible chat functionality based on HTTP methods. An API communicates with the backend servers, telling them to create new users, generate thumbnails for images sent in a message, or handle reconnection when connection is lost. The SDK, in turn, “wraps” every API call so you don’t have to worry about making them and, then, it returns a response. The chat and voice and videos SDKs sit in parallel with all your SDKs, so you can use many at once.

For example, instead of sending a message and continually polling our server to get the message’s status or handling when a user sends a new message while the other is sending, the SDK will automatically handle each case and notify your app when the event occurs.

When the SDK notifies your app of an event, you simply decide how to display it to your users. SDKs remove the difficulty of chat, voice, and video, allowing you to decide how your users will experience each interaction.

Services for implementing

There are a range of services and tools that allow you to implement chat, voice, and video in your product and they vary according to how much you need to build to achieve the interactions experience you want. These range from app development and data streaming platforms to SaaS (Software as a Service) tools without any customization. The former are designed for general mobile application development. They tend to offer the most basic functionality and require significant feature building and optimization. The latter, on the other hand, offer little to no flexibility.

In the middle, the API occupies the sweet spot for flexibility and fast development because it provides all the features required for interactions in a RESTful API and client-side SDK, so you can quickly launch and customize your interaction experience. By specializing in chat, voice, and video the API service gives you the technology to customize chat, voice, and video calls in your app according to your specifications.

    These are the three main categories of tools on the market for interactions platforms:

  • App development and data streaming platforms
    • Typically a managed service that provides a broad platform for building mobile applications and syncing data in real time.
  • API’s
    • An API and managed infrastructure to customize chat, voice, and video in your app, and typically including an SDK to launch quickly.
  • Software as a Service (SaaS)
    • An out-of-the-box solution suited to departments like marketing and sales that require little to no technical or UI customization.


Spotlight on voice and video APIs

APIs have emerged as the preferred service for in-app voice and video integration. It’s helpful to organize the landscape according to two variables: price and level of effort.

Lowest level of effort and medium price

The Sendbird Calls API minimizes the level of effort to implement in-app voice and video. Invested in a streamlined developer experience and thorough documentation, Sendbird Calls API allows you to develop your app with the least amount of effort. Its pricing sits squarely in the middle and appeals to businesses that want to integrate high-quality voice and video with minimal effort or modification to your UX.

High level of effort and high price

These services are the long-established telephony companies or communications APIs, like Vonage or Twilio. They are priced at a premium and require a high level of effort, but offer their longer-established brands and high customization.

Medium effort and low price

Agora.io, Sinch, and Plivo sit in this tier, offering a low price and moderate effort to implement. Plivo only offers voice over telephony networks. Sinch offers a number of communication APIs. Agora.io’s original product is voice, video, and other related products like video or audio broadcasting.

How is chat, voice & video priced?

Pricing for Chat

The primary driver for price in chat is the number of monthly active users (MAU). This variable changes the most over time as you grow in-app chat.

    Other factors include

  • Premium features
  • Data storage
  • Different support levels

SaaS vendors typically charge per seat per month, add to price for additional features, and set limits on API calls. This software is typically used for marketing and sales and follows well-worn pricing models for SaaS.

Pricing for voice and video

Companies price voice & video on a per minute basis. The following variations on price also exist:

  • Voice is cheaper than video by nearly half or more
  • Companies also occasionally vary the price of voice calls, depending on whether you’re receiving a call or making a call (the more expensive of the two)
  • The price of video sometimes depends on the quality of video, if options are available
  • HIPAA compliance often increases the price of both voice and video
  • As volume increases, companies offer cheaper rates per minute forvboth voice and video

Executing a successful project rollout

Planning your implementation, benchmarking and analytics

The most fundamental goal is to understand how chat or voice & video integrates into your main product. In other words, how do you link chat or voice & video to your business, product and UX goals? You may have a clear idea of the greatest areas of friction in your user experience or you may only have a hunch. Either way, thinking through your implementation goes a long way. We always recommend complementing your own planning by consulting with your interactions API provider about industry and use-case best practices.

Every customer of a chat or voice & video API wants their users to talk and, to some extent, greater user engagement is the most fundamental result of implementing messaging. At what points in your user lifecycle or user journey will more engagement either counteract churn or drive the user to continue the journey? Here’s a basic but proven example.

An on-demand ride-hailing app sees a lot of cancelled transactions 2 minutes after a user books a transaction. By giving your users the opportunity to communicate about the transaction before cancelling, you accomplish two things:

  1. Create insight into what users talk about before cancelling—e.g. is the driver unable to find the rider or does the rider need another 2 minutes?
  2. Dramatically decrease booking cancellations by enabling users to communicate about and solve their own issues.

Whatever your goals, set a benchmark and KPI for the issue before implementing chat and voice & video in your app. For example, how many users cancel:

  • 0 – 60 seconds after transaction
  • 60+ seconds after transaction
  • After an SMS
  • For specific reasons like the driver cannot reach the rider or other reasons

Once you implement chat, the first goal is to achieve user engagement. This can be measured by growth in messages sent, MAU, messages per user, or messages per transaction, and others. Then, measure against your benchmarks using different types of analytics for different use-cases.

For voice & video, the experience is more personal and immersive, but the goals are similar: achieve high engagement and NPS along your user’s communication journey, leading to increased conversion on your specific business, product, or UX goal. You can measure growth in connected calls, minutes per call, time-to-resolution, and NPS, among other KPIs.

Chat moderation

Every chat will need some form of moderation to protect its users. It’s important to plan ahead: what kind of moderation is appropriate for your user community?

Do your users share a lot of content? You may want to implement community moderators, user-to-user blocks, and other user-driven moderation.

Do your users mainly chat during transactions? You may need automatic profanity filters, regular expression filters, SPAM filters-or automatic image moderation.

    Other Technical Considerations
  • When your app connects to a chat API backend is extremely important because it relates to your MAU. Does the chat API provider have a Connection Manager in its SDK? If not, you’ll need to build one and determine optimal connection logic.
  • What type of authentication fits your security style? Typically, access tokens and session tokens are available for authentication. Use whichever fits your security style.
  • When to implement basic functions like starting or archiving a channel. Be sure that your chat API provider can consult you on crucial implementation steps like this.