Nick Parsons

Posted on May 8, 2020 • Originally published at getstream.io

Encrypted Chat on iOS (Swift)

#ios #swift #chat #encrypted

In this tutorial, we'll build encrypted chat on iOS using Swift. We'll combine Stream Chat and Virgil Security. Both Stream Chat and Virgil make it easy to create a solution with high security with all the features you expect. These two services allow developers to integrate chat that is zero-knowledge. The application embeds Virgil Security's E3Kit with Stream Chat's Swift components.

Note: All source code for this application is available on GitHub.

What is end-to-end encryption?

End-to-end encryption means that two people can have trade messages via the internet without anyone else being able to read them, even if the transmission or storage is compromised. To do this, the app encrypts the message before it leaves a user's device, and only the intended recipient can decrypt the message.

Virgil Security is a vendor that allows us to create end-to-end encryption via public/private key technology. Virgil provides a platform that allows us to create, store, and offer robust end-to-end encryption securely.

During this tutorial, we will create a Stream Chat app that uses Virgil's encryption to prevent anyone except the intended parties from reading messages. No one in your company, nor any cloud provider you use, can read these messages. Even if a malicious person gained access to the database containing them, all they would see is encrypted text, called ciphertext.

Building an Encrypted Chat Application

To build this application, we'll mostly rely on a few libraries from Stream Chat and Virgil (please check out the dependencies in the source to see what versions). Our final product will encrypt text on the device before sending a message. Decryption and verification will both happen in the receiver's device. Stream's Chat API will only see ciphertext, ensuring our user's data is never seen by anyone else, including you.

To accomplish this, the app performs the following process:

A user authenticates with your backend.
The iOS app requests a Stream auth token and API key from the backend. The Swift app creates a Stream Chat Client for that user.
The mobile app requests a Virgil auth token from the backend and registers with Virgil, which generates their private and public key. The app stores the private key locally, and the public key in Virgil Cloud.
Once the user decides who they want to chat with, the app creates and joins a Stream Chat Channel.
The app asks Virgil's API, via E3Kit, for the receiver's public key.
The user types a message and encrypts it with the receiver's public key using E3Kit, then sends it to Stream. After that, Stream Chat relays the message to the receiver. Stream only receives ciphertext, meaning they will never see the original message.
When the message is received, the app decrypts and verifies it is using E3Kit and passes it along to Stream Chat's iOS components for display.

While this looks complicated, Stream and Virgil do most of the work for us. We'll use Stream's out-of-the-box UI components to render the chat UI and Virgil to do all of the cryptography and key management. We simply combine these services.

The code is split between the iOS frontend contained in the ios folder, and the Express (Node.js) backend is found in the backend folder. See the README.md in each folder to see installing and running instructions. If you'd like to follow along with running code, make sure you get both the backend and ios running before continuing.

Let's walk through and look at the critical code needed for each step.

Prerequisites

Basic knowledge of iOS (Swift) and Node.js is expected. This code is intended to run locally on your machine.

You will need an account with Stream and Virgil. Once you've created your accounts, you can place your credentials in backend/.env if you'd like to run the code. You can use backend/.env.example as a reference for the required credentials. Please see the README in the backend directory for more information.

Step 0. Set up the Backend

For our Swift app to securely interact with Stream and Virgil, the backend provides four endpoints:

POST /v1/authenticate: This endpoint generates an auth token that allows the iOS application to communicate with the other endpoints. To keep things simple, this endpoint enables the client to be any user. The frontend tells the backend who it wants to authenticate as. In your application, this should be replaced with real authentication appropriate for your app.
POST /v1/stream-credentials: This returns the data required for the iOS app to establish a session with Stream. In order return this info we need to tell Stream this user exists and ask them to create a valid auth token:

The response payload has this shape:

apiKey is the Stream account identifier for your Stream instance. It is needed to identify what account your frontend is trying to connect with.
token JWT token to authorize the frontend with Stream.
user: This object contains the data that the frontend needs to connect and render the user's view.
- POST /v1/virgil-credentials: This returns the authentication token used to connect the frontend to Virgil. We use the Virgil Crypto SDK to generate a valid auth token for us:

In this case, the frontend only needs the auth token.

GET /v1/users: Endpoint for returning all users, which exists just to get a list of people to chat with. Please refer to the source if you're curious. Please note the backend uses in-memory storage, so if you restart, it will forget all of the users.

Step 1. User Authenticates With Backend

The first step is to authenticate a user and get our Stream and Virgil credentials. To keep thing simple, we have an insecure form that allows you to log in as anyone:

This is a simple form that takes any arbitrary name, effectively allowing us to log in as anyone (please use an appropriate authentication method for your application). First, let's add to Main.storyboard. We add a "Login View Controller" scene that's backed by a custom controller LoginViewController (to be defined). This controller is embedded in a Navigation Controller. Your storyboard should look something like this:

The form is a simple Stack View with a username field and a submit button. Let's look at our custom LoginViewController:

The usernameField is bound to the storyboard's Username Field, and the login method is bound to the Login button. When a user clicks login, we check if there's a username and if so, we log in via Account.shared.login. Account is a shared object that will store our credentials for future backend interactions. Once the user logs in, we initiate the UsersSegue, which boots our next scene. We'll see how this is done in a second, but first, let's see how the Account object logs in.

Here's how we define Account:

First, we set up our shared object that will store our login state in the authToken and userId properties. Note, apiRoot is the IP address where our backend is running. Please follow the instructions in the backend to run it. Our login method uses Alamofire (AF) to make a post request to our backend with the user to log in. Upon success, we store the authToken and userId, and then we call setupStream.

The method setupStream initializes our Stream Chat client. Here's the implementation:

We call to the backend with our credentials from login. We get back a token, which is a Stream Chat token. This token allows our mobile application to communicate directly with Stream without going through our backend. We also get an apiKey, which identifies the Stream account we're using. We need this data to initialize our Stream Client instance and set the user. Last, we initialize Virgil via setupVirgil:

This method requests our Virgil credentials and configures a VirgilClient class we defined, which wraps Virgil's E3Kit library. Once that's done, we call the completion to indicate success and allow the application to move on.

Let's see what the beginning of our VirgilClient implementation looks like:

VirgilClient is a singleton that is set up via configure. We take the identity (which is our username) and token, and then we generate a tokenCallback. The EThree client uses this callback to get a new JWT token. In our case, we've kept it simple by just returning the same token, but in a real application, you'd likely want to replace this with the rest call to the backend.

We use this token callback and identity to initialize eThree. We then use this instance to register the user.

Now we're set up to start chatting!

Step 2: Listing Users

Next, we'll create a view to list users. In the Main.storyboard we add a UITableView and back it by our custom UsersViewController:

And here's the first few lines of UsersViewController:

First, we fetch the users when the view loads. We do this via the Account instance configured during login. This action simply hits the /v1/users endpoint. Refer to the source if you're curious. We store the users in a users property and reload the table view.

Let's see how we configure the table cells:

To render the table correctly we indicate the number of rows via users.count and set the text of the cell to the user at that index. With this list of users, we can set up a click action on a user's row to start a private 1-on-1 chat with them.

Step 3: Starting an Encrypted Chat Channel

First, add a new blank view to Main.storyboard backed by a new custom class EncryptedChatViewController (we'll see its implementation in a minute):

Add a segue between from the user's table row to show the new controller:

With that setup, we can hook into the segue via the UITableViewController's prepare method:

Before transitioning to the new view, we need to set the view controller up. We generate a unique channel id using the user ids. We initialize a ChannelPresenter from the Stream library, set the type to messaging, and restrict the users. We grab the view controller from the segue and back it with the ChannelPresenter. This will tell the controller which channel to use. We also tell it what users are communicating.

Let's see how the controller creates this view:

Luckily, Stream comes with excellent UI components out of the box. We'll simply inherit from ChatViewController to do the hard work of displaying a chat. The presenter we set up tells the Stream UI component how to render. All we need to do now is hook into the message lifecycle to encrypt and decrypt messages on the fly.

Step 4: Sending an Encrypted Message

Now we're ready to send our first encrypted message. Since we're using Stream's built-in UI, all we need to do is hook into the message sending cycle. We'll do this via the messagePreparationCallback on the ChannelPresenter:

Upon loading the view, we set up the callback, which is a hook provided by Stream, to make any modifications we'd like to the message before sending it over the wire. Since we want to encrypt the message fully, we grab it and modify the text with the VirgilClient object.

For this to work, we need to look up the public key of the other user. Since this action requires a call to the Virgil API, it's asynchronous. We don't want to do this during message preparation, so we do it ahead of time via VirgilClient.shared.prepareUser(otherUser!). Let's see how that method is implemented:

This is relatively simple with Virgil's E3Kit. We find the user's Card which stores all of the information we need to encrypt and decrypt messages. We'll use this Card in the encrypt method during message preparation:

Once again, this is made easy by Virgil. We simply pass the text and the correct user card to authEncrypt, and we're done! Our message is now ciphertext ready to go over the wire. Stream's library will take care of the rest.

Step #5: Decrypting a Message

Since we're using the ChatViewController to render the view, we only need to hook into the cell rendering. We need to decrypt the message text and pass it along. We override the messageCell method:

We make a copy of the message and set the message's text to the decrypted value. In this case, we need to know if it's our message or theirs. First, we'll look at how to decrypt ours:

Since the message was encrypted initially on the device, we have everything we need. We simply ask Virgil E3Kit to decrypt it via authDecrypt. Decrypting the other user's messages is a bit more work:

In this case, we use the same method, but we pass the user card that we retrieved earlier, which verifies the message's authenticity. Now we can see our full chat: