This is a short overview of some of the qualitative considerations you may want to take into account when designing and building multiparty computation protocols on secure enclaves.
Input privacy or multiparty computation is a general term to describe software that combines sensitive input from multiple sources to produce an output, without revealing any parties' sensitive data to one another. As an example, let’s imagine there are two companies, Bank A and Insurance Company B, who wish to collaborate on a new product for their customers. An obvious first question may be to ask, who exactly are our shared customers in terms of their interests and demographics. Needless to say, neither the bank nor insurance company would be willing to hand over their customer lists to one another, so their two options are to find a trusted third party who can broker the transaction or perform some sort of multiparty computation.
Traditionally the term multiparty computation was specific to purely cryptographic protocols. In particular, homomorphic encryption, which has received a lot of attention from cryptographers since Craig Gentry’s 2009 thesis on Fully Homomorphic Encryption, is often used as a primitive in many multiparty computation protocols. Nevertheless, purely cryptographic protocols have their pros and minuses. They are ultimately the most secure approach if they are designed and implemented correctly. However, there are currently no normative standards available (only informative ISOs and guidelines by community groups) and to the best of our knowledge at the time of writing this, only one company has managed to get FIPS certification which is a slow and expensive process. We have no doubt the users of this technology will ultimately have to foot the bill for this process which may be prohibitive unless it becomes ubiquitous.
Another approach to multiparty computation is to use a trusted execution environment, or specifically a Secure Enclave. There are broadly two types of secure enclaves, a fully hardware-based enclave such as Intel's SGX chips or a software-defined enclave such as AWS Nitro Enclaves. The former has a security model based on physical interventions while the latter leverages the same technology used for multi-tenancy cloud environments (virtual machines running via EC2s for example).
Nevertheless, these two technologies do broadly speaking the same functionality from the perspective of the developer. They run your code in an isolated environment with very limited IO. Importantly they attest the code that is running inside the enclave via a cryptographic hash of the program. This attestation, in turn, is used to “prove” to a key management service that the enclave is safe to share decryption keys with. This means two or more parties can encrypt their data separately and send it into an enclave. Only when inside the enclave will the data be decrypted and a pre-agreed program can perform some computation on the data. Equally, data can be encrypted before sending it back out of the enclave.
Many of us are familiar with writing code. Whether your language of choice is C, Python, Rust, or PHP (to name a few) the paradigm itself is pretty much the same at a high level. You declare variables, functions, and maybe classes and describe how they interact.
Unsurprisingly, building an enclave application is simply writing code. However, it differs from the perspective of how the code perceived from different parties' viewpoints. Cryptographers use the term ideal functionalities to describe this. Basically, it boils down to how, in an ideal world, would the system work.
We can break this down further and ask ourselves:
- Who are the parties involved in the computation?
- What inputs do each of them provide and what outputs do they receive?
- What order do inputs and outputs get sent to the trusted execution environment?
- What calculation does the trusted execution environment perform?
- What code, parameters or arguments does the trusted execution promise to all parties?
The first important step in building any enclave application is to make sure you thoroughly understand the above questions and their answers, and ultimately that the ideal functionality is fit for purpose. Ideal functionalities are often implemented in practice and one needs to be extremely pedantic about exactly what each party learns about each other's secret inputs and outputs. This is compounded when sensitive information is used more than once and privacy loss is compounded. Theoretical frameworks such as UC and AC Composability must be leveraged to keep track of the leakage of sensitive information if there is any.
Let’s take a very simple example. Assume we have three parties, namely Alice, Bob, and Charlie, with secret inputs A, B, and C respectively. Further, let us assume these are all integer values between 0-10, and a priori the best guess they could make about each other's values is purely random (a uniform distribution over 0-10). Let’s assume our enclave returns A+B+C to Alice and both Bob and Charlie receive no output.
What did Alice learn? Well, she learned only a single number, the sum of all three sensitive inputs. So she doesn’t know exactly what Bob or Charlie’s values are. But that doesn’t mean she learned nothing about their inputs. If Alice’s value was 5 and the total she received was 12, then it has actually changed her guessing probability over Bob and Charlie’s values (neither can be over 7 in this example).
While in this trivial example Alice’s guessing probability of Bob and Charlie’s input was simply a truncation of possible values, in real-world applications it typically changes her distribution of uncertainty. These information leaks can easily compound in unintuitive ways and quickly Alice’s best guess at Bob and Charlie’s inputs can become very likely.
Ultimately all parties in a multiparty computation have to accept the security definition involved and this will very much depend on their circumstances.
In the next post, we will discuss security considerations in implementations on AWS Nitro.