Encryption at rest in Egnyte Object Store (EOS)
We are starting a new series of blog posts from Amrit Jassal, co-founder and CTO of Egnyte, that will focus on our own technology innovation and other tech industry trends. We are constantly developing new technology, as well as looking for ways to achieve economies of scale to help serve our growing customer base. Our goal is to ultimately open source some of these projects, so please look to this page and feel free to contact us with your own thoughts and suggestions.
Thoughts on data security.
One of the challenges we face everyday in implementing a persistent store is insuring the security of your data. To address this, we need to secure data from:
- Corruption in transit (malicious or otherwise)
Encryption at the transport layer (ex. HTTPS) can guard against spoofing and checksums during transit can guard against corruption.
- Random events (cosmic rays?) while on the media (bit rot)
Bit rot can be addressed through a periodic process of checking data integrity at rest.
- Media theft
Traditionally media theft is addressed by encrypting data at rest – the topic of this note.
In order to address encryption at rest, we utilize advanced encryption standard (AES), the symmetric-key algorithm of choice. AES performs well on a wide variety of hardware, from 8-bit smart cards to high-performance computers and we have a number of optimization choices available. In our implementation, we first had to decide the number of key bits — 128 and 256 being the most common. In reviewing US government policy, we noted TOP SECRET classification requires 192 bits or above, with 128 bits sufficient for a SECRET classification. Though in practice, 128 bits would be sufficient, ex. Seagate uses 128 bits for encryption at rest, see (1), we looked at the market and listened to what our customers were telling us, and decided to support AES 256 for our object store for the added level of security.
In our experience, adding inline encryption adds a non-trivial overhead to overall latency numbers (though Amazon claims that for their encryption support, “We have not seen any significant impact on performance when using SSE.” (!) (2)). To address that latency, we looked at some of the following options:
- Implementing the solution in a lower level language for performance reasons.
- Offloading encryption to dedicated hardware, which would require development in a platform that allows us to easily address custom hardware.
- Leveraging AES instruction set of modern CPUs, which was the most interesting since we already have these CPUs available and it doesn’t require special hardware installations.
For the first option, we looked at the C language to allow our object store to leverage hardware of multiple types where storage nodes expose persistence primitives over standard HTTP. With this approach, we would want to implement this on an HTTP server such as Apache or Nginx — two stacks we already use. This could all be done by either writing input/output filters, or modifying a module such as mod_dav (3). For option one and two, we would use a standard provider such as openssl, which has great support for leveraging mulitple “engines” (4).
For the third option, we decided that Intel “Westmere” CPUs would be a good plan, as they offer six new instructions that target AES optimizations (5). Of those six, four instructions target encryption and decryption while the other two target key expansion operations. As these instructions don’t use lookup tables we have the added benefit that they secure against data-dependent latency attacks.
With these strategies in mind, before embarking on a new (for us) development platform, we wanted to dig deeper into leveraging our current JVM based solution set. One solution is to use a JNI based approach, however that has the JNI performance drawback. As we are doing this inline, we would pay the JNI boundary latency penalty as blocks go in and out between JVM and JNI layers. Another approach would be to find a provider that specifically leverages the instruction set i.e. ties down the solution to a specific architecture. Since Intel is the vendor of choice, this doesn’t present much of a restriction, and our friends at Intel pointed us to their JDK extension benchmarks (6). Our internal benchmarks actually give us better numbers than the ones (about 40%) laid out in the reference and so this is the solution currently deployed in our object store.
We’re pleased with our implementation, so what conclusions did this approach leave us with?
- It allows us to leverage our existing skill set in JVM technologies
- It does not require custom hardware deployment in some middle layer (it would not be cost effective to add this hardware on storage nodes)
- It allows greater scale as scale is tied to number of storage nodes rather than application nodes
- It allows us to satisfy our requirement of supporting AES-256 as the encryption solution of choice.
Side note: encryption keys within Egnyte are per customer to avoid cross-contamination.(1) http://www.seagate.com/staticfiles/docs/pdf/whitepaper/tp596_128-bit_versus_256_bit.pdf (2) http://aws.typepad.com/aws/2011/10/new-amazon-s3-server-side-encryption.html (3) http://httpd.apache.org/docs/2.0/mod/mod_dav.html (4) http://www.openssl.org/docs/crypto/engine.html (5) http://software.intel.com/file/24917 (6) http://software.intel.com/en-us/articles/intel-aes-ni-performance-testing-on-linuxjava-stack/#enable-intel-eas-ni-in-oracle-jvm