At Egnyte we use multiple databases — SQL as well as NoSQL — for different use cases. SQL databases can range from single-user to multi-user, while different NoSQL databases are used for different durability requirements. Internally, one of the more interesting choices we’ve made has been to use Cassandra for our event store.
Multiple features within Egnyte rely on the availability of a robust event store. These features are powered by various services, which have their own ordering and durability semantics, including:
- Synchronization with on-premise devices, etc.
We love the write-friendly characteristics of Cassandra that allows us to directly commit to the cluster inside a larger transactional context.
But as part of a (periodic) review of our technology choices, we uncovered a serious set of operational issues with our cassandra clusters. To achieve desired performance levels we had to perform a series of tuning exercises. Some of these included:
- Switching to using RAID0 to handle high IOPS on cluster nodes, a safe move due to our high replication factor (3).
- Adjusting JVM parameters to trigger GC at a lower fraction for tenured objects to avoid paying for large garbage collections.
- Doubling the amount of RAM to create more headroom for Cassandra and OS caches.
- Stressing the row cache much more than key caches due to our heavy usage of range queries, a fact reflected in our configuration.
- Removing old SSTables, after ensuring correct set of data is replicated in the cluster.
Additionally, some of the older Cassandra clients did not include some key functionality such as connection pooling — which we bolted on. While these issues have been fixed in newer versions of Cassandra, lack of backward compatibility means upgrades will be tricky. Also, the lack of transactional semantics between our metadata and event stores technically complicates our on-premise client implementations.
Net-net, we found Cassandra to be operationally intensive (for us), offers a tricky upgrade path, and a potential complication in our client implementations. All these issues, however, can be addressed by moving away from Cassandra to a more main-stream database due to the fact that:
- We have significantly more in-house expertise in scaling relational databases — both vertically and horizontally
- Upgrades to main-stream relational databases are smoother due to larger user base and incremental improvements
- We can provide transactional semantics to our clients thus simplifying client implementations.
Technically, Egnyte customers will not see a change due to this but should see smoother synchronizations (for one). This whole conversation does seem to regularly bring us back to the question, to SQL or to NoSQL, one which we are regularly thinking about.