If you choose to use Apache Cassandra as your NoSQL database, keep in mind that you must consider the following characteristics in order to implement and design your model.
The Apache Cassandra is not fully compliance with ACID properties, however when design the application and the database model all of these properties and how Cassandra handle it should be considered.
See the following topics:
Isolation and Consistency
Apache Cassandra does not have isolation feature, so you must design considering that will not exist concurrency. It does not have lock feature comparing to relational databases, and so the approach must to be different. Consider the following scenario and how to handle it:
Many events arrive simultaneously to your system and you should parallelize to ensure performance. However, each event reads and write a same data that is stored in Cassandra, of course you should read the record and then write, however there is no isolation guarantee, and then it can share outdated data processing and thus may be inconsistent. The best way fix this scenario should be to get all the data you need and process it in memory and then update (write) in batch or serialized in order to evict chance of conflict.
One of the strongest features of the relational approach is atomicity that within a transactional context when an exception occurs all database updates are undone. It is the famous law of commit or rollback. However, Cassandra does not provide this transactional feature coupled with application method transaction and then the implementation of these scenarios turns complex. See the following situations and how to handle them:
# 1 – Within a application method that updates multiple tables in Cassandra it is necessary to ensure that all updates must be made in atomic way. In this scenario it is interesting to use batch feature that API provides and place within the same batch all writes of all tables.
# 2 – Within a application method you should ensure that others points may occurs errors (eg a REST call) and then the context should be rolled back. In this scenario it is interesting to create compensation routines: it should be an exception block to undo the update or create retentive routine to the point that has not been updated.
One of the strongest features of Cassandra is the writing/reading optimization. However, the writing method used in the client can be more effective then using a serialized approach or even using batches. See the following scenario and how to handle it:
You will need to write to Cassandra a few million of records and the scenario implementation shoould be different ways:
# 1 – Write the records in the serialized way, it is one by one each of the records but this approach is not the most performative because its application should wait for execution confirmation.
# 2 – Write the records in the batch way, it is sending a block of records, but this way shoud increase network latency when handling very large block.
But we have a third approach that seems quite interesting, specially when not requiring transactional context: use the feature of parallel programming provided by the API. In the Java API, the driver handle the Future features (using Guava) and in our tests seemed much more performative than other approaches above. However, be careful because this feature uses threads and so your application must be set up to manage a large volume of threads.
The design and implementation of your application should use total different approach comparing when using a relational and transactional database. Keep in mind all of these characteristics should carefully be considered in design once we always used relational aproach.