This blog post proposes to answer some of the questions asked during my Webinar “From RDBMS to Cassandra without a Hitch”.
The recording of this webinar should be online soon. The slides can be downloaded from here
Now, let’s go for the Q&A:
Q&A
Q from Paul Moore: Does the business require reporting on the contacts and how was that problem solved after moving to Cassandra?
A: Not really. Right now the business is focussing on the number of users, which is a key metrics to measure the application popularity and service growth. That’s the main reason why I did not give any precise figure about the current Libon users count. This is a sensitive information that should not be disclosed
Q from Abbas Hosseini: Do you recommend keeping both Cas… and MySQL?
A: It depends on your needs. The Lightweight Transaction feature guarantees linearizable writes/updates to one partition (one physical rows) only. If your business requires transactions that spans over multiple partitions (updating user information depending on value from another table for example) and you cannot found a trade-off with Cassandra data modeling, then you should probably keep your MySQL for those use cases
Q from Avinash Reddy: Can any body me tell to how to load the data using SSTable loader ?
A: It’s slightly off-topic but I think you can find good documentation here. There is also a blog post about the new implementation of SSTable Loader in Cassandra 2.1 here.
Q from Jega A: You said you have 7 Cassandra tables. How many RDBMS tables does this correspond to?
A: Approximately a dozen originally
Q from Eugene Valchkou: Why did you code Achilles instead of using Datastax Mapper ?
A: Good question. First because Achilles exists for more than 2 years, before the Datastax Object Mapper is released. Second because Achilles proposes more advanced features, like Dirty Checking, Insert Strategies etc. the official object mapper does not provide. The objective of the Datastax object mapper is to give people a simple skeleton code upon which you can build your own advanced features
Q from Tad Kershner: Why not pre-load Cassandra with the Oracle data prior to launch?
A: If by “pre-loading” your mean just importing raw data using SSTable Loader or something similar, we already thought about it some times. The problem is that the insertion path requires a lot of functional check and business processing so that just dumping data from Oracle to Cassandra is not enough. Not even considering the fact that we also need to manage live production data coming in continuously. The double-run strategy has been designed to cope with such requirement
Q from Thomas Sebastian: How do you handle scenarios where a transaction rollback is required? If not handled, what is the alternative?
A: There is nothing such as transaction rollback with Cassandra because Cassandra does not support at all transactions. We took it into account when designing the data model. It means that all the contacts data model requires no transaction. To guarantee eventual atomicity when updating multiple de-normalised tables, we use the CQL logged batch feature. And if a write fails for some reason, the retry strategy from the Java driver will do the job by sending the write to another replica. Since writes are idempotent in Cassandra, we eliminate a bunch of data integrity issues.
Q from Jega A: You said in your presentation “update paths are very error prone“. What do you mean by this?
A: Since we de-normalised a lot, every update of a mutable field on a contact results in updating 7 tables, more or less. This can be error-prone because the updates require reading data first (read-before-write), identifying the right partition key for each de-normalised table and writing the update. Without a good code coverage and without strong TDD discipline, it would be extremely error-prone.
Q from I Am: How did you manage hotspots when RDBMS keys are not UUIDs?
A: While migrating data from RDBMS to Cassandra, we regenerate all contacts partition keys using UUIDs. Thanks to Cassandra Murmur3Partitioner, the random distribution is guaranteed to be uniform.
Q from Kenneth Rawlings: When using materialized views, what strategy do you use for batching CAS and non-CAS inserts?
A: We simply do not use CAS inserts or updates for the contacts. Apart from during the data migration, there should be no concurrent update on a contact from some user address book. Indeed right now the Libon application does not support multi-device feature so there is indeed no risk of concurrent update so no need for CAS. It may change in the future though.
Q from Mourad Hamoud: How to paginate results (like MySQL with limit/offset keywords)?
A: We use the server-side paging feature. “SELECT * FROM ….” and then setting an appropriate fetchSize to this statement. On the returned ResulSet we ask for an iterator. Quite simple indeed. In fact, in the application all this paging mechanism with the Java Driver is abstracted by Achilles via its Slice Query API
Q from Tony Kyle: Where can we get performance benchmarks for Cassandra when transforming large data sets?
A: You need to measure and benchmark yourself. There is nothing such as an universal benchmark because the results depend on many factors: the hardware, the data model, the Cassandra tuning, the access pattern … I would advise using Gatling and the CQL plugin
Q from Asparuh Polyovski: I would like to ask what utilities do you use for migration from Oracle to Cassandra?
A: No dedicated tool, we wrote our own migration batch in plain Java
Q from Tony Kyle: Would love to see demo on mapper utility next…
A: I coded a scalable chat application using Achilles called KillrChat, you can have a look at it and see how the Achilles works
Q from Rohan D’Costa: Also how do i create a data model what rules do i follow ?
A: There are a bunch of resources on Planet Cassandra that help you with Data Modeling. For example the data modeling serie presented by Patrick McFadin (first, second, and third slides). There is also a free virtual training to get started with Apache Cassandra.
To finish, if you need to remember one thing, just remember this: “Model your table with a partition key on the entity that is likely to scale in your application. If your application will scale in term of users count, put the user_id as partition key, etc.”
Q from Ratnam Tatavarty: Have you encountered scenarios that require counters on contact ids (of a specific type or something similar)?
A: No, we did not use Cassandra counters for the contacts migration, there is no business requirement for counting contacts
Q from Scott Abernethy: Can you describe some of those strategies that allow you to avoid transaction?
A: Same answer as for Thomas Sebastian, we design the data model so that it is idempotent with regard to writes/updates. An update on a single contact in one user address book is an isolated event. We do not need to synchronize with any external state.
For the contact matching feature, where Libon scans an user address book to check whether new subscribers to Libon are present, we did not need ACID transaction either. The contact matching is perform on contact update so if we miss a contact due to concurrent operation this round, the match will be performed the next time Libon scans the address book.
Q from Mitch Gitman: When you do updates, do you literally update the existing rows or do you persist the updates as new rows? The former appears to be fighting against the way Cassandra wants to be used
A: For updates, we use Achilles Direct Update API to just update the appropriate columns, we do not rewrite the whole partition.
Q from Asparuh Polyovski: Also could you tell us how long it will take (approx.) to migrate 1 000 000 000 rows from 1 table
A: It’s not possible to give a figure, even draft figure, since performance results depend on many factors as mentioned earlier
Q from Jega A: How many Cassandra nodes do you have in production? Also, do you use OpsCenter for managing/monitoring your nodes?
A: Right now 10 nodes. Yes we use OpsCenter (the Datastax Enterprise version) to manage the cluster and perform administration tasks like repair, backup …
Q from Yogesh Habib: While inserting records, there was usage of long+timestap(-1 week), it was a calculation. How would it impact if composite key like is used Key(long, timestamp of -1 week)
A: -1 week is a random number, as I said during the webinar, it could be -1 day or -1 hour. The main idea is just to use some timestamp in the past. The timestamp should not be too close to current timestamp to avoid issues in case of time drift between servers, even though we set up an NTP deamon on each. The timestamp is set on each Cassandra column and interfere with Cassandra conflict resolution mechanism, it is not at all related to partition key or clustering column of data or timeuuid type. Please refer to the slides 55 to 62 of this presentation for more details
Q from Shashi Kangayam: How does Achilles compare to Astyanax ?
A: Achilles is using the Datastax Java Driver underneath, so purely CQL. Astyanax is still pulling Thrift even though I heard that they migrate progressively to CQL. Also, Astyanax has been designed first to meet Netflix requirements whereas Achilles was designed for all Java developer in the Cassandra community. Many Achilles features have been created while I was working on Libon project and facing real world project requirements
Q from Sameera Withanage: A step beyond migrating from Oracle. Once in Cassandra, could you briefly discuss options available to support continuous refactoring affecting data model keeping zero downtime in mind?
A: During the lifetime of a project, due to changing requirement, your data model is likely to evolve to meet new requirements. There are 2 kinds of schema change:
Non-structuring changes
: those changes do not impact your primary key. Most of the time it involves adding a new attribute/column to your table. In this case, Cassandra offers a very straightforward operation: ALTER TABLE. For example ALTER TABLE users ADD hobby text.
Please note that altering a table in Cassandra has no impact on existing data on disk. Unlike a relational database, Cassandra won’t need to scan all the data files to perform the change. Altering a table structure only affects the meta data table
Structuring changes
: those changes impact the primary key of your table. Most of the time, the only practical migration strategy is the double-run, e.g. writing to the old and new table (with new structure), copying data from old to new table on live production, and then stop writing to old table and read only from new table
Q from Thomas Sebastian: Have you faced a scenario where changes to RDBMS table was being captured with a change data capture product and you had to implement the same in cassandra ?
A: No, in the Oracle database, we do not use any CDC (Change Data Capture) or triggers because of the code complexity and those solutions are not easily testable. Please note that Cassandra does provides triggers feature but it is still at alpha state and not really recommended because you’re hacking with the write path and potentially kill the performance.
Q from Biswarup Deb: What approach would you use to check all the records were moved from RDBMS or flat files?
A: On the contacts table in Oracle, we added a new column “contact_uuid“. This column is null by default and every time we write the same contact in Oracle and Cassandra, we also fill in the value for this column. So that after the migration, we can scan all the table to look for any contact having contact_uuid nul, e.g. contacts that have been missed by the migration batch.
Q from Ratnam Tatavarty: Did you configure Oracle DB with any storage RAID/SAN/NAS (basically storage magic) for performance and did they come in way of provisioning Cassandra storage?
A: For Oracle, we use a RACK system. For Cassandra, it is a best practice NOT to use shared storage so we provision each Cassandra server with their own local storage (7200 RPM/SSDs)
Q from Scott Abernethy: How does you client side UID generator enforce uniqueness if multiple instances of the application is running?
A: There are multiple versions of UUID, according to the specs. Version 1 is called TimeUUID and provides a way to generate an unique UUID based on the MAC address of the current server, the timestamp of course, but also a clock sequence. By combining all those 3 factors, we can guarantee UUID uniqueness even if we try to generate twice using the same machine (same MAC address) at the same milli-second (clock sequence guarantee fine-grained randomness)
Q from Thomas Sebastian: What is meant by atomic at partition level? Will it lock the updates happening in multiple columns in a row.?
A: According to the official documentation, writes are atomic at the partition level. While inserting data, those stay in memory inside Memtable before being flushed to disk later. Atomicity can be achieved using the appropriate data structures in memory (check the class AtomicBTreeColumns for example in Cassandra source code). Please note that there is no cluster-wide locking because a partition is managed only by one node (+its replicas). On one Cassandra node, partition-level mutations are guaranteed to be atomic. Between different replicas, to guarantee a consistent view of your update, you should choose on the appropriate consistency level
Q from I Am: the primary keys in RDBMS are typically serially increasing numbers and so does cassandra automatically take care of inserting records in all nodes uniformly without hotspots due to such non-UUID partition keys ?
A: What ever is your partition key, it will be hashed using the Murmur3 hash so that the complete partition will be distributed uniformly across all nodes in the cluster.
Now, to be precise, the uniform distribution is only guaranteed for a high number of occurrence. For example if your cluster has 10 nodes, the “users” table has “user_id” as partition key but your insert only 10-100 users, there is no strong guarantee that each node will have 1/10 of the user’s data. It’s all about statistics.
> The easiest example is rolling a dice. If you roll a dice 10 times, there is no guarantee that the distribution of each number is uniform. If you roll it 1000000 times, it would likely be uniform.
Thus, the data model is key. Having a poorly designed data model won’t definitely not help with scaling out, and in this case even the most distributive hash function can’t help.
Q from Shailesh Bhaskaran: Do u have resource like the MySQL to Cassandra for IBM Z/OS (mainframe) DB2 to cassandra ?
A: Yes, there are some interesting links on Planet Cassandra:
Q from Mourad Hamoud: Do you plan to migrate the user accounts to Cassandra?
A: In short term, no. But if we want to rationalise the infrastructure, it can make sense, especially with Lightweight Transaction feature that cover some basic strongly consistent requirements
Q from Scott Abernethy: If a delete fails on one or more nodes and succeeds on others, is there a chance the delete will be lost?
A: This is a classical question with regard to tombstones management. Indeed every delete operation results in the creation of a tombstone column internally, which acts as a deletion marker. Now if the delete succeeds on some replicas but not on others, the repair mechanisms (consistent read, read repair, manual repair) will ensure that the data on all replicas converge eventually.
Now, Cassandra can not keep the tombstones for ever so after a while, they will be removed when compaction is triggered. If the tombstone is removed BEFORE the repair has a chance to kick-in, then you may see deleted columns come back to life. Thus this is a good practice to set the parameter gc_grace_second greater than the scheduled repair period.
Q from Asparuh Polyovski: Some countries regulations imposes liability of the RDBMS DBA administrator to delete personal data for person who is no more customer. How we can realize this (update/delete) in Cassandra?
A: Good question. For most de-normalised tables, the partition key is the user_id so in this case deleting the user’s contacts results just in deleting the whole partition. We also keep track all contacts for a given user in a dedicated table to be able to list all user’s contacts. The contacts cardinality is bounded (on average ≈300 per user, 2000 per user for the biggest address book). We can also use this table to delete data in some tables where the partition key is a composite of user_id and contact_uuid
Q from Shailesh Bhaskaran: Can SAN be used with Cassandra or does it have to be always local SSD’s ?
A: SAN (Storage Area Network) is clearly a bad practice with Cassandra. It is strongly recommended to use local disks. RPM 7200 at least and if you can afford, SSD.
Q from Vipin Kalra: So I just heard about that we shouldn’t be using network storage and instead use local storage. But by doing this are we not susceptible to potentially SPOF ?
A: Local disk can be a SPOF if most of the traffic is re-routed to a single Cassandra node. But in this case, it means that your data model is wrong because the data is not enough distributed. Using a badly chosen partition key can be the root cause.
And that’s it. Should you have any other question, do not hesitate to ping me at duy_hai(dot)doan(at)datastax(dot)com