Cassandra is a NoSQL database. You can store any kind of data in NoSQL. The particular format is not required. Big data developer must learn these unique features.
Data is stored in column format. Each column has its value in vertical format. Each column group there is a row key.
Data is replicated in all nodes. The nodes arrange like a ring. So Cassandra clusters also called ring nodes.
- Running Cassandra on single node is not useful.
- Multi node environment, the data on each node is replicated in other nodes
- When original node is down, during peer to peer communication, replica will answer
You can see in the below image, there are 3 families. One is Hospital, Kitchen and Sports. Like this in Cassandra each type of data stores in particular column family.
The keyspace is an outermost container in Cassandra. It saves all of your data in Keyspace. Single key-spaces normally enough per cluster. To store more applications, you can also create multiple Keyspaces per cluster.
- Replication factor is ‘3’. That means each row has ‘3’ replica
- Replicas placed based on the strategy you have given during creation of Keyspace
- Define key ranges for all nodes. Based on user request, if the key range falls, then that particular node-replica responds to the user. Placement of replica strategy you need to define while creating Keyspace.
- Each row has one Column family. Multiple column families you can create per Keyspace.
More on Hadoop
- Hadoop Sqoop vs Flume Vs Storm to process data
- 4 top databases to store variety big data
- Hadoop real-time top applications you need