NoSQL: Inception of a new SQL?

Call it Not-only-SQL or Non-SQL doesn’t matter. What matters is that the Industry Giants are using it (as NoSQL offers Horizontal scalability) and if you want to get in these companies, you got to know NoSQL (say Yes-to-NoSQL). NoSQL can be scaled as per the modern needs of applications. Basically, NoSQL differs from a traditional SQL in terms of the datatypes used and how the data is stored. We can store graphs (e.g.-neo4j), key-value pairs (e.g.-Oracle NoSQL), wide column (as opposed to row-wise data in traditional SQL. e.g.- SAP HANA, Cassandra), and documents (e.g.-MongoDB). Basically, we can store anything from structured to unstructured or semi-structured and we do not necessarily organize the data in a tabular format.

Big giants in the industry like LinkedIn, Facebook, and Google have multiple data servers across the globe. The most important and common characteristic of the data in these servers is that it is distributed in nature (spread across the globe in different servers). It is difficult for a traditional SQL DataBase to capture and process the data across these distributed servers(use of ad-hoc joins across the network would cause slow data processing). This is where the need for a distributed SQL database kicks in. Now you know why the term NoSQL is so widely used these days.

Every coin has a flip side. Likewise, there is a downside to NoSQL as well. The horizontal scalability (Ability to add new servers easily) which NoSQL offers is gained by compromising on some but not all of the properties of data like Availability, Consistency, Isolation, Durability (ACID). The compromise depends on many different factors like the type of application, the type of data stored and so on. A particular distributed system can have three major characteristics which are Consistency (C), Availability (A) and Partition (P). Ideally, all the three characteristics are essential for a perfect system but in practice, we can have only two out of the three. In case of distributed systems, the P is mandatory (since the network is distributed, the data is partitioned across the network). So, we have to make a choice between C and A. Consistency means that every transaction receives the most recent data and Availability means that data is 100% available at all times for each transaction. Most of the NoSQL database use the AP model where they compromise on consistency. NoSQL uses a mechanism called as Eventual Consistency where data changes are propagated to the nodes eventually but not immediately. In this case, the data that is provided may not be the latest data but it is the previously stored data (stale data).

NoSQL databases differ from each other in terms of datatypes they are used to store. They are primarily used for applications that have flexible and scalable architecture. NoSQL provides lightweight and faster querying. Different NoSQL databases have different querying syntax based on the type of data they are used to store. NoSQL is preferred in applications that do not rely much on ACID properties.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s