Introduction to NoSQL & Document Data Store

The following article provides a high-level overview of NoSQL databases and the various associated data store types related to these kinds of databases. A particular section of the article is dedicated to a brief summary of the Document Oriented NoSQL databases. I provide example data that illustrate how Document NoSQL database store the data and also outline the most significant differences between the relational type of SQL database and document-oriented NoSQL.

 

What is NoSQL Database?

The term NoSQL was originally coined by Carlo Strozzi (creator of Strozzi NoSQL) in 1998 and later popularized by Johan Oskarsson (Last.fm) who organized an event to discuss non-relational databases in early 2009.

A typical first reaction to hearing the ‘NoSQL database’ is that it is a database that does not use any SQL-like queries. That would be a very inaccurate statement, so let’s start right at the beginning and briefly clarify a bit of history behind the NoSQL databases. One of the best ways I found to explain the “NoSQL” moniker quickly, is to use the color-coded format that clearly accentuates the meaning behind the abbreviated form and at least for me, immediately resolved the confusion:

NotOnlySQL

or in this form:

As we can see, the above highlights the fact that NoSQL types of databases may support SQL-like formats and queries, but they are not restricted to them. NoSQL encompasses a much broader diversity of storing data, and it is, in fact, this range of different storage formats which eventually resolved in having various categories of NoSQL databases.

To keep it short, the main reason why NoSQL type of databases was born in the early days of twenty-first century 2000 and ultimately quickly gained the popularity, was related mainly to pressures coming from the growing sector of Big Data and Big Data analytics. We could conclude that the creation of NoSQL databases was merely a response to market demands. The Big Data companies required more efficient ways of working with the large volumes of data, as well as having an ability to effectively work with unstructured data and the data of loose consistency. These tasks were not easy accomplished by using the traditional relational types of SQL databases. The new data store types that are commonly used in today’s NoSQL databases allow a much faster processing than those employed in the traditional RDBMS.

 

NoSQL Databases and Data Store Types

Because the new NoSQL databases are and can be formed on top of nearly any current or future data format, and because many of the data store types tend to overlap, to categorize the NoSQL databases is not exactly an easy task.

That said, we do recognize the most common categories of data store types, mainly the following: Key-Value, Column, Graph, Document, and Multi-Model. However, it is important to note, that there are also other less prevalent data store types, such as tabular, tuple, RDF and other less widespread NoSQL databases. In total, there is currently over 225 different implementations of NoSQL, some of which use more than one distinctive ways to describe themselves.

The trend seems to be that new NoSQL databases are often created to promote a solution to a particular problem. So we see the new databases that are schema-free, offer simple replication support, easy API, eventually consistent / BASE (not ACID), horizontally scalable, distribution optimized, or those databases tailored to work with massive amounts of data.

The following table is my attempt at summarizing the most popular as well as those more obscure NoSQL data store types. The list also provides the associated NoSQL implementations (databases) that use these specific types of storing data.

NoSQL Database Short Description Implementations
Key-value store A collection of data characterized by a key and value pair. Oracle NoSQL DB, Dbm, Redis
Document store Each key is paired with the document of various document data types, such as XML, JSON or BSON (binary JSON) MongoDB, CouchDB

Clusterpoint, DocumentDB, Elasticsearch, MarkLogic

Graph store Uses graph structures to define data relationships. Allows relating data instances directly to one another. Neo4j, AllegroGraph, Oracle Spacial and Graph, Teradata Aster, ArangoDB
Wide Column Store A key-value type of database that uses tables, rows, and columns. In the same table, each of the rows can have different names and column formats. BigTable, Cassandra, Druid, HBase, Hypertable
Object database An object-oriented type of database that combines database abilities with object-oriented programming capabilities. DB4O, Objectivity/DB, Perst, Shoal, ZopeDB
Tabular Data elements are arranged in columns and rows, forming cells in the column/row juncture. Apache Accumulo, BigTable, Apache Hbase, Hypertable, Mnesia, OpenLink Virtuoso
Tuple store Data stored as an ordered list of elements. Apache River, GigaSpaces, Tarantool, TIBCO ActiveSpaces

OpenLink Virtuoso

RDF database Data elements are composed of subject-predicate-objects. Similar to a relational database where a query language retrieves data from a triplestore, but more optimized. MarkLogic, Ontotext-OWLIM, Virtuoso Universal Server, Stardog
Hosted Fast and flexible NoSQL database services hosted in a cloud infrastructure. Amazon DynamoDB & SimpleDB, Datastore on Google, Appengine, Cloudant Data Layer (CouchDB), Freebase. Microsoft Azure Tables & DocumentDB, OpenLink Virtuoso
Multivalue databases A variation of the relational model that organizes data in multidimensional structures with relationships amongst data. D3 Pick, jBASE Pick, ESE/NT, InfinityDB, OpenQM, OpenInsight, Rocket U2
Multimodel database Support document, graph, relational, key-value data and many other models, in one integrated backend. Couchbase, FoundationDB. MarkLogic. OrientDB

The following image outlines the key players/products in each of the major types of NoSQL database stores:

Image result for nosql Data Store Types

Image Copyright: http://www.tomsitpro.com/articles/rdbms-sql-cassandra-dba-developer,2-547-2.html

Document Type of NoSQL Database

Out of all the previously mentioned ways of storing and working with the data, one of the most commonly associated with a NoSQL database is the Document data store. As a matter of fact, open sourced MongoDB and CouchDB that use a JSON document to store information to the database and utilize JavaScript as the query language are currently also the two of the most popular NoSQL databases.

What is Document NoSQL and how it works?

NoSQL Document databases usually pair each of the keys saved to a database alongside a complex data structure, the so-called ‘document.’ The document itself can come in various document types, but one of the most common document types used in Document NoSQL databases are JSON and XML formats that are well-known to programmers. However, even though the Document NoSQL databases most often use JSON or XML, they are not restricted to just these two formats, there are many different implementations, for example, some of them also use BSON (binary JSON) format.

Document NoSQL Example

The following is a good example of the way Elasticsearch (another document NoSQL that uses JSON) would index a new document into the database. Note: Elasticsearch is another very popular implementation, and it is the NoSQL database that we also currently use in my workplace. “It uses JSON documents to store data and provides RESTful HTTP API to create and update database documents.” (Nayak, A., Poriya, A., & Poojary, D., 2013).

Let’s assume we need to insert a brand new JSON document into the “database” index, under a type called “message” with an id of “1”, this is the command that we would need to issue to Elasticsearch Index API:

PUT database/message/1

 

Elasticsearch would also reply in JSON, with the following message, confirming a successful save operation. Example:

 

Document NoSQL vs. SQL

SQL type databases are often abbreviated as RDBMS (relational database management systems), these include the big ones such as Oracle, Microsoft SQL and DB2 from IBM, and also open-source MySQL or PostgreSQL.

Image Copyright: slideshare.net/Couchbase/webinar-making-the-shift-from-relational-to-nosql

 

The following are some of the categories that come to mind to highlight the differences between the Document NoSQL and the typical RDBMS type of databases.

  • No Tables

The major difference between SQL and Document NoSQL lies in fact, that unlike SQL that stores the data inside the tables NoSQL database is document based.

  • Dynamic Schema

Another differentiator is something that will be less familiar to someone coming from the RDBMS world. SQL databases must have the predefined schema, which is often not a requirement with a Document type of NoSQL databases. For example, a typical SQL query would fail to store the data to SQL database if it contained column names that are not part of the predefined table schema.

This is not how it works in Document NoSQL databases. Let me illustrate this with an example. Let’s assume that we have a JSON document ID #1 (same as in our Elasticsearch sample) that contains a collection of specific name/value pairs (such as “user”: “Jozef Jarosciak”, or “date”: “2017-01-29T10:15:17”) and which will be attached and saved alongside the key #1 into NoSQL database. The following JSON document ID #2, which may have some additional name/value pairs (such as “subject”: “Welcome”) will not fail to attach to a key under ID #2, because NoSQL databases flawlessly support the dynamic schema types. And that would also be one of the biggest advantages of NoSQL solutions built on the document type of data stores.

  • Scalability

NoSQL databases are horizontally scalable, which is a different realm from vertically scalable SQL databases. An easy way to explain this would be to say that SQL databases can only be scaled by increasing the resources (CPU, RAM, HDD) of the machine that runs the database. NoSQL databases, on the other hand, are scaled horizontally, which means that to increase the database resources we only need to add more servers to the overall pool.

  • Complex Queries

This is one of the disadvantages of NoSQL databases. SQL, in my opinion, allows a lot better control over the query and is better suited to execute even the most complex queries. Queries that can be written in NoSQL query languages do not appear to offer the same powerful flexibility as the ones used in SQL query languages.

  • Property Types

NoSQL databases use the Brewers CAP theorem (Consistency, Availability, and Partition Tolerance), which states that it is not possible for NoSQL type of database to guarantee more than two of these 3 properties, which is dramatically different from the SQL database properties point of view, because SQL uses ACID properties for each transaction (Atomicity, Consistency, Isolation and Durability).

  • Performance

NoSQL document databases give a much better performance for simple queries; there is no need to run SQL type of join operations on the data sets, as all associated data are contained within a single document. There are typically also no integrity checks, which usually slow down SQL databases.

The SQL databases are conventionally used by businesses mainly due to their advantages when working with the structured data hosted on the vertically scalable database infrastructure. However, they’re not exactly performant when it comes to Big Data.

And that is where NoSQL databases came as the ultimate solution and the response to market’s demands of working efficiently with the Big Data and its massive amounts of non-structured data that often also come in an extensive variety of different data types. Document NoSQL databases are in this regard far better than typical relation database management systems, because they offer one of the most efficient ways handling enormous data volumes.

References

Nayak, A., Poriya, A., & Poojary, D. (2013). Type of NOSQL databases and its comparison with relational databases. International Journal of Applied Information Systems, 5(4), 16-19. (Accessed: 27 January 2017).

Cattell, R. (2011). Scalable SQL and NoSQL data stores. Acm Sigmod Record, 39(4), 12-27. (Accessed: 27 January 2017).

Padhy, R. P., Patra, M. R., & Satapathy, S. C. (2011). RDBMS to NoSQL: reviewing some next-generation non-relational database’s. International Journal of Advanced Engineering Science and Technologies, 11(1), 15-30. (Accessed: 27 January 2017).

Oussous, A., Benjelloun, F. Z., Lahcen, A. A., & Belfkih, S. (2013). Comparison and classification of nosql databases for big data. International Journal of Database Theory and Application, 6(4.2013). (Accessed: 29 January 2017).

Keith, M., & Schincariol, M. (2013). Introduction. In Pro JPA 2 (pp. 1-14). Apress. (Accessed: 29 January 2017).

Issac, L.P. (2014) SQL vs NoSQL database differences explained with few example DB. Available at: http://www.thegeekstuff.com/2014/01/sql-vs-nosql-db/?utm_source=tuicool (Accessed: 28 January 2017).

Index APIedit (2017) Available at: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html (Accessed: 28 January 2017).

Edlich, S. (2016) NOSQL databases. Available at: http://nosql-database.org/ (Accessed: 28 January 2017).

MongoDB (2017) NoSQL databases explained. Available at: https://www.mongodb.com/nosql-explained (Accessed: 29 January 2017).

Mayo, M. (2015) Top NoSQL Database Engines. Available at: http://www.kdnuggets.com/2016/06/top-nosql-database-engines.html (Accessed: 29 January 2017).

Comments

comments