Mapping Relational Databases and SQL to MongoDB

NoSQL databases have emerged tremendously in the last few years owing to their less constrained structure, scalable schema design, and faster access compared to traditional relational databases (RDBMS/SQL). MongoDB is an open source document-oriented NoSQL database which stores data in the form of JSON-like objects. It has emerged as one of the leading databases due to its dynamic schema, high scalability, optimal query performance, faster indexing and an active user community.

If you are coming from an RDBMS/SQL background, understanding NoSQL and MongoDB concepts can be bit difficult while starting because both the technologies have very different manner of data representation. This article will drive you to understand how the RDBMS/SQL domain, its functionalities, terms and query language map to MongoDB database. By mapping, I mean that if we have a concept in RDBMS/SQL, we will see what its equivalent concept in MongoDB is.

We will start with mapping the basic relational concepts like table, row, column, etc and move to discuss indexing and joins. We will then look over the SQL queries and discuss their corresponding MongoDB database queries. The article assumes that you are aware of the basic relational database concepts and SQL, because throughout the article more stress will be laid on understanding how these concepts map in MongoDB. Let’s begin.

Each database in MongoDB consists of collections which are equivalent to an RDBMS database consisting of SQL tables. Each collection stores data in the form of documents which is equivalent to tables storing data in rows. While a row stores data in its set of columns, a document has a JSON-like structure (known as BSON in MongoDB). Lastly, the way we have rows in an SQL row, we have fields in MongoDB. Following is an example of a document (read row) having some fields (read columns) storing user data:

This document is equivalent to a single row in RDBMS. A collection consists of many such documents just as a table consists of many rows. Note that each document in a collection has a unique _id field, which is a 12-byte field that serves as a primary key for the documents. The field is auto generated on creation of the document and is used for uniquely identifying each document.

To understand the mappings better, let us take an example of an SQL table usersand its corresponding structure in MongoDB. As shown in Fig 1, each row in the SQL table transforms to a document and each column to a field in MongoDB.

Figure 1 Mapping Table to Collection (1)

Figure 1

One interesting thing to focus here is that different documents within a collection can have different schemas. So, it is possible in MongoDB for one document to have five fields and the other document to have seven fields. The fields can be easily added, removed and modified anytime. Also, there is no constraint on data types of the fields. Thus, at one instance a field can hold int type data and at the next instance it may hold an array.

These concepts must seem very different to the readers coming from RDBMS background where the table structures, their columns, data types and relations are pre-defined. This functionality to use dynamic schema allows us to generate dynamic documents at run time.

For instance, consider the following two documents inside the same collection but having different schemas (Fig 2):

Figure 2 Documents in a Collection having different structure

Figure 2

The first document contains the fields address and dob which are not present in the second document while the second document contains fields gender andoccupation which are not present in the first one. Imagine if we would have designed this thing in SQL, we would have kept four extra columns for address,dob, gender and occupation, some of which would store empty (or null) values, and hence occupying unnecessary space.

This model of dynamic schema is the reason why NosSQL databases are highly scalable in terms of design. Various complex schemas (hierarchical, tree-structured, etc) which would require number of RDBMS tables can be designed efficiently using such documents. A typical example would be to store user posts, their likes, comments and other associated information in the form of documents. An SQL implementation for the same would ideally have separate tables for storing posts, comments and likes while a MongoDB document can store all these information in a single document.

Relationships in RDBMS are achieved using primary and foreign key relationships and querying those using joins. There is no such straightforward mapping in MongoDB but the relationships here are designed using embedded and linking documents.

Consider an example wherein we need to store user information and corresponding contact information. An ideal SQL design would have two tables, sayuser_information and contact_information, with primary keys id andcontact_id as shown in Fig 3. The contact_information table would also contain a column user_id which would be the foreign key linking to the id field of theuser_information table.

Figure 3

Figure 3

Now we will see how we would design such relationships in MongoDB using approaches of Linking documents and Embedded documents. Observe that in the SQL schema, we generally add a column (like id and contact_id in our case) which acts as a primary column for that table. However, in MongoDB, we generally use the auto generated _id field as the primary key to uniquely identify the documents.

This approach will use two collections, user_information andcontact_information both having their unique _id fields. We will have a fielduser_id in the contact_information document which relates to the _id field of the user_information document showing which user the contact corresponds to. (See Fig 4) Note that in MongoDB, the relations and their corresponding operations have to be taken care manually (for example, through code) as no foreign key constraints and rules apply.

Figure 4 Linking Documents in MongoDB

Figure 4

The user_id field in our document is simply a field that holds some data and all the logic associated with it has to be implemented by us. For example, even if you will insert some user_id in the contact_information document that does not exist in the user_information collection, MongoDB is not going to throw any error saying that corresponding user_id was not found in the user_informationcollection(unlike SQL where this would be an invalid foreign key constraint).

The second approach is to embed the contact_information document inside theuser_information document like this (Fig 5):

Figure 5 Embedding Documents in MongoDB

Figure 5

In the above example, we have embedded a small document of contact information inside the user information. In the similar manner, large complex documents and hierarchical data can be embedded like this to relate entities.

Also, which approach to use among Linking and Embedded approach depends on the specific scenario. If the data to be embedded is expected to grow larger in size, it is better to use Linking approach rather than Embedded approach to avoid the document becoming too large. Embedded approach is generally used in cases where a limited amount of information (like address in our example) has to be embedded.

To summarize, the following chart (Fig 6) represents the common co-relations we have discussed:

Figure 6 Mapping Chart

Figure 6

Now that we are comfortable with the basic mappings between RDBMS and MongoDB, we will discuss how the query language used to interact with the database differs between them.

For MongoDB queries, let us assume a collection users with document structure as follows:

For SQL queries, we assume the table users having five columns with the following structure:

Figure 7 Sample SQL Table

Figure 7

We will discuss queries related to create and alter collections (or tables), inserting, reading, updating and removing documents (or rows). There are two queries for each point, one for SQL and another for MongoDB. I will be explaining the MongoDB queries only as we are quite familiar with the SQL queries. The MongoDB queries presented here are written in the Mongo JavaScript shell while the SQL queries are written in MySQL.

In MongoDB, there is no need to explicitly create the collection structure (as we do for tables using a CREATE TABLE query). The structure of the document is automatically created when the first insert occurs in the collection. However, you can create an empty collection using createCollection command.

To insert a document in MongoDB, we use the insert method which takes an object with key value pairs as its input. The inserted document will contain the autogenerated _id field. However, you can also explicitly provide a 12 byte value as _id along with the other fields.

There is no Alter Table function in MongoDB to change the document structure. As the documents are dynamic in schema, the schema changes as and when any update happens on the document.

MongoDB uses the find method which is equivalent to the SELECT command in SQL. The following statements simply read all the documents from the postscollection.

The following query does a conditional search for documents having user_name field as mark. All the criteria for fetching the documents have to be placed in the first braces {} separated by commas.

The following query fetches specific columns, post_text  and post_likes_count as specified in the second set of braces {}.

Note that MongoDB by default returns the _id field with each find statement. If we do not want this field in our result set, we have to specify the _id key with a 0value in the list of columns to be retrieved. The 0 value of the key indicates that we want to exclude this field from the result set.

The following query fetches specific fields based on the criteria that user_name ismark.

We will now add one more criteria to fetch the posts with privacy type as public. The criteria fields specified using commas represent the logical AND condition. Thus, this statement will look for documents having both user_name as mark andpost_privacy as public.

To use logical OR between the criteria in the find method, we use the $oroperator.

Next, we will use the sort method which sorts the result in ascending order ofpost_likes_count(indicated by 1).

To sort the results in descending order,  we specify -1 as the value of the field.

To limit the number of documents to be returned, we use the limit method specifying the number of documents.

The way we use offset in SQL to skip some number of records, we use skipfunction in MongoDB. For example, the following statement would fetch ten posts skipping the first five.

The first parameter to the update method specifies the criteria to select the documents. The second parameter specifies the actual update operation to be performed. For example, the following query selects all the documents withuser_name as mark and sets their post_privacy as private.

One difference here is that by default, MongoDB update query updates only one (and the first matched) document. To update all the matching documents we have to provide a third parameter specifying multi as true indicating that we want to update multiple documents.

Removing documents is quite simple and similar to SQL.

Share This:

Leave a Reply

Your email address will not be published. Required fields are marked *