MongoDB vs MySQL – which one to choose? Both are free and both are well-known and popular. Which one will allow your application to grow over time? Which one will scale when the time comes for larger data volumes?
Talk about MongoDB for a moment and you will find a flexible, schema-less database that you can use right away. No data modeller, no problem. Just store your data and build your application. When you have more data, add it to your MongoDB without changing the schema because it doesn’t have one nor does it ever need one. Compare that with MySQL and you will learn the opposite. Here is one database that will need a data modeller or at least, someone who is capable of seeing the entities and relationships in your data. A schema is needed in MySQL first before any data can be stored. Yes, this one needs some planning upfront unlike MongoDB.
These two databases are popular and make great data stores in any application but finding the right one will require you to understand how the data is to be queried upon. If there are relationships between the data then MySQL is the right choice. If there are no relationships between documents then MongoDB should be chosen over MySQL because MongoDB lacks joins and makes up the difference when you need to either find documents in an unstructured data store or when you are looking up a key value.
Before deciding on one over the other, let us look at both databases from a comparative point of view:
|Tables, rows, columns
|Performance with Large Data sets
MongoDB is known to be schema-less which makes it extremely flexible and versatile because you can drop a few documents in a collection and run with it. Documents can differ too, in the number of fields or type of data stored in these fields. As new documents are stored, there is no schema change required.
Not so for MySQL because any change to the schema requires the application to shut down and be disconnected from the database. MySQL requires a schema as a prerequisite and needs to be defined in terms of tables and relations with a fixed structure. Traditionally, a data modeller is required to conceptualize the data and model it into normalized tables to prevent data redundancy and consistency.
MySQL uses SQL to access data. SQL or Structured Query Language uses commands such as INSERT, SELECT, UPDATE and DELETE to manage data. Related information is implemented as joins and allows queries to retrieve data from multiple tables that are related. With a relational schema, there is no duplication of data.
With MongoDB, the find() command is used to find documents. You can specify as many key-value pairs as required to find documents that match a criteria. To manage data, you have UPDATE, SAVE and REMOVE methods.
One single advantage of MongoDB over MySQL is its ability to handle large unstructured data. The performance and speed using MongoDB in querying collections of documents is clearly faster than finding data in large datasets in MySQL.
Full text search is the ability to efficiently search strings within strings like finding a keyword or multiple keywords in a large text string. To speed up searches, every word is indexed. Also most database systems strip off stop words such as ‘a’, ‘the’ or ‘and’. With MongoDB, the performance is relatively good. That’s because there are many indexing options available with MongoDB. One disadvantage of MongoDB is that it does not provide much for partial text search. Luckily there are third party tools that assist with that.
When it comes to MySQL, full text search is serious business. FULLTEXT is the index type for full text search columns which can be defined for CHAR, VARCHAR or TEXT columns. Full text searching is performed using the MATCH..AGAINST syntax. You can specify a list of columns to search in and a string. There are three types of full-text searches including a boolean search, natural language search and a query expansion search.
With MongoDB, it is possible to split up a collection and store it in separate shards. This allows an application to grow beyond resource limits of a standalone server or replica sets.
Also it can handle better distribution of data across multiple nodes and help with load balancing with queries. MySQL also has the ability to perform range based data partitioning. This supports automatic data volume and transparent query routing.
Unlike MongoDB, MySQL has no standard sharding implementation. There are two ways to provide sharding but none of them are standard. MySQL Cluster and MySQL Fabric provide built-in automatic sharding but they are rarely deployed.
Both MongoDB and MySQL utilizes indexes to select data. The only difference is how each database stores the data and how well the database format matches the underlying data structure in your application.
With MongoDB, indexes help select whole collections of documents versus how MySQL selects individual rows in a table. Retrieving data by ID in both databases are fast but majority of times, applications don’t query data using ID’s. Instead, it is tries to query data using related fields that are more meaningful to us such as first names, last names and addresses.
MySQL depends on indexes to improve performance on database selects unless tables are too small. But for large tables, indexes can dramatically improve performance. Administrators of MySQL can perform query optimizations on table selects to analyze how the database executes the query. From the analysis, indexes can be created on columns that can improve performance. This practice is repeated for each table to determine the best indexes to be created and used over all the possible queries.
As much as an index helps improve performance, it also hinders it when an insert or update operation is performed. When an insert or update occurs, indexes are updated. This is the overhead of maintaining indexes but it is well worth the cost because it speeds up performance when selecting data.
Authentication is the process of validating a client that is trying to connect to a database. MongoDB offers various methods such as:
SCRAM-SHA-1:Salted Challenge Authentication Mechanism uses simple text-based usernames and passwords transmitted over a channel protected by transport layer security (TLS).
MongoDB-CR: Like SCRAM, MongoDB-CR verifies a username and password against an authentication database. MongoDB-CR was removed from Version 3.0, and only older iterations use it today.
When it comes to giving access to an authenticated user, one of the best features of MongoDB is the role-based security. Role-based security allows roles to be defined with permissions to the database. Once a user is defined with a role, the user cannot access the system beyond what is allowed.
Access to MySQL database tables is slightly different than of MongoDB. Instead of role-based, MySQL uses access control and account management. Every user is granted privileges and permissions to select, update, insert and delete data. A user account can be individually permitted to access the database in any way defined. This allows permissions to be defined at a very minute detail instead of granting access at a role level.
Previously MTV used a Java-based content management platform with a fixed structure schema that became a hinderance whenever new tables or columns were added. They decided to move to MongoDB to store hierarchical data, scale with database clustering and bring more brands onto their existing platform.
When Twitter was first implemented, they stored tweets by the same date range on the same shard. Scaling simply meant filling up one machine with shard after shard.
- Load Balancing. People are only interested in what is happening now. This caused queries to look at the newer machines and not the older machines that had older tweets.
- Costly. As a machine is being filled up with all of its replication slaves, the process is expensive.
- Complicated. Building a new cluster every three weeks is both time consuming and tedious.
Twitter now implements a tweet store which is built using Gizzard – a MySQL product.
On a limited budget, the BBC wanted to bring the news to its 35 million unique readers in real time. MySQL was chosen to provide up to date info on the top story read during the hour and the number of stories read during the day.
Size of the BBC database consists of 8 million rows with the largest table being up to 4 million records.
To help you decide on which database is better for you, we have assembled a table of the pros and cons of each database for you:
Can store data without a schema in place.
Allows ‘joins’ and to define cardinality.
Supports both master-slave and master-master replication.
Handles unstructured data very fast.
Supports atomic tranactions.
Lack of joins.
Need to define tables and columns before storing data.
Only supports master-slave replication.
Multiple copies of the same data can occur when tables are not normalized.
Does not support atomic transactions.
No standard sharding implementation.
Performance may hinder on large databases.
Some of the biggest names use MongoDB and includes:
- NBC Universal
- Under Armour
And with MySQL, there are equally the same number or more customers that include:
- Cardinal Health
- Continental Airlines
Disclosure of Material Connection: Some of the links in the post above are “affiliate links.” This means if you click on the link and purchase the item, I will receive an affiliate commission. Regardless, I only recommend products or services I use personally and believe will add value to my readers.