Back Story: Back in 2016 I was pursuing an Informatics degree at the University at Albany and I had to take an Intro to Databases course.
Within the course, I learned two things:
Relational Databases and non-relational Databases, but I never truly learned the difference between the two, so I wanted to rekindle my curiosity with databases and the differences between the two.
What is a relational database?
- Defined by E.F Codd in 1970, a relational database is a digital database based on the relational model of data. The data is stored in tables containing rows (which represents an entry) and columns(which stores and sorts a specific type of information). Relationships are established through Primary and Foreign keys.
Example of a relational database model
Languages: SQL(Structured Query Language), MySQL, PostgreSQL, sqlite3
Advantages:
- Can handle lots of complex queries, database transactions, and routine analysis of data.
- ACID(Atomity, Consistency, Isolation, Durability): Set of properties that ensure reliable database transactions.
Disadvantages:
- Cannot store complex or very large images, numbers, designs and multimedia products
- Can become very costly with maintenance and new servers
Great! But what if our datasets are too large or unstructured?
Non-Relational Database or NoSQL
What is a Non-Relational Database?
- Non-relational databases existed in the late 1960?s, but the term was not used until 1998 by Carlo Strozzi who led the development of NoSQL.
- ?A NoSQL (originally referring to “non SQL” or “non relational”)[1] database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.? ? Wikipedia
Types of NoSQL databases:
Column Store: Uses the concept of keyspace which contains all the column families that contains rows and columns to store and organize data.
Each row contains a family of columns
Database: Apache HBase, Cassandra
Key-Value Store: Uses an associative array(map or dictionary) as their fundamental data model. The data is represented as a collection of key-value pairs and a key will show up at most once in the collection. You can store a value, such as an integer, string, a JSON structure, or an array, along with a key used to reference that value.
Each row has it?s own ID and values
Database: Redis, Amazon DynamoDB
Graph: Is a database that utilizes graph structures to represent and store data. This allows users the ability to traverse quickly among all the connected values and find insights in the relationships.
All Movies Kevin Bacon acts in
Database: neo4J, OrientDB, Titan
Document-Oriented Database: A document-oriented database, or document store, is a computer program designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.
Database: MongoDB, Couchbase
Advantages:
- Large volumes of structured, semi-structured, and unstructured data
- Object-oriented programming that is easy to use and flexible (MongoDB is basically written in javascript)
- Efficient, scale-out architecture instead of expensive, monolithic architecture
Disadvantages:
- Less support since NoSQL databases are usually open-source
- Administration: NoSQL databases requires technical skill in order to install and maintain.
- Less mature. NoSQL databases are still growing and many features are still being implemented.
Which one to use?
Companies utilize a mixture of both traditional and non-relational databases to meet their business requirements and needs.
Relational database:
Traditional relational databases are very good at keeping your data transactions secure and making complex queries to acquire information. Companies that are already structured and are not experiencing massive growth will most likely stick to traditional databases.
Non-relational databases:
Great at storing large amounts of data with little structure. Companies growing at a rapid pace like startups utilize more non-relational databases for it?s scalability and flexibility. Paired with the cloud, non-relational databases can also save companies a lot of money.
Data Fun Facts:
- Data is growing faster than ever before and by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet.
- By then, our accumulated digital universe of data will grow from 4.4 zettabyets today to around 44 zettabytes, or 44 trillion gigabytes.
- We are seeing a massive growth in video and photo data, where every minute up to 300 hours of video are uploaded to YouTube alone.
- This year, over 1.4 billion smart phones will be shipped ? all packed with sensors capable of collecting all kinds of data, not to mention the data the users create themselves.
- Within five years there will be over 50 billion smart connected devices in the world, all developed to collect, analyze and share data.
References:
https://www.hadoop360.datasciencecentral.com/blog/advantages-and-disadvantages-of-nosql-databases-what-you-should-k
https://www.mongodb.com/scale/advantages-of-nosql
https://www.techwalla.com/articles/disadvantages-of-a-relational-database