Code

NoSQL: What is this database and how to work with it / ITech content

NoSQL: What is this database and how to work with it / ITech content

Course with employment: “Web developer”

Learn more

Previously, data in online services was mainly stored in relational databases (RDBs), which had strictly defined schemas and relationships between tables. This provided structured storage of data, for example, information about orders in online stores or about users. However, such databases have limitations in scalability and are not always suitable for working with arrays of data with diverse structures. As technology developed, the need for more flexible data storage solutions arose, which led to the popularization of NoSQL databases, which can effectively process unstructured and semi-structured data.

Over time, services became increasingly complex and began to actively work with unstructured data, such as images, video, and audio. To effectively store these formats, the need for more flexible and simpler tools arose, which led to the development of non-relational database management systems, known as NoSQL. In this article, we will take a detailed look at the features and benefits of NoSQL databases, their application in modern projects, and key aspects to consider when choosing the right data storage system.

In this material, you will receive information about the following aspects:

  • What is NoSQL;
  • Why the NoSQL model was needed;
  • What are the advantages of NoSQL databases;
  • How do such databases work;
  • What tasks are they suitable for;
  • What types are there.

An expert is a specialist with deep knowledge and experience in a particular field. Experts play a key role in various fields, such as science, technology, business, and art. Their opinions and recommendations often have a significant impact on decision-making and project development. Expertise is formed through many years of practice, training, and continuous skill improvement. Understanding trends and innovations in their field allows experts to remain relevant and in demand. It is also important that experts can share their knowledge with others, thereby contributing to the development of entire communities and industries.

Skillbox expert and program director of the Data Science faculty, as well as the head of kongru.consulting. Author of the popular Telegram channel "Analytics Today." He has over 12 years of experience in analytics, which allows him to deeply understand modern trends and tools in this field.

What is NoSQL

NoSQL (not just SQL) is a broad term covering various data management technologies that differ from traditional relational databases that use SQL. NoSQL includes systems such as columnar, graph, and document-oriented databases, as well as key-value models. These technologies provide greater flexibility and scalability, making them ideal for working with large volumes of data and dynamic structures. Using NoSQL allows you to efficiently process unstructured data and maintain high performance in the face of growing demands for information storage and processing. The difference between relational and non-relational data stores lies in the methods and approaches to organizing and storing information. Relational databases use tables to structure data, where each record has a fixed structure and the relationships between tables are implemented through keys. This ensures strict data integrity and the ability to execute complex queries using the SQL language. In contrast, non-relational data stores, such as NoSQL, offer more flexible schemas, allowing you to store data in a variety of formats, including documents, graphs, or key-value pairs. This approach allows you to quickly adapt to changes in the data structure and scale as the volume of information increases. Non-relational databases are particularly effective for working with large volumes of data that do not require a strict schema, making them popular for modern web applications and big data processing systems.

Thus, the choice between relational and non-relational storage depends on the specific project requirements, the volume of data processed, and the required flexibility in information management.

In relational databases, data is structured in tables with a fixed number of columns. Tables can be related to each other through common fields. For example, the users table may contain a group field that indicates the number of the group to which the user belongs. This field establishes a link to the groups table, which contains information about different user groups. This model works effectively with similar data, ensuring integrity and ease of information management. Relational databases allow for complex queries and support normalization rules, which helps reduce data duplication and simplifies its processing.

Working with data of different formats in relational databases can be challenging due to the need to standardize the data. The rigid structure of SQL is not always suitable for flexible tasks, so it is advisable to add entities with only the necessary fields. MongoDB, one of the most well-known NoSQL database management systems, offers this capability thanks to its document-oriented approach. In the following article, we will take a closer look at the features and benefits of working with MongoDB.

Why the NoSQL Model Emerged

The rigid structure of relational databases is only one of their drawbacks. Other problems include slowness, dependence on a single point of access, limited scalability, and difficulty processing large volumes of data. These drawbacks are a consequence of the ACID standards on which relational databases are based. ACID standards, which ensure reliability and data integrity, can negatively impact the performance and flexibility of the system, especially under high loads and the need to process large amounts of information.

Atomicity is a key property of transactions in database management systems, which guarantees that a transaction will be executed in full or not at all. This means that if an error or failure occurs, all changes made within the transaction are rolled back, and the database returns to its original state. A transaction, in the context of databases, encompasses various operations, such as adding, updating, or deleting records. Atomicity ensures data integrity and avoids partially completed operations, which is critical for the reliable operation of information systems. Consistency is an important principle in transaction management. It implies that the state of data before and after the transaction must remain consistent. This principle can be compared to the law of conservation of mass in physics, which states that nothing disappears without a trace and nothing appears out of nowhere. Consistency ensures the reliability and predictability of systems, ensuring that all changes to the data are correctly reflected and do not lead to inconsistencies. It is important to adhere to this principle to maintain the integrity and correctness of data in any database.

When a bank customer transfers 100 rubles from account A to account B, the balance of account A decreases by 100 rubles, and the balance of account B increases by the same amount. This process ensures that financial transactions are correctly reflected and maintains the accuracy of accounting for funds in the banking system. Each transfer is an important part of financial transactions, allowing customers to manage their funds efficiently and securely.

Consistency ensures that the correctness of data in the system is maintained by ensuring that if a failure occurs during the transaction, the combined balance of both accounts remains unchanged. For example, if an error occurs after the balance in account A decreases but before the balance in account B increases, the database will be restored to a state in which the account balances comply with established business rules. This is an important aspect of financial data management, helping to avoid inconsistencies and ensure the integrity of information.

Isolation is a property of transactions whereby concurrent operations cannot see each other's intermediate states. This ensures that transactions execute as if they were sequential. This ensures data consistency and prevents errors associated with concurrent access to the same resources. Isolation is a key aspect in database management systems, ensuring the reliability and integrity of transaction processes.

Imagine a situation where two customers simultaneously want to purchase the same product from an online store, and only one item remains. In such a situation, it is important for the store to use effective inventory and sales management methods. This will help avoid conflicts and ensure customer satisfaction. Modern technologies, such as automation and record management systems, can help resolve such situations, ensuring transparency and fairness in the purchasing process. Optimizing online store operations in such cases can significantly improve customer service and increase brand trust.

Customer A adds an item to their cart, and the system initiates the transaction process to reserve this product. At the same time, Customer B also selects the same item and adds it to their cart. In such situations, an efficient reservation system is essential to avoid conflicts and ensure that both customers can complete their purchases. Optimizing order processing and inventory management plays a key role in ensuring customer satisfaction and increasing conversion.

If Customer A's payment is successful, the system confirms the purchase and updates the number of available items on the website, reducing it to zero.

After Customer B completes the transaction, the system automatically records the sale of the item and cancels further purchases.

Durability ensures that changes to the database are preserved after the transaction is completed, even in the event of failures. This is a key aspect that guarantees data integrity and protection against loss, which is especially important for mission-critical systems and applications. Thanks to durability, users can be confident that their data will remain immutable and available for future use, regardless of external factors.

ACID standards ensure the reliability of relational databases, but they can reduce data processing speed, making such systems less suitable for high-load services. When working intensively with large volumes of data, relational databases can be inefficient, highlighting the need to find alternative solutions for optimizing performance.

With the spread of the internet, it became clear that the volume of data being processed was rapidly increasing. This led to server overload and the need to program data into a unified format. Companies faced a high demand for additional servers and specialists, which created significant financial costs. As a result, there was a need for alternative methods of storing information.

What are the advantages of NoSQL?

In the late 2000s, many large companies began actively using non-relational database management systems (DBMS), which offered a number of significant advantages over traditional relational systems. The main advantages of non-relational DBMS are high scalability, flexibility in data storage, and the ability to process large volumes of information in real time. These systems allow you to effectively work with unstructured data and support various data models, such as document, graph, and key-value. As a result, companies were able to optimize their business processes, increase productivity, and reduce data management costs.

  • The ability to work with any data format makes it possible to use one database for all company data. A single database is cheaper to store and easier to maintain.
  • NoSQL databases easily scale horizontally—if the amount of data or queries increases, simply add more nodes. Relational databases must be scaled vertically, meaning they must be moved to a more powerful server. Furthermore, NoSQL databases are easier to migrate to the cloud.
  • High query performance simplifies overall application performance.
  • Developing applications with NoSQL is easier. Development teams can more quickly create and implement new features and services.

Companies experiencing rapid growth in data volumes and workloads can effectively and cost-effectively address these challenges with NoSQL databases. These technologies offer significant advantages, which are reflected in the BASE architectural standards. The acronym BASE stands for Basically Available, Soft State, and Eventually Consistent, emphasizing their primary focus on ensuring the availability and flexibility of data. Using NoSQL allows organizations to manage large volumes of information while ensuring high performance and scalability.

Basically available is a system characteristic that ensures its availability for read and write operations at all times, even under failures or anomalies. However, it should be noted that this availability may come at a cost: some queries may return intermediate or partially incorrect results. This is a tradeoff that must be considered when designing systems where high data availability is critical.

Soft state consistency means that the system state can change over time to achieve the required consistency. This approach allows systems to adapt to changes and maintain data relevance, even if it is not always in a strictly consistent state. Soft consistency is important for distributed systems, where latency and possible failures must be taken into account. This method ensures higher availability and resilience, allowing the system to function efficiently despite temporary inconsistencies.

Eventually consistent data allows for temporary data inconsistencies, with the data eventually achieving a consistent state. This approach improves system availability and performance, which is especially important in distributed architectures and cloud services. Event-driven consistency allows systems to process queries faster without waiting for all operations to complete, thereby ensuring more efficient resource use and an improved user experience.

NoSQL database developers have chosen to abandon ACID standards to ensure high performance and easy scalability. This solution enables efficient processing of large volumes of data and adaptability to changing business requirements. NoSQL databases provide the flexibility needed for modern applications, making them especially popular in big data and cloud environments.

How NoSQL Databases Work

Unlike relational databases, which have a strict structure, NoSQL databases offer greater flexibility in data management. Entities in NoSQL are not required to adhere to a table format, which allows developers to freely add new fields, modify or delete existing ones, adapting the data structure to the specific needs of the application. This makes NoSQL an ideal solution for projects where data may change or evolve over time, providing high scalability and performance.

As an example, consider a portal dedicated to movie reviews and ratings. SQL queries for such a site might be as follows:

In a non-relational database, each movie can be represented as a separate document, especially in the case of a document-oriented database. This document includes all the necessary data about the film, allowing for efficient storage and processing of information. Non-relational databases provide flexibility in data structure, allowing you to easily add new attributes and modify existing ones. This makes them particularly suitable for working with large volumes of data and diverse types of film information.

If you plan to add information about the film's awards, the following SQL query will help you with this process. It will allow you to efficiently update the database with the necessary award data, making the film's information more complete and attractive to users. A properly structured SQL query will ensure correct data entry and its subsequent use in various applications and websites.

We have created a new table called "Awards" and established a relationship between the "Movies" and "Awards" tables via a foreign key. This allows for efficient data organization and the ability to track the awards received by each film.

In a document-oriented database, the implementation of these operations would be represented as follows:

Adding key-value pairs to the code is easy and does not risk breaking functionality. This provides flexibility and ease of use. You can customize the parameters as needed, significantly simplifying the development and data management process.

Let's imagine that product managers have tasked us with implementing a new feature—a list of awards for each film. Working with a SQL database, we'll need to complete several steps. First, we'll create a new table that will contain award information, including the award name, year received, and association with the film. Then, we'll update the existing database structure, adding the appropriate keys and indexes to optimize queries. After that, we can populate the table with award data using both manual entry and automated import from external sources. Finally, we'll test the new functionality to ensure the award information is displayed correctly in the user interface. This new feature will not only improve user experience on the platform but will also increase the value of the content by providing additional information about the films.
  • Consider the data implementation scheme.
  • Create a new "Awards" table.
  • Establish a relationship with the "Movies" table using a foreign key.
  • Change the code to insert award data into the table.

When using relational databases (RDBs), a strict data structure is required, which requires preliminary planning of the integration scheme. If the database has already grown significantly in size or is actively used in a production environment, the migration process becomes more complex. It is necessary to create data backups, develop and test migration scenarios, and update related applications and services. Proper preparation for migration will ensure data security and minimize the risk of information loss.

Changing the data structure in NoSQL databases is carried out in two stages. The first stage involves analyzing the existing data schema and determining the requirements for the new structure. At this stage, it is important to consider how the data will be used and what queries will be executed. The second stage involves making changes to the structure, which may include adding, deleting, or modifying collections and documents. This process allows you to optimize database performance and ensure more efficient storage and processing of information. Correctly changing the structure in NoSQL systems helps improve scalability and flexibility of working with data.

  • Let's simply add a new "Awards" field to the movie documents.
  • Fill this field with award data for each movie.

Non-relational databases are used in projects with a dynamically changing data structure. They are ideal for situations where you need to quickly adapt to new requirements and change information storage schemes without significant investment of time and resources.

If you want to better understand the differences between relational and non-relational databases, we suggest you read our detailed material on this topic. Here you will find the key aspects, advantages, and disadvantages of each database type, which will help you make an informed choice based on your needs.

What tasks are NoSQL databases suitable for?

NoSQL databases are an effective alternative to classic relational databases based on SQL. They are ideal for applications that require high scalability, fast response, and flexibility. Using NoSQL allows developers to easily adapt the architecture to changing requirements, while sacrificing strict data consistency in favor of performance and query speed. This approach is especially relevant in the context of dynamically evolving technologies and large volumes of data, where traditional solutions may prove ineffective.

NoSQL databases are often used in high-load services that require a high frequency of database queries, as well as the processing of large volumes of data, including data of an uncertain or changing format. These technologies are ideal for online games, IoT applications, and analytics systems that require fast processing and flexibility in data management. Using NoSQL allows you to effectively scale applications and adapt to rapidly changing business requirements, making them an essential tool in modern IT solutions.

Types of NoSQL Databases

Non-relational databases (NoSQL) are an alternative to traditional relational databases, offering flexibility in storing and processing data. In this article, we will cover the key types of non-relational databases and provide examples of simple Python queries for writing and reading data.

There are several main types of non-relational databases, including document, graph, column-oriented, and key-value databases. Document databases, such as MongoDB, store data in JSON format, which makes it easy to work with semi-structured data. Graph databases, such as Neo4j, are ideal for analyzing complex data relationships. Column-oriented databases, such as Apache Cassandra, are optimized for processing large volumes of data, while key-value databases, such as Redis, offer high-speed data access.

Working with these databases in Python typically uses specialized libraries. For example, for MongoDB, you can use the pymongo library. To get started, install the library using pip:

«`bash
pip install pymongo
«`

Once installed, you can use the following code to write and read data from a MongoDB database:

«`python
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient(‘mongodb://localhost:27017/’)
db = client[‘mydatabase’]
collection = db[‘mycollection’]

# Write data
data = {‘name’: ‘Alice’, ‘age’: 30}
collection.insert_one(data)

# Reading data
result = collection.find_one({‘name’: ‘Alice’})
print(result)
«`

These simple examples demonstrate how easy it is to work with non-relational databases in Python. Non-relational databases offer a variety of features for storing and managing data, making them a popular choice for modern applications.

Key-value stores are one of the simplest and most popular NoSQL technologies. In such systems, data is organized into pairs consisting of a unique key and a corresponding value. Each piece of data is identified by its own key, allowing for efficient retrieval of its value on demand. This approach provides high-speed data access and flexible management, making key-value stores ideal for applications requiring fast and scalable information storage.

Data warehouses can be compared to a phone book, where the subscriber's name serves as the key and the phone number as the value. To retrieve a specific person's number, it is necessary to retrieve the value that corresponds to their key, their name. This allows you to effectively organize and quickly find information in the database.

Image: Skillbox Media

This type of database is ideal for automatic text replacement. For example, a misspelled word can be replaced with the correct spelling, and an obscene expression with a corresponding synonym. Key-value data stores are often used to log queries to other databases, allowing for efficient tracking and analysis of interactions with the system. These databases provide fast data processing and high performance, making them indispensable in modern applications.

Writing and reading data in the popular Redis database is accomplished using simple commands. Redis is a high-performance NoSQL system that uses an in-memory data structure, ensuring fast query processing. To write data to Redis, commands such as SET are used to set values ​​and HSET for working with hashes. Data is read using GET to retrieve values ​​and HGET to retrieve data from hashes. Using Redis allows for efficient data management thanks to its high speed and flexibility. This makes Redis an ideal choice for applications that require fast data access, such as caching, user sessions, and real-time analytics.

Other key-value data stores are an important tool for organizing and managing data. These systems allow for the efficient storage and rapid retrieval of information based on unique keys. Key-value stores are widely used in a variety of applications, from web development to big data. They provide high performance and scalability, making them ideal for working with large volumes of information. Popular examples of such stores include Redis, Memcached, and Amazon DynamoDB. These solutions allow developers to optimize data storage processes and improve its availability, which in turn contributes to the overall efficiency of applications.

  • Amazon DynamoDB;
  • Riak.

Document-oriented databases store information in JSON or BSON format, allowing data to be stored as individual documents. Each document represents a unique record, and the flexible document structure allows for the storage of complex data. Examples of document write and read operations in MongoDB, one of the most popular document-oriented databases, demonstrate the convenience and efficiency of working with such data structures. MongoDB allows developers to easily manage and process data, making it an ideal choice for modern applications that require high performance and scalability.

Other document-oriented databases are data management systems that organize information into documents. These databases provide flexibility and scalability, allowing them to efficiently process unstructured and semi-structured data. Popular document-oriented databases include MongoDB, Couchbase, and CouchDB. Each offers unique features, such as support for JSON documents, horizontal scalability, and easy integration with various applications. Document-oriented databases are becoming increasingly relevant as data volumes grow and information needs to be processed quickly. They are ideal for web applications, content management systems, and analytics platforms.

  • Couchbase;
  • Firebase.

Column-oriented databases store information in columns, which allows for efficient management of object properties. For example, if we were creating a movie library in a SQL database and decided to add information about whether a movie was available in 3D, we would have to create a separate "3D Availability" table, filling most of the cells with the value "No." This can lead to data redundancy and complicate query processing. Column-based databases avoid such problems, providing simpler and more flexible data management.

In the column-based database, we'll add the "Available in 3D" attribute to objects where necessary, namely, movie cards with 3D versions. This will allow users to easily find movies available in 3D and improve the visibility of our content in search engines.

Writing and reading data in the popular Cassandra column-based database are performed using specific commands and approaches that ensure high performance and scalability. Cassandra uses a column-based data model, allowing for efficient management of large volumes of information. Insert operations are used to write data to Cassandra, which can be performed using CQL (Cassandra Query Language). Data is read through queries that allow information to be retrieved by keys or specific conditions. High availability and a distributed architecture make Cassandra an ideal choice for applications that require continuous data access and fast query processing. Column-oriented databases are a special category of database management systems that are optimized for working with large volumes of data and deliver high performance when executing analytical queries. Unlike traditional relational databases, which store data in rows, column-oriented databases organize information in columns. This significantly speeds up the data reading process, especially when working with large data sets. Popular column-oriented databases include Apache Cassandra, which provides high availability and scalability. Also worth mentioning is Google BigQuery, which is focused on processing large volumes of data in the cloud. Vertica and Amazon Redshift are other examples of powerful column-oriented solutions that are widely used for analytics and business intelligence. These systems are ideal for tasks related to big data analytics, such as log processing, user behavior analysis, and complex queries. Column-oriented databases are becoming increasingly popular among organizations looking to optimize their data processing processes and improve business analytics.

  • ClickHouse;
  • Apache HBase.

Graph databases represent information as a graph consisting of nodes and edges. Nodes represent objects, and edges represent the relationships between them. Thanks to this structure, graph databases provide high performance when executing queries related to multidimensional data analysis. This makes them particularly suitable for tasks where the relationships between elements are important, such as social networks, recommender systems, and complex network analysis. Using graph databases allows you to quickly find and analyze relationships, which significantly improves the performance and accuracy of data analysis.

Image: Skillbox Media

Social networks can effectively use Graph databases are used to store and manage information about users, represented as nodes, and their connections, formed by edges. This allows you to organize data about friends, subscriptions, and other interactions. Graph databases provide fast access to information and simplify the analysis of social connections, making them ideal for working with large volumes of data on social platforms.

Queries in the popular graph database Neo4j look like this. Neo4j uses the Cypher query language, which allows you to efficiently work with graph data. For example, to retrieve nodes and their connections, you can use commands that query the database, ensuring easy reading and writing of information. Cypher provides powerful tools for data analysis, allowing users to easily discover patterns and relationships between nodes. Due to its flexibility and efficiency, Neo4j is becoming a popular choice for projects involving the analysis of complex networks and structured data.

There are many other graph databases that offer unique capabilities for storing and processing data. These databases are optimized for working with graph structures, allowing for efficient management of relationships between objects. Popular graph databases include Neo4j, OrientDB, and ArangoDB, each offering its own advantages and features. Neo4j, for example, is widely used for social network analysis and recommendation engines due to its high performance and user-friendly Cypher query language. OrientDB combines the capabilities of a graph and document database, making it a versatile solution for a variety of tasks. ArangoDB stands out for its diverse data model and support for various query types, allowing developers to design applications flexibly. The use of graph databases is becoming increasingly relevant in the modern world, where complex relationships and dependencies play a key role in the analysis and processing of information.

  • OrientDB;
  • InfiniteGraph.

What's Next

For a deeper understanding of non-relational databases, we recommend reading the books "NoSQL: A Methodology for Developing Non-Relational Databases" by Pramodkumar Sadalaj and Martin Fowler, as well as "Seven Databases in Seven Weeks: An Introduction to Modern Databases and the Ideology of NoSQL" by Eric Redmond. In addition, the Stepik platform offers a free course on working with MongoDB in Python, which will be useful for beginner developers. Skillbox offers an extensive course on databases, covering both SQL and NoSQL, which will help you gain comprehensive knowledge in this area. Studying these resources will help you master key concepts and practical skills for working with nonrelational databases.

Learn more about coding and programming in our Telegram channel. Subscribe to stay up to date with interesting content and helpful tips!

  • Database Management System: What It Is and Why It's Needed
  • "Everything for the Children": How a Mother of Many Children Moved Her Family from the Provinces and Became a 1C Analyst
  • MS SQL Server Database: What It Is, Why It's Needed, How It Appeared, and What's Good About It