Part 1: Introduction to MongoDB Aggregation Pipeline

Part 1: Introduction to MongoDB Aggregation Pipeline

Developing a deep understanding of the fundamentals of MongoDB aggregation pipeline

INTRODUCTION

MongoDB is a popular NoSQL database system that excels in handling structured, semi-structured, and unstructured data. It offers high performance, scalability, and flexible data storage. Although Software engineers use methods such as findOne(), and findAll() and populate provided by the Mongoose ODM (a Node.js-based Object Data Modeling (ODM) library for MongoDB) to query and analyze data for better developer experience, these methods are still abstractions upon the power of MongoDB.

ODM libraries like Mongoose are undoubtedly beneficial for a various querying purposes of this. However, it is essential to note that using them in complex query needs can substantially affect the performance of the database. Therefore, to effectively handle complex data, utilizing the aggregation pipeline framework becomes imperative for writing more efficient queries.

Learn more about the comparison between Mongoose and MongoDB drivers here.

This article is the first of an interesting series on MongoDB's aggregation pipeline. It covers its definition, advantages, stages, and operators, as well as how it handles relationships between collections.

PREREQUISITES

To get the most out of this article, you should have the following requirements:

  • Working knowledge of JavaScript.

  • Working knowledge of NodeJS

  • Basic understanding of MongoDB

LEARNING OBJECTIVES

By the end of this article, you will:

  • Have an intrinsic understanding of the MongoDB aggregation pipeline.

  • Understand the importance and benefits of mastering the MongoDB aggregation pipeline.

  • Understand the stages and operations in the MongoDB aggregation pipeline.

  • How to handle complex transformations and calculations.

  • How to handle relationships in the MongoDB aggregation pipeline.

  • Understand how to design an aggregation pipeline in MongoDB.

  • How to apply concepts learned through a real-world application.

OVERVIEW OF MONGODB AGGREGATION PIPELINE

What is MongoDB Aggregation Pipelne?

The MongoDB aggregation pipeline is a powerful framework for processing and transforming data within MongoDB. It consists of a sequence of stages, each representing an operation or transformation to be applied to the data. These stages can perform various tasks such as filtering, grouping, sorting, projecting, and aggregating data using operators and expressions. The aggregation pipeline allows for complex data manipulations and computations, providing a flexible way to extract meaningful insights from MongoDB collections.

Let me explain using a more relatable example. The aggregation pipeline process can be likened to the process of transforming an ingredient into a delicious meal. Ingredients pass through different stages to get done. And each phase has its effect towards the end goal (a delicious meal). For instance, the process from pepper to a delicious stew might look like this. I’m not a chef though.

Below are typical stages of transforming ingredients into delicious meals.

Rinse → Grind → Cook → Add oil → Season it → Add vegetables → Add meat and fish → Allow to cook for some minutes

Just like the stew example cited above, the aggregation pipeline is also multistage. Where the output from one stage becomes the input for the next stage. Some of the operations performed by the stages include filtering, sorting, grouping and so on.

Importance and Benefits of Understanding MongoDB Aggregation Pipeline

  • Data manipulation and transformation:

The aggregation pipeline allows you to manipulate and transform data within MongoDB collection using a powerful set of operators, and expressions that enable operations like filtering, grouping, sorting, projecting, and aggregating data.

  • Advanced, and Insights:

It allows you to perform complex analytics and gain valuable insights into MongoDB data. They can calculate aggregates, perform statistical operations, extract key metrics, and generate meaningful reports. Also, it enables the aggregation of data from multiple documents making it ideal for data analysis and reporting tasks.

  • Performance optimization:

The aggregation pipeline takes advantage of indexes and query optimization techniques to improve query performance. This makes it more efficient for handling large volumes of data in the MongoDB database. Also, by aggregating and filtering data at the database level, unnecessary data transfer and processing overhead can be minimized, resulting in faster and more efficient data retrieval.

  • Flexibility and expressiveness:

The aggregation pipeline provides a flexible and expressive framework for data manipulation. Developers can chain multiple stages together and compose complex queries to achieve specific data transformations. This flexibility allows for the creation of custom data pipelines tailored to unique business requirements.

  • Integration with the MongoDB ecosystem:

The aggregate pipeline interfaces with other MongoDB features and functions without any problems. By integrating it with queries, indexes, and other MongoDB capabilities, developers can make the most of the MongoDB ecosystem.This interface simplifies data processing and analysis tasks since data from MongoDB collections is directly consumed without the need for further data translation or transformation.

HOW IT WORKS

The diagram above illustrates the process of the aggregation pipeline. It involves performing a series of transformations on data until the desired output (aggregated data) is obtained. The input is typically a single collection, but other collections can be merged later in the process.

STAGES OF MONGODB AGGREGATION PIPELINE

The previous sections has given you a background knowledge on MongoDB aggregation pipeline. In this section, I will be explaining some of the stages and operators used in MongoDB to carry out data analysis and processing.

If you want to follow along with the coding part and play around with the queries, you can write your code in the sandbox here

Aggregation Pipeline Stages:

  1. $match: This stage filters the documents in the collection based on specified criteria. Therefore, allowing you to select a subset of documents that match certain conditions. It is usually used in the first stage of the pipeline. It can filter a database collection e.g. users by a field called active. The result of the matching would be a list of users whose active status is true. The query below demonstrate how to use the $match stage.

db.collection.aggregate([
      {
          $match: {
            active: true
          },
       }
]);

The above query filters users collection by its active field. Therefore, it returns the list of active users. The output of the above query should look like this:

[
  {
    "_id": ObjectId("5a934e000102030405000000"),
    "active": true,
    "email": "temiian@gmail.com",
    "firstname": "temi",
    "gender": "female",
    "lastname": "lan"
  },
  {
    "_id": ObjectId("5a934e000102030405000002"),
    "active": true,
    "email": "evelan@gmail.com",
    "firstname": "eve",
    "gender": "female",
    "lastname": "Lan"
  }
]

Find the code in the playground here :

  1. $sort: The $sort stage sorts the documents based on one or more fields. It allows you to specify the sorting order, ascending or descending, for the output. By modify the data from our previous example, the example below shows how to use the $sort stage. We will be adding the age field to documents in our users collection.
db.collection.aggregate([
  {
    $sort: {
      "age": -1
    }
  }
]);

The above code sorts users collection by its age field. Therefore, it returns the list of users by their ages sorted in descending order. By setting "age" field to -1, we have instructed MongoDB to sort the data by the specified in descending order. Otherwise, it gets sorted in ascending order.

The output of the above query should look like this:

[
  {
    "_id": ObjectId("5a934e000102030405000003"),
    "active": false,
    "age": 25,
    "email": "mathewsilva@gmail.com",
    "firstname": "mathew",
    "gender": "male",
    "lastname": "silva"
  },
  {
    "_id": ObjectId("5a934e000102030405000002"),
    "active": true,
    "age": 20,
    "email": "evelan@gmail.com",
    "firstname": "eve",
    "gender": "female",
    "lastname": "Lan"
  },
  {
    "_id": ObjectId("5a934e000102030405000001"),
    "active": false,
    "age": 17,
    "email": "adamian@gmail.com",
    "firstname": "adam",
    "gender": "male",
    "lastname": "Ian"
  },
  {
    "_id": ObjectId("5a934e000102030405000000"),
    "active": true,
    "age": 16,
    "email": "temiian@gmail.com",
    "firstname": "temi",
    "gender": "female",
    "lastname": "lan"
  }
]

Find the code in the playground here

  1. $group : The $group stage groups documents based on a specific key or set of keys. It allows you to perform aggregation operations such as sum, average, count, and more on grouped documents. With such strength, It is considered one of the most important and commonly-used stages of the Mongodb aggregation pipeline. The example below is a sample of the basic usage of the $group stage.
[
  {
    _id: 1,
    firstName: "Matt",
    lastName: "James",
    gender: "male",
    email: "mattjames@gmail.com",
    salary: 5000,
    department: {
      "name": "HR"
    }
  },
  {
    _id: 2,
    firstName: "Endo",
    lastName: "kiriku",
    gender: "male",
    email: "endokiriku@gmail.com",
    salary: 8000,
    department: {
      "name": "Finance"
    }
  },
  {
    _id: 3,
    firstName: "stewart",
    lastName: "cams",
    gender: "male",
    email: "stewartcams@gmail.com",
    salary: 7500,
    department: {
      "name": "Marketing"
    }
  },
  {
    _id: 4,
    firstName: "darijo",
    lastName: "srna",
    gender: "female",
    email: "darijosrna@gmail.com",
    salary: 5000,
    department: {
      "name": "HR"
    }
  },
  {
    _id: 5,
    firstName: "Muna",
    lastName: "Raja",
    gender: "male",
    email: "munaraja@gmail.com",
    salary: 4500,
    department: {
      "name": "Finance"
    }
  },
  {
    _id: 6,
    firstName: "Julian",
    lastName: "Draxler",
    gender: "male",
    email: "juliandraxler@gmail.com",
    salary: 7000,
    department: {
      "name": "Marketing"
    }
  }
]

To group the above data by the department's name, use the code below:

db.collection.aggregate([
  {
    $group: {
      _id: "$department.name"
    }
  }
])

The code above returns the original data grouped by its id. The result of the query is shown below:

[
  {
    "_id": "HR"
  },
  {
    "_id": "Finance"
  },
  {
    "_id": "Marketing"
  }
]

You can also modify the last query to get the accumulated value for each group as shown below:

db.employees.aggregate([ 
    { $group:{ _id:'$department.name', totalEmployees: { $sum:1 } } 
}]);

The query above will return the department name and totalEmployee under each of the department. It will add a new field (totalEmployees) to the output. In this expression, { $sum: 1}, $sum is an accumulative operator that returns sum of numerical values. It keeps incrementing the totalEmployee field for every occurence of the department name. I'll explain this further when explaining when discussing about operators.

Find the code for this example here

  1. $project: This stage reshapes documents and includes or excludes specific fields. It allows you to define the fields that should be included or excluded in the output before, it passes a set of defined fields to the next stage in the pipeline. These fields can be exisiting field or newly-computed fields. The query below show how to use the $project stage of the MongoDB aggregation pipeline.
db.collection.aggregate([
  {
    $project: {
      firstName: 1,
      lastName: 1
    }
  }
])

This query tells MongoDB to return firstName and lastName as the only fields in the output. The output of the query should look like this:

[
  {
    "_id": 1,
    "firstName": "Matt",
    "lastName": "James"
  },
  {
    "_id": 2,
    "firstName": "Endo",
    "lastName": "kiriku"
  },
  {
    "_id": 3,
    "firstName": "stewart",
    "lastName": "cams"
  },
  {
    "_id": 4,
    "firstName": "darijo",
    "lastName": "srna"
  },
  {
    "_id": 5,
    "firstName": "Muna",
    "lastName": "Raja"
  },
  {
    "_id": 6,
    "firstName": "Julian",
    "lastName": "Draxler"
  }
]

Also, operators can be used within the $project stage. Find the code for this example here

  1. $limit: This stage limits the number of documents passed to the next pipeline stage, thereby restricting the output to a specific number.

    The query below shows how to use the $limit stage.

db.collection.aggregate([
   { $limit : 3 }
]);

The query above instructs MongoDB to return only the first 3 documents in the output. The out of the query is should look like this:

[
  {
    "_id": 1,
    "department": {
      "name": "HR"
    },
    "email": "mattjames@gmail.com",
    "firstName": "Matt",
    "gender": "male",
    "lastName": "James",
    "salary": 5000
  },
  {
    "_id": 2,
    "department": {
      "name": "Finance"
    },
    "email": "endokiriku@gmail.com",
    "firstName": "Endo",
    "gender": "male",
    "lastName": "kiriku",
    "salary": 8000
  },
  {
    "_id": 3,
    "department": {
      "name": "Marketing"
    },
    "email": "stewartcams@gmail.com",
    "firstName": "stewart",
    "gender": "male",
    "lastName": "cams",
    "salary": 7500
  }
]

Find the code for this example here

  1. $skip: The $skip stage in a pipeline skips a specified number of documents and passes the remaining documents to the next stage. It is useful for pagination or skipping some initial documents.

    The code below shows how to use the $skip stage of the aggregation pipeline.

     db.collection.aggregate([
       {
         $skip: 2
       }
     ]);
    

    This query skipped the first two documents in the collection. The output of the query should look like this:

     [
       {
         "_id": 3,
         "department": {
           "name": "Marketing"
         },
         "email": "stewartcams@gmail.com",
         "firstName": "stewart",
         "gender": "male",
         "lastName": "cams",
         "salary": 7500
       },
       {
         "_id": 4,
         "department": {
           "name": "HR"
         },
         "email": "darijosrna@gmail.com",
         "firstName": "darijo",
         "gender": "female",
         "lastName": "srna",
         "salary": 5000
       },
       {
         "_id": 5,
         "department": {
           "name": "Finance"
         },
         "email": "munaraja@gmail.com",
         "firstName": "Muna",
         "gender": "male",
         "lastName": "Raja",
         "salary": 4500
       },
       {
         "_id": 6,
         "department": {
           "name": "Marketing"
         },
         "email": "amitabh.b@abc.com",
         "firstName": "Julian",
         "gender": "male",
         "lastName": "Draxler",
         "salary": 7000
       }
     ]
    
  2. $unwind: This stage deconstructs an array field from the input documents and produces a separate document for each element in the array. It is particularly useful when performing operations on individual array elements. To illustrate how to use the $unwind stage, let's add "hobbies" field to our documents.

        [
          {
            "firstname": "temi",
            "lastname": "lan",
            "email": "temiian@gmail.com",
            "gender": "female",
            "active": true,
            "hobbies": ["Football", "Gaming", "Dancing"]
          },
          {
            "firstname": "adam",
            "lastname": "Ian",
            "email": "adamian@gmail.com",
            "gender": "male",
            "active": false,
            "hobbies": ["Football", "Sleeping", "Reading"]
          },
          {
            "firstname": "eve",
            "lastname": "Lan",
            "email": "evelan@gmail.com",
            "gender": "female",
            "active": true,
            "hobbies": ["Swimming", "Gaming", "Dancing"]
          },
          {
            "firstname": "mathew",
            "lastname": "silva",
            "email": "mathewsilva@gmail.com",
            "gender": "male",
            "active": false,
            "hobbies": ["Traveling", "Reading"]
          }
        ]
    

    Having added "hobbies" field to each of the documents as seen above. The query below deconstructs the field using $unwind.

     db.collection.aggregate([
       {
         $unwind: "$hobbies"
       }
     ])
    

    The query above uses $unwind to deconstruct the hobbies field of our User data. As a result, it return a document for each of the items in the hobbies array as show below.

[
  {
    "_id": ObjectId("5a934e000102030405000000"),
    "active": true,
    "email": "temiian@gmail.com",
    "firstname": "temi",
    "gender": "female",
    "hobbies": "Football",
    "lastname": "lan"
  },
  {
    "_id": ObjectId("5a934e000102030405000000"),
    "active": true,
    "email": "temiian@gmail.com",
    "firstname": "temi",
    "gender": "female",
    "hobbies": "Gaming",
    "lastname": "lan"
  },
  {
    "_id": ObjectId("5a934e000102030405000000"),
    "active": true,
    "email": "temiian@gmail.com",
    "firstname": "temi",
    "gender": "female",
    "hobbies": "Dancing",
    "lastname": "lan"
  },
  {
    "_id": ObjectId("5a934e000102030405000001"),
    "active": false,
    "email": "adamian@gmail.com",
    "firstname": "adam",
    "gender": "male",
    "hobbies": "Football",
    "lastname": "Ian"
  },
  {
    "_id": ObjectId("5a934e000102030405000001"),
    "active": false,
    "email": "adamian@gmail.com",
    "firstname": "adam",
    "gender": "male",
    "hobbies": "Sleeping",
    "lastname": "Ian"
  },
  {
    "_id": ObjectId("5a934e000102030405000001"),
    "active": false,
    "email": "adamian@gmail.com",
    "firstname": "adam",
    "gender": "male",
    "hobbies": "Reading",
    "lastname": "Ian"
  },
  {
    "_id": ObjectId("5a934e000102030405000002"),
    "active": true,
    "email": "evelan@gmail.com",
    "firstname": "eve",
    "gender": "female",
    "hobbies": "Swimming",
    "lastname": "Lan"
  },
  {
    "_id": ObjectId("5a934e000102030405000002"),
    "active": true,
    "email": "evelan@gmail.com",
    "firstname": "eve",
    "gender": "female",
    "hobbies": "Gaming",
    "lastname": "Lan"
  },
  {
    "_id": ObjectId("5a934e000102030405000002"),
    "active": true,
    "email": "evelan@gmail.com",
    "firstname": "eve",
    "gender": "female",
    "hobbies": "Dancing",
    "lastname": "Lan"
  },
  {
    "_id": ObjectId("5a934e000102030405000003"),
    "active": false,
    "email": "mathewsilva@gmail.com",
    "firstname": "mathew",
    "gender": "male",
    "hobbies": "Traveling",
    "lastname": "silva"
  },
  {
    "_id": ObjectId("5a934e000102030405000003"),
    "active": false,
    "email": "mathewsilva@gmail.com",
    "firstname": "mathew",
    "gender": "male",
    "hobbies": "Reading",
    "lastname": "silva"
  },
  {
    "_id": ObjectId("5a934e000102030405000003"),
    "active": false,
    "email": "mathewsilva@gmail.com",
    "firstname": "mathew",
    "gender": "male",
    "hobbies": "",
    "lastname": "silva"
  }
]

Find the code for this example here

These are the most-used aggregation pipeline stages. To learn more about aggregation pipeline stages, you should check the official Mongodb website. In the next section, I'll be explaining some of the operators of MongoDB aggregation pipeline.

Aggregation Pipeline Operators:

  1. $sum: $sum operator returns the sum of numerical values, ignoring non-numeric ones. It is available in both $group and $project stages. This operator is useful when aggregating data to obtain the sum of a field across multiple documents or accumulating a set of numerical values. Foar example, if we have a sales collection with documents structured like this:

     [
       {
         _id: 1,
         product: "Book",
         quantity: 5,
         price: 10
       },
       {
         _id: 2,
         product: "Pen",
         quantity: 2,
         price: 2
       },
       {
         _id: 3,
         product: "Bag",
         quantity: 1,
         price: 20
       }
     ]
    

    Let try to calculate total sales for each product. Mathematically, the total sales for each product is: quantity * price. To achieve this, we will be doing our calculation in the $group stage of the pipeline.

db.collection.aggregate([
  {
    $group: {
      _id: "$product",
      totalSales: {
        $sum: {
          $multiply: [
            "$quantity",
            "$price"
          ]
        }
      }
    }
  }
])

Looks difficult? I know. Don't worry I'll explain.

Remember, in the last section, I said $group is used in grouping documents based on a specific key. In this case, we will be gouping our sales by the product field (the name of the product) as shown below:

{
    $group: {
        "_id": "$product"
    }
}

Then, we define a variable called totalSales to store the total sales.

{
    $group: {
        "_id": "$product",
          totalSales: {
            $sum: {

            }
        }
    }
}

Inside the totalSales variable, we use $sum to accumulate the total sales (quantity * price) for each product. Next, we use the $multiply operator to calculate the total sales. The $multiply operator performs multiplication operation on items in the array passed to it. Hence the $multiply part of the query.

db.collection.aggregate([
  {
    $group: {
      _id: "$product",
      totalSales: {
        $sum: {
          $multiply: [
            "$quantity",
            "$price"
          ]
        }
      }
    }
  }
])

The query above returns this result:

[
  {
    "_id": "Book",
    "totalSales": 50
  },
  {
    "_id": "Pen",
    "totalSales": 4
  },
  {
    "_id": "Bag",
    "totalSales": 20
  }
]

Find the code for this example here

  1. $avg calculates and returns the average of numeric values, ignoring non-numeric. It is used in both $group and $project stages. MongoDB sums the specified field and divides it by the count of documents with non-null values to derive the average.
db.collection.aggregate([
  {
    $group: {
      _id: "null",
      averagePrice: {
        $avg: "$price"
      }
    }
  }
])

The query calculates will return the average of prices of products across multiple collections and save the result in the averagePrice variable. Below is the result of the query.

[
  {
    "_id": "null",
    "averagePrice": 10.666666666666666
  }
]
  1. $first: It returns a value from the first document for each group. It is only available in the $group stage. For example, to get the first item in each category of products. Let's modify data from our previous example by adding the category field to each document as shown below. Then, we use the follow query to achieve our goal.
db.collection.aggregate([
  {
    $group: {
      _id: "$category",
      firstProduct: {
        $first: "$product"
      }
    }
  }

The query above will return the first items in the two categories of product. Below is the output of the query.

[
  {
    "_id": "vegetable",
    "firstProduct": "Carrot"
  },
  {
    "_id": "stationeries",
    "firstProduct": "Book"
  }
]

Find the output of the code here

  1. $last: It returns the last value of each group and it is available in $group stage only. Order is defined only if documents are in a defined order. $last operator is similar to $first operator, but it returns the last item from each group of documents. To demonstrate, modify the query from the last example by changing the $first operator to $last as shown below:
db.collection.aggregate([
  {
    $group: {
      _id: "$category",
      firstProduct: {
        $last: "$product"
      }
    }
  }
])

The query above will return only the last document from each category in the collection. Below is the result of the query.

[
  {
    "_id": "stationeries",
    "firstProduct": "Ruler"
  },
  {
    "_id": "vegetable",
    "firstProduct": "Lettuce"
  }
]
  1. $max: It returns the highest expression value for each group. It is available in both the $group and $project stages. It can be used with numerical values or dates. For example, we can use this operator to return the product with the highest price.

     db.collection.aggregate([
       {
         $group: {
           _id: null,
           maxPrice: {
             $max: "$price"
           }
         }
       }
     ])
    

    The query would return the document for the product with the highest price as shown below:

[
  {
    "_id": null,
    "maxPrice": 18
  }
]
  1. $min: It returns the lowest expression value for each group. It is available in both the $group and $project stages. This operator is similar to $max in operation only that it returns the lowest value of a given set of documents. To demonstrate how it works, we only need to modify the query in our last example by changing the $max operator to $min as shown below:
db.collection.aggregate([
  {
    $group: {
      _id: null,
      maxPrice: {
        $min: "$price"
      }
    }
  }
])

The query above will return the document

Find the code for this example here

$lookup: This is arguably the most powerful stage of the Aggregation Pipeline due to its ability to relate two collections by performing left-outer join operation on the them. It does this by combining collections based on a common field i.e. foreign Field (known as foreignKey in relational databases) and retrieves related data from another collection.

If you had some experience using SQL-based databases. You'll notice this is similar to the way it uses JOINS to handle relationship between its tables.

Knowledge of SQL-based databases is not a requirement for understanding the concepts explained here. Athough, it can be helpful.

The $lookup stage requires some of these parameters to work. These include:

  • The foreign collection to join with.

  • Local and foreign fields based on which the matching will be done.

  • Output field to store the joined documents.

Below is an example of how the $lookup stage works:

Assuming we have two collections: orders and products. Each order document will have a product_id field that corresponds to a specific product in the products collection.

Given the products and orders data below:

db={
  "products": [
    {
    "_id": ObjectId("5a934e000102030405000001"),
    "amount": 50,
    "name": "Shoes"
  },
  {
    "_id": ObjectId("5a934e000102030405000002"),
    "amount": 50,
    "name": "Bag"
  },
  {
    "_id": ObjectId("5a934e000102030405000003"),
    "amount": 50,
    "name": "Belt"
  },
  {
    "_id": ObjectId("5a934e000102030405000004"),
    "amount": 50,
    "name": "Cap"
  }
  ],
  "orders": [
    {
      _id: ObjectId("70b987654321fedcba987654"),
      order_number: "ORD123456",
      product_id: ObjectId("5a934e000102030405000001"),
      quantity: 3,
      customer_name: "John Doe"
    },
    {
      _id: ObjectId("70b987654321fedcba987655"),
      order_number: "ORD123456",
      product_id: ObjectId("5a934e000102030405000004"),
      quantity: 3,
      customer_name: "David Doe"
    }
  ]
}

Note:

I have presented the data above in the format supported by the online code editor I'm using. Collections are usually separated into files in real use-cases.

To relate the two collections here we will define our $lookup stage using the code below:

db.orders.aggregate([
  {
    $lookup: {
      from: "products",
      // Foreign collection name
      localField: "product_id",
      // Field in the local collection
      foreignField: "_id",
      // Field in the foreign collection
      as: "product_info"// Output field to store the joined documents
    }
  }
])

In this example, the $lookup stage will match each order document's product_id with the _id field of the products collection. The resulting documents will include an additional array field called product_info containing information from the joined products documents.

The result of the query is show below:


[
  {
    "_id": ObjectId("70b987654321fedcba987654"),
    "customer_name": "John Doe",
    "order_number": "ORD123456",
    "product_id": ObjectId("5a934e000102030405000001"),
    "product_info": [
      {
        "_id": ObjectId("5a934e000102030405000001"),
        "amount": 50,
        "name": "Shoes"
      }
    ],
    "quantity": 3
  },
  {
    "_id": ObjectId("70b987654321fedcba987655"),
    "customer_name": "David Doe",
    "order_number": "ORD123456",
    "product_id": ObjectId("5a934e000102030405000004"),
    "product_info": [
      {
        "_id": ObjectId("5a934e000102030405000004"),
        "amount": 50,
        "name": "Cap"
      }
    ],
    "quantity": 3
  }
]

While its usage is encourages, keep in mind that the $lookup stage can be resource-intensive, especially if dealing with large datasets, so its usage should be considered carefully.

In this article, you learned the fundamentals of the MongoDB Aggregation Pipeline Framework. You learned about its importance, most-used stages, and how it is used in handling relationships between collections. Armed with this knowledge, you are well-equipped to carry out complex data processing and analysis tasks in your next project.

CONCLUSION

Congratulations on getting this far, you just took the first step in mastering MongoDB aggregation pipeline. There is a saying that “Practice makes perfect”. To be able to retain the knowledge you just acquired through this article, consider implementing it in your next project or practice here.

In the next article, I will be demonstrating the practicability of the concepts learned in this article by building a Simple Blog Application.

Until then, feel free to drop your comments or questions in the comment section, I will try to respond. Goodluck.