Skip to the content.

MongoDB 常用 Aggregation Pipeline Stages - group and lookup etc

2020-02-01 10:00:00


最近有把一个小项目的底层数据库由 MySQL 修改成 MongoDB,借此更进一步熟悉了 Aggregation Pipeline Stages

本文仅包含翻译小项目 SQL 版本所需的 MongoDB Aggregation Pipeline Stages 知识,成文时 MongoDB 最新版本为 V4.2

db.collection.aggregate( [ { <stage> }, ... ] )

注:除了 $out、$merge、$geoNear 所有 stage(阶段)均可在 pipeline(管道)中出现多次,意味着本文所提这些均可出现多次

Examples

示例仅演示本文相关 stage 主要用法,详细请访问各个 stage 的官方文档

db.message.insertMany([
    {"id": 1,"num": 2},
    {"id": 1,"num": 4},
    {"id": 2,"num": 6},
    {"id": 2,"num": 4},
    {"id": 2,"num": 2},
    {"id": 3,"num": 2}
]);

db.user.insertMany([
    {"id": 1,"name": "n1"},
    {"id": 2,"name": "n2"},
    {"id": 3,"name": "n3"}
]);
db.message.aggregate([{
    $match: {
        id: 2
    }
}, {
    $sort: {
        num: 1
    }
}, {
    $limit: 2
}, {
    $project: {
        _id: 0
    }
}]);

[
    {"id": 2,"num": 2},
    {"id": 2,"num": 4}
]
db.message.aggregate([{
    $group: {
        _id: "$id",
        max: {
            $max: "$num"
        },
        sum: {
            $sum: "$num"
        },
        count: {
            $sum: 1
        }
    }
}]);

[
    {"_id": 3,"max": 2,"sum": 2,"count": 1},
    {"_id": 2,"max": 6,"sum": 12,"count": 3},
    {"_id": 1,"max": 4,"sum": 6,"count": 2}
]
db.user.aggregate([{
    $match: {
        id: 3
    }
}, {
    $lookup: {
        from: "message",
        localField: "id",
        foreignField: "id",
        as: "messages"
    }
}, {
    $project: {
        _id: 0,
        "messages._id": 0
    }
}]);

{"id": 3,"name": "n3","messages": [
        {"id": 3,"num": 2}
    ]
}

$match

{ $match: { <query> } }

Place the $match as early in the aggregation pipeline as possible. Because $match limits the total number of documents in the aggregation pipeline, earlier $match operations minimize the amount of processing down the pipe

If you place a $match at the very beginning of a pipeline, the query can take advantage of indexes

$sort

{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }

set the sort order to 1 or -1 to specify an ascending or descending sort respectively

When a $sort precedes a $limit and there are no intervening stages that modify the number of documents, the optimizer can coalesce the $limit into the $sort

The $sort stage has a limit of 100 megabytes of RAM. By default, if the stage exceeds this limit, $sort will produce an error. To allow for the handling of large datasets, set the allowDiskUse option to true to enable $sort operations to write to temporary files

$sort operator can take advantage of an index when placed at the beginning of the pipeline or placed before the $project, $unwind, and $group aggregation operators. If $project, $unwind, or $group occur prior to the $sort operation, $sort cannot use any indexes

$limit

{ $limit: <positive integer> }

$limit takes a positive integer that specifies the maximum number of documents to pass along

$group

{ $group: { _id: <expression>, <field1>: { <accumulator1> : <expression1> }, ... } }

The _id field is mandatory; however, you can specify an _id value of null, or any other constant value, to calculate accumulated values for all the input documents as a whole

The remaining computed fields are optional and computed using the <accumulator> operators

Accumulator Operator

$avg Returns an average of numerical values. Ignores non-numeric values

$max Returns the highest expression value for each group

$min Returns the lowest expression value for each group

$sum Returns a sum of numerical values. Ignores non-numeric values

$lookup

Equality Match

To perform an equality match between a field from the input documents with a field from the documents of the “joined” collection

{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}

from Specifies the collection in the same database to perform the join with. The from collection cannot be sharded

localField Specifies the field from the documents input to the $lookup stage

foreignField Specifies the field from the documents in the from collection

as Specifies the name of the new array field to add to the input documents. The new array field contains the matching documents from the from collection. If the specified name already exists in the input document, the existing field is overwritten

if does not contain the localField or foreignField, the $lookup treats the value as null for matching purposes

Join Conditions and Uncorrelated Sub-queries

$skip

{ $skip: <positive integer> }

$skip takes a positive integer that specifies the maximum number of documents to skip

$sample

{ $sample: { size: <positive integer> } }

Randomly selects the specified number of documents from its input

$sample may output the same document more than once in its result set

$count

{ $count: <string> }

<string> is the name of the output field which has the count as its value. <string> must be a non-empty string, must not start with $ and must not contain the . character

The $count stage is equivalent to the following $group + $project sequence

db.collection.aggregate( [
   { $group: { _id: null, myCount: { $sum: 1 } } },
   { $project: { _id: 0 } }
] )