Node.js – the right approach to update many records in MongoDB using Mongoose

mongodbmongoosenode.js

I am pulling some records from MongoDB using Mongoose, importing them into another system and then I would like to set status (document attribute) for all these documents to processed.

I could find this solution: Update multiple documents by id set. Mongoose

I was wondering if that is the right approach, to build up a criterion consisting of all document ids and then perform the update. Please also take into account a fact that it's going to be many documents.

(What is the limit of the update query? Couldn't find it anywhere. Official documentation: http://mongoosejs.com/docs/2.7.x/docs/updating-documents.html)

Best Answer

The approach of building up a criterion consisting of all document ids and then performing the update is bound to cause potential issues. When you iterate a list of documents sending an update operation with each doc, in Mongoose you run the risk of blowing up your server especially when dealing with a large dataset because you are not waiting for an asynchronous call to complete before moving on to the next iteration. You will be essentially building a "stack" of unresolved operations until this causes a problem - Stackoverflow.

Take for example, supposing you had an array of document ids that you wanted to update the matching document on the status field:

const processedIds = [
  "57a0a96bd1c6ef24376477cd",
  "57a052242acf5a06d4996537",
  "57a052242acf5a06d4996538"
];

where you can use the updateMany() method

Model.updateMany(
  { _id: { $in: processedIds } }, 
  { $set: { status: "processed" } }, 
  callback
);

or alternatively for really small datasets you could use the forEach() method on the array to iterate it and update your collection:

processedIds.forEach(function(id)){
  Model.update({ _id: id}, { $set: { status: "processed" } }, callback);
});

The above is okay for small datasets. However, this becomes an issue when you are faced with thousands or millions of documents to update as you will be making repeated server calls of asynchronous code within the loop.

To overcome this use something like async's eachLimit and iterate over the array performing a MongoDB update operation for each item while never performing more than x parallel updates the same time.

The best approach would be to use the bulk API for this which is extremely efficient in processing updates in bulk. The difference in performance vs calling the update operation on each and every one of the many documents is that instead of sending the update requests to the server with each iteration, the bulk API sends the requests once in every 1000 requests (batched).

For Mongoose versions >=4.3.0 which support MongoDB Server 3.2.x, you can use bulkWrite() for updates. The following example shows how you can go about this:

const bulkUpdateCallback = function(err, r){
  console.log(r.matchedCount);
  console.log(r.modifiedCount);
}

// Initialize the bulk operations array
const bulkUpdateOps = [], counter = 0;

processedIds.forEach(function (id) {
  bulkUpdateOps.push({
    updateOne: {
      filter: { _id: id },
      update: { $set: { status: "processed" } }
    }
  });
  counter++;

  if (counter % 500 == 0) {
    // Get the underlying collection via the Node.js driver collection object
    Model.collection.bulkWrite(bulkUpdateOps, { ordered: true, w: 1 }, bulkUpdateCallback);
    bulkUpdateOps = []; // re-initialize
  }
})

// Flush any remaining bulk ops
if (counter % 500 != 0) {
  Model.collection.bulkWrite(bulkOps, { ordered: true, w: 1 }, bulkUpdateCallback);
}

For Mongoose versions ~3.8.8, ~3.8.22, 4.x which support MongoDB Server >=2.6.x, you could use the Bulk API as follows

var bulk = Model.collection.initializeOrderedBulkOp(),
    counter = 0;

processedIds.forEach(function(id) {
    bulk.find({ "_id": id }).updateOne({ 
        "$set": { "status": "processed" }
    });

    counter++;
    if (counter % 500 == 0) {
        bulk.execute(function(err, r) {
           // do something with the result
           bulk = Model.collection.initializeOrderedBulkOp();
           counter = 0;
        });
    }
});

// Catch any docs in the queue under or over the 500's
if (counter > 0) {
    bulk.execute(function(err,result) {
       // do something with the result here
    });
}

Related Solutions

Mongodb – Update MongoDB field using value of another field

The best way to do this is in version 4.2+ which allows using of aggregation pipeline in the update document and the updateOne, updateMany or update collection methods. Note that the latter has been deprecated in most if not all languages drivers.

###MongoDB 4.2+

Version 4.2 also introduced the $set pipeline stage operator which is an alias for $addFields. I will use $set here as it maps with what we are trying to achieve.

db.collection.<update method>(
    {},
    [
        {"$set": {"name": { "$concat": ["$firstName", " ", "$lastName"]}}}
    ]
)

Note that square brackets in the second argument to the method which define an aggregation pipeline instead of a plain update document. Using a plain document will not work correctly.

###MongoDB 3.4+

In 3.4+ you can use $addFields and the $out aggregation pipeline operators.

db.collection.aggregate(
    [
        { "$addFields": { 
            "name": { "$concat": [ "$firstName", " ", "$lastName" ] } 
        }},
        { "$out": "collection" }
    ]
)

Note that this does not update your collection but instead replace the existing collection or create a new one. Also for update operations that require "type casting" you will need client side processing, and depending on the operation, you may need to use the find() method instead of the .aggreate() method.

##MongoDB 3.2 and 3.0 The way we do this is by $projecting our documents and use the $concat string aggregation operator to return the concatenated string. we From there, you then iterate the cursor and use the $set update operator to add the new field to your documents using bulk operations for maximum efficiency.

###Aggregation query:

var cursor = db.collection.aggregate([ 
    { "$project":  { 
        "name": { "$concat": [ "$firstName", " ", "$lastName" ] } 
    }}
])

###MongoDB 3.2 or newer

from this, you need to use the bulkWrite method.

var requests = [];
cursor.forEach(document => { 
    requests.push( { 
        'updateOne': {
            'filter': { '_id': document._id },
            'update': { '$set': { 'name': document.name } }
        }
    });
    if (requests.length === 500) {
        //Execute per 500 operations and re-init
        db.collection.bulkWrite(requests);
        requests = [];
    }
});

if(requests.length > 0) {
     db.collection.bulkWrite(requests);
}

###MongoDB 2.6 and 3.0

From this version you need to use the now deprecated Bulk API and its associated methods.

var bulk = db.collection.initializeUnorderedBulkOp();
var count = 0;

cursor.snapshot().forEach(function(document) { 
    bulk.find({ '_id': document._id }).updateOne( {
        '$set': { 'name': document.name }
    });
    count++;
    if(count%500 === 0) {
        // Excecute per 500 operations and re-init
        bulk.execute();
        bulk = db.collection.initializeUnorderedBulkOp();
    }
})

// clean up queues
if(count > 0) {
    bulk.execute();
}

###MongoDB 2.4

cursor["result"].forEach(function(document) {
    db.collection.update(
        { "_id": document._id }, 
        { "$set": { "name": document.name } }
    );
})

Mongodb – How to get the last N records in mongodb

If I understand your question, you need to sort in ascending order.

Assuming you have some id or date field called "x" you would do ...

.sort()

db.foo.find().sort({x:1});

The 1 will sort ascending (oldest to newest) and -1 will sort descending (newest to oldest.)

If you use the auto created _id field it has a date embedded in it ... so you can use that to order by ...

db.foo.find().sort({_id:1});

That will return back all your documents sorted from oldest to newest.

Natural Order

You can also use a Natural Order mentioned above ...

db.foo.find().sort({$natural:1});

Again, using 1 or -1 depending on the order you want.

Use .limit()

Lastly, it's good practice to add a limit when doing this sort of wide open query so you could do either ...

db.foo.find().sort({_id:1}).limit(50);

db.foo.find().sort({$natural:1}).limit(50);