Javascript – Please explain object versioning in the nodejs module system

Architectureencapsulationjavascriptnode.js

This question is about the persistance of variables across different modules in nodejs when they don't directly "require" each other, but do "require" a common ancestor.

It is also the generalised version of this stackoverflow question. Whilst I received an answer there that helped me solve my specific case, I'm still unclear as to the answer to the architectural problem I posed. I hoped that the general problem was a good fit for a programmers.stackexchange question.

Assume we have a nodejs program structured like so:

  • Module Alpha – exports variable a.
  • Module Beta – requires module Alpha, reads variable a -> prints it to console periodically
  • Module Gamma – requires module Alpha, periodically writes to variable a

Is the instance of module Alpha, and thus variable a, common across all three modules. e.g. can module Gamma make changes to variable a that are successfully printed to the console by module Beta.

If this is true, why is this true even though there is no direct "requires" relationship between modules Gamma and Beta?

If this is not true, can you explain the best way to share a variable than needs to be modified at runtime between multiple node modules?

Can you explain the underlaying data model which is causing this behaviour in nodejs modules?

Best Answer

I think the critical piece you're missing is that the result of require("foo") is always the same object. Consider this REPL example:

> var myHttpModule = require("http")
{ ... }
> myHttpModule.someNewProperty = "someValue"
'someValue'
> require("http").someNewProperty
'someValue'

The second call to require("http") did not re-create the http module -- it summoned up the only http module that exists in Node's current execution environment, which had been altered on the previous line.

So, in your example, there's only one Alpha module. When Beta and Gamma call require("alpha"), they're each getting the same reference to the one-and-only Alpha module singleton object.

(Or, to be perfectly precise, they're getting the same reference to the exports value of the Alpha module. require creates a Module object, but returns only the exports property of that module. If you don't fully understand the distinction right now, it's not a big deal.)

Behind the scenes: what's really going on with require caching?

The first time you use require to include a module, Node actually runs the code in that module. The resulting module is stored in require.cache. Subsequent attempts to load the module with require first check for an already-loaded module object in require.cache. (Note: environment-native built-in Node modules like http are exceptions in that they don't use require.cache -- you'll need to test out my code below with a custom module.)

require.cache is an object whose keys are module file path, with associated values that are module objects. For example, assume some module named "foo" in C:\node_modules\foo.js:

> require.cache                             // empty cache
{ } 
> require("foo")                            // require foo
{ bar: 'baz' }
> require.cache                             // cache now populated
{ 'C:\\node_modules\\foo.js':
   { id: 'C: \\node_modules\\foo.js',
     exports: { bar: 'baz' },
     parent: { ... },
     filename: 'C:\\node_modules\\foo.js',
     ...
     paths:
      [ ... ] } }

The current value of the foo module is in require.cache["C:\node_modules\foo.js"].exports. We can use require.resolve to get the file path of the module from the name, so we can express it also as require.cache[require.resolve("foo")].exports.

If we call require("foo") a second time, Node sees that require.cache[require.resolve("foo")] is defined, and so it returns the value of require.cache[require.resolve("foo")].exports instead of re-running the module creation code. This exports property of the foo module in require.cache is the single instance of the foo module export value, returned with every subsequent call to require("foo").

One interesting implication here is that you can delete require.cache[require.resolve("foo")] to force a reload of the foo module with the next call to require("foo"), because the object describing the module is removed from the require.cache object.

So what does this mean for me?

Without require, you could still share values between modules using global variables. For example, your Alpha/Beta/Gamma case would work just as well with just Beta and Gamma setting and reading the global value global.a. Instead, with Node's require system, you're actually setting and reading global.require.cache[require.resolve("alpha")].exports.a, which Node lets you read neatly as just require("alpha").a.

In fact, if you just want to share a value and don't need to import your shared data as a module, you could also use a namespacing object, i.e., have Beta and Gamma set and read properties of the object global.alpha. The major advantage to using require is that you don't need to set up require("alpha") external to the other scripts, whereas you would need to define global.alpha = {} before requireing Beta and Gamma. (Alternatively, you could conditionally define global.alpha in each module based on a typeof global.alpha == 'undefined' check instead, to see if it's the first module to use it.)

Related Topic