How can I merge two Python dictionaries in a single expression?
For dictionaries x
and y
, z
becomes a shallowly-merged dictionary with values from y
replacing those from x
.
In Python 3.9.0 or greater (released 17 October 2020): PEP-584, discussed here, was implemented and provides the simplest method:
z = x | y # NOTE: 3.9+ ONLY
In Python 3.5 or greater:
z = {**x, **y}
In Python 2, (or 3.4 or lower) write a function:
def merge_two_dicts(x, y):
z = x.copy() # start with keys and values of x
z.update(y) # modifies z with keys and values of y
return z
and now:
z = merge_two_dicts(x, y)
Explanation
Say you have two dictionaries and you want to merge them into a new dictionary without altering the original dictionaries:
x = {'a': 1, 'b': 2}
y = {'b': 3, 'c': 4}
The desired result is to get a new dictionary (z
) with the values merged, and the second dictionary's values overwriting those from the first.
>>> z
{'a': 1, 'b': 3, 'c': 4}
A new syntax for this, proposed in PEP 448 and available as of Python 3.5, is
z = {**x, **y}
And it is indeed a single expression.
Note that we can merge in with literal notation as well:
z = {**x, 'foo': 1, 'bar': 2, **y}
and now:
>>> z
{'a': 1, 'b': 3, 'foo': 1, 'bar': 2, 'c': 4}
It is now showing as implemented in the release schedule for 3.5, PEP 478, and it has now made its way into the What's New in Python 3.5 document.
However, since many organizations are still on Python 2, you may wish to do this in a backward-compatible way. The classically Pythonic way, available in Python 2 and Python 3.0-3.4, is to do this as a two-step process:
z = x.copy()
z.update(y) # which returns None since it mutates z
In both approaches, y
will come second and its values will replace x
's values, thus b
will point to 3
in our final result.
Not yet on Python 3.5, but want a single expression
If you are not yet on Python 3.5 or need to write backward-compatible code, and you want this in a single expression, the most performant while the correct approach is to put it in a function:
def merge_two_dicts(x, y):
"""Given two dictionaries, merge them into a new dict as a shallow copy."""
z = x.copy()
z.update(y)
return z
and then you have a single expression:
z = merge_two_dicts(x, y)
You can also make a function to merge an arbitrary number of dictionaries, from zero to a very large number:
def merge_dicts(*dict_args):
"""
Given any number of dictionaries, shallow copy and merge into a new dict,
precedence goes to key-value pairs in latter dictionaries.
"""
result = {}
for dictionary in dict_args:
result.update(dictionary)
return result
This function will work in Python 2 and 3 for all dictionaries. e.g. given dictionaries a
to g
:
z = merge_dicts(a, b, c, d, e, f, g)
and key-value pairs in g
will take precedence over dictionaries a
to f
, and so on.
Critiques of Other Answers
Don't use what you see in the formerly accepted answer:
z = dict(x.items() + y.items())
In Python 2, you create two lists in memory for each dict, create a third list in memory with length equal to the length of the first two put together, and then discard all three lists to create the dict. In Python 3, this will fail because you're adding two dict_items
objects together, not two lists -
>>> c = dict(a.items() + b.items())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'dict_items' and 'dict_items'
and you would have to explicitly create them as lists, e.g. z = dict(list(x.items()) + list(y.items()))
. This is a waste of resources and computation power.
Similarly, taking the union of items()
in Python 3 (viewitems()
in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don't do this:
>>> c = dict(a.items() | b.items())
This example demonstrates what happens when values are unhashable:
>>> x = {'a': []}
>>> y = {'b': []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
Here's an example where y
should have precedence, but instead the value from x
is retained due to the arbitrary order of sets:
>>> x = {'a': 2}
>>> y = {'a': 1}
>>> dict(x.items() | y.items())
{'a': 2}
Another hack you should not use:
z = dict(x, **y)
This uses the dict
constructor and is very fast and memory-efficient (even slightly more so than our two-step process) but unless you know precisely what is happening here (that is, the second dict is being passed as keyword arguments to the dict constructor), it's difficult to read, it's not the intended usage, and so it is not Pythonic.
Here's an example of the usage being remediated in django.
Dictionaries are intended to take hashable keys (e.g. frozenset
s or tuples), but this method fails in Python 3 when keys are not strings.
>>> c = dict(a, **b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings
From the mailing list, Guido van Rossum, the creator of the language, wrote:
I am fine with
declaring dict({}, **{1:3}) illegal, since after all it is abuse of
the ** mechanism.
and
Apparently dict(x, **y) is going around as "cool hack" for "call
x.update(y) and return x". Personally, I find it more despicable than
cool.
It is my understanding (as well as the understanding of the creator of the language) that the intended usage for dict(**y)
is for creating dictionaries for readability purposes, e.g.:
dict(a=1, b=10, c=11)
instead of
{'a': 1, 'b': 10, 'c': 11}
Despite what Guido says, dict(x, **y)
is in line with the dict specification, which btw. works for both Python 2 and 3. The fact that this only works for string keys is a direct consequence of how keyword parameters work and not a short-coming of dict. Nor is using the ** operator in this place an abuse of the mechanism, in fact, ** was designed precisely to pass dictionaries as keywords.
Again, it doesn't work for 3 when keys are not strings. The implicit calling contract is that namespaces take ordinary dictionaries, while users must only pass keyword arguments that are strings. All other callables enforced it. dict
broke this consistency in Python 2:
>>> foo(**{('a', 'b'): None})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: foo() keywords must be strings
>>> dict(**{('a', 'b'): None})
{('a', 'b'): None}
This inconsistency was bad given other implementations of Python (PyPy, Jython, IronPython). Thus it was fixed in Python 3, as this usage could be a breaking change.
I submit to you that it is malicious incompetence to intentionally write code that only works in one version of a language or that only works given certain arbitrary constraints.
More comments:
dict(x.items() + y.items())
is still the most readable solution for Python 2. Readability counts.
My response: merge_two_dicts(x, y)
actually seems much clearer to me, if we're actually concerned about readability. And it is not forward compatible, as Python 2 is increasingly deprecated.
{**x, **y}
does not seem to handle nested dictionaries. the contents of nested keys are simply overwritten, not merged [...] I ended up being burnt by these answers that do not merge recursively and I was surprised no one mentioned it. In my interpretation of the word "merging" these answers describe "updating one dict with another", and not merging.
Yes. I must refer you back to the question, which is asking for a shallow merge of two dictionaries, with the first's values being overwritten by the second's - in a single expression.
Assuming two dictionaries of dictionaries, one might recursively merge them in a single function, but you should be careful not to modify the dictionaries from either source, and the surest way to avoid that is to make a copy when assigning values. As keys must be hashable and are usually therefore immutable, it is pointless to copy them:
from copy import deepcopy
def dict_of_dicts_merge(x, y):
z = {}
overlapping_keys = x.keys() & y.keys()
for key in overlapping_keys:
z[key] = dict_of_dicts_merge(x[key], y[key])
for key in x.keys() - overlapping_keys:
z[key] = deepcopy(x[key])
for key in y.keys() - overlapping_keys:
z[key] = deepcopy(y[key])
return z
Usage:
>>> x = {'a':{1:{}}, 'b': {2:{}}}
>>> y = {'b':{10:{}}, 'c': {11:{}}}
>>> dict_of_dicts_merge(x, y)
{'b': {2: {}, 10: {}}, 'a': {1: {}}, 'c': {11: {}}}
Coming up with contingencies for other value types is far beyond the scope of this question, so I will point you at my answer to the canonical question on a "Dictionaries of dictionaries merge".
These approaches are less performant, but they will provide correct behavior.
They will be much less performant than copy
and update
or the new unpacking because they iterate through each key-value pair at a higher level of abstraction, but they do respect the order of precedence (latter dictionaries have precedence)
You can also chain the dictionaries manually inside a dict comprehension:
{k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7
or in Python 2.6 (and perhaps as early as 2.4 when generator expressions were introduced):
dict((k, v) for d in dicts for k, v in d.items()) # iteritems in Python 2
itertools.chain
will chain the iterators over the key-value pairs in the correct order:
from itertools import chain
z = dict(chain(x.items(), y.items())) # iteritems in Python 2
I'm only going to do the performance analysis of the usages known to behave correctly. (Self-contained so you can copy and paste yourself.)
from timeit import repeat
from itertools import chain
x = dict.fromkeys('abcdefg')
y = dict.fromkeys('efghijk')
def merge_two_dicts(x, y):
z = x.copy()
z.update(y)
return z
min(repeat(lambda: {**x, **y}))
min(repeat(lambda: merge_two_dicts(x, y)))
min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
min(repeat(lambda: dict(chain(x.items(), y.items()))))
min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))
In Python 3.8.1, NixOS:
>>> min(repeat(lambda: {**x, **y}))
1.0804965235292912
>>> min(repeat(lambda: merge_two_dicts(x, y)))
1.636518670246005
>>> min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
3.1779992282390594
>>> min(repeat(lambda: dict(chain(x.items(), y.items()))))
2.740647904574871
>>> min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))
4.266070580109954
$ uname -a
Linux nixos 4.19.113 #1-NixOS SMP Wed Mar 25 07:06:15 UTC 2020 x86_64 GNU/Linux
Resources on Dictionaries
A closure is a pairing of:
- A function, and
- A reference to that function's outer scope (lexical environment)
A lexical environment is part of every execution context (stack frame) and is a map between identifiers (ie. local variable names) and values.
Every function in JavaScript maintains a reference to its outer lexical environment. This reference is used to configure the execution context created when a function is invoked. This reference enables code inside the function to "see" variables declared outside the function, regardless of when and where the function is called.
If a function was called by a function, which in turn was called by another function, then a chain of references to outer lexical environments is created. This chain is called the scope chain.
In the following code, inner
forms a closure with the lexical environment of the execution context created when foo
is invoked, closing over variable secret
:
function foo() {
const secret = Math.trunc(Math.random()*100)
return function inner() {
console.log(`The secret number is ${secret}.`)
}
}
const f = foo() // `secret` is not directly accessible from outside `foo`
f() // The only way to retrieve `secret`, is to invoke `f`
In other words: in JavaScript, functions carry a reference to a private "box of state", to which only they (and any other functions declared within the same lexical environment) have access. This box of the state is invisible to the caller of the function, delivering an excellent mechanism for data-hiding and encapsulation.
And remember: functions in JavaScript can be passed around like variables (first-class functions), meaning these pairings of functionality and state can be passed around your program: similar to how you might pass an instance of a class around in C++.
If JavaScript did not have closures, then more states would have to be passed between functions explicitly, making parameter lists longer and code noisier.
So, if you want a function to always have access to a private piece of state, you can use a closure.
...and frequently we do want to associate the state with a function. For example, in Java or C++, when you add a private instance variable and a method to a class, you are associating state with functionality.
In C and most other common languages, after a function returns, all the local variables are no longer accessible because the stack-frame is destroyed. In JavaScript, if you declare a function within another function, then the local variables of the outer function can remain accessible after returning from it. In this way, in the code above, secret
remains available to the function object inner
, after it has been returned from foo
.
Uses of Closures
Closures are useful whenever you need a private state associated with a function. This is a very common scenario - and remember: JavaScript did not have a class syntax until 2015, and it still does not have a private field syntax. Closures meet this need.
Private Instance Variables
In the following code, the function toString
closes over the details of the car.
function Car(manufacturer, model, year, color) {
return {
toString() {
return `${manufacturer} ${model} (${year}, ${color})`
}
}
}
const car = new Car('Aston Martin','V8 Vantage','2012','Quantum Silver')
console.log(car.toString())
Functional Programming
In the following code, the function inner
closes over both fn
and args
.
function curry(fn) {
const args = []
return function inner(arg) {
if(args.length === fn.length) return fn(...args)
args.push(arg)
return inner
}
}
function add(a, b) {
return a + b
}
const curriedAdd = curry(add)
console.log(curriedAdd(2)(3)()) // 5
Event-Oriented Programming
In the following code, function onClick
closes over variable BACKGROUND_COLOR
.
const $ = document.querySelector.bind(document)
const BACKGROUND_COLOR = 'rgba(200,200,242,1)'
function onClick() {
$('body').style.background = BACKGROUND_COLOR
}
$('button').addEventListener('click', onClick)
<button>Set background color</button>
Modularization
In the following example, all the implementation details are hidden inside an immediately executed function expression. The functions tick
and toString
close over the private state and functions they need to complete their work. Closures have enabled us to modularise and encapsulate our code.
let namespace = {};
(function foo(n) {
let numbers = []
function format(n) {
return Math.trunc(n)
}
function tick() {
numbers.push(Math.random() * 100)
}
function toString() {
return numbers.map(format)
}
n.counter = {
tick,
toString
}
}(namespace))
const counter = namespace.counter
counter.tick()
counter.tick()
console.log(counter.toString())
Examples
Example 1
This example shows that the local variables are not copied in the closure: the closure maintains a reference to the original variables themselves. It is as though the stack-frame stays alive in memory even after the outer function exits.
function foo() {
let x = 42
let inner = function() { console.log(x) }
x = x+1
return inner
}
var f = foo()
f() // logs 43
Example 2
In the following code, three methods log
, increment
, and update
all close over the same lexical environment.
And every time createObject
is called, a new execution context (stack frame) is created and a completely new variable x
, and a new set of functions (log
etc.) are created, that close over this new variable.
function createObject() {
let x = 42;
return {
log() { console.log(x) },
increment() { x++ },
update(value) { x = value }
}
}
const o = createObject()
o.increment()
o.log() // 43
o.update(5)
o.log() // 5
const p = createObject()
p.log() // 42
Example 3
If you are using variables declared using var
, be careful you understand which variable you are closing over. Variables declared using var
are hoisted. This is much less of a problem in modern JavaScript due to the introduction of let
and const
.
In the following code, each time around the loop, a new function inner
is created, which closes over i
. But because var i
is hoisted outside the loop, all of these inner functions close over the same variable, meaning that the final value of i
(3) is printed, three times.
function foo() {
var result = []
for (var i = 0; i < 3; i++) {
result.push(function inner() { console.log(i) } )
}
return result
}
const result = foo()
// The following will print `3`, three times...
for (var i = 0; i < 3; i++) {
result[i]()
}
Final points:
- Whenever a function is declared in JavaScript closure is created.
- Returning a
function
from inside another function is the classic example of closure, because the state inside the outer function is implicitly available to the returned inner function, even after the outer function has completed execution.
- Whenever you use
eval()
inside a function, a closure is used. The text you eval
can reference local variables of the function, and in the non-strict mode, you can even create new local variables by using eval('var foo = …')
.
- When you use
new Function(…)
(the Function constructor) inside a function, it does not close over its lexical environment: it closes over the global context instead. The new function cannot reference the local variables of the outer function.
- A closure in JavaScript is like keeping a reference (NOT a copy) to the scope at the point of function declaration, which in turn keeps a reference to its outer scope, and so on, all the way to the global object at the top of the scope chain.
- A closure is created when a function is declared; this closure is used to configure the execution context when the function is invoked.
- A new set of local variables is created every time a function is called.
Links
Best Answer
To just merge the arrays (without removing duplicates)
ES5 version use
Array.concat
:ES6 version use destructuring
Since there is no 'built in' way to remove duplicates (ECMA-262 actually has
Array.forEach
which would be great for this), we have to do it manually:Then, to use it:
This will also preserve the order of the arrays (i.e, no sorting needed).
Since many people are annoyed about prototype augmentation of
Array.prototype
andfor in
loops, here is a less invasive way to use it:For those who are fortunate enough to work with browsers where ES5 is available, you can use
Object.defineProperty
like this: