Javascript – How to merge two arrays in JavaScript and de-duplicate items

arraysjavascriptmerge

I have two JavaScript arrays:

var array1 = ["Vijendra","Singh"];
var array2 = ["Singh", "Shakya"];

I want the output to be:

var array3 = ["Vijendra","Singh","Shakya"];

The output array should have repeated words removed.

How do I merge two arrays in JavaScript so that I get only the unique items from each array in the same order they were inserted into the original arrays?

Best Answer

To just merge the arrays (without removing duplicates)

ES5 version use `Array.concat`:

var array1 = ["Vijendra", "Singh"];
var array2 = ["Singh", "Shakya"];

console.log(array1.concat(array2));

ES6 version use destructuring

const array1 = ["Vijendra","Singh"];
const array2 = ["Singh", "Shakya"];
const array3 = [...array1, ...array2];

Since there is no 'built in' way to remove duplicates (ECMA-262 actually has Array.forEach which would be great for this), we have to do it manually:

Array.prototype.unique = function() {
    var a = this.concat();
    for(var i=0; i<a.length; ++i) {
        for(var j=i+1; j<a.length; ++j) {
            if(a[i] === a[j])
                a.splice(j--, 1);
        }
    }

    return a;
};

Then, to use it:

var array1 = ["Vijendra","Singh"];
var array2 = ["Singh", "Shakya"];
// Merges both arrays and gets unique items
var array3 = array1.concat(array2).unique();

This will also preserve the order of the arrays (i.e, no sorting needed).

Since many people are annoyed about prototype augmentation of Array.prototype and for in loops, here is a less invasive way to use it:

function arrayUnique(array) {
    var a = array.concat();
    for(var i=0; i<a.length; ++i) {
        for(var j=i+1; j<a.length; ++j) {
            if(a[i] === a[j])
                a.splice(j--, 1);
        }
    }

    return a;
}

var array1 = ["Vijendra","Singh"];
var array2 = ["Singh", "Shakya"];
    // Merges both arrays and gets unique items
var array3 = arrayUnique(array1.concat(array2));

For those who are fortunate enough to work with browsers where ES5 is available, you can use Object.defineProperty like this:

Object.defineProperty(Array.prototype, 'unique', {
    enumerable: false,
    configurable: false,
    writable: false,
    value: function() {
        var a = this.concat();
        for(var i=0; i<a.length; ++i) {
            for(var j=i+1; j<a.length; ++j) {
                if(a[i] === a[j])
                    a.splice(j--, 1);
            }
        }

        return a;
    }
});

Explanation

Say you have two dictionaries and you want to merge them into a new dictionary without altering the original dictionaries:

x = {'a': 1, 'b': 2}
y = {'b': 3, 'c': 4}

The desired result is to get a new dictionary (z) with the values merged, and the second dictionary's values overwriting those from the first.

>>> z
{'a': 1, 'b': 3, 'c': 4}

A new syntax for this, proposed in PEP 448 and available as of Python 3.5, is

z = {**x, **y}

And it is indeed a single expression.

Note that we can merge in with literal notation as well:

z = {**x, 'foo': 1, 'bar': 2, **y}

and now:

>>> z
{'a': 1, 'b': 3, 'foo': 1, 'bar': 2, 'c': 4}

It is now showing as implemented in the release schedule for 3.5, PEP 478, and it has now made its way into the What's New in Python 3.5 document.

However, since many organizations are still on Python 2, you may wish to do this in a backward-compatible way. The classically Pythonic way, available in Python 2 and Python 3.0-3.4, is to do this as a two-step process:

z = x.copy()
z.update(y) # which returns None since it mutates z

In both approaches, y will come second and its values will replace x's values, thus b will point to 3 in our final result.

Not yet on Python 3.5, but want a single expression

If you are not yet on Python 3.5 or need to write backward-compatible code, and you want this in a single expression, the most performant while the correct approach is to put it in a function:

def merge_two_dicts(x, y):
    """Given two dictionaries, merge them into a new dict as a shallow copy."""
    z = x.copy()
    z.update(y)
    return z

and then you have a single expression:

z = merge_two_dicts(x, y)

You can also make a function to merge an arbitrary number of dictionaries, from zero to a very large number:

def merge_dicts(*dict_args):
    """
    Given any number of dictionaries, shallow copy and merge into a new dict,
    precedence goes to key-value pairs in latter dictionaries.
    """
    result = {}
    for dictionary in dict_args:
        result.update(dictionary)
    return result

This function will work in Python 2 and 3 for all dictionaries. e.g. given dictionaries a to g:

z = merge_dicts(a, b, c, d, e, f, g)

and key-value pairs in g will take precedence over dictionaries a to f, and so on.

Critiques of Other Answers

Don't use what you see in the formerly accepted answer:

z = dict(x.items() + y.items())

In Python 2, you create two lists in memory for each dict, create a third list in memory with length equal to the length of the first two put together, and then discard all three lists to create the dict. In Python 3, this will fail because you're adding two dict_items objects together, not two lists -

>>> c = dict(a.items() + b.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'dict_items' and 'dict_items'

and you would have to explicitly create them as lists, e.g. z = dict(list(x.items()) + list(y.items())). This is a waste of resources and computation power.

Similarly, taking the union of items() in Python 3 (viewitems() in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don't do this:

>>> c = dict(a.items() | b.items())

This example demonstrates what happens when values are unhashable:

>>> x = {'a': []}
>>> y = {'b': []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Here's an example where y should have precedence, but instead the value from x is retained due to the arbitrary order of sets:

>>> x = {'a': 2}
>>> y = {'a': 1}
>>> dict(x.items() | y.items())
{'a': 2}

Another hack you should not use:

z = dict(x, **y)

This uses the dict constructor and is very fast and memory-efficient (even slightly more so than our two-step process) but unless you know precisely what is happening here (that is, the second dict is being passed as keyword arguments to the dict constructor), it's difficult to read, it's not the intended usage, and so it is not Pythonic.

Here's an example of the usage being remediated in django.

Dictionaries are intended to take hashable keys (e.g. frozensets or tuples), but this method fails in Python 3 when keys are not strings.

>>> c = dict(a, **b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

From the mailing list, Guido van Rossum, the creator of the language, wrote:

I am fine with declaring dict({}, **{1:3}) illegal, since after all it is abuse of the ** mechanism.

and

Apparently dict(x, **y) is going around as "cool hack" for "call x.update(y) and return x". Personally, I find it more despicable than cool.

It is my understanding (as well as the understanding of the creator of the language) that the intended usage for dict(**y) is for creating dictionaries for readability purposes, e.g.:

dict(a=1, b=10, c=11)

instead of

{'a': 1, 'b': 10, 'c': 11}

Response to comments

Despite what Guido says, dict(x, **y) is in line with the dict specification, which btw. works for both Python 2 and 3. The fact that this only works for string keys is a direct consequence of how keyword parameters work and not a short-coming of dict. Nor is using the ** operator in this place an abuse of the mechanism, in fact, ** was designed precisely to pass dictionaries as keywords.

Again, it doesn't work for 3 when keys are not strings. The implicit calling contract is that namespaces take ordinary dictionaries, while users must only pass keyword arguments that are strings. All other callables enforced it. dict broke this consistency in Python 2:

>>> foo(**{('a', 'b'): None})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() keywords must be strings
>>> dict(**{('a', 'b'): None})
{('a', 'b'): None}

This inconsistency was bad given other implementations of Python (PyPy, Jython, IronPython). Thus it was fixed in Python 3, as this usage could be a breaking change.

I submit to you that it is malicious incompetence to intentionally write code that only works in one version of a language or that only works given certain arbitrary constraints.

More comments:

dict(x.items() + y.items()) is still the most readable solution for Python 2. Readability counts.

My response: merge_two_dicts(x, y) actually seems much clearer to me, if we're actually concerned about readability. And it is not forward compatible, as Python 2 is increasingly deprecated.

{**x, **y} does not seem to handle nested dictionaries. the contents of nested keys are simply overwritten, not merged [...] I ended up being burnt by these answers that do not merge recursively and I was surprised no one mentioned it. In my interpretation of the word "merging" these answers describe "updating one dict with another", and not merging.

Yes. I must refer you back to the question, which is asking for a shallow merge of two dictionaries, with the first's values being overwritten by the second's - in a single expression.

Assuming two dictionaries of dictionaries, one might recursively merge them in a single function, but you should be careful not to modify the dictionaries from either source, and the surest way to avoid that is to make a copy when assigning values. As keys must be hashable and are usually therefore immutable, it is pointless to copy them:

from copy import deepcopy

def dict_of_dicts_merge(x, y):
    z = {}
    overlapping_keys = x.keys() & y.keys()
    for key in overlapping_keys:
        z[key] = dict_of_dicts_merge(x[key], y[key])
    for key in x.keys() - overlapping_keys:
        z[key] = deepcopy(x[key])
    for key in y.keys() - overlapping_keys:
        z[key] = deepcopy(y[key])
    return z

Usage:

>>> x = {'a':{1:{}}, 'b': {2:{}}}
>>> y = {'b':{10:{}}, 'c': {11:{}}}
>>> dict_of_dicts_merge(x, y)
{'b': {2: {}, 10: {}}, 'a': {1: {}}, 'c': {11: {}}}

Coming up with contingencies for other value types is far beyond the scope of this question, so I will point you at my answer to the canonical question on a "Dictionaries of dictionaries merge".

Less Performant But Correct Ad-hocs

These approaches are less performant, but they will provide correct behavior. They will be much less performant than copy and update or the new unpacking because they iterate through each key-value pair at a higher level of abstraction, but they do respect the order of precedence (latter dictionaries have precedence)

You can also chain the dictionaries manually inside a dict comprehension:

{k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7

or in Python 2.6 (and perhaps as early as 2.4 when generator expressions were introduced):

dict((k, v) for d in dicts for k, v in d.items()) # iteritems in Python 2

itertools.chain will chain the iterators over the key-value pairs in the correct order:

from itertools import chain
z = dict(chain(x.items(), y.items())) # iteritems in Python 2

Performance Analysis

I'm only going to do the performance analysis of the usages known to behave correctly. (Self-contained so you can copy and paste yourself.)

from timeit import repeat
from itertools import chain

x = dict.fromkeys('abcdefg')
y = dict.fromkeys('efghijk')

def merge_two_dicts(x, y):
    z = x.copy()
    z.update(y)
    return z

min(repeat(lambda: {**x, **y}))
min(repeat(lambda: merge_two_dicts(x, y)))
min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
min(repeat(lambda: dict(chain(x.items(), y.items()))))
min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))

In Python 3.8.1, NixOS:

>>> min(repeat(lambda: {**x, **y}))
1.0804965235292912
>>> min(repeat(lambda: merge_two_dicts(x, y)))
1.636518670246005
>>> min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
3.1779992282390594
>>> min(repeat(lambda: dict(chain(x.items(), y.items()))))
2.740647904574871
>>> min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))
4.266070580109954

$ uname -a
Linux nixos 4.19.113 #1-NixOS SMP Wed Mar 25 07:06:15 UTC 2020 x86_64 GNU/Linux

Resources on Dictionaries

My explanation of Python's dictionary implementation, updated for 3.6.
Answer on how to add new keys to a dictionary
Mapping two lists into a dictionary
The official Python docs on dictionaries
The Dictionary Even Mightier - talk by Brandon Rhodes at Pycon 2017
Modern Python Dictionaries, A Confluence of Great Ideas - talk by Raymond Hettinger at Pycon 2017

Javascript – How do JavaScript closures work

A closure is a pairing of:

A function, and
A reference to that function's outer scope (lexical environment)

A lexical environment is part of every execution context (stack frame) and is a map between identifiers (ie. local variable names) and values.

Every function in JavaScript maintains a reference to its outer lexical environment. This reference is used to configure the execution context created when a function is invoked. This reference enables code inside the function to "see" variables declared outside the function, regardless of when and where the function is called.

If a function was called by a function, which in turn was called by another function, then a chain of references to outer lexical environments is created. This chain is called the scope chain.

In the following code, inner forms a closure with the lexical environment of the execution context created when foo is invoked, closing over variable secret:

function foo() {
  const secret = Math.trunc(Math.random()*100)
  return function inner() {
    console.log(`The secret number is ${secret}.`)
  }
}
const f = foo() // `secret` is not directly accessible from outside `foo`
f() // The only way to retrieve `secret`, is to invoke `f`

In other words: in JavaScript, functions carry a reference to a private "box of state", to which only they (and any other functions declared within the same lexical environment) have access. This box of the state is invisible to the caller of the function, delivering an excellent mechanism for data-hiding and encapsulation.

And remember: functions in JavaScript can be passed around like variables (first-class functions), meaning these pairings of functionality and state can be passed around your program: similar to how you might pass an instance of a class around in C++.

If JavaScript did not have closures, then more states would have to be passed between functions explicitly, making parameter lists longer and code noisier.

So, if you want a function to always have access to a private piece of state, you can use a closure.

...and frequently we do want to associate the state with a function. For example, in Java or C++, when you add a private instance variable and a method to a class, you are associating state with functionality.

In C and most other common languages, after a function returns, all the local variables are no longer accessible because the stack-frame is destroyed. In JavaScript, if you declare a function within another function, then the local variables of the outer function can remain accessible after returning from it. In this way, in the code above, secret remains available to the function object inner, after it has been returned from foo.

Uses of Closures

Closures are useful whenever you need a private state associated with a function. This is a very common scenario - and remember: JavaScript did not have a class syntax until 2015, and it still does not have a private field syntax. Closures meet this need.

Private Instance Variables

In the following code, the function toString closes over the details of the car.

function Car(manufacturer, model, year, color) {
  return {
    toString() {
      return `${manufacturer} ${model} (${year}, ${color})`
    }
  }
}
const car = new Car('Aston Martin','V8 Vantage','2012','Quantum Silver')
console.log(car.toString())

Functional Programming

In the following code, the function inner closes over both fn and args.

function curry(fn) {
  const args = []
  return function inner(arg) {
    if(args.length === fn.length) return fn(...args)
    args.push(arg)
    return inner
  }
}

function add(a, b) {
  return a + b
}

const curriedAdd = curry(add)
console.log(curriedAdd(2)(3)()) // 5

Event-Oriented Programming

In the following code, function onClick closes over variable BACKGROUND_COLOR.

const $ = document.querySelector.bind(document)
const BACKGROUND_COLOR = 'rgba(200,200,242,1)'

function onClick() {
  $('body').style.background = BACKGROUND_COLOR
}

$('button').addEventListener('click', onClick)

<button>Set background color</button>

Modularization

In the following example, all the implementation details are hidden inside an immediately executed function expression. The functions tick and toString close over the private state and functions they need to complete their work. Closures have enabled us to modularise and encapsulate our code.

let namespace = {};

(function foo(n) {
  let numbers = []
  function format(n) {
    return Math.trunc(n)
  }
  function tick() {
    numbers.push(Math.random() * 100)
  }
  function toString() {
    return numbers.map(format)
  }
  n.counter = {
    tick,
    toString
  }
}(namespace))

const counter = namespace.counter
counter.tick()
counter.tick()
console.log(counter.toString())

Examples

Example 1

This example shows that the local variables are not copied in the closure: the closure maintains a reference to the original variables themselves. It is as though the stack-frame stays alive in memory even after the outer function exits.

function foo() {
  let x = 42
  let inner  = function() { console.log(x) }
  x = x+1
  return inner
}
var f = foo()
f() // logs 43

Example 2

In the following code, three methods log, increment, and update all close over the same lexical environment.

And every time createObject is called, a new execution context (stack frame) is created and a completely new variable x, and a new set of functions (log etc.) are created, that close over this new variable.

function createObject() {
  let x = 42;
  return {
    log() { console.log(x) },
    increment() { x++ },
    update(value) { x = value }
  }
}

const o = createObject()
o.increment()
o.log() // 43
o.update(5)
o.log() // 5
const p = createObject()
p.log() // 42

Example 3

If you are using variables declared using var, be careful you understand which variable you are closing over. Variables declared using var are hoisted. This is much less of a problem in modern JavaScript due to the introduction of let and const.

In the following code, each time around the loop, a new function inner is created, which closes over i. But because var i is hoisted outside the loop, all of these inner functions close over the same variable, meaning that the final value of i (3) is printed, three times.

function foo() {
  var result = []
  for (var i = 0; i < 3; i++) {
    result.push(function inner() { console.log(i) } )
  }
  return result
}

const result = foo()
// The following will print `3`, three times...
for (var i = 0; i < 3; i++) {
  result[i]() 
}

Final points:

Whenever a function is declared in JavaScript closure is created.
Returning a function from inside another function is the classic example of closure, because the state inside the outer function is implicitly available to the returned inner function, even after the outer function has completed execution.
Whenever you use eval() inside a function, a closure is used. The text you eval can reference local variables of the function, and in the non-strict mode, you can even create new local variables by using eval('var foo = …').
When you use new Function(…) (the Function constructor) inside a function, it does not close over its lexical environment: it closes over the global context instead. The new function cannot reference the local variables of the outer function.
A closure in JavaScript is like keeping a reference (NOT a copy) to the scope at the point of function declaration, which in turn keeps a reference to its outer scope, and so on, all the way to the global object at the top of the scope chain.
A closure is created when a function is declared; this closure is used to configure the execution context when the function is invoked.
A new set of local variables is created every time a function is called.