Given a number, find the next higher number which has the exact same set of digits as the original number

algorithmdigits

I just bombed an interview and made pretty much zero progress on my interview question. Can anyone let me know how to do this? I tried searching online but couldn't find anything:

Given a number, find the next higher number which has the exact same
set of digits as the original number. For example: given 38276 return
38627

I wanted to begin by finding the index of the first digit (from the right) that was less than the ones digit. Then I would rotate the last digits in the subset such that it was the next biggest number comprised of the same digits, but got stuck.

The interviewer also suggested trying to swap digits one at a time, but I couldn't figure out the algorithm and just stared at a screen for like 20-30 minutes. Needless to say, I think I'm going to have to continue the job hunt.

edit: for what its worth, I was invited to the next round of interviews

Best Answer

You can do it in O(n) (where n is the number of digits) like this:

Starting from the right, you find the first pair-of-digits such that the left-digit is smaller than the right-digit. Let's refer to the left-digit by "digit-x". Find the smallest number larger than digit-x to the right of digit-x, and place it immediately left of digit-x. Finally, sort the remaining digits in ascending order - since they were already in descending order, all you need to do is reverse them (save for digit-x, which can be placed in the correct place in O(n)).

An example will make this more clear:

123456784987654321
start with a number

123456784 987654321
         ^the first place from the right where the left-digit is less than the right  
         Digit "x" is 4

123456784 987654321
              ^find the smallest digit larger than 4 to the right

123456785 4 98764321
        ^place it to the left of 4

123456785 4 12346789
123456785123446789
         ^sort the digits to the right of 5.  Since all of them except 
         the '4' were already in descending order, all we need to do is 
         reverse their order, and find the correct place for the '4'

Proof of correctness:

Let's use capital letters to define digit-strings and lower-case for digits. The syntax AB means "the concatenation of strings A and B". < is lexicographical ordering, which is the same as integer ordering when the digit-strings are of equal length.

Our original number N is of the form AxB, where x is a single digit and B is sorted descending.
The number found by our algorithm is AyC, where y ∈ B is the smallest digit > x (it must exist due to the way x was chosen, see above), and C is sorted ascending.

Assume there is some number (using the same digits) N' such that AxB < N' < AyC. N' must begin with A or else it could not fall between them, so we can write it in the form AzD. Now our inequality is AxB < AzD < AyC, which is equivalent to xB < zD < yC where all three digit-strings contain the same digits.

In order for that to be true, we must have x <= z <= y. Since y is the smallest digit > x, z cannot be between them, so either z = x or z = y. Say z = x. Then our inequality is xB < xD < yC, which means B < D where both B and D have the same digits. However, B is sorted descending, so there is no string with those digits larger than it. Thus we cannot have B < D. Following the same steps, we see that if z = y, we cannot have D < C.

Therefore N' cannot exist, which means our algorithm correctly finds the next largest number.

Portable algorithms that don't need (or benefit from) any HW support

A pre-populated table lookup method can be very fast if your CPU has a large cache and you are doing lots of these operations in a tight loop. However it can suffer because of the expense of a 'cache miss', where the CPU has to fetch some of the table from main memory. (Look up each byte separately to keep the table small.) If you want popcount for a contiguous range of numbers, only the low byte is changing for groups of 256 numbers, making this very good.

If you know that your bytes will be mostly 0's or mostly 1's then there are efficient algorithms for these scenarios, e.g. clearing the lowest set with a bithack in a loop until it becomes zero.

I believe a very good general purpose algorithm is the following, known as 'parallel' or 'variable-precision SWAR algorithm'. I have expressed this in a C-like pseudo language, you may need to adjust it to work for a particular language (e.g. using uint32_t for C++ and >>> in Java):

GCC10 and clang 10.0 can recognize this pattern / idiom and compile it to a hardware popcnt or equivalent instruction when available, giving you the best of both worlds. (https://godbolt.org/z/qGdh1dvKK)

int numberOfSetBits(uint32_t i)
{
     // Java: use int, and use >>> instead of >>. Or use Integer.bitCount()
     // C or C++: use uint32_t
     i = i - ((i >> 1) & 0x55555555);        // add pairs of bits
     i = (i & 0x33333333) + ((i >> 2) & 0x33333333);  // quads
     i = (i + (i >> 4)) & 0x0F0F0F0F;        // groups of 8
     return (i * 0x01010101) >> 24;          // horizontal sum of bytes
}

For JavaScript: coerce to integer with |0 for performance: change the first line to i = (i|0) - ((i >> 1) & 0x55555555);

This has the best worst-case behaviour of any of the algorithms discussed, so will efficiently deal with any usage pattern or values you throw at it. (Its performance is not data-dependent on normal CPUs where all integer operations including multiply are constant-time. It doesn't get any faster with "simple" inputs, but it's still pretty decent.)

References:

How this SWAR bithack works:

i = i - ((i >> 1) & 0x55555555);

The first step is an optimized version of masking to isolate the odd / even bits, shifting to line them up, and adding. This effectively does 16 separate additions in 2-bit accumulators (SWAR = SIMD Within A Register). Like (i & 0x55555555) + ((i>>1) & 0x55555555).

The next step takes the odd/even eight of those 16x 2-bit accumulators and adds again, producing 8x 4-bit sums. The i - ... optimization isn't possible this time so it does just mask before / after shifting. Using the same 0x33... constant both times instead of 0xccc... before shifting is a good thing when compiling for ISAs that need to construct 32-bit constants in registers separately.

The final shift-and-add step of (i + (i >> 4)) & 0x0F0F0F0F widens to 4x 8-bit accumulators. It masks after adding instead of before, because the maximum value in any 4-bit accumulator is 4, if all 4 bits of the corresponding input bits were set. 4+4 = 8 which still fits in 4 bits, so carry between nibble elements is impossible in i + (i >> 4).

So far this is just fairly normal SIMD using SWAR techniques with a few clever optimizations. Continuing on with the same pattern for 2 more steps can widen to 2x 16-bit then 1x 32-bit counts. But there is a more efficient way on machines with fast hardware multiply:

Once we have few enough "elements", a multiply with a magic constant can sum all the elements into the top element. In this case byte elements. Multiply is done by left-shifting and adding, so a multiply of x * 0x01010101 results in x + (x<<8) + (x<<16) + (x<<24). Our 8-bit elements are wide enough (and holding small enough counts) that this doesn't produce carry into that top 8 bits.

A 64-bit version of this can do 8x 8-bit elements in a 64-bit integer with a 0x0101010101010101 multiplier, and extract the high byte with >>56. So it doesn't take any extra steps, just wider constants. This is what GCC uses for __builtin_popcountll on x86 systems when the hardware popcnt instruction isn't enabled. If you can use builtins or intrinsics for this, do so to give the compiler a chance to do target-specific optimizations.

With full SIMD for wider vectors (e.g. counting a whole array)

This bitwise-SWAR algorithm could parallelize to be done in multiple vector elements at once, instead of in a single integer register, for a speedup on CPUs with SIMD but no usable popcount instruction. (e.g. x86-64 code that has to run on any CPU, not just Nehalem or later.)

However, the best way to use vector instructions for popcount is usually by using a variable-shuffle to do a table-lookup for 4 bits at a time of each byte in parallel. (The 4 bits index a 16 entry table held in a vector register).

On Intel CPUs, the hardware 64bit popcnt instruction can outperform an SSSE3 PSHUFB bit-parallel implementation by about a factor of 2, but only if your compiler gets it just right. Otherwise SSE can come out significantly ahead. Newer compiler versions are aware of the popcnt false dependency problem on Intel.

https://github.com/WojciechMula/sse-popcount state-of-the-art x86 SIMD popcount for SSSE3, AVX2, AVX512BW, AVX512VBMI, or AVX512 VPOPCNT. Using Harley-Seal across vectors to defer popcount within an element. (Also ARM NEON)
Counting 1 bits (population count) on large data using AVX-512 or AVX-2
related: https://github.com/mklarqvist/positional-popcount - separate counts for each bit-position of multiple 8, 16, 32, or 64-bit integers. (Again, x86 SIMD including AVX-512 which is really good at this, with vpternlogd making Harley-Seal very good.)

Easy interview question got harder: given numbers 1..100, find the missing number(s) given exactly k are missing

Here's a summary of Dimitris Andreou's link.

Remember sum of i-th powers, where i=1,2,..,k. This reduces the problem to solving the system of equations

a₁ + a₂ + ... + a_k = b₁

a₁² + a₂² + ... + a_k² = b₂

...

a₁^k + a₂^k + ... + a_k^k = b_k

Using Newton's identities, knowing b_i allows to compute

c₁ = a₁ + a₂ + ... a_k

c₂ = a₁a₂ + a₁a₃ + ... + a_k-1a_k

...

c_k = a₁a₂ ... a_k

If you expand the polynomial (x-a₁)...(x-a_k) the coefficients will be exactly c₁, ..., c_k - see Viète's formulas. Since every polynomial factors uniquely (ring of polynomials is an Euclidean domain), this means a_i are uniquely determined, up to permutation.

This ends a proof that remembering powers is enough to recover the numbers. For constant k, this is a good approach.

However, when k is varying, the direct approach of computing c₁,...,c_k is prohibitely expensive, since e.g. c_k is the product of all missing numbers, magnitude n!/(n-k)!. To overcome this, perform computations in Z_q field, where q is a prime such that n <= q < 2n - it exists by Bertrand's postulate. The proof doesn't need to be changed, since the formulas still hold, and factorization of polynomials is still unique. You also need an algorithm for factorization over finite fields, for example the one by Berlekamp or Cantor-Zassenhaus.

High level pseudocode for constant k:

Compute i-th powers of given numbers
Subtract to get sums of i-th powers of unknown numbers. Call the sums b_i.
Use Newton's identities to compute coefficients from b_i; call them c_i. Basically, c₁ = b₁; c₂ = (c₁b₁ - b₂)/2; see Wikipedia for exact formulas
Factor the polynomial x^k-c₁x^k-1 + ... + c_k.
The roots of the polynomial are the needed numbers a₁, ..., a_k.

For varying k, find a prime n <= q < 2n using e.g. Miller-Rabin, and perform the steps with all numbers reduced modulo q.

EDIT: The previous version of this answer stated that instead of Z_q, where q is prime, it is possible to use a finite field of characteristic 2 (q=2^(log n)). This is not the case, since Newton's formulas require division by numbers up to k.

Best Answer

Related Solutions

How to count the number of set bits in a 32-bit integer

Portable algorithms that don't need (or benefit from) any HW support

How this SWAR bithack works:

With full SIMD for wider vectors (e.g. counting a whole array)

Easy interview question got harder: given numbers 1..100, find the missing number(s) given exactly k are missing

Related Topic