I recently interviewed at Amazon. During a coding session, the interviewer asked why I declared a variable in a method. I explained my process and he challenged me to solve the same problem with fewer variables. For example (this wasn't from the interview), I started with Method A then improved it to Method B, by removing int s
. He was pleased and said this would reduce memory usage by this method.
I understand the logic behind it, but my question is:
When is it appropriate to use Method A vs. Method B, and vice versa?
You can see that Method A is going to have higher memory usage, since int s
is declared, but it only has to perform one calculation, i.e. a + b
. On the other hand, Method B has lower memory usage, but has to perform two calculations, i.e. a + b
twice. When do I use one technique over the other? Or, is one of the techniques always preferred over the other? What are things to consider when evaluating the two methods?
Method A:
private bool IsSumInRange(int a, int b)
{
int s = a + b;
if (s > 1000 || s < -1000) return false;
else return true;
}
Method B:
private bool IsSumInRange(int a, int b)
{
if (a + b > 1000 || a + b < -1000) return false;
else return true;
}
Best Answer
Instead of speculating about what may or may not happen, let's just look, shall we? I'll have to use C++ since I don't have a C# compiler handy (though see the C# example from VisualMelon), but I'm sure the same principles apply regardless.
We'll include the two alternatives you encountered in the interview. We'll also include a version that uses
abs
as suggested by some of the answers.Now compile it with no optimization whatsoever:
g++ -c -o test.o test.cpp
Now we can see precisely what this generates:
objdump -d test.o
We can see from the stack addresses (for example, the
-0x4
inmov %edi,-0x4(%rbp)
versus the-0x14
inmov %edi,-0x14(%rbp)
) thatIsSumInRangeWithVar()
uses 16 extra bytes on the stack.Because
IsSumInRangeWithoutVar()
allocates no space on the stack to store the intermediate values
it has to recalculate it, resulting in this implementation being 2 instructions longer.Funny,
IsSumInRangeSuperOptimized()
looks a lot likeIsSumInRangeWithoutVar()
, except it compares to -1000 first, and 1000 second.Now let's compile with only the most basic optimizations:
g++ -O1 -c -o test.o test.cpp
. The result:Would you look at that: each variant is identical. The compiler is able to do something quite clever:
abs(a + b) <= 1000
is equivalent toa + b + 1000 <= 2000
consideringsetbe
does an unsigned comparison, so a negative number becomes a very large positive number. Thelea
instruction can actually perform all these additions in one instruction, and eliminate all the conditional branches.To answer your question, almost always the thing to optimize for is not memory or speed, but readability. Reading code is a lot harder than writing it, and reading code that's been mangled to "optimize" it is a lot harder than reading code that's been written to be clear. More often than not, these "optimizations" have negligible, or as in this case exactly zero actual impact on performance.
Let's measure! I've transcribed the examples to Python:
Run with Python 3.5.2, this produces the output:
Disassembly in Python isn't terribly interesting, since the bytecode "compiler" doesn't do much in the way of optimization.
The performance of the three functions is nearly identical. We might be tempted to go with
IsSumInRangeWithVar()
due to it's marginal speed gain. Though I'll add as I was trying different parameters totimeit
, sometimesIsSumInRangeSuperOptimized()
came out fastest, so I suspect it may be external factors responsible for the difference, rather than any intrinsic advantage of any implementation.If this is really performance critical code, an interpreted language is simply a very poor choice. Running the same program with pypy, I get:
Just using pypy, which uses JIT compilation to eliminate a lot of the interpreter overhead, has yielded a performance improvement of 1 or 2 orders of magnitude. I was quite shocked to see
IsSumInRangeWithVar()
is an order of magnitude faster than the others. So I changed the order of the benchmarks and ran again:So it seems it's not actually anything about the implementation that makes it fast, but rather the order in which I do the benchmarking!
I'd love to dig in to this more deeply, because honestly I don't know why this happens. But I believe the point has been made: micro-optimizations like whether to declare an intermediate value as a variable or not are rarely relevant. With an interpreted language or highly optimized compiler, the first objective is still to write clear code.
If further optimization might be required, benchmark. Remember that the best optimizations come not from the little details but the bigger algorithmic picture: pypy is going to be an order of magnitude faster for repeated evaluation of the same function than cpython because it uses faster algorithms (JIT compiler vs interpretation) to evaluate the program. And there's the coded algorithm to consider as well: a search through a B-tree will be faster than a linked list.
After ensuring you're using the right tools and algorithms for the job, be prepared to dive deep into the details of the system. The results can be very surprising, even for experienced developers, and this is why you must have a benchmark to quantify the changes.