There are all kinds of techniques for high-performance transaction processing and the one in Fowler's article is just one of many at the bleeding edge. Rather than listing a bunch of techniques which may or may not be applicable to anyone's situation, I think it's better to discuss the basic principles and how LMAX addresses a large number of them.
For a high-scale transaction processing system you want to do all of the following as much as possible:
Minimize time spent in the slowest storage tiers. From fastest to slowest on a modern server you have: CPU/L1 -> L2 -> L3 -> RAM -> Disk/LAN -> WAN. The jump from even the fastest modern magnetic disk to the slowest RAM is over 1000x for sequential access; random access is even worse.
Minimize or eliminate time spent waiting. This means sharing as little state as possible, and, if state must be shared, avoiding explicit locks whenever possible.
Spread the workload. CPUs haven't gotten much faster in the past several years, but they have gotten smaller, and 8 cores is pretty common on a server. Beyond that, you can even spread the work over multiple machines, which is Google's approach; the great thing about this is that it scales everything including I/O.
According to Fowler, LMAX takes the following approach to each of these:
Keep all state in memory at all times. Most database engines will actually do this anyway, if the entire database can fit in memory, but they don't want to leave anything up to chance, which is understandable on a real-time trading platform. In order to pull this off without adding a ton of risk, they had to build a bunch of lightweight backup and failover infrastructure.
Use a lock-free queue ("disruptor") for the stream of input events. Contrast to traditional durable message queues which are definitively not lock free, and in fact usually involve painfully-slow distributed transactions.
Not much. LMAX throws this one under the bus on the basis that workloads are interdependent; the outcome of one changes the parameters for the others. This is a critical caveat, and one which Fowler explicitly calls out. They do make some use of concurrency in order to provide failover capabilities, but all of the business logic is processed on a single thread.
LMAX is not the only approach to high-scale OLTP. And although it's quite brilliant in its own right, you do not need to use bleeding-edge techniques in order to pull off that level of performance.
Of all of the principles above, #3 is probably the most important and the most effective, because, frankly, hardware is cheap. If you can properly partition the workload across half a dozen cores and several dozen machines, then the sky's the limit for conventional Parallel Computing techniques. You'd be surprised how much throughput you can pull off with nothing but a bunch of message queues and a round-robin distributor. It's obviously not as efficient as LMAX - actually not even close - but throughput, latency, and cost-effectiveness are separate concerns, and here we're talking specifically about throughput.
If you have the same sort of special needs that LMAX does - in particular, a shared state which corresponds to a business reality as opposed to a hasty design choice - then I'd suggest trying out their component, because I haven't seen much else that's suited to those requirements. But if we're simply talking about high scalability then I'd urge you to do more research into distributed systems, because they are the canonical approach used by most organizations today (Hadoop and related projects, ESB and related architectures, CQRS which Fowler also mentions, and so on).
SSDs are also going to become a game-changer; arguably, they already are. You can now have permanent storage with similar access times to RAM, and although server-grade SSDs are still horribly expensive, they will eventually come down in price once adoption rates grow. It's been researched extensively and the results are pretty mind-boggling and will only get better over time, so the whole "keep everything in memory" concept is a lot less important than it used to be. So once again, I'd try to focus on concurrency whenever possible.
You write a program to solve a problem. That problem is accompanied by a specific set of requirements for solving it. If those requirements are met, the problem is solved and the objective is achieved.
That's it.
Now, the reason that best practices are observed is because some requirements have to do with maintainability, testability, performance guarantees and so forth. Consequently, you have those pesky folks like me who require things like proper coding style. It doesn't take that much more effort to cross your T's and dot your I's, and it is a gesture of respect to those who have to read your code later and figure out what it does.
For large systems, this kind of restraint and discipline is essential, because you have to play nice with others to get it all to work, and you have to minimize technical debt so that the project doesn't collapse under its own weight.
At the opposite end of the spectrum are those one-off utilities that you write to solve a specific problem right now, utilities that you'll never use again. In those cases, style and best practices are completely irrelevant; you hack the thing together, run it, and get on with the next thing.
So, as with so many things in software development, it depends.
Best Answer
TL;DR - they're equivalent examples at the IL layer.
DotNetFiddle makes this pretty to answer as it allows you to see the resulting IL.
I used a slightly different variation of your loop construct in order to make my testing quicker. I used:
Variation 1:
Variation 2:
In both cases, the compiled IL output rendered the same.
So to answer your question: the compiler optimizes out the declaration of the variable, and renders the two variations equivalent.
To my understanding, the .NET IL compiler moves all variable declarations to the beginning of the function but I couldn't find a good source that clearly stated that2. In this particular example, you see that it moved them up with this statement:
Wherein we get a bit too obsessive in making comparisons....
Case A, do all variables get moved up?
To dig into this a bit further, I tested the following function:
The difference here is that we declare either an
int i
or astring j
based upon the comparison. Again, the compiler moves all the local variables to the top of the function2 with:I found it interesting to note that even though
int i
won't be declared in this example, the code to support it is still generated.Case B: What about
foreach
instead offor
?It was pointed out that
foreach
has different behavior thanfor
and that I wasn't checking the same thing that had been asked about. So I put in these two sections of code to compare the resulting IL.int
declaration outside of the loop:int
declaration inside of the loop:The resulting IL with the
foreach
loop was indeed different from the IL generated using thefor
loop. Specifically, the init block and the loop section changed.The
foreach
approach generated more local variables and required some additional branching. Essentially, on the first time in it jumps to the end of the loop to get the first iteration of the enumeration and then jumps back to almost the top of the loop to execute the loop code. It then continues to loop through as you'd expect.But beyond the branching differences caused by using the
for
andforeach
constructs, there was no difference in the IL based upon where theint i
declaration was placed. So we're still at the two approaches being equivalent.Case C: What about different compiler versions?
In a comment that was left1, there was a link to an SO question regarding a warning about variable access with foreach and using closure. The part that really caught my eye in that question was that there may have been differences in how the .NET 4.5 compiler worked versus earlier versions of the compiler.
And that's where the DotNetFiddler site let me down - all they had available was .NET 4.5 and a version of the Roslyn compiler. So I brought up a local instance of Visual Studio and started testing out the code. To make sure I was comparing the same things, I compared locally built code at .NET 4.5 to the DotNetFiddler code.
The only difference that I noted was with the local init block and variable declaration. The local compiler was a bit more specific in naming the variables.
But with that minor difference, it was so far, so good. I had equivalent IL output between the DotNetFiddler compiler and what my local VS instance was producing.
So I then rebuilt the project targeting .NET 4, .NET 3.5, and for good measure .NET 3.5 Release mode.
And in all three of those additional cases, the generated IL was equivalent. The targeted .NET version had no effect on the IL that was generated in these samples.
To summarize this adventure: I think we can confidently say that the compiler does not care where you declare the primitive type and that there is no effect upon memory or performance with either declaration method. And that holds true regardless of using a
for
orforeach
loop.I considered running yet another case that incorporated a closure inside of the
foreach
loop. But you had asked about the effects of where a primitive type variable was declared, so I figured I was delving too far beyond what you were interested in asking about. The SO question I mentioned earlier has a great answer that provides a good overview about closure effects on foreach iteration variables.1 Thank you to Andy for providing the original link to the SO question addressing closures within
foreach
loops.2 It's worth noting that the ECMA-335 spec addresses this with section I.12.3.2.2 'Local variables and arguments'. I had to see the resulting IL and then read the section for it to be clear regarding what was going on. Thanks to ratchet freak for pointing that out in chat.