Code Security – The Case for Code Obfuscation

obfuscation

What are the top reasons to write obfuscated code, in terms of a real benefit to the people developing the code, and the business that runs that code (if the code in question is in fact commericial code)? Are there documented cases (available online in some location) which describe when obfuscation did more good than bad? Are there well-known examples where, for example, obfuscation was proven to meaningfully delay a malicious 3rd party from getting at the code? It seems that, just like rolling up your car windows won't stop people from breaking them and stealing your stereo, obfuscating your code just keeps honest people honest.

=========

Background:

This is an attempt to purposely challenge my assumptions on this topic.

I'm big-time against using code obfuscation in general, but I'm curious if I'm missing something. I get why, in cases like JavaScript, minification helps things load faster and all (there's a real, functional benefit there), but I can't seem to come up with a single reason why code obfuscation, for the purpose of being an obstacle to discovering what an section of code/algorithm does, is actually effective for any purpose whatsoever.

With open source being crazy popular, the question seems to be "share the code, or keep it proprietary?" When it comes to commercial code, I can understand why you can't share everything, and you've got the law in your side to fight theft.

BTW, if the reason someone is writing obfuscated code is "job security" then I would fire any programmer found to be consistently, and purposely using obfuscation with the sole purpose of helping to keep their jobs, unless they could reasonably show that it had some business benefit. It's so completely anti-team that it's ridiculous, and points to someone that's more concerned with keeping their job through misguided practices, then keeping it because they write awesome software.

I only mention this specific case because, while I realize people are usually joking, I'd like to deter any answers whose basic thrust is that obfuscation for job security alone is a good idea.

Best Answer

One very interesting use case for obfuscation is tracing the origin of illicit copies. Assuming that obfuscation is a relatively cheap operation the original author can supply each client with differently obfuscated versions of the application, if an illicit copy is found the author can compare with supplied versions and trace back the source of the piracy.

That's a form of steganography, inspired and in variation of the "traitor tracing" cryptographic schemes. I have no idea if it's common1, or even if it's a good idea, but I've seen it applied in practice under the following parameters:

  • Highly competitive nationwide market with just two vendors,
  • About 50 deployments covered the market,
  • Average development time for both applications was a couple of years (more or less),
  • Average obfuscation time for our application was a couple of hours,
  • Lifespan for both applications was expected to be about ten years.

The rationale was of course security through obscurity initially, and it evolved at the aforementioned scheme at some point2. Both vendors had access to each other's binary code, legally, and I think it's obvious that decompilation attempts from both were expected. Obfuscation did nothing in terms of security, in the long run. Both vendors had highly motivated and talented teams, working in an extremely profitable and niche market, in the end our products were more similar than not, and any competitive advantage was gained through other, less obscure means.

I can't really expand, because (a) it was very early in my career and I didn't get a clear overview of the design decisions or the results of the tracing scheme (if any) and (b) some of my involvement with the project was under a NDA.

Another valid use case for obfuscation could be when you are somehow legally obliged to submit your code to a third party:

If your firm does IP work for technology companies, or is involved in cases involving software source code, you may be obliged to submit your client’s source code to the USPTO, a court or third party.

Since source code is considered a trade secret, most regulatory agencies use a "50%" rule. Source code submitted is obscured so that it cannot be used as-is.

IANAL, and the link is more relevant to hard copies of code rather than actual working code, so this might be completely irrelevant.

Now, as Javascript is the canonical example for obfuscation, there's one side-effect that's not commonly considered, and that's hiding malicious code in obfuscated Javascript. Although there are definite advantages in minifying3 Javascript, I don't see any point in actual obfuscation and I'm happy Douglas Crockford agrees with me:

Then finally, there is that question of code privacy. This is a lost cause. There is no transformation that will keep a determined hacker from understanding your program. This turns out to be true for all programs in all languages, it is just more obviously true with JavaScript because it is delivered in source form. The privacy benefit provided by obfuscation is an illusion. If you don’t want people to see your programs, unplug your server.

As for obfuscation for "job security", that's a behaviour that should never pass code review, and if identified it shouldn't be tolerated. I wouldn't go as far as firing the culprit at first, but repeat offenders definitely deserve a good spanking, at least.

In conclusion, obfuscation is a typical example of security through obscurity, it's only obvious merit is as a deterrent and nothing more. There might be creative use cases4 I don't know of, but in general the benefits are minimal, at best.

1 After writing this I found out this answer which basically describes the same scheme, so it might be more common that I thought.
2 Although steganography is still security through obscurity.
3 Minification ~ removing whitespace and shortening tokens, not intentionally obscuring.
4 Does the International Obfuscated C Code Contest count?

Related Topic