Why JavaScript is Not Compiled to Bytecode Before Network Transfer

bytecodejavascript

You'd often see that JavaScript is actually being transported over the web with all the useless stuff that doesn't need to be there — Comments, particularly those containing licenses, indentations ('\t', '\n'), etc. Given enough time, it could end up wasting terabytes of data worldwide!
Would a JavaScript bytecode cause another bigger problem, or has nobody thought of this yet?

Best Answer

View Source

"View Source" was in the beginning, and still is to some extent, considered to be an important feature of the web. It is how generations of web developers learned web development, and the relevant standards bodies (ECMA TC39, W3C, WHATWG) still take it very seriously.

Minification

ECMAScript files are typically "minified" before being deployed. This includes removal of all comments, all whitespace, and renaming of all identifiers to be as short as possible, plus some higher-level optimizations such as removal of dead code.

Compression

Support for compression exists in HTTP since HTTP/1.0 (early 1996). ECMAScript is text, and text compresses really well. In fact, ECMAScript is text with lots of redundancies (lots of appearances of ;, {, }, (, ), ,, ., function, var, if, for, and so on), and compression algorithms thrive on redundancy. So, the amount of data that is transferred is much smaller than you make it out to be. As an experiment, try compressing an ECMAScript source file with one of the typical compression algorithms used on the web (e.g gzip or deflate), and compare that to the size of the compiled bytecode of the same file.

It turns out that compressed source code is actually pretty small, often comparable or smaller than a typical byte code file.

Also, there are specialized compression algorithms for what I will now term "web text".

Zopfli is an improved encoding algorithm for web text compatible with deflate/zlib. This means it can be decoded by any delate/zlib compliant decoder, in other words, it can be uncompressed by every browser without changes. Compressing takes about 80 times longer than with deflate, for a 3%–8% improvement in output size over "naked" deflate. This might not make sense to do on-the-fly for dynamically created content, but pre-compressing something like JQuery might make sense.

Brotli is a new compression algorithm based on LZ77, Huffman, context modeling, and some other tricks, e.g. a pre-defined dictionary of frequent text chunks extracted from a large corpus of web sites, texts, ECMAScript source files, CSS files, etc. It can achieve up to 25% better compression than deflate/zlib. It is designed to be efficiently decoded on low-end portable devices.

Bytecode format

Which brings us to the next problem: there is no standardized bytecode format for ECMAscript. In fact, some implementations may not even use bytecode at all! For example, for the first couple of years, V8 compiled ECMAScript straight to native machine code, with no bytecode step in between. Chakra, SquirrelFish Extreme, and SpiderMonkey all use bytecode, but they use different bytecode. dyn.js, TruffleJS, Nashorn, and Rhine don't use ECMAScript-specific bytecode, they compile to JVML bytecode. Likewise, IronJS compiles to CLI CIL bytecode.

Now, you might say: why not define a standardized bytecode format for ECMAScript? The problems with this are two-fold:

  1. A bytecode format constrains the design of the execution engine. For example, look at JVMs: JVMs are much more similar to each other than ECMAScript engines. Personally, I believe the "performance race" of the late 2000s / early 2010s would not have been possible without the wide range of experimentation that the lack of a standardized bytecode format afforded.

  2. Not only is it hard to get all ECMAScript engine vendors to agree on a common standardized bytecode format, but consider this: it doesn't make sense to add a bytecode format for only ECMAScript to the browser. If you do a common bytecode format, it would be nice if it supported ActionScript, VBScript, Python, Ruby, Perl, Lua, PHP, etc. as well. But now you have the same problem as in #1, except exponentially increased: not only do all ECMAScript engine vendors need to agree on a common bytecode format, you also have to get the PHP, Perl, Ruby, Python, Lua, etc. communities to agree as well!

Caching

Well-known widely-used libraries are hosted at canonical URIs, where they can be referenced from multiple sites. Therefore, they only need to be downloaded once and can be cached client-side.

CDN

Many libraries use CDNs, so they are actually served from a location close to the user.

Wasm / asm.js

WebAssembly (Wasm) is a compact binary instruction format that is currently being standardized by the W3C and already being shipped in Firefox, Chrome, Safari, and Edge. It is, however, not designed as bytecode format for ECMAScript, rather it is designed as a low-level portable machine code and compilation target for languages like C, C++, and Rust.

Before Wasm, there was already asm.js, which had similar goals, but it was designed as a syntactic and semantic subset of ECMAScript, so you could run it unmodified in a non asm.js-aware engine, and it would work, just much slower.

Related Topic