# The JavaScript Binary AST .center[A proposal to speed up the web] .center[Mozilla, Bloomberg, Facebook] .center[David Teller, Mozilla] .center[Fosdem 2018] --- ## Making the web fast enough? - On the web, there's no such thing as fast enough. - JavaScript has become very fast but... --- ## ...looking at trends - How much JS in Google Sheets, Google Docs, Yahoo!, LinkedIn, Facebook? - 3-7 compressed Mb+ JS code. -- - Updated *very* often. - Sometimes every 5 minutes. -- - Facebook: 500-900ms just *parsing* JavaScript - Chrome & Firefox. -- - And things are only getting worse. ---  --- ## Why is JavaScript loading so slow? --- ### Loading JavaScript 1. Download source file. 1. Decompress source file. 1. Convert encoding. 1. Tokenize. 1. Parse. 1. Generate Bytecode. 1. Start execution. ---  --- ### Step 1: Get the code * Download and decompress source file. * Convert encoding. (typically two/three successive tasks) --- ### Step 2. Tokenize the text (1) ```js function foo(x) { return y; } ``` .center[⇓] * `Token.FunctionKeyword` * `Token.Identifier(foo)` * `Token.LPar` * `Token.Identifier(x)` * `Token.RPar` * `Token.RBrace` * ... --- ### Step 2. Tokenize the text (2) Exercises: * Is `for` an identifier or a keyword? -- * What does `/` mean? -- * Is `"use strict"` a string or a directive? -- * How can I store my string efficiently? -- * ... -- **Hint** it depends. -- Yeah, tokenization is hard. .mini[] --- ### Step 3. Parse the tokens (1) `[FunctionKeyword, Identifier(foo), ...]` -- .center[⇓] -- ```yaml FunctionDeclaration: isAsync: false isGenerator: false scope: ... name: BindingIdentifier: name: "foo" params: - FormalParameters: items: - BindingIdentifier: name: "x" rest: null body: ... ``` --- ### Step 3. Parse the tokens (2) Exercise: ```javascript var x = 10; function foo(isReady) { if (isReady) { return x + 10; } // ... } ``` What is the return of `foo(true)`? -- **Hint** it depends. --- ### Step 3. Parse the tokens (3) Parsing JS is hard: ```js var x = 10; function foo(isReady) { if (isReady) { return x + 10; } } console.log(foo(true)); // `20` var x = 10; function foo(isReady) { if (isReady) { return x + 10; } var x = 10; } console.log(foo(true)); // `NaN` ``` --- ### Step 3. Parse the tokens (4) * ...handle variables .mini[]; -- * ...handle `this` .mini[]; -- * ...handle `eval` .mini[]; -- * ...handle `with` .mini[]; -- * ...handle `"use strict"` .mini[]; -- * also, you can't skip anything; --- ### Step 4. Wait, there's more! * Perform safety-checks on the code. * Generate browser-specific bytecode. * Execute! --- ## So what can we make faster? 1. Download source file. 1. Decompress source file. 1. Convert encoding. 1. Tokenize. 1. Parse. 1. Perform safety-checks. 1. Generate Bytecode. 1. Start execution. --- ### Things people have tried * Browser improvements * Lazy parsers (1) * Bytecode caching (2) * JavaScript frameworks/toolchains * Minimizers (1) * Lazy loaders (3) * Browser APIs * Wasm (3) * ServiceWorker loaders (3, 4) .small[ (1) May decrease global performance. (2) When the JS of the page doesn't change. (3) Requires rewrite. Might not help. (4) Hurts the web. ] --- ## Introducing the JavaScript Binary AST --- ### What is it? * A proposal for the JavaScript language, by Mozilla, Bloomberg, Facebook. * A new file format for JavaScript code. * Smaller than .js, much faster to parse. * **Not** uglified. * **Not** a bytecode. --- ### Recall parsing ```yaml FunctionDeclaration: isAsync: false isGenerator: false scope: ... name: BindingIdentifier: name: "foo" params: - FormalParameters: items: - BindingIdentifier: name: "x" rest: null body: ... ``` That's an Abstract Syntax Tree (AST). --- ### Storing the AST ```js 1: "foo", [ /*FunctionDeclaration*/ 42, /*isAsync*/0, /*isGenerator*/0, ..., /*name*//*BindingIdentifier*/ 43, /*foo*/1, ... ] ``` (except compressed) --- ## What's the point? --- ### Things the browser needs to do. 1. Download binjs file. 1. Tokenize + Parse + Check **only what you use**. 1. Generate Bytecode. 1. Start execution. --- ### What is faster? 1. Smaller file (WIP). 1. Tokenization + Parsing can start earlier. 1. Trivial tokenization. 1. Format is **proof-carrying**. * Hard cases are already processed. * Checks are fast. 1. Parse **only** the code you execute, when needed. 1. Parse strings, identifiers, ... only **once**. 1. More opportunities for concurrency. --- ## Is there time for a demo? --- ## Calendar --- ### Early 2017 * Status: Proof of concept. * Security: None. * Speed: Extreme improvements (1). * File size: Pretty good improvements (1). * Standardization: Ecma step 1. (1) No hard numbers because it's too early to make promises. --- ### Early 2018 * Status: Prototype 3. * Security: As much as JavaScript, just easier to check. * Reference implementation: ~90%. * Firefox implementation: ~30% * Speed: Not measured yet. * File size: Improvements... but we want more. * Specifications: High-level specifications ~90%. --- ### Hopefully late summer 2018 * Experimental deployment on a few major websites. * Firefox implementation for opt-in users. * Discussion with other browser vendors on further optimizations. -- * Your help welcome :) * https://github.com/binast/binjs-ref * https://binast.github.io/ecmascript-binary-ast/ --- ## Thanks for listening.  [.github[]](https://github.com/Yoric/Fosdem-2018) ### Any questions?