首先要理解的是 DOM 是 API,是一组无关编程语言的接口(Interfaces)而非实现(Implementation)。前端平时常说的 DOM 其实只是浏览器通过 ECMAScript(JavaScript)对 DOM 接口的一种实现。
其次要知道的是,DOM 既是为 HTML 制定的,也是为 XML 制定的。而两者各有一些特异的部分,所以作为 DOM 标准基石的 DOM Level 1 其实分为 Core 与 HTML 两个部分。Core 定义了 fundamental interfaces 与 extended interfaces,分别是共用的基础接口与 「XML 拓展包」,而 HTML 部分则全都是「HTML 拓展包」。题主所问到的 Document 接口被定义在 Core 的 fundamental interfaces 中,而 HTMLDocument 接口则定义在 HTML 部分中,且「接口继承」于 Document。
SW-Precacheis a great Service Worker tool from Google. It is a node module designed to beintegratedinto your build process and to generate a service worker for you.Thoughyou can use sw-precache out of the box, you might still wonder what happens under the hood. There you go, this article is written for you!
sw-precache.js is the main entry of the module. It reads the configuration, processes parameters, populates the service-worker.tmpl template and writes the result into specified file. Andfunctions.js is just a module containing bunch of external functions which would be all injected into the generated service worker file as helpers.
Since the end effect of sw-precache is performed by the generated service worker file in the runtime, a easy way to get an idea of what happens is by checking out source code inside service-worker.tmpl . It’s not hard to understand the essentials and I will help you.
Initialization
The generated service worker file (let’s call it sw.js for instance) get configuration by text interpolation when sw-precache.js populating service-worker.tmpl .
1 2 3 4 5 6 7 8
// service-worker.tmpl var precacheConfig = <%= precacheConfig %>;
It’s not difficult to see that it’s a list of relative urls and MD5 hashes. In fact, one thing that sw-precache.js do in the build time is to calculate hash of each file that it asked to “precache” from staticFileGlobs parameter.
In sw.js, precacheConfig would be transformed into a ES6 Map with structure Map {absoluteUrl => cacheKey} as below. Noticed that I omit the origin part (e.g. http://localhost) for short.
Instead of using raw URL as the cache key, sw-precache append a _sw-precache=[hash] to the end of each URL when populating, updating its cache and even fetching these subresouces. Those _sw-precache=[hash] are what we called cache-busting parameter*. It can prevent service worker from responding and caching out-of-date responses found in browsers’ HTTP cache indefinitely.
Because each build would re-calculate hashes and re-generate a new sw.js with new precacheConfig containing those new hashes, sw.js can now determine the version of each subresources thus decide what part of its cache needs a update. This is pretty similar with what we commonly do when realizing long-term caching with webpack or gulp-rev, to do a byte-diff ahead of runtime.
*: Developer can opt out this behaviour with dontCacheBustUrlsMatching option if they set HTTP caching headers right. More details on Jake’s Post.
On Install
ServiceWorker gives you an install event. You can use this to get stuff ready, stuff that must be ready before you handle other events.
During the install lifecycle, sw.js open the cache and get started to populate its cache. One cool thing that it does for you is its incremental update mechanism.
Sw-precache would search each cache key (the values of urlsToCacheKeys) in the cachedUrls, a ES6 Set containing URLs of all requests indexed from current version of cache, and only fetch and cache.put resources couldn’t be found in cache, i.e, never be cached before, thus reuse cached resources as much as possible.
If you can not fully understand it, don’t worry. We will recap it later, now let’s move on.
On Activate
Once a new ServiceWorker has installed & a previous version isn’t being used, the new one activates, and you get an activate event. Because the old version is out of the way, it’s a good time to handle schema migrations in IndexedDB and also delete unused caches.
During activation phase, sw.js would compare all existing requests in the cache, named existingRequests (noticed that it now contains resources just cached on installation phase) with setOfExpectedUrls, a ES6 Set from the values of urlsToCacheKeys. And delete any requests not matching from cache.
Although the comments in source code have elaborated everything well, I wanna highlight some points during the request intercepting duration.
Should Respond?
Firstly, we need to determine whether this request was included in our “pre-caching list”. If it was, this request should have been pre-fetched and pre-cached thus we can respond it directly from cache.
1 2 3
// sw.js* var url = event.request.url shouldRespond = urlsToCacheKeys.has(url);
Noticed that we are matching raw URLs (e.g. http://localhost/js/a.js) instead of the hashed ones. It prevent us from calculating hashes at runtime, which would have a significant cost. And since we have kept the relationship in urlToCacheKeys it’s easy to index the hashed one out.
* In real cases, sw-precache would take ignoreUrlParametersMatching and directoryIndex options into consideration.
Navigation Fallback
One interesting feature that sw-precache provided is navigationFallback(previously defaultRoute), which detect navigation request and respond a preset fallback HTML document when the URL of navigation request did not exist in urlsToCacheKeys.
It is presented for SPA using History API based routing, allowing responding arbitrary URLs with one single HTML entry defined in navigationFallback, kinda reimplementing a Nginx rewrite in service worker*. Do noticed that service worker only intercept document (navigation request) inside its scope (and any resources referenced in those documents of course). So navigation towards outside scope would not be effected.
* navigateFallbackWhitelist can be provided to limit the “rewrite” scope.
Respond from Cache
Finally, we get the appropriate cache key (the hashed URL) by raw URL with urlsToCacheKeys and invoke event.respondWith() to respond requests from cache directly. Done!
* The code was “ES6-fied” with error handling part removed.
Cache Management Recap
That’s recap the cache management part with a full lifecycle simulation.
The first build
Supposed we are in the very first load, the cachedUrls would be a empty set thus all subresources listed to be pre-cached would be fetched and put into cache on SW install time.
// SW Network Logs [sw] GET a.js?_sw-precache=3cb4f0 [sw] GET b.css?_sw-precache=c5a951
After that, it will start to control the page immediately because the sw.js would call clients.claim() by default. It means the sw.js will start to intercept and try to serve future fetches from caches, so it’s good for performance.
In the second load, all subresouces have been cached and will be served directly from cache. So none requests are sent from sw.js.
Once we create a byte-diff of our subresouces (e.g., we modify a.js to a new version with hash value d6420f) and re-run the build process, a new version of sw.js would be also generated.
The new sw.js would run alongside with the existing one, and start its own installation phase.
// SW Network Logs [sw] GET a.js?_sw-precache=d6420f
This time, sw.js see that there is a new version of a.js requested, so it fetch /js/a.js?_sw-precache=d6420f and put the response into cache. In fact, we have two versions of a.js in cache at the same time in this moment.
1 2 3 4
// what's in cache? http.../js/a.js?_sw-precache=3cb4f0 http.../js/a.js?_sw-precache=d6420f http.../css/b.css?_sw-precache=c5a951
By default, sw.js generated by sw-precache would call self.skipWaiting so it would take over the page and move onto activating phase immediately.
// the one deleted http.../js/a.js?_sw-precache=3cb4f0
By comparing existing requests in the cache with set of expected ones, the old version of a.js would be deleted from cache. This ensure there is only one version of our site’s resources each time.
That’s it! We finish the simulation successfully.
Conclusions
As its name implied, sw-precache is designed specifically for the needs of precaching some critical static resources. It only does one thing but does it well. I’d love to give you some opinionated suggestions but you decide whether your requirements suit it or not.
Precaching is NOT free
So don’t precached everything. Sw-precache use a “On Install — as a dependency” strategy for your precache configs. A huge list of requests would delay the time service worker finishing installing and, in addition, wastes users’ bandwidth and disk space.
For instance, if you wanna build a offline-capable blogs. You had better not include things like 'posts/*.html in staticFileGlobs. It would be a huge disaster to data-sensitive people if you have hundreds of posts. Use a Runtime Caching instead.
“App Shell”
A helpful analogy is to think of your App Shell as the code and resources that would be published to an app store for a native iOS or Android application.
Though I always consider that the term “App Shell” is too narrow to cover its actual usages now, It is widely used and commonly known. I personally prefer calling them “Web Installation Package” straightforward because they can be truly installed into users’ disks and our web app can boot up directly from them in any network environments. The only difference between “Web Installation Package” and iOS/Android App is that we need strive to limit it within a reasonable size.
Precaching is perfect for this kinda resources such as entry html, visual placeholders, offline pages etc., because they can be static in one version, small-sized, and most importantly, part of critical rendering path. We wanna put first meaningful paint ASAP to our user thus we precache them to eliminate HTTP roundtrip time.
BTW, if you are using HTML5 Application Cache before, sw-precache is really a perfect replacement because it can cover nearly all use cases the App Cache provide.
This is not the end
Sw-precache is just one of awesome tools that can help you build service worker. If you are planing to add some service worker power into your website, Don’t hesitate to checkout sw-toolbox, sw-helper (a new tool Google is working on) and many more from communities.
在不少人眼里,Flash 与 HTML5 是纯粹的竞争关系,我们应该为 HTML5 与 Open Web 标准的胜利欢呼,而将 Flash 狠狠的咒骂在黄泉之下。但其实,大多数人都忘记了,或是从不曾知道:HTML5(严谨的来说,其 marketing 含义中所涵盖的那些 Web APIs),有很大一部分正是 Flash 平台、Flash 社区对 web 标准做出的贡献。
Adobe has long played a leadership role in advancing interactivity and creative content – from video, to games and more – on the web. Where we’ve seen a need to push content and interactivity forward, we’ve innovated to meet those needs. Where a format didn’t exist, we invented one – such as with Flash and Shockwave. And over time, as the web evolved, these new formats were adopted by the community, in some cases formed the basis for open standards, and became an essential part of the web.
当我们(企业、用户)需要 web 平台承载包括视频、游戏在内的各种富交互内容而 web 平台本身还不具备这样的能力时,我们通过给予这个平台一种新的格式,以满足大家的需求,这就是 Flash Player,作为一种私有平台与浏览器插件,却能一度成为 web 事实标准的客观原因。
而时至今日,这些 web 平台所欠缺的能力,在得到市场与社区的认可之后,逐渐被从 Flash 中吸收与扬弃,成为了诸如 HTML5 Video/Audio/Canvas、WebGL 这些真正的 Open Web 标准。这时候,这些在诞生之初颇为创新的,作为了一种「过渡手段」、「Shim」的私有平台,便自然而然的,慢慢的不再被需要了。
Over the years, Flash has helped bring the Web to greatness with innovations in media and animation, which ultimately have been added to the core web platform.
Flash led the way on the web for rich content, gaming, animations, and media of all kinds, and inspired many of the current web standards powering HTML5.
上面的例子看上去是刻意使用「自我指涉」来进行反证的,但这只是为了证明方便。实际上,现实中与「停机问题」一样是现代计算机「不可解」的问题还有很多,比如所有「判断一个程序是否会在某输入下怎么样?」的算法、Hilbert 第十问题等等,wikipedia 甚至有一个 List of undecidable problems。
“Avoiding success at all cost” is the informal motto behinds Haskell. It could be parenthesized in two ways, either “Avoiding (success at all cost)” or “(Avoiding sucess) (at all cost)”.
I’m not going to interpret them directly but rather to share some thoughts on “the success vs. costs” basing on my very own understanding and experience.
The success vs. cost of language design
There’re always trade offs (or compromises) in any software design, and programming language design has no exceptions.
In other words, all language design decision that made them “successful” i.e. being popular and widely-used in industry or education for some reasons, all comes with their own “costs”: being unsafe, limited expressiveness, or having bad performance, etc.
Whether or not the “cost” is a problem really depends on scenarios, or their goals. For instances, Python/JavaScript are both very expressive and beginner-friendly by being dynamically-typed, sacrifing the type safety and performance. Java, in constrast, uses a much safer and optimization-friendly type system but being much less expressive. Another typicial comparison would be memory management in programming languages, where languages that are “managed” (by either ARC or Gabage Collector) could be much easier and safer (in terms of memory) for most programmers but also considerred slower than languages that are “closer to the metal”.
None of these “costs”, or “differences”, really prevent them from being immortally popular.
For Haskell, the story becomes quite different: being research-oriented means the goal of this language is to pursue some “ultimate” things: the “ultimate” simplicity of intermediate representation, the “ultimate” type system where safety and expressiveness can coexist, the “ultimate” compilation speed and runtime performance, the “ultimate” concise and elegant concrete syntax, the “ultimate”…I don’t know. But it has to be some “ultimate” things that is very difficult, probably endless and impossible, to achieve.
This, as a result, made all language decisions in Haskell became very hard and slow, because almost nothing can be scarified. That’s why Haskell insisted to be lazy to “guard” the purity regardless of some problems of being “call-by-need”; a decent IO mechanisms is missing in the first 4 yrs after the project’s start until P Walder found Monad; and the Type Class, which is first proposed in P Walder’s 1989 paper, spent yrs long to implement and popularize.
As a side note though, it doesn’t mean there is no compromise in Haskell at all. It’s just as minimized as it could be during its progress. When one audience asking why we have Haskell and OCaml, which’re quite similar in very high level, both survived, SPJ replies:
There’s just a different set of compromises.
The success vs. cost of language design process
Another common but extremely controversial (if not the most) topics of programming language design is about its design process: Would you prefer dictatorship or a committee (in other words, a dictatorship of many?)? Would you prefer being proprietary or standardized? In which form would you write the standards, in human nature language, pseudo code, or formal semantics? How many and how frequently breaking changes dare you make? Would you let open source community involve in?
Again, I think there is no THE answer for all those questions. Majority of popular programming languages came and are still on going with very different paths.
Python, whose creater, Guido van Rossum, known as the “Benevolent Dictator For Life” (BDFL), i.e. good kind of dictator, still play the central role (until July 2018) of the Python’s development after Python getting popular and adapt a open source and community-based development model. This factor direcly contribute to the fact that Python 3, as a breaking (not completely backward-compatible and not easy to port) but good (in terms of language design and consistency) revision of the language can still be landed, despite of many communities’ pressures. There’re many language (Ruby, Perl, Elm) also choose to follow this route.
JavaScript, widely known as being created by Brendan Eich in 10 days, in comparision, quickly involved into a committee (TC39) and standardized (ECMAScript) language due to both the open nature of the Web and fast adoption of itself. But Brendan, as the creater, wasn’t even powerful enough to push the committee landing ES4, which is also a breaking but much better revision, but ended up with the ES5 (Harmony), a backward-compatible, yet much less ambitious version due to many political “fights” between different parties (e.g. Mozilla, Microsoft, Yahoo etc.) thus the history wasn’t changed. Even the latest rising and yearly releasing of the “modern” JavaScript (ES6 or ES2015, 2016, 2017…) are mainly driven by the new generation of committee parties (+ Google, Facebook, Airbnb etc.) and still in a very open and standardized way.
As you can see here, even the history and progress of two rather similar languages can be so different, not to mention more proprietary languages such as Java from Sun/Oracle, C# from Microsoft, OC/Swift from Apple (though the latter was open sourced) or more academia and standardized language like SML and Scheme which both has a standard written in formal semantics.
So it’s not not obvious that Haskell, also chose its own unique process to suit its unique goal. Although it backs on academia, it chose a rather practical/less-formal approach to define the language, i.e. the compiler implementation over standardization (plus many “formal” fragments among papers though), which is more like C++/OCaml from this point of view. It has a committee, but instead of being very open and conservative, it’s more dictatorial (in terms of average users) and super aggressive in terms of making breaking changes. As a result however, it trained a group of very change-tolerant people in its community…All of these quirks and odds combined works very well and avoid the Haskell “becoming too success too quickly”.
End thoughts
To be fair, Haskell has alreay been very “successful” nowdays, in particular academia (for education, sexy type laboratory etc.) but also industry, either being used in real business or being very reputable among programmers (as being both hard and fun).
I am not confident and qualified to say Haskell is success in the right degree at the right time. But it’s great to see it, after more than 20 and now almost 30 yrs, slowly figure out its very own way, to “Escape from the Ivory Tower”, and keep going beyond.