Content integrity in LibResilient

January 18, 2022

So far most of LibResilient development was focused on proving the concept and implementing different content fetching plugins. After the project got a small NGI Assure grant, the focus for the previous milestone was instead making the project itself more, well, resilient.

Today another milestone was completed, focusing on integrity of content fetched via LibResilient, and thus on the security of websites deploying it.

The problem

On a very basic level, LibResilient’s job is fetching website content from places other than the original domain of that website.

This can mean alternative endpoints controlled by the website owners (say, on secondary domains, or just IP addresses), or it can mean third-party endpoints like IPFS gateways, Tor2web proxies, or any location where the website’s operator can upload website content, and from which that content can then be fetched.

This, however, creates a problem — operators of such third party services effectively get the ability to modify the content (accidentally, or… maliciously).

Ensuring content integrity

The solution is to verify content integrity. We can leverage the Subresource Integrity (SRI) feature of modern browsers — but that has several downsides:

it works only when integrity data is available at the time of a request;
setting it in HTML is quite unwieldy and effectively impractical if we want to use it for all content and assets;
the integrity attribute is only defined for <script> and <link> elements.

Thankfully, it turns out integrity data can be provided for fetch requests in JavaScript for any content type, and once provided the browser will do the heavy lifting of verifying it!

That also means we can have plugins that provide that integrity data: for example, directly from the config, or through an somewhat elaborate but considerably more flexible process of fetching a signed file with the relevant integrity data for each request separately.

Finally, for transport plugins that do not rely on the Fetch API, and thus do not benefit from the browser checking if integrity data matches fetched content, we now have a wrapper plugin that explicitly checks integrity of any resource, using the SubtleCrypto API.

Work done in this milestone

Let’s break down specific work done in this latest milestone.

1. Making sure LibResilient supports Subresource Integrity (SRI) fully:

This meant, first and foremost, identifying how SRI should be supported in LibResilient. Options included supporting it directly in the service worker code, or bubbling it down to the plugins. In the end, the latter approach was elected as more flexible.

Then, identifying places in service-worker.js and plugin code where SRI was not being correctly handled, and fixing that.

Some research was also necessary to establish if SRI can be set (and if it is enforced by the JS Fetch API) for resources other than scripts and CSS.

Once we had the basic SRI compatibility ensured, it was possible to write SRI- related wrapper plugins.

The first, basic-integrity, makes it possible to statically configure integrity data for specific URLs.

It doesn’t check the integrity itself, just makes sure that integrity data configured for a given URL is added to the request data when the URL is being fetched by LibResilient. Actual verification is assumed to be done by any plugin wrapped by it.

Secondly, integrity-check wrapper plugin uses the SubtleCrypto API to implement integrity check directly in JS.

This makes it possible to check integrity (if present in the request being handled) of content fetched by transport plugins that do not guarantee integrity will be checked by the browser — such as any plugin not using the Fetch API.

Finally, the signed-integrity plugin is a proof-of-concept demonstrating how SRI could be used in LibResilient for sites that are not completely static. For each content URL being fetched it first fetches integrity data from an URL built by appending .integrity to the content URL, expecting a JSON Web Token.

That JWT’s signature is verified using a pre-configured public key (assumption being that it was signed with a related private key on the server). JWT’s payload should contain an “integrity” field, which is then used to set the SRI data on the request being handled.

The plugin itself does not check integrity, it is assumed that the wrapped plugin will do that check.

By combining these plugins (for example, signed-integrity to retrieve integrity data for content, wrapping the integrity-check plugin that actually verifies integrity of content fetched by a transport plugin wrapped by it in turn) it is possible to provide SRI for transport plugins not built around the Fetch API.

3. Documentation

https://gitlab.com/rysiekpl/libresilient/-/blob/master/docs/CONTENT_INTEGRITY.md

A document on content integrity in the context of LibResilient was also created. It discusses SRI, different available plugins, pros and cons of different approaches to content integrity when using LibResilient, and mentions possible future developments.

Code quality and such

Code written for this milestone is of course covered by tests, and so overall test coverage for the project went up to ~60%.

All of the functionality in this milestone was implemented without any external dependencies (npm and package.json are only used for running the unit tests, and have nothing to do with LibResilient’s browser-side code). The aim remains for LibResilient to be deployable by simply copying a few JS files over to the directory from which a website is served. No dependency hell, no bundling, no stress.

Next steps

Work has already started on the next milestone, focusing on being able to deploy LibResilient configuration changes even when the original website is not available.