resilient.is

More funding, new milestones

2023-04-12T00:00:00+00:00

Good news: NLnet Foundation decided to extend their small grant for LibResilient! That grant funded bulk of the work over the last year and a half. It will now continue to do so for the next year or so.

The grant extension also helped to define the milestones for 2023. Here they are, in no particular order — the order they get implemented depends on many factors, including some non-obvious interplay between them. Some of these milestones are pretty simple, some will require substantial re-writes. Exciting times ahead!

Improved user experience

LibResilient needs a “still loading” screen to be displayed when loading HTML resources over slow transports. For example, retrieving content from IPFS can take upwards of 20s to load sometimes. Currently the user experience here is lacking: depending on browser timeout defaults, the page fails to load, showing an obscure error, and then suddenly loads and displays content.

Also, LibResilient currently kicks-in on the second request to a site that uses it, as that’s when the ServiceWorker gets actually loaded. This could be improved by using Clients.claim().

Plan

implement a “still loading” screen; research questions:
- will the return temporary HTML content need to explicitly redirect?
- or will the browser redirect by itself?
- which HTTP codes to use?
test Clients.claim() use and implement if makes sense

MIME-type deduction

When retrieving content using certain transports (for example IPFS), MIME-type information is not available. Currently, plugins that face this issue naïvely try to guess the MIME-type based on file extension. What is needed is a facility to reliably establish MIME-types of requested content, made available by the ServiceWorker itself to all plugins that need it.

Plan

implement MIME-type deduction directly in the ServiceWorker code
- research if it would be possible to achieve this by looking at actual content, using some MIME-sniffing JS library
define an internal API for plugins to use when they need it
update plugins which currently use the naïve approach to use the new internal API
document all of this properly

The interplay between LibResilient, CORB, CORS, and CSP is not well documented.

Documenting it well requires substantial testing on small but purpose-built infrastructure.

Plan

design and deploy infrastructure for testing CORB/CORS/CSP in the context of LibResilient
test different settings and edge-cases
document best practices and potential pitfalls

The assumption is that cookies and other credentials should not be exposed to alternative transports (for example), but this needs strict testing and documentation. Perhaps this should also be configurable.

Plan

design and deploy infrastructure for testing this
test settings, assumptions, and edge-cases
document best practices and potential pitfalls
potentially (if deemed useful) implement configuration options for sending cookies/credentials when retrieving content via alternative transports

Rewriting tests for Deno

Currently LibResilient uses Jest and Node.js for tests of the browser-side code, and Deno for the CLI and CLI-related tests. This makes maintenance difficult. Deno is a much better choice as it implements WebAPIs natively. So, browser-side code tests should be re-written for Deno, and Node.js dependency completely removed from te project.

Plan

rewrite browser-side code tests for Deno
make sure all integrations with CI/CD set-up work

Improved error handling

LibResilient has to handle request errors better. It needs to be smart about displaying the original 404 page from the original domain, and otherwise needs to show some form of a plugin call stack or other explanation if a request failed. Perhaps a “development” mode should be implemented. Currently when a request fails (for example, due to 404 error, or because integrity check fails, or…), a browser-internal “request failed” page is displayed to the user. This is not very helpful when debugging issues, and at the same time this is confusing to users.

From the developer (and user) perspective, it’s difficult to figure out what went wrong when a resource is not successfully fetched, and if the failure is related to LibResilient or not. This is especially problematic when fetching resources that are protected by subresource integrity.

This will require substantial rewrites of crucial pieces of the ServiceWorker and plugins.

Plan

design the error handling system for LibResilient
- plugins throw standardized JS errors or otherwise return standardized error responses
- ServiceWorker handles them and potentially translates them to HTML error pages to display to the user
implement it in ServiceWorker
implement it in plugins
consider implementing “developer mode”
document the error handling system for LibResilient
document: best practices regarding when what kind of errors should be thrown by plugins
document: how the ServiceWorker handles those errors

Improved `config.json` handling when fetched via alternative channels

When loading config.json via alternative transports, if config.json is broken but the original website is not available, a LibResilient-enabled site might end up in a broken state

LibResilient should verify that a newly loaded config is valid broken (say, by deploying the new configuration and attempting to load the config.json file it just loaded), and reverting to the previous, clearly working config otherwise.

Plan

test possible failure modes of config.json loaded via alternative transports
improve handling of broken/problematic config.json
- consider implementing deep plugin check
update documentation

Improving experience for site administrators

There are several “papercut” issues in LibResilient and its plugins that are too small to be considered a separate project plan items, but are nonetheless important to fix.

Plan

basic-integrity plugin: how to treat URIs without domain name?
ServiceWorker: deterministically sorting query parameters
create and test code to remove the service worker

DNSlink-based plugins: DNS-over-HTTPS without JSON

DNSlink-based plugins as currently implemented can only use DNS-over-HTTPS servers that offer JSON endpoints. Implementing pure DNS-over-HTTPS would greatly (several orders of magnitude) improve the number of DoH servers DNSlink-based plugins can use.

Plan

implement pure DoH in DNSlink-based plugins
- dnslink-ipfs
- dnslink-fetch
update documentation

Documentation is now more usable

2023-01-28T00:00:00+00:00

LibResilient’s documentation used to be just a bunch of Markdown files spread across the repository, available through the Gitlab-hosted project. That was fine as the first jab at making it available, but it was far from perfect: for example, it was kind of difficult to discover and browse plugin documentation.

Now, documentation (both general docs, and per-plugin documentaion) is available directly on the website. Still not perfect, but considerably better nonetheless. Importantly, plugins documentation is also gathered in a single place.

What’s in the docs

Documentation is divided into two parts:

This should make the information there more easily available and discoverable. There is no search (yet?), as the size of project’s documentation is still small and arguably manageable with the help of a decent index pages.

General documentation

These are documentation resources discussing LibResilient generally, or providing step-by-step guides on deployment.

This includes a high-level overview of the philosophy guiding the project, and an (still not entirely complete) description of its architecture. There is also an extensive Frequently Asked Questions section, diving into things like interactions with web analytics systems and admin panels, how does LibResilient handle interactivity on a website, and a deep-dive into Service Workers as used by LibResilient.

The technical, step-by-step guides include the Quickstart guide and the example deployment document. There are also technical write-ups focusing on specific features of LibResilient: its ability to update configuration even during disruption and on ensuring security and content integrity — a topic particularly important when using third-party-run alternative endpoints.

Docs

Plugins documentation

This section contains documentation of every plugin available in LibResilient’s main code tree.

How extensively a plugin is documented differs between plugins. Some, like dnslink-fetch offer a reasonably good write-up. Others, like gun-ipfs. Some plugins are considered stable or late beta, some are broken and need to be re-written. This is clearly expressed on the plugin overview page:

Plugins

Improving the documentation

It’s important to recognize that good documentation is crucial to adoption. Quite a lot of work went into improving LibResilient’s documentation situation, but by no means is it done and perfect. If you’d like to get involved in helping out with this — or with any other aspect of LibResilient — check out the code in GitLab.

General documentation available on this website is built from the content in the /docs/ directory in the repository. Plugin documentation is built from README.md files in each individual plugin’s directory. And if you’re wondering how that’s achieved, the absolutely horrendously ugly code for that is here. Actual LibResilient code is much cleaner, pinky promise!

LibResilient CLI

2023-01-15T00:00:00+00:00

LibResilient now has a large number of transport plugins, offering a lot of flexibility for website administrators. Some of these plugins rely on more than one way of making information available — for example dnslink-ipfs expects content to be published on IPFS, and the latest IPFS address to then be pushed to DNS.

How to push this information and data out was so far left to website administrators, creating a relatively large obstacle to LibResilient adoption. Now this might gradually start getting solved, thanks to lrcli, the LibResilient CLI.

The Problem

The big problem with creating a consistent CLI for a tool like LibResilient is that basically all relevant functionality is related to specific LibResilient plugins. Additionally, at least in case of some plugins there is more than one way to push the information where it needs to be for LibResilient to be able to make use of it.

Consider the alt-fetch plugin, which allows LibResilient to fetch content from alternative HTTPS endpoints. Can the lrcli make assumptions regarding how content should be pushed to them? Should FTP or SFTP be used? Or maybe some REST API needs to be used instead and PUT HTTPS requests need to be issued? Or perhaps it’s some kind of proprietary service that requires a specific proprietary protocol?

There is a growing number of plugins that might need some CLI functionality to make Libresilient easy to deploy when using them. And in case of many if not most of these plugins there are simply too many possible ways of pushing the information out for there to be a general CLI that implements all of them.

The Approach

The plugin-based approach seems to work well for LibResilient itself. It might work well for the CLI itself, then, as well. After all, the author of a plugin probably knows best what kind of tools might a website administrator need to properly push the content and any necessary additional information out for the plugin to be able to make use of it.

LibResilient CLI is built around a simple plugin architecture. It assumes a cli.js file in plugin’s main directory. The file should be a valid Deno module (lrcli is written for Deno JS runtime), and export an object that defines the name, description, version, and actions implemented by the plugin. Based on that, CLI knows how to run specific actions and interpret relevant command line arguments.

Here is a simple example for the basic-integrity plugin.

The Plan

With time, more plugins will gain a CLI component. For some of them — like the basic-integrity or signed-integrity, which already have it — CLI’s role is going to be limited to generate data locally, for use with other tools in the publishing pipeline. For other plugins — for example, IPFS-based transport plugins — it makes sense to implement actions that push content out, actually publishing it.

And in some cases, this will remain somewhat complicated. There are simply too many ways to push out content to a simple HTTPS endpoint, that are often also very specific to a given website, for them to all be implementable in a single LibResilient CLI plugin. Same is probably true for pushing out DNS updates required for DNSLink-based plugins. In such cases, most broadly used mechanisms will probably be implemented (FTP/SFTP for fetch-based plugins? DNS UPDATE for DNSLink-based plugins?), but anything fancier than that will have to be left to the website admin, who knows their infrastructure and how to distribute content on it.

Example Usage

When run, lrcli expects the name of the plugin to load, and tries to be helpful in guiding the user in how its usage:

$ cli/lrcli.js 

Command-line interface for LibResilient.

This script creates a common interface to CLI actions implemented by LibResilient plugins.

Usage:
    lrcli.js [options] [plugin-name [plugin-options]]

Options:

    -h, --help [plugin-name]
        Print this message, if no plugin-name is given.
        If plugin-name is provided, print usage information of that plugin.

Plugin names are assumed to be sub-directories under plugins/ directory in LibResilient’s code directory:

$ cli/lrcli.js no-such-plugin

*** TypeError: Module not found "file:///home/user/Projects/libresilient/plugins/no-such-plugin/cli.js". ***

If plugin exists, usage information can be printed, based on data exported by the plugin’s cli.js module:

$ cli/lrcli.js basic-integrity

*** No action specified for plugin ***

CLI plugin:
    basic-integrity

Plugin Description:
    Verifying subresource integrity for resources fetched by other plugins.
    CLI used to generate subresource integrity hashes for provided files.
    
Usage:
    lrcli.js [general-options] basic-integrity [plugin-action [action-options]]

General Options:

    -h, --help [plugin-name]
        Print this message, if no plugin-name is given.
        If plugin-name is provided, print usage information of that plugin.

Actions and Action Options:

    get-integrity [options...] 
        calculate subresource integrity hashes for provided files

         
            paths of files to be processed

        --algorithm (default: SHA-256)
            SubtleCrypto.digest-compatible algorithm names to use when calculating digests (default: "SHA-256")

        --output (default: json)
            a string, defining output mode ('json' or 'text'; 'json' is default)

The plugin controls its output, but a good practice is to provide support at least for json and text when useful data is returned, to simplify integration into any other tools the website admin chooses to use in their deployment pipeline:

$ cli/lrcli.js basic-integrity get-integrity libresilient.js 
{"libresilient.js":["sha256-UrkUn2KwKBQ93jS/pSd3Kt0/+9XkDT6Rj93jec/lOZY="]}

$ cli/lrcli.js basic-integrity get-integrity libresilient.js --output text
libresilient.js: sha256-UrkUn2KwKBQ93jS/pSd3Kt0/+9XkDT6Rj93jec/lOZY=

New DNSLink-based transport plugins

2023-01-08T00:00:00+00:00

After a long break (perhaps a bit too long, in fact), LibResilient is back in active development. As part of that, a two new transport plugins got implemented:

They fetch content using means that have been employed by LibResilient plugins before, but use DNSLink to figure out where to fetch content from.

What is DNSLink

DNSLink is a standard for storing information on where a content related to a given domain can be found directly in DNS, using TXT records.

Let’s say you are running a website at https://example.org and want to provide information on where relevant content can be found in case, for example, the main site goes down. You could put this in some place on the site itself, but this creates a chicken-and-egg problem: information on where to get content if the site is not available is only accessible as long as the site is available.

Instead, you could use DNSLink and put that information directly in DNS. To do that you would create TXT records for _dnslink.example.org label, like so:

_dnslink.example.org.  60      IN      TXT     "dnslink=/ipfs/Qm..."
_dnslink.example.org.  60      IN      TXT     "dnslink=/https/gateway.ipfs.io/ipfs/Qm..."
_dnslink.example.org.  60      IN      TXT     "dnslink=/https/example.com/"

Software that understands DNSLink would take this to mean that content for example.org is also available directly on IPFS, or via HTTPS on specific IPFS gateways, or via HTTPS on an alternative endpoint (in this case, example.com). As long as DNS remains available, if client software understands DNSLink and supports the relevant transport protocols, the content could be retrieved even if https://example.com site itself is down.

This is where the new LibResilient plugins come in.

New plugins

With the new plugins, LibResilient turns any modern browser into client software that understands DNSLink and can retrieve content related to a website that happens to be down (as long as that particular visitor had visited that site once before and Service Worker got loaded).

The first of the two, dnslink-fetch, is very similar to the alt-fetch plugin: given alternative endpoints, it performs HTTP fetch() requests to them to pull relevant content. But instead of requiring the endpoints to be configured explicitly in the config.json configuration file (and thus be somewhat inflexible), it pulls the endpoints from DNS, in accordance to the DNSLink standard.

The second plugin, dnslink-ipfs, uses DNSLink to figure out which IPFS CID to use when fetching the content from IPFS. This is necessary because IPFS uses content-addressing: when content itself changes, the address will change too. When using IPFS for content retrieval, the address of the current, up-to-date version of content must first be known. DNSLink provides a good way of making that information available, and the plugin leans on that.

Updating DNS…

…is not, however, LibResilient’s responsibility. So, if you want to use these DNSLink-based plugins, you will need to separately implement some way of updating relevant TXT records when content gets modified or new content gets published. This can be done via your DNS hosting provider’s APIs, using the DNS UPDATE query if your DNS nameservers support that, or some other means.

At least for now.

Some work has started on implementing a command-line tool that would simplify deployment of websites that use LibResilient, so perhaps one day this will be handled by the LibResilient CLI. Stay tuned!

Configuration updates during disruption and outage

2022-02-09T00:00:00+00:00

Imagine this: you’re running a reasonably important site, and you decided to deploy LibResilient to make it, well, resilient.

Your config sets up the fetch plugin, then local cache, and then alt-fetch with some nice independent endpoints (say, an IPFS gateway here, an Tor Onion gateway there). Perhaps with some content integrity checking deployed too, for a peace of mind (no need to completely trust those third party gateway operators, after all).

Obviously you run your own IPFS node and a Tor Hidden Service for these to work correctly, but your website visitors do not need any special software, extensions, or configuration — they just visit your site in their regular browser; LibResilient handles everything else behind the scenes.

Then you experience an outage

Maybe your server keeled over, or maybe it’s a DDoS.

Good news is: for all visitors who had visited your site before, everything seems to work just fine (if perhaps a tiny bit slower than normally). Your website content is cached on IPFS nodes, and IPFS gateways are happily serving the requests LibResilient is sending their way. Local cache makes the experience quite seamless for content those visitors had viewed before, and the IPFS-related slowdown for content they have not is still small.

For whatever reason, however, you figure out the outage will last a bit longer, and you’d like to swap out the fetch pluging completely (no reason for visitors to wait for something that isn’t going to work anyway for the time being). You’d perhaps also want to remove the Tor Onion gateways from the alt-fetch endpoints — after all your Tor Hidden Service is down as well.

Bad news is: LibResilient’s config is a JavaScript file imported directly in the Service Worker, you have no way to update it until your site comes back up.

But now you actually do!

This is what this milestone (third one supported by a small grant from NGI Assure) was all about.

To make config updates possible during disruption and outage, the config format needed to be changed (JSON was the obvious choice), and then the whole machinery of verifying, loading, and caching it needed to be implemented.

And so now, the config file (config.json) is just a regular file that can be retrieved via any configured plugins. You don’t have to do anything special for this to work.

Let’s dive deeper into what exactly has been done this month.

1. Switching to a JSON config file (instead of JS)

This was done because the ServiceWorkers API does not provide a way to update JavaScript scripts that were imported into the Service Worker via importScripts() call, in any other way that via a direct HTTPS fetch() to the original website.

For obvious reasons that’s unworkable for updating the config during disruption/outage.

A bunch of research was required, as expected. In the end, LibResilient needed to have a roughly full implementation of what the browser does to scripts imported via importScripts(): fetching config.json, caching it, and establishing it as stale so that it can be re-fetched.

Additional benefit of this is that the config file is now not code, it’s an “inert” format (JSON). It is no longer possible to include running code directly in the configuration file. This is important for various reasons that the LANGSEC community explores at length.

This work also included implementing validity checks on the config file — something that was not really possible when config was written in directly-loaded JavaScript.

Implementing this change required me finally diving deep into the ServiceWorker lifecycle, especially the parts of it that are mostly glossed over or not mentioned at all in most documentation: what exactly happens when a ServiceWorker has been registered and installed, but is now stopped, and is being restarted?

This research was crucial to implementing the JSON config change correctly, and provided important insight that will potentially be very useful for implementing future improvements.

2. Implementing a way for config to be updated and applied during disruption

https://gitlab.com/rysiekpl/libresilient/-/issues/30

Once the JSON config change was implemented, it was possible to implement background fetching of the updated config.json file.

This required cleaning up and refactoring code implementing JSON config support, and deciding what criteria to use when establishing if a cached config.json is “stale” (currently: over 24 hours old, based on the Date: header on the cached response).

The biggest issue was figuring out what should happen if the freshly retrieved config file configures plugins that have not been loaded upon Service Worker installation. Because an updated config.json file is processed after Service Worker restart (so, not during installation), importScripts() is not available.

A decision was made to test for such such config changes and reject such a config file outright, falling back to the already cached, if stale, config.json, if the updated file was not retrieved using a regular fetch.

The rationale for this is that in such circumstances:

the original website cannot be assumed to be working correctly, as the config.json file was retrieved using an alternative transport;
the currently deployed (if stale) config.json is functional, as we were, in fact, able to retrieve the updated config.json.

Ideas for potential further improvements to this are listed here.

3. Documentation

https://gitlab.com/rysiekpl/libresilient/-/blob/master/docs/UPDATING_DURING_DISRUPTION.md

Documentation was written on how JSON config loading and updating works, and how the config can be updated during disruption or outage. It explains the rationale behind implementation decisions and their technological context.

There is obviously more work needed to make this documentation more useful and readable. But it’s a start.

Maintaining quality of code

Code written for this milestone is of course covered by tests; overall test coverage went up to ~62%.

As before, I have avoided any external dependencies what-so-ever. LibResilient remains easily deployable by simply copying a few JS files (and now, a single JSON file) and adding a single line to your HTML.

And the next milestone is…

There are four milestones on the todo list. Unclear which one I will focus on next, but that should be resolved soon. Keep an eye on the issues assigned to those milestones if you want to be the first to know!

Content integrity in LibResilient

2022-01-18T00:00:00+00:00

So far most of LibResilient development was focused on proving the concept and implementing different content fetching plugins. After the project got a small NGI Assure grant, the focus for the previous milestone was instead making the project itself more, well, resilient.

Today another milestone was completed, focusing on integrity of content fetched via LibResilient, and thus on the security of websites deploying it.

The problem

On a very basic level, LibResilient’s job is fetching website content from places other than the original domain of that website.

This can mean alternative endpoints controlled by the website owners (say, on secondary domains, or just IP addresses), or it can mean third-party endpoints like IPFS gateways, Tor2web proxies, or any location where the website’s operator can upload website content, and from which that content can then be fetched.

This, however, creates a problem — operators of such third party services effectively get the ability to modify the content (accidentally, or… maliciously).

Ensuring content integrity

The solution is to verify content integrity. We can leverage the Subresource Integrity (SRI) feature of modern browsers — but that has several downsides:

it works only when integrity data is available at the time of a request;
setting it in HTML is quite unwieldy and effectively impractical if we want to use it for all content and assets;
the integrity attribute is only defined for