opam

NOTE (September 2016): updated proposal from OCaml 2016 workshop is available, including links to prototype implementation.

This is an initial proposal on signing the OPAM repository. Comments and discussion are expected on the platform mailing-list.

The purpose of this proposal is to enable a secure distribution of OCaml packages. The package repository does not have to be trusted if package developers sign their releases.

Like Python's pip, Ruby's gems or more recently Haskell's hackage, we are going to implement a flavour of The Upgrade Framework (TUF). This is good because:

it has been designed by people who know the stuff much better than us
it is built upon a threat model including many kinds of attacks, and there are some non-obvious ones (see the specification, and below)
it has been thoroughly reviewed
following it may help us avoid a lot of mistakes

Importantly, it doesn't enforce any specific cryptography, allowing us to go with what we have at the moment in native OCaml, and evolve later, e.g. by allowing ed25519.

There are several differences between the goal of TUF and opam, namely TUF distributes a directory structure containing the code archive, whereas opam distributes metadata about OCaml packages. Opam uses git (and GitHub at the moment) as a first class citizen: new packages are submitted as pull requests by developers who already have a GitHub account.

Note that TUF specifies the signing hierarchy and the format to deliver and check signatures, but allows a lot of flexibility in how the original files are signed: we can have packages automatically signed on the official repository, or individually signed by developers. Or indeed allow both, depending on the package.

Below, we tried to explain the specifics of our implementation, and mostly the user and developer-visible changes. It should be understandable without prior knowledge of TUF.

We are inspired by Haskell's adjustments (and e2e) to TUF using a git repository for packages. A signed repository and signed packages are orthogonal. In this proposal, we aim for both, but will describe them independently.

Threat model

An attacker can compromise at least one of the package distribution system's online trusted keys.
An attacker compromising multiple keys may do so at once or over a period of time.
An attacker can respond to client requests (MITM or server compromise) during downloading of the repository, a package, and also while uploading a new package release.
An attacker knows of vulnerabilities in historical versions of one or more packages, but not in any current version (protecting against zero-day exploits is emphatically out-of-scope).
Offline keys are safe and securely stored.

An attacker is considered successful if they can cause a client to build and install (or leave installed) something other than the most up-to-date version of the software the client is updating. If the attacker is preventing the installation of updates, they want clients to not realize there is anything wrong.

Attacks

Arbitrary package: an attacker should not be able to provide a package they created in place of a package a user wants to install (via MITM during package upload, package download, or server compromise).
Rollback attacks: an attacker should not be able to trick clients into installing software that is older than that which the client previously knew to be available.
Indefinite freeze attacks: an attacker should not be able to respond to client requests with the same, outdated metadata without the client being aware of the problem.
Endless data attacks: an attacker should not be able to respond to client requests with huge amounts of data (extremely large files) that interfere with the client's system.
Slow retrieval attacks: an attacker should not be able to prevent clients from being aware of interference with receiving updates by responding to client requests so slowly that automated updates never complete.
Extraneous dependencies attacks: an attacker should not be able to cause clients to download or install software dependencies that are not the intended dependencies.
Mix-and-match attacks: an attacker should not be able to trick clients into using a combination of metadata that never existed together on the repository at the same time.
Malicious repository mirrors: should not be able to prevent updates from good mirrors.
Wrong developer attack: an attacker should not be able to upload a new version of a package for which they are not the real developer.

Trust

A difficult problem in a cryptosystem is key distribution. In TUF and this proposal, a set of root keys are distributed with opam. A threshold of these root keys needs to sign (transitively) all keys which are used to verify opam repository and its packages.

Root keys

The root of trust is stored in a set of root keys. In the case of the official opam OCaml repository, the public keys are to be stored in the opam source, allowing it to validate the whole trust chain. The private keys will be held by the opam and repository maintainers, and stored password-encrypted, securely offline, preferably on unplugged storage.

They are used to sign all the top-level keys, using a quorum. The quorum has several benefits:

the compromise of a number of root keys less than the quorum is harmless
it allows to safely revoke and replace a key, even if it was lost

The added cost is more maintenance burden, but this remains small since these keys are not often used (only when keys are going to expire, were compromised or in the event new top-level keys need to be added).

The initial root keys could be distributed as such:

Louis Gesbert, opam maintainer, OCamlPro
Anil Madhavapeddy, main repository maintainer, OCaml Labs
Thomas Gazagnaire, main repository maintainer, OCaml Labs
Grégoire Henry, OCamlPro safekeeper
Someone in the OCaml team ?

Keys will be set with an expiry date so that one expires each year in turn, leaving room for smooth rollover.

For other repositories, there will be three options:

no signatures (backwards compatible ?), e.g. for local network repositories. This should be allowed, but with proper warnings.
trust on first use: get the root keys on first access, let the user confirm their fingerprints, then fully trust them.
let the user manually supply the root keys.

End-to-end signing

This requires the end-user to be able to validate a signature made by the original developer. There are two trust paths for the chain of trust (where "→" stands for "signs for"):

(high) root keys → repository maintainer keys → (signs individually) package delegation + developer key → package files
(low) root keys → snapshot key → (signs as part of snapshot) package delegation + developer key → package files

It is intended that packages may initially follow the low trust path, adding as little burden and delay as possible when adding new packages, and may then be promoted to the high path with manual intervention, after verification, from repository maintainers. This way, most well-known and widely used packages will be provided with higher trust, and the scope of an attack on the low trust path would be reduced to new, experimental or little-used packages.

Repository signing

This provides consistent, up-to-date snapshots of the repository, and protects against a whole different class of attacks than end-to-end signing (e.g. rollbacks, mix-and-match, freeze, etc.)

This is done automatically by a snapshot bot (might run on the repository server), using the snapshot key, which is signed directly by the root keys, hence the chain of trust:

root keys → snapshot key → commit-hash

Where "commit-hash" is the head of the repository's git repository (and thus a valid cryptographic hash of the full repository state, as well as its history)

Repository maintainer (RM) keys

Repository maintainers hold the central role in monitoring the repository and warranting its security, with or without signing. Their keys (called targets keys in the TUF framework) are signed directly by the root keys. As they have a high security potential, in order to reduce the consequences of a compromise, we will be requiring a quorum for signing sensitive operations

These keys are stored password-encrypted on the RM computers.

Snapshot key

This key is held by the snapshot bot and signed directly by the root keys. It is used to guarantee consistency and freshness of repository snapshots, and does so by signing a git commit-hash and a time-stamp.

It is held online and used by the snapshot bot for automatic signing: it has lower security than the RM keys, but also a lower potential: it can not be used directly to inject malicious code or metadata in any existing package.

Delegate developer keys

These keys are used by the package developers for end-to-end signing. They can be generated locally as needed by new packagers (e.g. by the opam-publish tool), and should be stored password-encrypted. They can be added to the repository through pull-requests, waiting to be signed (i) as part of snapshots (which also prevents them to be modified later, but we'll get to it) and (ii) directly by RMs.

Initial bootstrap

We'll need to start somewhere, and the current repository isn't signed. An additional key, initial-bootstrap, will be used for guaranteeing integrity of existing, but yet unverified packages.

This is a one-go key, signed by the root keys, and that will then be destroyed. It is allowed to sign for packages without delegation.

Trust chain and revocation

In order to build the trust chain, the opam client downloads a keys/root key file initially and before every update operation. This file is signed by the root keys, and can be verified by the client using its built-in keys (or one of the ways mentioned above for unofficial repositories). It must be signed by a quorum of known root keys, and contains the comprehensive set of root, RM, snapshot and initial bootstrap keys: any missing keys are implicitly revoked. The new set of root keys is stored by the opam client and used instead of the built-in ones on subsequent runs.

Developer keys are stored in files keys/dev/<id>, self-signed, possibly RM signed (and, obviously, snapshot-signed). The conditions of their verification, removal or replacement are included in our validation of metadata update (see below).

File formats and hierarchy

Signed files and tags

The files follow the opam syntax: a list of fields fieldname: followed by contents. The format is detailed in opam's documentation.

The signature of files in opam is done on the canonical textual representation, following these rules:

any existing signature: field is removed
one field per line, ending with a newline
fields are sorted lexicographically by field name
newlines, backslashes and double-quotes are escaped in string literals
spaces are limited to one, and to these cases: after field leaders fieldname:, between elements in lists, before braced options, between operators and their operands
comments are erased
fields containing an empty list, or a singleton list containing an empty list, are erased

The signature: field is a list with elements in the format of string triplets [ "<keyid>" "<algorithm>" "<signature>" ]. For example:

opam-version: "1.2"
name: "opam"
signature: [
  [ "louis.gesbert@ocamlpro.com" "RSASSA-PSS" "048b6fb4394148267df..." ]
]

Signed tags are git annotated tags, and their contents follow the same rules. In this case, the format should contain the field commit:, pointing to the commit-hash that is being signed and tagged.

File hierarchy

The repository format is changed by the addition of:

a directory keys/ at the root
delegation files packages/<pkgname>/delegate and compilers/<patchname>.delegate
signed checksum files at packages/<pkgname>/<pkgname>.<version>/signature

Here is an example:

repository root /
|--packages/
|  |--pkgname/
|  |  |--delegation                    - signed by developer, repo maintainer
|  |  |--pkgname.version1/
|  |  |  |--opam
|  |  |  |--descr
|  |  |  |--url
|  |  |  `--signature                  - signed by developer1
|  |  `--pkgname.version2/ ...
|  `--pkgname2/ ...
|--compilers/
|  |--version/
|  |  |--version+patch/
|  |  |  |--version+patch.comp
|  |  |  |--version+patch.descr
|  |  |  `--version+patch.signature
|  |  `--version+patch2/ ...
|  |--patch.delegate
|  |--patch2.delegate
|  `--version2/ ...
`--keys/
   |--root
   `--dev/
      |--developer1-email              - signed by developer1,
      `--developer2-email ...            and repo maint. once verified

Keys are provided in different files as string triplets [ [ "keyid" "algo" "key" ] ]. keyid must not conflict with any previously-defined keys, and algo may be "rsa" and keys encoded in PEM format, with further options available later.

For example, the keys/root file will have the format:

date: "2015-06-04T13:53:00Z"
root-keys: [ [ "keyid" "{expire-date}" "algo" "key" ] ]
snapshot-keys: [ [ "keyid" "algo" "key" ] ]
repository-maintainer-keys: [ [ "keyid" "algo" "key" ] ]

This file is signed by current and past root keys -- to allow clients to update. The date: field provides further protection against rollback attacks: no clients may accept a file with a date older than what they currently have. Date is in the ISO 8601 standard with 0 UTC offset, as suggested in TUF.

Delegation files

/packages/pkgname/delegation delegates ownership on versions of package pkgname. The file contains version constraints associated with keyids, e.g.:

name: pkgname
delegates: [
  "thomas@gazagnaire.org"
  "louis.gesbert@ocamlpro.com" {>= "1.0"}
]

The file is signed:

by the original developer submitting it
or by a developer previously having delegation for all versions, for changes
or directly by repository maintainers, validating the delegation, and increasing the level of trust

Every key a developer delegates trust to must also be signed by the developer.

compilers/patch.delegate files follow a similar format (we are considering changing the hierarchy of compilers to match that of packages, to make things simpler).

The delegates: field may be empty: in this case, no packages by this name are allowed on the repository. This may be useful to mark deletion of obsolete packages, and make sure a new, different package doesn't take the same name by mistake or malice.

Package signature files

These guarantee the integrity of a package: this includes metadata and the package archive itself (which may, or may not, be mirrored on the the opam repository server).

The file, besides the package name and version, has a field package-files: containing a list of files below packages/<pkgname>/<pkgname>.<version> together with their file sizes in bytes and one or more hashes, prefixed by their kind, and a field archive: containing the same details for the upstream archive. For example:

name: pkgname
version: pkgversion
package-files: [
  "opam" {901 [ sha1 "7f9bc3cc8a43bd8047656975bec20b578eb7eed9" md5 "1234567890" ]}
  "descr" {448 [ sha1 "8541f98524d22eeb6dd669f1e9cddef302182333" ]}
  "url" {112 [ sha1 "0a07dd3208baf4726015d656bc916e00cd33732c" ]}
  "files/ocaml.4.02.patch" {17243 [ sha1 "b3995688b9fd6f5ebd0dc4669fc113c631340fde" ]}
]
archive: [ 908460 [ sha1 "ec5642fd2faf3ebd9a28f9de85acce0743e53cc2" ] ]

This file is signed either:

by the initial-bootstrap key, only initially
by a delegate key (i.e. by a delegated-to developer)
by a quorum of repository maintainers

The latter is needed to hot-fix packages on the repository: repository maintainers often need to do so. A quorum is still required, to prevent a single RM key compromise from allowing arbitrary changes to every package. The quorum is not initially required to sign a delegation, but is, consistently, required for any change to an existing, signed delegation.

Compiler signature files <version>+<patch>.signature are similar, with fields compiler-files containing checksums for <version>+<patch>.*, the same field archive: and an additional optional field patches:, containing the sizes and hashes of upstream patches used by this compiler.

If the delegation or signature can't be validated, the package or compiler is ignored. If any file doesn't correspond to its size or hashes, it is ignored as well. Any file not mentioned in the signature file is ignored.

Snapshots and linearity

Main snapshot role

The snapshot key automatically adds a signed annotated tag to the top of the served branch of the repository. This tag contains the commit-hash and the current timestamp, effectively ensuring freshness and consistency of the full repository. This protects against mix-and-match, rollback and freeze attacks.

The signed annotated tag is deleted and recreated by the snapshot bot, after checking the validity of the update, periodically and after each change.

Linearity

The repository is served using git: this means, not only the latest version, but the full history of changes are known. This as several benefits, among them, incremental downloads "for free"; and a very easy way to sign snapshots. Another good point is that we have a working full OCaml implementation.

We mentioned above that we use the snapshot signatures not only for repository signing, but also as an initial guarantee for submitted developer's keys and delegations. One may also have noticed, in the above, that we sign for delegations, keys etc. individually, but without a bundle file that would ensure no signed files have been maliciously removed.

These concerns are all addressed by a linearity condition on the repository's git: the snapshot bot does not only check and sign for a given state of the repository, it checks every individual change to the repository since the last well-known, signed state: patches have to follow from that git commit (descendants on the same branch), and are validated to respect certain conditions: no signed files are removed or altered without signature, etc.

Moreover, this check is also done on clients, every time they update: it is slightly weaker, as the client don't update continuously (an attacker may have rewritten the commits since last update), but still gives very good guarantees.

A key and delegation that have been submitted by a developer and merged, even without RM signature, are signed as part of a snapshot: git and the linearity conditions allow us to guarantee that this delegation won't be altered or removed afterwards, even without an individual signature. Even if the repository is compromised, an attacker won't be able to roll out malicious updates breaking these conditions to clients.

The linearity invariants are:

no key, delegation, or package version (signed files) may be removed
a new key is signed by itself, and optionally by a RM
a new delegation is signed by the delegate key, optionally by a RM. Signing keys must also sign the delegate keys
a new package or package version is signed by a valid key holding a valid delegation for this package version
keys can only be modified with signature from the previous key or a quorum of RM keys
delegations can only be modified with signature by a quorum of RMs, or possibly by a former delegate key (without version constraints) in case there was previously no RM signature
any package modification is signed by an appropriate delegate key, or by a quorum of RM keys

It is sometimes needed to do operations, like key revocation, that are not allowed by the above rules. These are enabled by the following additional rules, that require the commit including the changes to be signed by a quorum of repository maintainers using an annotated tag:

package or package version removal
removal (revocation) of a developer key
removal of a package delegation (it's in general preferable to leave an empty delegation)

Changes to the keys/root file, which may add, modify or revoke keys for root, RMs and snapshot keys is verified in the normal way, but needs to be handled for checking linearity since it decides the validity of RM signatures. Since this file may be needed before we can check the signed tag, it has its own timestamp to prevent rollback attacks.

In case the linearity invariant check fail:

on the GitHub repository, this is marked and the RMs are advised not to merge (or to complete missing tag signatures)
on the clients, the update is refused, and the user informed of what's going on (the repository has likely been compromised at that point)
on the repository (checks by the snapshot bot), update is stalled and all repository maintainers immediately warned. To recover, the broken commits (between the last signed tag and master) need to be amended.

Work and changes involved

General

Write modules for key handling ; signing and verification of opam files.

Write the git synchronisation module with linearity checks.

Rewrite the default HTTP repository synchronisation module to use git fetch, verify, and git pull. This should be fully transparent, except:

in the cases of errors, of course
when registering a non-official repository
for some warnings with features that disable signatures, like source pinning (probably only the first time would be good)

Include the public root keys for the default repository, and implement management of updated keys in ~/.opam/repo/name.

Handle the new formats for checksums and non-repackaged archives.

Allow a per-repository security threshold (e.g. allow all, allow only signed packages, allow only packages signed by a verified key, allow only packages signed by their verified developer). It should be easy but explicit to add a local network, unsigned repository. Backends other than git won't be signed anyway (local, rsync...).

opam-publish

Generate keys, handle locally stored keys, generate signature files, handle signing, submit signatures, check delegation, submit new delegation, request delegation change (might require repository maintainer intervention if an RM signed the delegation), delete developer, delete package.

Manage local keys. Probably including re-generating, and asking for revocation.

opam-admin

Most operations on signatures and keys will be implemented as independent modules (as to be usable from e.g. unikernels working on the repository). We should also make them available from opam-admin, for testing and manual management. Special tooling will also be needed by RMs.

fetch the archives (but don't repackage as pkg+opam.tar.gz anymore)
allow all useful operations for repository maintainers (maybe in a different tool ?):
- manage their keys
- list and sign changed packages directly
- list and sign waiting delegations to developer keys
- validate signatures, print reports
- sign tags, including adding a signature to an existing tag to meet the quorum
- list quorums waiting to be met on a given branch
generate signed snapshots (same as the snapshot bot, for testing)

Signing bots

If we don't want to have this processed on the publicly visible host serving the repository, we'll need a mechanism to fetch the repository, and submit the signed tag back to the repository server.

Doing this through mirage unikernels would be cool, and provide good isolation. We could imagine running this process regularly:

fetch changes from the repository's git (GitHub)
check for consistency (linearity)
generate and sign the signed tag
push tag back to the release repository

Travis

All security information and check results should be available to RMs before they make the decision to merge a commit to the repository. This means including signature and linearity checks in a process running on Travis, or similarly on every pull-request to the repository, and displaying the results in the GitHub tracker.

This should avoid most cases where the snapshot bot fails the validation, leaving it stuck (as well as any repository updates) until the bad commits are rewritten.

Some more detailed scenarios

`opam init` and `update` scenario

On init, the client would clone the repository and get to the signed tag, get and check the associated keys/root file, and validate the signed tag according to the new keyset. If all goes well, the new set of root, RM and snapshot keys is registered.

Then all files' signatures are checked following the trust chains, and copied to the internal repository mirror opam will be using (~/.opam/repo/<name>). When a package archive is needed, the download is done either from the repository, if the file is mirrored, or from its upstream, in both cases with known file size and upper limit: the download is stopped if going above the expected size, and the file removed if it doesn't match both.

On subsequent updates, the process is the same except that a fetch operation is done on the existing clone, and that the repository is forwarded to the new signed tag only if linearity checks passed (and the update is aborted otherwise).

`opam-publish` scenario

The first time a developer runs opam-publish submit, a developer key is generated, and stored locally.
Upon opam-publish submit, the package is signed using the key, and the signature is included in the submission.
If the key is known, and delegation for this package matches, all is good
If the key is not already registered, it is added to /keys/dev/ within the pull-request, self-signed.
If there is no delegation for the package, the /packages/pkgname/delegation file is added, delegating to the developer key and signed by it.
If there is an existing delegation that doesn't include the auhor's key, this will require manual intervention from the repository managers. We may yet submit a pull-request adding the new key as delegate for this package, and ask the repository maintainers -- or former developers -- to sign it.

Security analysis

We claim that the above measures give protection against:

Arbitrary packages: if an existing package is not signed, it is not installed (or even visible) to the user. Anybody can submit new unclaimed packages (but, in the current setting, still need GitHub write access to the repository, or to bypass GitHub's security).
Rollback attacks: git updates must follow the currently known signed tag. if the snapshot bot detects deletions of packages, it refuses to sign, and clients double-check this. The keys/root file contains a timestamp.
Indefinite freeze attacks: the snapshot bot periodically signs the signed tag with a timestamp, if a client receives a tag older than the expected age it will notice.
Endless data attacks: we rely on the git protocol and this does not defend against endless data. Downloading of package archive (of which the origin may be any mirror), though, is protected. The scope of the attack is mitigated in our setting, because there are no unattended updates: the program is run manually, and interactively, so the user is most likely to notice.
Slow retrieval attacks: same as above.
Extraneous dependencies attacks: metadata is signed, and if the signature does not match, it is not accepted.

NOTE: the provides field -- yet unimplemented, see the document in opam/doc/design -- could provide a vector in this case, by advertising a replacement for a popular package. Additional measures will be taken when implementing the feature, like requiring a signature for the provided package.

Mix-and-match attacks: the repository has a linearity condition, and partial repositories are not possible.
Malicious repository mirrors: if the signature does not match, reject.
Wrong developer attack: if the developer is not in the delegation, reject.

GitHub repository

Is the link between GitHub (opam-repository) and the signing bot special? If there is a MITM on this link, they can add arbitrary new packages, but due to missing signatures only custom universes. No existing package can be altered or deleted, otherwise consistency condition above does not hold anymore and the signing bot will not sign.

Certainly, the access can be frozen, thus the signing bot does not receive updates, but continues to sign the old repository version.

Snapshot key

If the snapshot key is compromised, an attacker is able to:

Add arbitrary (non already existing) packages, as above.
Freeze, by forever re-signing the signed tag with an updated timestamp.

Most importantly, the attacker won't be able to tamper with existing packages. This hudgely reduces the potential of an attack, even with a compromised snapshot key.

The attacks above would also require either a MITM between the repository and the client, or a compromise of the opam repository: in the latter case, since the linearity check is reproduces even from the clients:

any tamper could be detected very quickly, and measures taken.
a freeze would be detected as soon as a developer checks that their package is really online. That currently happens several times a day.

The repository would then just have to be reset to before the attack, which git makes as easy as it can get, and the holders of the root keys would sign a new /auth/root, revoking the compromised snapshot key and introducing a new one.

In the time before the signing bot can be put back online with the new snapshot key -- i.e. the breach has been found and fixed -- a developer could manually sign time-stamped tags before they expire (e.g. once a day) so as not to hold back updates.

Repository Maintainer keys

Repository maintainers are powerful, they can modify existing opam files and sign them (as hotfix), introduce new delegations for packages, etc.).

However, by requiring a quorum for sensitive operations, we limit the scope of a single RM key compromise to the validation of new developer keys or delegations (which should be the most common operation done by RMs): this enables to raise the level of security of the new, malicious packages but otherwise doesn't change much from what can be done with just access to the git repository.

A further compromise of a quorum of RM keys would allow to remove or tamper with any developer key, delegation or package: any of these amounts to being able to replace any package with a compromised version. Cleaning up would require replacing all but the root keys, and resetting the repository to before any malicious commit.

Difference to TUF

we use git
thus get linearity "for free"
and already have a hash over the entire repository
TUF provides a mechanism for delegation, but it's both heavier and not expressive enough for what we wanted -- delegate to packages directly.
We split in lots more files, and per-package ones, to fit with and nicely extend the git-based workflow that made the success of opam. The original TUF would have big json files signing for a lot of files, and likely to conflict. Both developers and repository maintainers should be able to safely work concurrently without issue. Signing bundles in TUF gives the additional guarantee that no file is removed without proper signature, but this is handled by git and signed tags.
instead of a single file with all signed packages by a specific developer, one file per package

Differences to Haskell:

use TUF keys, not gpg
e2e signing

Signing the OPAM repositoryOn 2015-06-05, by Louis Gesbert and Hannes Mehnert

Threat model

Attacks

Trust

Root keys

End-to-end signing

Repository signing

Repository maintainer (RM) keys

Snapshot key

Delegate developer keys

Initial bootstrap

Trust chain and revocation

File formats and hierarchy

Signed files and tags

File hierarchy

Delegation files

Package signature files

Snapshots and linearity

Main snapshot role

Linearity

Work and changes involved

General

opam

opam-publish

opam-admin

Signing bots

Travis

Some more detailed scenarios

opam init and update scenario

opam-publish scenario

Security analysis

GitHub repository

Snapshot key

Repository Maintainer keys

Difference to TUF

Differences to Haskell:

Signing the OPAM repository
On 2015-06-05, by Louis Gesbert and Hannes Mehnert

`opam init` and `update` scenario

`opam-publish` scenario