More detailed descriptions of curator workflows.

This commit is contained in:
Adam Bergmark 2016-12-19 19:37:42 +01:00
parent cc72e288f9
commit 3f5dd28276

View File

@ -4,7 +4,7 @@ Originally this was handled largely by Michael Snoyman,
but now we are a team of 4 people handling requests weekly in rotation.
Curation activities are mostly automated, and do not take up a significant amount of time.
## Workflow
## Workflow overview
This section sketches out at a high level how the entire Stackage build/curation
process works:
@ -23,10 +23,30 @@ process works:
The typical story on pull requests is: If Travis accepts it and the
author only added packages under his/her own name, merge it. If the
build later fails (see below), then block the package until it's
fixed.
build later fails (see "Adding Debian packages for required system tools or libraries"),
then block the package until it's fixed.
If benchmarks, haddocks, or test suites fails at this point we
typically also block the package until these issues are fixed. This in
order to add packages with a clean slate.
Optionally we can check if packdeps says the package is up to date.
Visit http://packdeps.haskellers.com/feed?needle=<package-name>
Builds may fail because of unrelated bounds changes. If this happens,
first add any version bounds to get master into a passing state (see
"Fixing bounds issues"), then re-run the travis build.
A common issue is that authors submit newly uploaded packages, it can
take up to an hour before this has synced across the stack
infrastructure. You can usually compare the versions of the package in
https://github.com/commercialhaskell/all-cabal-metadata/tree/master/packages/
to what's on hackage to see if this is the case. Wait an hour and
re-run the pull request.
Tests also commonly fail due to missing test files, and sometimes due
to doctest limitations. You can point the maintainer to
https://github.com/bergmark/blog/blob/master/2016/package-faq.md
## Fixing bounds issues
@ -36,20 +56,121 @@ issue on the Stackage repo about the problem, and modifying the
build-constraints.yaml file to work around it in one of the ways below. Be sure
to refer to the issue for workarounds added to that file.
* __Temporary upper bounds__ Most common technique, just prevent a new version of a library from being included immediately
* __Skipping tests and benchmarks__ If the upper bound is only in a test suite or benchmark, you can add the relevant package to skipped-tests or skipped-benchmarks. For example, if conduit had an upper bound on criterion for a benchmark, you could added conduit as a skipped benchmark.
* __Excluding packages__ In an extreme case of a non-responsive maintainer, you can remove the package entirely from Stackage. We try to avoid that whenever possible
### Temporary upper bounds
Most common technique, just prevent a new version of a library from
being included immediately. This also applies to when only benchmarks
and tests are affected.
* Copy the stackage-curator output and create a new issue, see e.g
https://github.com/fpco/stackage/issues/2108
* Add a new entry under the "stackage upper bounds" section of `build-constraints.yaml`. For the above example it would be
```yaml
"Stackage upper bounds":
# https://github.com/fpco/stackage/issues/2108
- pipes < 4.3.0
```
* Commit (message e.g. "Upper bound for #2108")
* Optionally: Verify with `stackage-curator check` locally
* Push
* Verify that everything works on the build server (you can restart the build or wait for it to to run again)
Sometimes releases for different packages are tightly coupled. Then it
can make sense to combine them into one issue, as in
https://github.com/fpco/stackage/issues/2143.
If a dependency that is not explicitly in stackage is causing test or
benchmark failures you can skip or expect them to fail (see "Skipping
tests and benchmarks" and "Expecting test/benchmark/haddock
failures"). Bonus points for reporting this upstream to that packages'
maintainer.
### Lifting upper bounds
You can try this when you notice that a package has been updated. You
can also periodically try to lift bounds (I think it's good to do this
at the start of your week /@bergmark)
If stackage-curator is happy commit the change ("Remove upper bounds and close #X").
### Amending upper bounds
With the `pipes` example above there was later a new release of
`pipes-safe` that required the **newer** version of `pipes`. You can
add that package to the same upper bounds section,
(e.g. https://github.com/fpco/stackage/commit/6429b1eb14db3f2a0779813ef2927085fa4ad673)
as we want to lift them simultaneously.
### Skipping tests and benchmarks
Sometimes tests and benchmark dependencies are forgotten or not cared
for. To disable compilation for them add them to `skipped-tests` or
`skipped-benchmarks`. If a package is added to these sections they
won't be compiled, and their dependencies won't be taken into account.
There are sub sections under these headers that is used to group types
of failures together, and also to document what type of failures
exist.
### Expecting test/benchmark/haddock failures
The difference from the `skipped` sections is that items listed here
are compiled and their dependencies are taken into account. These
sections also have sub sections with groups and descriptions.
One big category of test suites in this section are those requiring
running services. We don't want to run those, but we do want to check
dependencies and compile them.
If there are no version bounds that would fix the issue or if you
can't figure it out, file it
(e.g. https://github.com/fpco/stackage/issues/2133) to ask the
maintainer for help.
### Waiting for new releases
Sometimes there is a failure reported on a (now possibly closed) issue
on an external tracker. If an issue gets resolved but there is no
hackage release yet we'd like to get notified when it's uploaded.
Add the package with its current version to the
`tell-me-when-its-released` section. This will cause the build to stop
when the new version is out.
### Excluding packages
In an extreme case of a non-responsive maintainer, you can remove the
package entirely from Stackage. We try to avoid that whenever
possible.
This typically happens when we move to a new major GHC release or when
there are only a few packages waiting for updates on an upper bounds
issue.
Comment out the offending packages from the "packages" section and add
a comment saying why it was disabled:
```
# - swagger # BLOCKED aeson 1.0
```
## Updating the content of the Docker image used for building
### Adding Debian packages for required system tools or libraries
Additional (non-Haskell) system libraries or tools should be added to `stackage/debian-bootstrap.sh`.
Committing the changes to a branch should trigger a DockerHub. Normally only the nightly branch needs to be updated
Committing the changes to a branch should trigger a DockerHub. Normally only the `nightly` branch needs to be updated
since new packages are not added to the current lts release.
Use [Ubuntu Package content search](http://packages.ubuntu.com/) to determine which package provides particular dev files (it defaults to xenial which is the version used to build Nightly).
Note we generally don't install/run services needed for testsuites in the docker images - packages with tests requiring some system service can be add to expected-test-failures.
Note that we generally don't install/run services needed for testsuites in the docker images - packages with tests requiring some system service can be added to `expected-test-failures`.
It's good to inform the maintainer of any disabled tests (commenting in the PR is sufficient).
If a new package fails to build because of missing system libraries we often ask the maintainer to help figure out what to install.
### Upgrading GHC version
The Dockerfile contains information on which GHC versions should be used. You
@ -106,7 +227,7 @@ we're just not there yet.
/opt/stackage-build/stackage/automated/build.sh lts-3.0
```
Recommended: run these from inside a `screen` session. If you get version bound
Recommended: run these from inside a `tmux` session. If you get version bound
problems on nightly or LTS major, you need to fix build-constraints.yaml (see
info above). For an LTS minor bump, you'll typically want to use the
`CONSTRAINTS` environment variable, e.g.:
@ -126,13 +247,13 @@ If a build fails for bounds reasons, see all of the advice above. If the code
itself doesn't build, or tests fail, open up an issue and then either put in a
version bound to avoid that version or something else. It's difficult to give
universal advice on how to solve things, since each situation is unique. Let's
develop this advice over time. For now: if you're not sure, ask Michael for
guidance.
develop this advice over time. For now: if you're not sure, ask for guidance.
__`NOPLAN=1`__ If you wish to rerun a build without recalculating a
build plan, you can set the environment variable `NOPLAN=1`. This is
useful for such cases as an intermittent test failure, out of memory
condition, or manually tweaking the plan file.
condition, or manually tweaking the plan file. This is the default for
LTS builds.
### Timing
@ -146,9 +267,33 @@ LTS minor bumps typically are run on Sundays.
### Website sync debugging (and other out of disk space errors)
* You can detect the problem by running `df`. If you see that `/` is out of space, we have a problem
* There are many temp files inside `/home/ubuntu/stackage-server-cron` that can be cleared out occasionally
* You can then manually run `/home/ubuntu/stackage-server-cron.sh`, or wait for the cron job to do it
* (outdated) There are many temp files inside `/home/ubuntu/stackage-server-cron` that can be cleared out occasionally
* (outdated) You can then manually run `/home/ubuntu/stackage-server-cron.sh`, or wait for the cron job to do it
### Wiping the cache
Sometimes the cache can get corrupted which might manifest as `can't load .so/.DLL`. You can wipe the nightly cache and rebuild everything by doing `rm -rf /opt/stackage-build/stackage/automated/nightly`.
Sometimes the cache can get corrupted which might manifest as `can't load .so/.DLL`.
You can wipe the nightly cache and rebuild everything by doing
`rm -rf /var/stackage/stackage/automated/nightly`.
Replace nightly with `lts7` to wipe the LTS 7 cache.
## Local curator setup
We don't run the full stackage build locally as that might take too
much time. Some steps on the other hand are much faster to do
yourself.
It's useful to be able to modify constraints locally before pushing to
the repository. To do this first install stackage-curator:
`git clone git@github.com:fpco/stackage-curator.git && cd stackage-curator && stack install`
or get the linux binary: https://s3.amazonaws.com/stackage-travis/stackage-curator/stackage-curator.bz2
(it's a good idea to upgrade stackage-curator at least at the start of your week as curator).
Then clone the stackage repo `git clone git@github.com:fpco/stackage.git`.
Inside it run `stack update && stackage-curator check` to get new packages and do dependency resolution.
This can be used to make sure all version bounds are in place
(including for test suites and benchmarks), to check whether bounds
can be lifted, and to get `tell-me-when-its-released` notifications.
Notably this doesn't build anything, so you won't see any compilation
errors for builds/tests/benchmarks.