From 3f5dd282764f55525637a4ad3fa5f504faf316e2 Mon Sep 17 00:00:00 2001
From: Adam Bergmark <adam@bergmark.nl>
Date: Mon, 19 Dec 2016 19:37:42 +0100
Subject: [PATCH] More detailed descriptions of curator workflows.

---
 CURATORS.md | 175 +++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 160 insertions(+), 15 deletions(-)
diff --git a/CURATORS.md b/CURATORS.md
index 7b5f8997..0af3d9f3 100644
--- a/CURATORS.md
+++ b/CURATORS.md
@@ -4,7 +4,7 @@ Originally this was handled largely by Michael Snoyman,
 but now we are a team of 4 people handling requests weekly in rotation.
 Curation activities are mostly automated, and do not take up a significant amount of time.
 
-## Workflow
+## Workflow overview
 
 This section sketches out at a high level how the entire Stackage build/curation
 process works:
@@ -23,10 +23,30 @@ process works:
 
 The typical story on pull requests is: If Travis accepts it and the
 author only added packages under his/her own name, merge it.  If the
-build later fails (see below), then block the package until it's
-fixed.
+build later fails (see "Adding Debian packages for required system tools or libraries"),
+then block the package until it's fixed.
+
+If benchmarks, haddocks, or test suites fails at this point we
+typically also block the package until these issues are fixed. This in
+order to add packages with a clean slate.
 
 Optionally we can check if packdeps says the package is up to date.
+Visit http://packdeps.haskellers.com/feed?needle=<package-name>
+
+Builds may fail because of unrelated bounds changes. If this happens,
+first add any version bounds to get master into a passing state (see
+"Fixing bounds issues"), then re-run the travis build.
+
+A common issue is that authors submit newly uploaded packages, it can
+take up to an hour before this has synced across the stack
+infrastructure. You can usually compare the versions of the package in
+https://github.com/commercialhaskell/all-cabal-metadata/tree/master/packages/
+to what's on hackage to see if this is the case. Wait an hour and
+re-run the pull request.
+
+Tests also commonly fail due to missing test files, and sometimes due
+to doctest limitations. You can point the maintainer to
+https://github.com/bergmark/blog/blob/master/2016/package-faq.md
 
 ## Fixing bounds issues
 
@@ -36,20 +56,121 @@ issue on the Stackage repo about the problem, and modifying the
 build-constraints.yaml file to work around it in one of the ways below. Be sure
 to refer to the issue for workarounds added to that file.
 
-* __Temporary upper bounds__ Most common technique, just prevent a new version of a library from being included immediately
-* __Skipping tests and benchmarks__ If the upper bound is only in a test suite or benchmark, you can add the relevant package to skipped-tests or skipped-benchmarks. For example, if conduit had an upper bound on criterion for a benchmark, you could added conduit as a skipped benchmark.
-* __Excluding packages__ In an extreme case of a non-responsive maintainer, you can remove the package entirely from Stackage. We try to avoid that whenever possible
+### Temporary upper bounds
+
+Most common technique, just prevent a new version of a library from
+being included immediately. This also applies to when only benchmarks
+and tests are affected.
+
+* Copy the stackage-curator output and create a new issue, see e.g
+https://github.com/fpco/stackage/issues/2108
+
+* Add a new entry under the "stackage upper bounds" section of `build-constraints.yaml`. For the above example it would be
+
+```yaml
+    "Stackage upper bounds":
+        # https://github.com/fpco/stackage/issues/2108
+        - pipes < 4.3.0
+```
+
+* Commit (message e.g. "Upper bound for #2108")
+* Optionally: Verify with `stackage-curator check` locally
+* Push
+* Verify that everything works on the build server (you can restart the build or wait for it to to run again)
+
+Sometimes releases for different packages are tightly coupled. Then it
+can make sense to combine them into one issue, as in
+https://github.com/fpco/stackage/issues/2143.
+
+If a dependency that is not explicitly in stackage is causing test or
+benchmark failures you can skip or expect them to fail (see "Skipping
+tests and benchmarks" and "Expecting test/benchmark/haddock
+failures"). Bonus points for reporting this upstream to that packages'
+maintainer.
+
+### Lifting upper bounds
+
+You can try this when you notice that a package has been updated. You
+can also periodically try to lift bounds (I think it's good to do this
+at the start of your week /@bergmark)
+
+If stackage-curator is happy commit the change ("Remove upper bounds and close #X").
+
+### Amending upper bounds
+
+With the `pipes` example above there was later a new release of
+`pipes-safe` that required the **newer** version of `pipes`. You can
+add that package to the same upper bounds section,
+(e.g. https://github.com/fpco/stackage/commit/6429b1eb14db3f2a0779813ef2927085fa4ad673)
+as we want to lift them simultaneously.
+
+### Skipping tests and benchmarks
+
+Sometimes tests and benchmark dependencies are forgotten or not cared
+for. To disable compilation for them add them to `skipped-tests` or
+`skipped-benchmarks`. If a package is added to these sections they
+won't be compiled, and their dependencies won't be taken into account.
+
+There are sub sections under these headers that is used to group types
+of failures together, and also to document what type of failures
+exist.
+
+### Expecting test/benchmark/haddock failures
+
+The difference from the `skipped` sections is that items listed here
+are compiled and their dependencies are taken into account. These
+sections also have sub sections with groups and descriptions.
+
+One big category of test suites in this section are those requiring
+running services. We don't want to run those, but we do want to check
+dependencies and compile them.
+
+If there are no version bounds that would fix the issue or if you
+can't figure it out, file it
+(e.g. https://github.com/fpco/stackage/issues/2133) to ask the
+maintainer for help.
+
+### Waiting for new releases
+
+Sometimes there is a failure reported on a (now possibly closed) issue
+on an external tracker. If an issue gets resolved but there is no
+hackage release yet we'd like to get notified when it's uploaded.
+
+Add the package with its current version to the
+`tell-me-when-its-released` section. This will cause the build to stop
+when the new version is out.
+
+### Excluding packages
+
+In an extreme case of a non-responsive maintainer, you can remove the
+package entirely from Stackage. We try to avoid that whenever
+possible.
+
+This typically happens when we move to a new major GHC release or when
+there are only a few packages waiting for updates on an upper bounds
+issue.
+
+Comment out the offending packages from the "packages" section and add
+a comment saying why it was disabled:
+
+```
+        # - swagger # BLOCKED aeson 1.0
+```
+
 
 ## Updating the content of the Docker image used for building
 
 ### Adding Debian packages for required system tools or libraries
 Additional (non-Haskell) system libraries or tools should be added to `stackage/debian-bootstrap.sh`.
-Committing the changes to a branch should trigger a DockerHub. Normally only the nightly branch needs to be updated
+Committing the changes to a branch should trigger a DockerHub. Normally only the `nightly` branch needs to be updated
 since new packages are not added to the current lts release.
 
 Use [Ubuntu Package content search](http://packages.ubuntu.com/) to determine which package provides particular dev files (it defaults to xenial which is the version used to build Nightly).
 
-Note we generally don't install/run services needed for testsuites in the docker images - packages with tests requiring some system service can be add to expected-test-failures.
+Note that we generally don't install/run services needed for testsuites in the docker images - packages with tests requiring some system service can be added to `expected-test-failures`.
+It's good to inform the maintainer of any disabled tests (commenting in the PR is sufficient).
+
+If a new package fails to build because of missing system libraries we often ask the maintainer to help figure out what to install.
 
 ### Upgrading GHC version
 The Dockerfile contains information on which GHC versions should be used. You
@@ -106,7 +227,7 @@ we're just not there yet.
 /opt/stackage-build/stackage/automated/build.sh lts-3.0
 ```
 
-Recommended: run these from inside a `screen` session. If you get version bound
+Recommended: run these from inside a `tmux` session. If you get version bound
 problems on nightly or LTS major, you need to fix build-constraints.yaml (see
 info above). For an LTS minor bump, you'll typically want to use the
 `CONSTRAINTS` environment variable, e.g.:
@@ -126,13 +247,13 @@ If a build fails for bounds reasons, see all of the advice above. If the code
 itself doesn't build, or tests fail, open up an issue and then either put in a
 version bound to avoid that version or something else. It's difficult to give
 universal advice on how to solve things, since each situation is unique. Let's
-develop this advice over time. For now: if you're not sure, ask Michael for
-guidance.
+develop this advice over time. For now: if you're not sure, ask for guidance.
 
 __`NOPLAN=1`__ If you wish to rerun a build without recalculating a
 build plan, you can set the environment variable `NOPLAN=1`. This is
 useful for such cases as an intermittent test failure, out of memory
-condition, or manually tweaking the plan file.
+condition, or manually tweaking the plan file. This is the default for
+LTS builds.
 
 ### Timing
 
@@ -146,9 +267,33 @@ LTS minor bumps typically are run on Sundays.
 ### Website sync debugging (and other out of disk space errors)
 
 * You can detect the problem by running `df`. If you see that `/` is out of space, we have a problem
-* There are many temp files inside `/home/ubuntu/stackage-server-cron` that can be cleared out occasionally
-* You can then manually run `/home/ubuntu/stackage-server-cron.sh`, or wait for the cron job to do it
+* (outdated) There are many temp files inside `/home/ubuntu/stackage-server-cron` that can be cleared out occasionally
+* (outdated) You can then manually run `/home/ubuntu/stackage-server-cron.sh`, or wait for the cron job to do it
 
 ### Wiping the cache
 
-Sometimes the cache can get corrupted which might manifest as `can't load .so/.DLL`. You can wipe the nightly cache and rebuild everything by doing `rm -rf /opt/stackage-build/stackage/automated/nightly`.
+Sometimes the cache can get corrupted which might manifest as `can't load .so/.DLL`.
+You can wipe the nightly cache and rebuild everything by doing
+`rm -rf /var/stackage/stackage/automated/nightly`.
+Replace nightly with `lts7` to wipe the LTS 7 cache.
+
+## Local curator setup
+
+We don't run the full stackage build locally as that might take too
+much time. Some steps on the other hand are much faster to do
+yourself.
+
+It's useful to be able to modify constraints locally before pushing to
+the repository. To do this first install stackage-curator:
+`git clone git@github.com:fpco/stackage-curator.git && cd stackage-curator && stack install`
+or get the linux binary: https://s3.amazonaws.com/stackage-travis/stackage-curator/stackage-curator.bz2
+(it's a good idea to upgrade stackage-curator at least at the start of your week as curator).
+Then clone the stackage repo `git clone git@github.com:fpco/stackage.git`.
+Inside it run `stack update && stackage-curator check` to get new packages and do dependency resolution.
+
+This can be used to make sure all version bounds are in place
+(including for test suites and benchmarks), to check whether bounds
+can be lifted, and to get `tell-me-when-its-released` notifications.
+
+Notably this doesn't build anything, so you won't see any compilation
+errors for builds/tests/benchmarks.