diff --git a/doc/posts/Announcement.anansi b/doc/posts/Announcement.anansi index dd5abfd..6b13430 100644 --- a/doc/posts/Announcement.anansi +++ b/doc/posts/Announcement.anansi @@ -10,13 +10,62 @@ an API description. This is much closer to the traditional use of `QuickCheck`. The most obvious use-case is checking that properties hold of an *entire* server rather than of -individual endpoints. +individual endpoints. (But there are other uses that you can skip to if they +sound more interesting.) ## `serverSatisfies` -There are a variety of best practices in writing web APIs that aren't always -obvious. As a running example, let's use a simple service that allows adding, -removing, and querying biological species. Our SQL schema is: +A useful guideline when writing and maintaing software is that, if there isn't +a test for a behaviour or property, sooner or later that property will be broken. +Another important perspective is that tests are a form of documentation - the +present developer telling future ones "this matters, and should be this way". + +The advantage of using tests for this form of documentation is that there's +simply too much information to convey, some of it only relevant to very specific +use cases, and rather than overload developers with an inexhaustible quantity of +details that would be hard to keep track of or remember, tests are a mechanism +of reminding developers of *only the relevant information, at the right time*. +<>. + +We might hope that we could use tests to communicate the wide array of best +practices that have developed around APIs. About to return a top-level integer +in JSON? A test should say that's bad practice. About to not catch exceptions +and give a more meaningful HTTP status code? Another test there to stop you. + +Traditionally, in web services these things get done at the level of *individual* +endpoints. But this means that if a developer who hasn't had extensive experience with web +programming best practices writes a *new* endpoint which *does* return a top-level +integer literal, there's no test there to stop her. Code review might help, but +code review is much more error prone than tests, and really only meant for those +things that are too subtle to automate. (Indeed, if code review were such a reliable +defense mechanism against bugs and bad code, why have tests and linters at all?) + +The problem, then, with thinking about tests as only existing at the level of individual +endpoints is that there are no tests *for* tests - tests that check that new +behaviour and tests conforms to higher-level, more general best practices. + +`servant-quickcheck` aims to solve that. It allows describing properties that +*all* endpoints myst satisfy. If a new endpoint comes along, it too will be +tested for that property, without any further work. + +Why isn't this idea already popular? Well, most web frameworks don't have a +reified description of APIs. When you don't know what the endpoints of an +application are, and what request body they expect, trying to generate arbitrary +requests is almost entirely going to result in 404s (not found) and 400s (bad +request). Maybe one in a thousand requests will actually test a handler. Not +very useful. + +`servant` applications, on the other hand, have a machine-readable API description +already available. And they already associate "correct" requests with particular +types. It's a small step, therefore, to generate 'arbitrary' values for these +requests, and all of them will go through to your handlers. (Note: all of the +uses of `servant-quickcheck` work with applications *not* written with servant-server - +and indeed not *in Haskell - but the API must be described with the servant +DSL.) + +Let's see how this works in practice. As a running example, let's use a simple +service that allows adding, removing, and querying biological species. Our SQL +schema is: :d schema.sql @@ -150,6 +199,60 @@ instance Arbitrary Species where arbitrary = Species <$> arbitrary <*> arbitrary : +But this fails in quite a few ways. + +### Why best practices are good + +As a side note: you might have wondered "why bother with API best practices?". +It is, it would be said, a lot of extra (as in not only getting the feature done) +work to do, for dubious benefit. And indeed, the relevance of discoverability, for +example, unclear, since not that many tools use it. + +But `servant-quickcheck` both makes it *easier* to conform to best practices, +and exemplifies their advantage. If we pick 201 (Success, the 'resource' was +created), rather than the more generic 200 (Success), `servant-quickcheck` knows +this means there should be some representation of the rec°as a response + + +## `serversEqual` + +There's another very appealing application of the ability to generate "sensible" +arbitrary requests. It's testing that two applications are equal. Generate arbitrary +requests, send them to both servers (in the same order), and check that the responses +are equivalent. (This was, in fact, one of the first applications of +`servant-client`, albeit in a much more manual way, when we rewrote a microservice +originally in Python in Haskell.) Generally with rewrites, even if there's some +behaviour that isn't optimal, if a lot of things already depend on that service, +it makes sense to first mimick *exactly* the original behaviour, and only then +aim for improvements. + +`servant-quickcheck` provides a single function, `serversEqual`, that attempts +to verify the equivalence of servers. Since some aspects of responses might not +be relevant (for example, whether the the `Server` header is the same, or whether +two JSON responses have the same formatting), it allows you to provide a custom +equivalence function. Other than that, you need only provide an API type and two +URLs for testing, and the rest `serversEqual` handles. + +## Future directions: benchmarking + +What else could benefit from tooling that can automatically generate sensible +(*vis-a-vis* a particular application's expectations) requests? + +One area is extensive automatic benchmarking. Currently we use tools such as +`ab`, `wrk`, `httperf` in a very manual way - we pick a particular request that +we are interested in, and write a request that gets made thousands of times. +But now we can have a multiplicity of requests to benchmark with! This allows +*finding* slow endpoints, as well as (I would imagine, though I haven't actually +tried this yet) synchronization issues that make threads wait for too long (such +as waiting on an MVar that's not really needed), bad asymptotics with respect +to some other type of request. + +(On this last point, imagine not having an index in a database for "people", + and having a tool that discovers that the latency on a search by first name + grows linearly with the number of POST requests to a *different* endpoint! We'd + need to do some to do this well, possibly involving some machine learning, but + it's an interesting and probably useful idea.) + **Note**: This post is an anansi literate file that generates multiple source files. They are: