Godkjenn: approval testing for Python 3
- godkjenn
/go:kjen/ Approve (Norwegian)
Approval testing for Python 3.
Introduction
Godkjenn is a test for performing approval testing in Python. Approval testing is a simple yet powerful technique for testing systems that may not be amenable to more traditional kinds of tests or for which more traditional tests might not be particularly valuable. Approval testing can also be extremely useful for adding tests to legacy systems where simply capturing the behavior of the system as a whole is the first step in adding more sophisticated tests.
Principle
The principle of approval testing simple. Given a function or program that you consider correct, you store its output for a given input. This is the accepted or golden version of its output. Then, as you change your code, you reproduce the output (we call this received output) and compare it to the accepted version. If they match, then the test passes. Otherwise, it fails.
A test failure can mean one of two things. First, it could mean that you actually broke you program and need to fix it so that the received output matches the accepted. Second, it could mean the received output is now correct, the accepted output is now out of date, and you need to update the accepted output with the received.
As an approval testing tool, godkjenn
aims to streamline and simplify this kind of testing.
Core elements
There are a few core elements to godkjenn
. These are the parts of the approval testing system that are independent
of any particular testing framework. Generally speaking, you won’t need to work with these directly; the integration
with your testing framework with hide most of the low-level details.
Vaults
Vaults are where the accepted outputs are stored. (The term vault is a bit of a play on words: the accepted output is “golden”, and you keep gold in vaults.)
The vault abstraction defines an API for storing and retrieving accepted (and received) output.
godkjenn
provides a simple vault, FSVault
, that stores its data on the filesystem. Other vaults can be provided
via a plugin system.
Verification
The core verification algorithm compares new received data with the accepted data for a given test. If there’s a mismatch or if not accepted output exists, this triggers test failure and instructs the user on what to do next.
Diffing
When an approval test fails, godkjenn
provides tools for viewing the differences between the accepted and received
data. godkjenn
comes with some very basic fallback diffing support, and it provides a way to run external diffing
tools. You can even have it use different tools for different types of files.
Godkjenn tutorial
This will take you through the process of installing, setting up, and using godkjenn for approval testing in a Python project.
Note
This tutorial will use godkjenn’s pytest integration. Godkjenn does not mandate the use of pytest, though currently it’s the only testing framework for which godkjenn provides an integration. Integrating with other frameworks is straightforward and encouraged!
Installing godkjenn
First you need to install godkjenn. The simplest and most common way to do this is with pip:
pip install godkjenn
For pytest integration you’ll also want to install the necessary plugin:
pip install "godkjenn[pytest-plugin]"
A first test
Now let’s create our first test that uses godkjenn. Create a directory to contain the code
for the rest of this tutorial. We’ll refer to it as TEST_DIR
or $TEST_DIR
.
Once you have that directory, create the file TEST_DIR/pytest.ini
. This can be empty; it just exists
to tell pytest where the “top” of your tests is.
Next create the file TEST_DIR/test_tutorial.py
with these contents:
1def test_demo(godkjenn):
2 test_data = b'12345'
3 godkjenn.verify(test_data, mime_type='application/octet-stream')
This will be mostly familiar if you’ve used pytest: it’s just a single test function with a fixture.
On line 1 we define the test function. The godkjenn
parameter tells pytest that we want to use the
godkjenn
fixture. This fixture gives us an object that we use for verifying our test data.
On line 2 we simply invent some test data. Notice that it’s a bytes
object. Godkjenn ultimately requires all of its
data to be bytes
, so for this tutorial we’ve just created some simple data. In practice, this data would be the
output from some function that you want to test.
Finally on line 3 we call godkjenn.verify()
, passing in our test_data
. This call will take the data we pass in
and compare it to the currently-accepted “golden” output. If the data we pass in does not match the accepted output, the
test is flagged as a failure. Similarly - as will happen for us - we’ll get a failure if there is no existing accepted
output.
Running the test
Now we can run the tests with pytest. For now just run the test from TEST_DIR
:
cd $TEST_DIR
pytest .
You should see some output like this:
$ pytest .
========================================================================================================= test session starts =========================================================================================================
platform darwin -- Python 3.8.0, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /Users/abingham/repos/sixty-north/godkjenn/docs/tutorial, configfile: pytest.ini
plugins: hypothesis-6.4.0, godkjenn-2.0.1
collected 1 item
test_tutorial.py F [100%]
============================================================================================================== FAILURES ===============================================================================================================
______________________________________________________________________________________________________________ test_demo ______________________________________________________________________________________________________________
There is no accepted data
If you wish to accept the received result, run:
godkjenn -C . accept "test_tutorial.py::test_demo"
======================================================================================================= short test summary info =======================================================================================================
FAILED test_tutorial.py::test_demo
========================================================================================================== 1 failed in 0.07s ==========================================================================================================
We see that - as expected - our test failed. The report tells us that “There is no accepted data”. This means that this is the first time we’ve run the test and haven’t accepted any output for the test. We’re also given instructions on how accept the output if we believe it to be correct.
This idea of of “accepting” output is central to the notion of approval testing. At some point we have to decide that our code is correct and that the output it produces is indicative of that proper functioning. For now let’s assume that our data is correct.
Status
Before accepting the data, let’s use the godkjenn status
command to see the state of our approval tests:
$ godkjenn status
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Test ID ┃ Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ test_tutorial.py::test_demo │ initialized │
└─────────────────────────────┴─────────────┘
This tells us that we have one approval test in our system and that its state is “initialized”. This means that it has some received data (i.e. the data from the test we just ran) but no accepted data.
Accepting the data
Since we believe that the data we passed to godkjenn.verify()
represents the correct output from our program, we want to accept it. We can
use the command provided to us in the test output:
$ godkjenn accept "test_tutorial.py::test_demo"
Now if we run status
we see don’t get any output:
$ godkjenn status
This is because all of our tests are “up-to-date”, i.e. all of that have accepted data and not received data. In order to see the status of all test-ids, including those that are up-to-date, you can use the “-a/–show-all” options of the ‘status’ command:
$ godkjenn status --show-all
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Test ID ┃ Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ test_tutorial.py::test_demo │ up-to-date │
└─────────────────────────────┴────────────┘
And that’s it! You’ve created your first godkjenn approval test and accepted its output.
Accepting new data
Over time, of course, your code may change such that its correct output no longer matches your accepted output. When
this happens your test will fail and you’ll have to accept the new data. To see this, let’s change our test_tutorial.py
to look like this:
1def test_demo(godkjenn):
2 test_data = b'1234567890'
3 godkjenn.verify(test_data, mime_type='application/octet-stream')
You can see on line 2 that test_data
now has more digits. When we run our test we get a failure because of this change:
$ pytest test_tutorial.py
=================================================================================================================================== test session starts ===================================================================================================================================
platform darwin -- Python 3.8.0, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /Users/abingham/repos/sixty-north/godkjenn/docs/tutorial, configfile: pytest.ini
plugins: hypothesis-6.4.0, godkjenn-2.0.1
collected 1 item
test_tutorial.py F [100%]
======================================================================================================================================== FAILURES =========================================================================================================================================
________________________________________________________________________________________________________________________________________ test_demo ________________________________________________________________________________________________________________________________________
Received data does not match accepted
If you wish to accept the received result, run:
godkjenn -C . accept "test_tutorial.py::test_demo"
================================================================================================================================= short test summary info =================================================================================================================================
FAILED test_tutorial.py::test_demo
==================================================================================================================================== 1 failed in 0.05s ====================================================================================================================================
You can see the failure was because “Received data does not match accepted”. That is, the data we’re passing to godkjenn.verify()
doesn’t
match the accepted data.
If we run godkjenn status
again, we see a new status for our test:
$ godkjenn status
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Test ID ┃ Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ test_tutorial.py::test_demo │ mismatch │
└─────────────────────────────┴──────────┘
The status “mismatch” means that the received and accepted data are different.
Seeing the difference
Our job now is to decide if the received data is correct and should become the accepted. To make this decision it can be very helpful to see the accepted data, the received data, and the differenced between them.
To see the accepted data we can use the godkjenn accepted
command:
$ godkjenn accepted "test_tutorial.py::test_demo" -
12345%
Similarly, we can see the received data using the godkjenn received
command:
$ godkjenn received "test_tutorial.py::test_demo" -
1234567890%
In this case it’s pretty easy to see the difference. In other cases it might be more difficult. To help with this godkjenn also
lets you view the difference between the files with the godkjenn diff
command. By default godkjenn diff
uses a very basic diff
display:
$ godkjenn diff "test_tutorial.py::test_demo"
WARNING:godkjenn.cli:No review tools configured. Fallback differs will be used.
---
+++
@@ -1 +1 @@
-12345
+1234567890
Again, in a simple case like this, this default diff output is enough to make it clear what the difference is. In more complex cases you might need more powerful tools, though, and we’ll look at how to use those soon.
Configuring an external diff tool
The built-in diff tool in godkjenn is sufficient for simple cases, but many people have other, more sophisticated diff tools that they would prefer to use with approval testing. Godkjenn allows you to specify these tools in your configuration.
For this tutorial we’re going to configure godkjenn to use Beyond Compare as its default diff tool. To do this you need to create the file “.godkjenn/config.toml”. Put these contents in that file:
[godkjenn.differs]
default_command = "bcomp {accepted.path} {received.path}"
This is telling godkjenn to run the command “bcomp” (the Beyond Compare executble) to display diffs. The first argument to “bcomp” will be the path to the current accepted data, and the second is the path to the received data. With this configuration, whenever godkjenn needs to display a diff (e.g in the “diff” and “review” commands), it will use “bcomp”.
If you don’t have Beyond Compare installed, you can replace “bcomp” with many other commands like “diff”, “vimdiff”, and “p4diff”.
Once you’ve made this change, you can run your godkjenn diff command again and see your configure diff tool being used.
Note
Godkjenn supports fairly sophisticated configuration of diff tools, allowing you to use different diff tools for different MIME types. See the configuration documentation for details.
Accepting the new data
We’ll assume that our new data is actually correct and accept it:
godkjenn accept "test_tutorial.py::test_demo"
Once we do that we see that our status is back to “up-to-date”:
$ godkjenn status --show-all
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Test ID ┃ Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ test_tutorial.py::test_demo │ up-to-date │
└─────────────────────────────┴────────────┘
Reviewing multiple tests
The godkjenn diff command lets you view the difference between the accepted and received data for a single test. In many cases, though, you have several - and in some cases a great many - tests for which you need to see the diff. The godkjenn review command lets you view all of the diffs for ‘mismatch’ tests in sequence.
In effect, the review
command calls diff
for each test that’s in the mismatch state, one after the other, using
the diffing tools that you’ve configured.
To see review
in action, let’s first add a new test. Here’s the new contents of “test_tutorial.py”:
def test_demo(godkjenn):
test_data = b"1234567890"
godkjenn.verify(test_data, mime_type="application/octet-stream")
def test_second_demo(godkjenn):
test_data = b"8675309"
godkjenn.verify(test_data, mime_type="application/octet-stream")
We’ll run the tests and accept the received data in order to lay a foundation for running review
:
$ pytest test_tutorial.py
==================================================== test session starts =====================================================
platform darwin -- Python 3.8.0, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /Users/abingham/repos/sixty-north/godkjenn/docs/tutorial/sandbox, configfile: pytest.ini
plugins: hypothesis-6.4.0, godkjenn-4.0.0
collected 2 items
test_tutorial.py FF [100%]
========================================================== FAILURES ==========================================================
_________________________________________________________ test_demo __________________________________________________________
Received data does not match accepted
If you wish to accept the received result, run:
godkjenn -C . accept "test_tutorial.py::test_demo"
______________________________________________________ test_second_demo ______________________________________________________
There is no accepted data
If you wish to accept the received result, run:
godkjenn -C . accept "test_tutorial.py::test_second_demo"
================================================== short test summary info ===================================================
FAILED test_tutorial.py::test_demo
FAILED test_tutorial.py::test_second_demo
===================================================== 2 failed in 0.03s ======================================================
$ godkjenn accept-all
Now we’ll modify the tests so that each produces different output:
def test_demo(godkjenn):
test_data = b"-- 1234567890 --"
godkjenn.verify(test_data, mime_type="application/octet-stream")
def test_second_demo(godkjenn):
test_data = b"-- 8675309 --"
godkjenn.verify(test_data, mime_type="application/octet-stream")
If you run the tests again, godkjenn status
shows that both tests are in the ‘mismatch’ state:
$ pytest test_tutorial.py
... elided ...
$ godkjenn status
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Test ID ┃ Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ test_tutorial.py::test_second_demo │ mismatch │
│ test_tutorial.py::test_demo │ mismatch │
└────────────────────────────────────┴──────────┘
Now if you run godkjenn review
, godkjenn
will run your configured diff tool twice, once for each test, letting
you view and potentially modify the accepted data.
Critically, if the received and accepted data are identical after godkjenn
runs your diff tool (i.e. if you edit
them through your diff tool), then godkjenn
will accept the data and mark them as ‘up-to-date’. This gives you a
conventient way to rapidly iterate through a set of received data, verifying each in turn and rapidly updating the
accepted data for each affected test.
Note
The godkjenn review
command can be very useful if you’ve a collection of received data for which you need to
manually verify each one. However, if you somehow know that all of the received data should be accepted, then it’s
even faster to use the godkjenn accept-all
command.
Configuring godkjenn
Godkjenn support some minor configuration through the file config.toml
in the
data directory. This file must contain a top-level ‘godkjenn’ key, for example:
[godkjenn]
Everything under this key is part of the godkjenn configuration.
Configurable options
differs.default_command
The godkjenn.differs.default_command
option specifies the default tool to use for displaying diffs. The value is a
template used with the str.format()
method that is passed Artifact
instances for the received and accepted data.
The call looks like this:
command_template = . . . value of differs.default_command . . .
command = command_template.format(accepted=accepted_artifact, received=received.artifact)
As you can see, the artifacts are passed using the ‘accepted’ and ‘received’ keyword arguments.
A common template value would extract the path
attribute from each artifacts and use those as arguments to a
diff/merge tool. For example, here’s a config that passes the artifact paths to vimdiff:
[godkjenn.differs]
default_command = "vimdiff {accepted.path} {received.path}"
This option is only used if there is not a MIME-type-specific tool defined in godkjenn.differs.mime_types
.
differs.mime_types
The godkjenn.differs.mime_types
option allows you to specify diff commands for specific MIME-types. It is a mapping
from MIME-types to command template (as described in the differs.default_command
option). When godkjenn needs to run
a diff tool for an artifact the MIME-type of the received data is looked up in the differs.mime_types
mapping, and
if it’s found then the value is used as the command template. If no match is found, then differs.default_command
is
used as the default.
For example, here’s how you could specify that the image-diff
tool should be used for PNGs, vimdiff
should be
used for plain text, and bcomp
for everything else:
[godkjenn.differs]
default_command = "bcomp {accepted.path} {received.path}"
[godkjenn.differs.mime_types]
"image/png" = "image-diff {accepted.path} {received.path}"
"text/plain" = "vimdiff {accepted.path} {received.path}"
Note
If there is no differs.mime_types
entry for an artifact, and if differs.default_command
is not set, godkjenn
fallsback to some fairly primitive built-in diffing tools. You’re almost always best off configuring at least the
differs.default_command
options.
Locating data
TL;DR
Tell godkjenn where to start looking for the data directory with the -C
option. Tell it the name of
the data directory with the -d
options. E.g. godkjenn -C tests -d .godkjenn_dir status
.
-C
defaults to “.” (i.e. the current directory), and -d
defaults to “.godkjenn”.
Overview
Godkjenn stores data for each call to verify()
in your tests. The data it stores includes any accepted or received
data for the test, as well as any metadata required for the other data (e.g. mime types, encoding, etc.). Godkjenn wraps all of
this up in the concept of a vault, and the vault stores this data on the filesystem.
When you run godkjenn, it needs to know where to find the vault data. It find this based on two pieces of data: a starting directory and the name of the data directory. By default the starting directory is whichever directory you run godkjenn from. Similarly, the default value for the data directory name is “.godkjenn”. Godkjenn starts by looking for the data directory name in the starting directory. If it doesn’t find it, it checks the parent of the starting directory. It continues looking up the ancestor directories until either a) it find a directory containing the data directory or b) it runs out of ancestors.
Assuming a data directory is found, godkjenn now has the information it needs to run.
An example
Suppose you had a directory structure like this:
my_project/
tests/
.godkjenn
Here the “.godkjenn” directory is the data directory that godkjenn needs to find.
The simplest mode for godkjenn is if you run it from the ‘tests’ directory:
cd my_project/tests
godkjenn status
Run this way, godkjenn will find the “.godkjenn” directory using its default settings. It will first start its search from the “tests” directory because that’s where we’re running godkjenn from. Since by default it looks for a directory called “.godkjenn”, it will find it immediately.
Specifying a start directory
Suppose though that you wanted to start godkjenn from another directory. If we wanted to start godkjenn from the
‘my_project’ directory we’d need to tell it where to start looking for “.godkjenn”. We can use the -C
command line
argument to do this. Here’s how it looks if we start godkjenn from the ‘my_project’ directory:
cd my_project
godkjenn -C tests status
In this case, instead of starting the search in the “my_project” directory as it would by default, godkjenn starts the search from the ‘tests’ directory.
Using a different data directory name
While unusual, it is technically possible to use a different name for the data directory. Suppose you instead had this directory structure:
my_project/
tests/
.godkjenn_data
You can tell godkjenn to look for a different name with the -d
command line option. For example, to run godkjenn from
the ‘my_project’ directory with this structure you’d use this:
cd my_project
godkjenn -C tests -d .godkjenn_data status
Again, you won’t normally need this, but it’s there if you do.
Creating the data directory
Probably the most common way to create the data directory is to let godkjenn do it for you automatically. This generally happens in one of two ways. First, when you run tests using the pytest integration, it will create a “.godkjenn” directory in the pytest root directory. Pytest uses a fairly involved algorithm to determine the location of the root directory.
Another way to create the data directory is with the pytest receive
command. If no existing data directory exists
(as determined by the algorithm described above), then this command will try to create a “.godkjenn” directory in your
current directory. If you use the -C
and/or -d
arguments to change where godkjenn starts its search, then
godkjenn receive
will create the directory there instead.
If you don’t want godkjenn receive
to create a directory, you can pass the “–no-init” argument.