Property-based testing is a technique where you make assertions about a system's output with respect to its input. For example, if the input to a system (a function, a server, etc) is two numbers, property-based testing could assert that the output of the system should be the sum of these numbers. This type of testing frees you from having to come up with input data. Instead, you define relationships between the system's input and output. Then the test runner verifies that the relationships hold.
Stateful property-based testing (SPBT) is another technique for when the tested system retains a state. This is the case, for example, when the system is a database or a queue or a file. If I write an entry to a database and then list all entries in the database, I would expect the entry I wrote to be part of the list. That is a stateful property of the database.
There are libraries available in several different languages for SPBT. In this article, I will use
quickcheck-state-machine. I like
quickcheck-state-machine for many reasons:
- It is written in Haskell, which means you get access to Haskell's type safety and fast performance.
- Its opinionated structure splits SPBT into component parts, which helped my learning process.
- It builds a state machine, which can be manipulated outside of the test.
prettyCommandsuses the state machine, for example, to make really nice logs after the test is run.
- Fine-grained control of generation and shrinking is possible. This allows you to do more targeted testing.
- Its use of the higher-kinded types (HKTs)
Concrete. It allows you to extract commands from a state machine using the
SymbolicHKT and then run it using the
- It can test the parallel execution pf commands to find bugs arising from race conditions.
This article shows how to use
quickcheck-state-machine to build a state machine and use it for SPBT. It uses version
quickcheck-state-machine. The system under test will be a FIFO queue of integers that uses the file system to store entries.
quickcheck-state-machine library is under active development, the API is subject to change. I will do my best to revise this article as the API changes.
The fundamental building blocks of a state machine built with
quickcheck-state-machine come in three types. One represents a model of the system, one represents the commands that can be issued to the system and one represents responses to the commands.
All three need to be polymorphic in accepting an HKT with the signature
(Type -> Type), which I'll call
r. This polymorphism will never be used directly. But it is used by
quickcheck-state-machine internally to inject two different HKTs:
Symbolic HKT is used by
quickcheck-state-machine when generating a series of commands from a state machine. In contrast, the
Concrete HKT is used when the state machine is executing. In models that can be created in pure contexts like the one below, this distinction is not useful. But when models use types that only exist in monadic contexts, the distinction is important.
data Model (r :: Type -> Type) = Model [Int] deriving (Show, Eq, Generic) deriving anyclass instance ToExpr (Model Concrete) data Command (r :: Type -> Type) = Push Int | Pop | AskLength deriving stock (Eq, Show, Generic1) deriving anyclass (Rank2.Functor, Rank2.Foldable, Rank2.Traversable, CommandNames) data Response (r :: Type -> Type) = Pushed | Popped (Maybe Int) | TellLength Int deriving stock (Show, Generic1) deriving anyclass (Rank2.Foldable)
Let's unpack what's going on here. The
Model is an array of integers that we'll use to simulate a FIFO queue. There are three
Commands - you can
Push an integer onto the queue,
Pop something off of the queue (either nothing or an integer), and
AskLength to the queue. The
Responses to these three commands are confirming that a value has been
Pushed, telling us the integer that has been
It isn't necessary to have a one-to-one correspondence between commands and responses. Haskell's pattern matching will allow us to define the function for any valid command/response pair.
Here is a FIFO queue for integers that reads and writes the queue to the file system. Each integer is separated by a colon:
-- push to the head of the queue pushToQueue :: String -> Int -> IO () pushToQueue fname x = do fe <- doesFileExist fname if (not fe) then do withFile fname WriteMode $ \handle -> hPutStr handle $ show x -- write the number else do txt <- withFile fname ReadMode $ \handle -> hGetLine handle let split = splitOn ":" txt -- append the number to the beginning of the string withFile fname WriteMode $ \handle -> hPutStr handle $ intercalate ":" (show x : split) -- pop from the back of the queue popFromQueue :: String -> IO (Maybe Int) popFromQueue fname = do fe <- doesFileExist fname if (not fe) then return $ Nothing else do txt <- withFile fname ReadMode $ \handle -> hGetLine handle let split = splitOn ":" txt if (length split == 1) then -- remove the file if queue is empty removeFile fname else -- remove the last element withFile fname WriteMode $ \handle -> hPutStr handle $ intercalate ":" $ init split return $ if null split then Nothing else Just (read (last split) :: Int) -- get the length of the queue lengthQueue :: String -> IO Int lengthQueue fname = do fe <- doesFileExist fname if (not fe) then return 0 else do txt <- withFile fname ReadMode $ \handle -> hGetLine handle let split = splitOn ":" txt return $ length split
The first step in creating my state machine is to initialize the model. The initializer function needs to be polymorphic as it will eventually accept the
Concrete HKTs depending on if you are in generation or execution mode. In this case, as I am using an array as the underlying model, the logical initializer is an empty array.
initModel :: Model r initModel = Model 
The next thing I need to do for my state machine is to create transitions. The transitions are used to both generate commands and execute the tests, so the function needs to remain polymorphic.
The transition function takes a model, a command, and a response. It then returns the underlying model after the command has been applied. You can think of the model as transitioning from one state to the next.
In the implementation below, I make my own FIFO queue with
AskLength will return the length of the model:
transition :: Model r -> Command r -> Response r -> Model r transition (Model m) (Push x) Pushed = Model (x : m) transition (Model m) Pop (Popped _) = Model (if null m then m else init m) transition m AskLength (TellLength _) = m
Preconditions are guards that apply to certain commands based on the current state.
Top represents the precondition always being satisfied.
Bot is the opposite, the precondition is never satisfied. The
Logic type contains various boolean operators that can be applied to the model and command. The outcome of the operator determines if the precondition is satisfied or not.
Because the pre-condition is only used when generating lists of programs, it doesn't need to use concrete values. So it doesn't need to be polymorphic and exists only for the
In this model, every command can be executed irrespective of the state. So I return
precondition :: Model Symbolic -> Command Symbolic -> Logic precondition _ _ = Top
Postconditions are where the correctness of the response is asserted. I like this API because it provides a one-stop-shop for all assertions. In other SPBT libraries, it is easy to litter assertions all over the place, which makes the code more difficult to read. In
quickcheck-state-machine, the only checks for correct behavior are in the postconditions.
Postconditions only are checked when the state machine is actually running. This means they only exist in the
Note that the model passed to the postcondition function is the one before the command executes. It is often useful to apply the transition to the model when evaluating the response, as I do below:
postcondition :: Model Concrete -> Command Concrete -> Response Concrete -> Logic -- after a push, assert the pushed element is at the head of the new model postcondition mod cmd@(Push x) resp = x .== head m' where Model m' = transition mod cmd resp -- after a pop, assert that the popped element is at the end of the old model postcondition (Model m) Pop (Popped x) = x .== if null m then Nothing else Just $ last m -- the length of the model and the length of the SUT should always be aligned postcondition (Model m) AskLength (TellLength x) = length m .== x
Invariants take a model and assert that the model is always in a certain state, irrespective of the command and response. Invariants also run after every step in the state machine, which makes them expensive to run. Because of this,
quickcheck-state-machine uses a
Maybe to allow for no invariants to be returned.
As there is no invariant behavior I want to see in this model, I can return
invariant = Nothing
The generator is one of the places that
quickcheck-state-machine really shines. You create generators using
QuickCheck combinators, so any existing
QuickCheck custom combinators can be repurposed for
Here, I use the
oneof combinator, which generates commands with a uniform distribution. Because I am in the command generation phase, the
Symbolic HKT is used:
generator :: Model Symbolic -> Maybe (Gen (Command Symbolic)) generator _ = Just $ oneof [(pure Pop), (Push <$> arbitrary), (pure AskLength)]
QuickCheck, the shrinker takes a value and returns an array of new values to test. Most
QuickCheck programs never use the shrinker directly, but here, I use it to specify what does and doesn't need to be shrunk. This allows the generation to move fast through values that have no logical relationship.
For example, below, I only apply the shrinker to numbers pushed onto the stack, as I want to test if the size of the numbers matters. In all other places, there is no shrinker used:
shrinker :: Model Symbolic -> Command Symbolic -> [Command Symbolic] shrinker _ (Push x) = [ Push x' | x' <- shrink x ] shrinker _ _ = 
Semantics take a command using the
Concrete HKT, which signifies that it is used only when the tests are actually executing, and returns the result of the execution in the monadic context (here
semantics :: String -> Command Concrete -> IO (Response Concrete) semantics fname (Push x) = do pushToQueue fname x return Pushed semantics fname Pop = do val <- popFromQueue fname return $ Popped val semantics fname AskLength = do val <- lengthQueue fname return $ TellLength val
The purpose of Mock is to generate dummy responses when the state machine is in command generation mode (thus the
Symbolic HKT). It is the foil to Semantics, which creates
Concrete responses from real commands during text execution mode. The content of the mock responses is thrown away, as all the library uses
mock for is to create a
Response used to effectuate a transition between states.
One nice thing about
mock is that, if you want to, you can create a full-fledged mock of your model, and this can be useful if you'd like to use SPBT to generate
(Comamnd Response) pairs that can be used to induce a spec of the model (ie to induce a JSON schema or an OpenAPI spec).
mock :: Model Symbolic -> Command Symbolic -> GenSym (Response Symbolic) mock _ (Push _) = pure Pushed mock _ Pop = pure $ Popped Nothing mock _ AskLength = pure $ TellLength 0
Cleanup, like the semantics, exists within the monad that the system is executing. It's called after each series of commands is executed.
Since I don't need any cleanup for our queue, I can write an empty function in the
IO monadic context:
cleanup :: model Concrete -> IO () cleanup _ = return ()
Now that I have all of the ingredients, I can build my state machine.
Because the system under test takes one argument, I also pass that argument to the state machine.
sm :: String -> StateMachine Model Command IO Response sm s = StateMachine initModel transition precondition postcondition invariant generator shrinker (semantics s) mock cleanup
Now for the fun part, let's run my tests!
First, I want each test to execute in its own FIFO queue, which means a different file for each queue. I chose the pcg unique random number generator to accomplish this. This guarantees that each number generated will be unique during the run of a program.
newRand :: IO Int newRand = do g <- create i <- uniform g return i
Then, I define the property.
forAllCommands uses a state machine in its first argument to generate the commands. This is a state machine that will only run for the
Symbolic HKT, not the
Concrete HKT, so it won't ever touch
IO. The next argument is a lower bound for the number of commands in a sequence. I use
Nothing for the lower bound, meaning no lower bound.
The last argument to
forAllCommands is a function that accepts a sequence of commands and returns a monadic property. Monadic properties are defined in
QuickCheck.Monadic and can be used whenever a property exists in a monadic context. The convenience method
monadicIO can be used to define properties in the
IO context, and
run lifts the result of the monadic execution to the
QuickCheck's monad transformer, and in our case we are transforming the
IO monad. To learn more about monadic transformations,
mtl is a great place to start.
So, the sequence below is:
- Create a new random number and lift it to the
- Create a file name using this number.
- Create a state machine using this filename.
quickcheck-state-machine, which already executes in the
PropertyMcontext, so there is no need to prefix it with
quickcheck-state-machine's pretty printer
prettyCommandson the histogram generated by run result.
state_machine_properties :: Property state_machine_properties = forAllCommands (sm "") Nothing $ \cmds -> monadicIO $ do id <- run newRand let fname = "queues/queue" <> (show id) <> ".txt" let sm' = sm fname (hist, _model, res) <- runCommands sm' cmds prettyCommands sm' hist (checkCommandNames cmds (res === Ok))
Lastly, I execute the test and create the
queues directory if it doesn't exist yet:
main :: IO () main = do createDirectoryIfMissing False "queues" quickCheck state_machine_properties
When I run
stack test from the command line, I see the following:
quickcheck-state-machine-tutorial> test (suite: quickcheck-state-machine-tutorial-test) +++ OK, passed 100 tests. Commands (264 in total): 33.7% Pop 33.7% Push 32.6% AskLength quickcheck-state-machine-tutorial> Test suite quickcheck-state-machine-tutorial-test passed Completed 2 action(s).
And voila! Our tests pass.
All of this is on the github repo 'meeshkan/quickcheck-state-machine-example'.
Stateful property based testing is a great way to find bugs in stateful systems. SPBT exists in several other frameworks as well:
- hypothesis in Python and Java
- proper in Erlang and Elixir
- FsCheck in F# and C#
I hope you find this technique useful! If you'd like your repos to benefit from automatic SPBT, I will take this opportunity to shamelessly plug the Meeshkan alpha on meeshkan.com.
Here are three follow up exercises you can do to expand your understanding of
quickcheck-state-machine and SPBT!
Let's create a bug in the queue!
In the implementation of the FIFO queue, instead of adding the number to the head via
show x : split, add it to the tail using
split ++ [show x]. See if it's caught.
Create a new bug where the queue stops accepting new values once there are 50 values.
You can do this in the
pushToQueue function. If you run the tests as-is, you won't find it. There are three ways you can find the bug:
- Increase the number of times QuickCheck runs
- Change the generator so that the frequency of push is greater than the frequency of pop. For inspiration, check out QuickCheck generator combinators and see if there is one that allows certain outcomes to happen with greater frequency than others.
- Change the lower bound in
Nothingto something a bit higher.
This may take a long time to run depending on your parameters because of the shrinker. The shrinker will try to find a specific range of values to produce the bug, but because the bug is not linked to the specific value, it won't be able to meaningfully shrink.
For example, here is an excerpt of console output that would happen if you are able to provoke the bug. Here, I see that the postcondition for
AskLength failed at the barrier of 50 results in the queue.
Model [+0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0] == AskLength ==> TellLength 50 [ 0 ] Model [0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0] PostconditionFailed "PredicateC (51 :/= 50)" /= Ok
Can you anticipate what the bug will be? Once you run into the bug, was your guess right? How does this bug show the difference between