14 November 2025
Arrange: Set up the data, inputs, and environment required for the test
Act: Execute the function or code being tested
Assert: Check that the outcome matches expectations
I start by writing all my tests with these three comments, and fill in as I go along
This is assuming that in both cases your code is arranged as a package. E.g.
R/ folder and DESCRIPTION/NAMESPACE files
pyproject.toml and a src/ folder.
IDEs (e.g. RStudio, Positron, VSCode) will have a way to run tests for you.
(to be easier to test)
Consider this example
There is a lot going on here - testing that each of these parts are working correctly, along with potential edge cases will be tricky!
We can use much simpler dataframe than might be expected in the real use of these functions.
But, by using simpler dataframes we can ensure that the only changes are the ones we are expecting.
In this case, a new column value is added.
We can use much simpler dataframe than might be expected in the real use of these functions.
But, by using simpler dataframes we can ensure that the only changes are the ones we are expecting.
In this case, we are expecting less rows of data, but the same structure of columns.
mutate_data and filter_data was easyget_data which needs access to the database?plot_data, how can we test a plot?my_function, which calls all of the other functions?In a unit test, mock objects can simulate the behavior of complex, real objects and are therefore useful when a real object is impractical or impossible to incorporate into a unit test. [7]
Note: we create a “Mock” object for each of the functions we want to mock.
These Mock’s will simply return the values we pass in.
When the function is called, the mock will capture the call and value of arguments it was called with.
Note: we can now validate that our functions (mocks) have been called the correct amount of times, and that they have been called with the correct arguments.
Note: the difference between mocking functions which are imported vs functions in modules which are imported.
This requires the pytest-mock plugin to be installed (via pip).
Note: the difference between mocking functions which are imported vs functions in modules which are imported.
This requires the pytest-mock plugin to be installed (via pip).
my_functionmy_functiontest_that("it calls other functions correctly", {
# arrange
m_get_data <- Mock("get_data")
m_mutate_data <- Mock("mutate_data")
m_filter_data <- Mock("filter_data")
m_plot_data <- Mock("plot_data")
local_mocked_bindings(
get_data = m_get_data,
mutate_data = m_mutate_data,
filter_data = m_filter_data,
plot_data = m_plot_data
)
# ...
})my_functiontest_that("it calls other functions correctly", {
# ...
# act
actual <- my_function()
# assert
expect_equal(actual, "plot_data")
expect_called(m_get_data, 1)
expect_args(m_get_data, 1)
expect_args(m_mutate_data, 1, "get_data")
expect_args(m_filter_data, 1, "mutate_data")
expect_args(m_plot_data, 1, "filter_data")
})::::
:::
my_functiontest_that("fn calls all the other functions", {
# arrange
df <- data.frame(x = c(0, 1, 2), y = c(3, 4, 5))
expected_df <- data.frame(x = c(1, 2), y = c(4, 5), value = c(0.25, 0.4))
m_get_data <- Mock(df)
m_plot_data <- Mock("plot_data")
local_mocked_bindings(get_data = m_get_data, plot_data = m_plot_data)
# act
actual <- my_function()
# assert
expect_equal(actual, "plot_data")
expect_args(m_plot_data, 1, expected_df)
})The first time we run this, it will create a snapshot of the plot. This will be a file saved to disk.
Next time we run the test, it will compare the before/after to see if the output of the function has changed.
If the snapshot ever changes, you can run snapshot_accept() to use the new snapshot.
Similar to R, but you need to install the pytest-snapshot plugin first.
Then, you need to run pytest --snapshot-update to generate the initial snapshot, and run that same command any time you want to update the snapshot.
view slides at the-strategy-unit.github.io/data_science/presentations