Coffee and Coding

Making my analytical workflow more reproducible with {targets}

Jan 25, 2024

`{targets}` for analysts

Tom previously presented about {targets} at a coffee and coding last March and you can revisit his presentation and learn about the reasons why you should use the package to manage your pipeline and see a simple demonstration of how to use the package.
Matt has presented previously about {targets} and making your workflows (pipelines) reproducible.
So….. if you aren’t really even sure why your pipeline needs managing as an analyst or whether you actually have one (you do) then links to their presentations are at the end

Aims

In this presentation we aim to demonstrate the real-world use of {targets} in an analysis project, but first a brief explanation

Without {targets} we

Write a script
Execute script
Make changes
Go to step 2

With {targets} we will

learn how the various stages of our analysis fit together
save time by only running necessary stages as we cycle through the process
help future you and colleagues re-visiting the analysis - Matt says “its like a time-capsule”
make Reproducible Analytical Pipelines

Explain the live project

original project had 30+ metrics
multiple inter-related processing steps
each time a metric changed or a process was altered it impacted across the project
there was potential for mistakes, duplication, lots of wasted time
using targets provides a structure that handles these inter-relationships

How {targets} can help

gets you thinking about your analysis and its building blocks
targets forces you into a functions approach to workflow
entire pipeline is reproducible
visualise on one page
saves time
(maybe we need an advanced function writing session in another C&C?)

Demonstration in a live project

Let’s look at a real life example in a live project…

Visualising

Current project in {targets} and visualised with tar_visnetwork()

A directed graph made with the targets R package where each node is a function or object and arrows between them indicate their dependencies.

Legend for graph made with the targets R package

A directed graph made with the targets R package where each node is a function or object and arrows between them indicate their dependencies. A node has been selected, highlighting all its upstream nodes.

Code

it’s like a recipe of steps
it’s easier to read
you have built functions which you can transfer and reuse
it’s efficient, good practice
debugging is easier because if/when it fails you know exactly which target it has failed on
it creates intermediate cached objects you can fetch at any time

How can I start using it?

You could “retro-fit” it to your project, but … ideally you should start your project off using {targets}
There are at least three of us in SU who have used it in our projects.
We are offering to hand hold you to get started with your next project.
Matt, Tom, Jacqueline

Useful {targets} links

Tom’s previous coffee and coding presentation
Matt’s previous presentations
The {targets} documentation is detailed and easy to follow.
A demo repository demonstrated in last weeks NHSE C&C
Software Carpentry are developing a course here Pre-alpha targets course
Live project demonstrated in this presentation using {targets}