Project Structure
RStudio Projects
Analytical projects should be self-contained and portable. This means that all the materials required for an analysis should be organised into a single folder that can be shared in its entirety and be re-run by other people, ideally via GitHub.
We recommend RStudio Projects as a system for creating standardised project structures that meet these goals. The {usethis} package contains a number of helper functions to help get you started, including usethis::create_project()
.
Dependency management
One of the most common issues you’ll face when using a project someone else has created, or you created previously, is maintaining the required packages to run the project. Knowing what packages are needed to run a particular project isn’t always obvious, and over time packages can change, rendering code that once worked unusable.
The {renv}
R package helps solve this problem by:
- Keeping track of the packages that are required for a particular project.
- Logging the installed version of all of the packages.
- Maintaining a per-project library of packages, so projects don’t interfere with one another.
Workflow management
It’s helpful to split discrete analytical tasks into separate script files, which can make it easier to handle the codebase in context and provide an obvious order of operations. For example, 01_read.R
, 02_wrangle.R
, 03_model.R
, etc.
You could still forget to re-run one of the numbered files, however, or it may take a long time to re-run all the steps again if you only make one small change to the code. This is where a workflow manager is useful.
We recommend the {targets} R package as a workflow manager. You write a series of steps and {targets} automatically recognises all the relationships between functions and objects as a graph. This means {targets} knows the order that things should be run and knows which bits of code need to be re-run if there are upstream changes. It’s a well-documented and supported package.
Functions
It’s beneficial to convert code into discrete functions where possible. This makes it easier to:
- reduce the chance of errors, because you’ll avoid repetitive and mistake-prone copy-pasting of code
- understand your scripts, because code can be condensed into a simpler calls that are easier to read
- reuse your code, because functions allow you to consistently call the same code more than once and can be copied into other projects
- debug, because the source of an error can be more easily traced and your code can be tested more easily
Consider the DRY (Don’t Repeat Yourself) principle when deciding whether or not to convert some code into a function. It may be better to write a function if you’ve used the same piece of code more than once in an analysis, especially if it contains many lines.
Function names should be short but descriptive and should contain a verb that describes what the function does. For example, get_geospatial_data()
may be better than the generic get_data()
, which is certainly better than the uninformative data()
.
In a project, it’s conventional to put your functions in a folder called R
in the project’s root directory. You can group functions into separate R scripts with meaningful names to make it easier to organise them (read-data.R
, model.R
, etc). You can then source()
these function scripts into your analytical scripts as required.
Packages
It may be beneficial to gather your functions into a discrete package so that you and others can install and reuse them for other projects.
The {usethis} package has a number of shortcuts to help you set up a package. You can begin with usethis::create_package()
to generate the basic structure and then usethis::use_r
and usethis::use_test()
to add scripts and {testthat} tests into the correct folder structure.
We recommend you include a number of extra files in your package to make its purpose clear and to encourage collaboration. This includes:
- a README file to describe the purpose of your package and provide some simple examples, which you can set up with
usethis::use_readme_md()
orusethis::use_readme_rmd()
if it contains R code that you want to execute - a NEWS file with
usethis::use_news_md()
, which is used to communicate the latest changes to your package - a CODE_OF_CONDUCT file with
usethis::use_code_of_conduct
to explain to collaborators how they should engage with your project - vignettes with
usethis::use_vignette()
, which are short documents that let you mix code with prose to describe how to use the functions in your package
We recommend semantic versioning as you develop your package. In this system, the version number is composed of three digits (like ‘1.2.3’) that are each incremented as you make major breaking changes, minor changes and patches or bug fixes. The usethis::use_version()
function can help you to do this and to automatically update the DESCRIPTION and NEWS file.
Use {pkgdown} to autogenerate a website from your package’s documentation. This lets people see your documentation rendered nicely on the internet, without the need to install the package. You can serve this site on the web and update it automatically using GitHub Pages and GitHub Actions.