Overview

Welcome to the third and final post in the Reproducible Programming for Biologists Who Code series.

The items in this post are things that are nice to do. Where the Must Dos and Should Dos posts had suggestions for making your code more likely to be reproducible, some of the ideas in this post make your work easier to reproduce.

If this is the first post you’ve seen from the series, I recommend that you read the previous posts before this one. I put a good amount of thought into sorting the advice into three tiers of importance, so I believe your return on time invested will be greater with advice from the first two posts.

The first half of this post (Testing Code to Code Review) discusses practices that improve code quality. The second half (Continuous Integration to Archiving Code) is about useful tools.

Table of Contents:

Testing Code

How do you know that your code works? Is it that it runs without throwing any errors? Is it that the output is in the correct shape? Or that the genes that are output at the end seem biologically plausible?

One way to help ensure your code works is to test it. You probably already test your code in an ad hoc manner by running it as you write it. Instead of doing this testing just once, you can write down the test in a function called a test case and run it after each code change. By taking a more structured approach to testing your code, you can ensure to both yourself and future users that everything works correctly. As an added bonus you’ll find that writing test cases encourages you to structure your code into functions.

There are a number of types of tests that are used in software development, but the two main types used in science are regression tests and unit tests. Unit tests are test cases used to ensure each function you write behaves like you’d expect it to. Likewise, when you find a bug in your code and fix it, you write a regression test to ensure that the bug doesn’t come back. These tests do the same thing, but are written for different reasons. If you are doing further reading about how to write test cases, it will help to be aware of both terms.

To be less abstract, imagine you were writing a function to find a sequence’s reverse complement. Once you finished the function, you would write unit tests that make sure the function behaved correctly. For example, you could assert that the all of the resulting characters were still DNA characters, and that the reverse complement function got the correct result on a known sequence. When using the function later on, you might find that the resulting sequence was missing its last character. You would then write a regression test that ensures the original and reverse complement sequence have the same length to keep the bug from coming back in the future.

There are a number of libraries that help organize and write these test cases, but the ones I’ve selected are pytest for Python and testthat for R.

pytest

Pytest is similar to the builtin unittest package, but is simpler to set up and work with. After installation, you can run py.test <test_dir> and pytest will find all the files with _test or test_ in their name and look through the files for test cases. Test cases in pytest are functions marked by having test in their names. Pytest will then run all the test cases and report which tests passed and which failed.

Some example pytest code is below:

# test_utils.py

import utils  # A file with functions to be tested

def test_square():
    assert utils.square(0) == 0
    assert utils.square(3) == 9
    assert utils.square(-1) == 1

testthat

In terms of finding test functions, testthat works similarly to pytest. It also creates the test/testthat directory for you when you run usethis::use_testthat(). The logic of how the test cases are structured is different than in pytest though, so be sure to read the documentation if you’re coming from a Python background.

Code for testthat looks like this:

# test-square.R

library(utils)

test_that("Square function works correctly", {
    expect_equal(square(0), 0)
    expect_equal(square(3), 9)
    expect_equal(square(-1), 1)
})

Further Reading:

  • This article discusses the impact of bugs in science.
  • For a tutorial on pytest, read this post.
  • The documentation for testthat is somewhat vague on how testthat should actually be used, but it links to a textbook chapter that is more descriptive.

Even More Readable Code

It’s not a mistake that code readability has shown up in multiple posts. While the concept itself is simple, it’s a deep subject with a lot of complexity to it. Just like in prose, writing code that people can understand is both challenging and important.

Code Cleanup

Before publishing your code it’s good to give it a inspection to make sure everything is in order. I personally end up with a lot of commented out code and debugging print statements throughout the course of writing a program. While these were helpful to me once, they aren’t helpful to others, and the code looks much nicer when they’re removed. Likewise unused module imports and functions can and should be removed.

Linting

I wish I had a program that would correct me when I made fashion mistakes. A computer dispassionately telling me that I shouldn’t wear blue stripes with blue plaid would save me a lot of brainpower when putting together outfits. To my knowledge a program for correcting personal style doesn’t exist, but programs for fixing programming style do.

These programs are called linters, and they keep you from having to keep track of tomes worth of style information. Using a linter both makes your code easier for you and others to read, and saves the energy of having to memorize the style standards of the language you’re using.

There are a number of linters in Python, but I would recommend using either Pylint or Black. Pylint is a straightforward linter that lets you know if your code breaks the PEP8 style standard, but otherwise leaves formatting up to you. Black is different from other linters in that it modifies the code to match its idea of style. The choice between the two is whether you would rather have more flexibility with Pylint or never have to think about anything style related with Black.

In R your choices are between the built-in RStudio static analysis tools and lintr. The choice between them is whether you want the added options present in lintr or the ease of having the tool built into RStudio.

Further Reading:

Python Typing

There is a library called mypy that allows you to predict bugs by marking the data types in your Python code. “But wait,” I’m sure you’re asking, “if I wanted to write statically typed code I’d use Rust instead of Python.” It’s a fair point. But what if I told you that the code is only a little bit statically typed?

Having to specify the type of every variable in your code is kind of a pain, so I don’t do it. Instead, I just use type hints to keep track of the types of function arguments. Since we’re already keeping track of the types of our function arguments, we can use mypy to get type-based error checking for free by tracking types in our function arguments instead of our comments. The code below shows an example of what that looks like.

# Types in the documentation

def get_gene_count(gene_file):
    '''Count the number of genes in a file produced
    by `microarray_rnaseq_gene_intersection.py`

    Arguments
    ---------
    gene_file: str
        The path to a file containing the list of genes created by
        microarray_rnaseq_gene_intersection.py

    Returns
    -------
    num_genes: int
        The number of genes present in the gene file
    '''
# Types in the function arguments

def get_gene_count(gene_file: str) -> int:
    '''Count the number of genes in a file produced
    by `microarray_rnaseq_gene_intersection.py`

    Arguments
    ---------
    gene_file: The path to a file containing the list of genes created by
               microarray_rnaseq_gene_intersection.py

    Returns
    -------
    num_genes: The number of genes present in the gene file
    '''

Some work has been done to make a mypy equivalent in R, but they haven’t reached the same standard as mypy so far.

Further Reading:

Code Review

Linus’s Law says “given enough eyeballs, all bugs are shallow”. Unfortunately, the median number of eyeballs applied to research code is two. The standard way to get another person’s opinion on your code is a process called code review. A typical workflow involves creating a pull request with the new analysis/feature you’ve added, asking someone to review your code, editing your code based on their comments, and merging your code into your project’s main branch. Reviewing other lab members’ code and having them review yours is a good way to stay up to date with other work going on in your lab and ensure everyone’s code quality is good.

Further Reading:

Continuous Integration

At this point you’re probably thinking “That sure is a lot of programs to keep track of running when I write code.” Luckily, there is a type of software called continuous integration service that is designed to run your tests and linters automatically. Even better, it syncs with your version control platform.

The idea behind continuous integration (CI) is that there is a set of tasks that you want to run each time you add code to your repository. CI services will do these tasks on a different computer for you and even give you a shiny badge to put in your README so everyone knows your code works.

You have tons of options when it comes to CI services, including Travis CI, CircleCI, TeamCity, and Github Actions. They all do more or less the same thing, so I’d recommend picking whichever seems the easiest to set up for you. That being said, Github Actions still feels like it’s a beta release. If you can find prebuilt actions that do what you want, and you’re already using Github as your version control platform, then go ahead. Just be aware that it can be a pain to work with conda in Actions.

Mostly CI is used for basic functions such as testing code, linting it, and publishing it to pypi. It can also be used more creatively to do things like continuously rebuild a manuscript as you write it. (Full disclosure, I’m in GreeneLab, as are some of the authors on the paper.)

Further Reading:

Using Containers

Code always works on the computer of the person who writes it. Wouldn’t it be nice if you could just give people your computer so they could run your code? The idea behind containers is exactly that. Containers keep you from having to spend the time and effort making your code work on every computer when everyone can use the same virtual machine.

Docker is the most popular containerization software 1. While a full tutorial of how to use docker is beyond the scope of the post, I think this tutorial is a pretty good starting point for Python and this tutorial is a good one for R. If you’re using the Python tutorial, I encourage you to read the section on sharing your image from the R tutorial, as the Python tutorial lacks it.

If your project is entirely in R or Jupyter notebooks, learning Docker might be unnecessary. Binder is a program that takes your repository, reads the requirements.txt/environment.yml file, and creates a hosted docker image for you. As long as your code uses less than 2GB of RAM and runs for around an hour or less Binder will allow people to run your code from a public URL. This is great for small analyses, visualizations, and generally allowing other people use your code frictionlessly. If you need more power for your analysis, you’ll have to bring your own compute.

Further Reading:

Creating a Documentation Site

It’s nice to have a website to point people to for information about your project and how to use it. Unfortunately, if you’re like me you may have made a promise to yourself to not get back into web development. Thankfully there are tools that keep you from having to write website code if you have been consistently documenting your project.

In Python, the tool you use depends on the style of your docstrings. If you’ve been using reStructuredText, then you can use a package called Sphinx. If you’ve been using NumPy or Google style docstrings, you’ll use a Sphinx extension called Napoleon instead. These methods are used the same way, though you’ll have to add Napolean to a config file to run it. You can also integrate Sphinx with CI to build the site automatically whenever you add new code to your project.

If you’re using roxygen or a similar tool to document your R code, then running pkgdown will generate a website automatically from your function documentation and README. You can then call pkgdown with your CI software to host the website as a github page. After just a few function calls you’ll have a nice website that looks like the pkgdown website itself.

Further reading:

Linux Package Management

Linux packages tend to be more stable and backwards compatible than Python and R packages. That being said, it can still be helpful to keep track of the versions of which Linux packages you used. Among other things, Guix acts as a package manager for a Linux environment. If you program in Linux, and have a lot of C library/Linux package dependencies, you might want to try it out.

Archiving Code

One important aspect of science is the ability to access and reference past work. Hosting code on the internet is a great way to make code easy to access. That being said, pieces of the internet rapidly evolve and disappear, making it difficult to ensure that past work is available.

One way to address this issue is to add your code to an archive like Zenodo or Software Heritage. Doing so only takes a few minutes, and in the case of Zenodo also gets you a DOI so people can cite your code.

Further Reading:

Conclusion

Thank you so much for reading this series of posts, I hope you enjoyed them! Going forward I don’t intend for this blog to be solely about reproducible research. I’ll post about a few one-off analyses soon, but currently I intend for the focus of the blog to be on the writing down the unwritten rules of how to develop good research code. If you’re interested, follow me on twitter, or do whatever it is that people do with RSS feeds using the wifi symbol at the bottom of the page.

Further Reading

Acknowledgements

Thanks to Wayne for introducing me to Binder, and to Sabrina Granger for telling me about Guix and Software Heritage. Thanks again to Rachel Ungar for proofreading. This post is funded in part by the Gordon and Betty Moore Foundation’s Data-driven Discovery initiative through grant GBMF4552

Footnotes

1: Kubernetes is also popular, but is overkill for most research applications.

BibTex Citation:

@misc{Heil_2020, title={Reproducible Programming for Biologists Who Code - Part 3: Nice To Dos}, url={https://autobencoder.com/2020-07-27-nicetodo/}, journal={AutoBenCoding}, author={Heil, Benjamin J.}, year={2020}, month={07}}