Reproducible Research: Write your Clinical Chemistry paper using R Markdown

Abstract
Background: This blog post is going to show you how to write a reproducible article in the field of clinical chemistry using R Markdown. The only thing that will change for journal to journal will be the reference fomating and perhaps section numbering. The source code itself will be provided so that you can use it as a template.

Methods: The paper will use R, R-Markdown, bookdown and pandoc. The references will be taken care of using BibTeX and reference formatting will be managed with Zotero csl files.

Results: The result will be a manuscript that anyone can reproduce.

Conclusions: R Markdown makes reproducible research through literate programming pretty easy.

1 Background

Last week at the MSACL conference Dr. Keith Baggerly from MD Anderson Cancer Centre’s Bioformatics and Computational Biology Group spoke about the importance of reproducible research using the Duke University ovarian cancer biomarker scandal as a backdrop. The talk…was…incredible and illustrated how easy it is to introduce catastrophic errors into your research papers through the use of GUI analytical tools. The now retracted article that Baggerly dismantled is here. I urge everyone in our field to watch similar talks from Keith discussing biomarker analysis in mass spectrometric proteomic data and microarray data. Shannon Haymond and I were then discussing how to make a submission in the field of clinical chemistry that is reproducible. While this article will not discuss the basics of R and R Markdown, it will serve as a guide for those who know a little about these and give you a working YAML header and get the citations and cross-references correct.

2 Overhead

2.1 YAML

RMarkdown articles require a YAML header to instruct R Markdown how to process your article. This is the YAML code that worked for me. I am sure there are other ways to do this:

Indenting and spacing really matter a lot in YAML so don’t mess with them. You can generate PDF, MS Word or HTML output as needed by uncommenting and commenting out the output type as appropriate in the YAML above.

2.2 CSL Files

CSL files take care of citation formatting for you. Depending on what journal you are making submission to, you will need a different .csl file. They exist for every conceivable journal. In the world of non-reproducible reports, this process is taken care of by reference managers in GUI word processors but since GUI word processors do not produce reproducible research, we must break up with them.

From the YAML, you will see that you need a file called “clinical-chemistry.csl” which I downloaded from here. Put this file in the same folder as your R Markdown file. The Clinical Chemistry .csl depends on the .csl file of the American Association for Cancer Research but it will be downloaded for you on the fly. If you need a different reference format, search the CSL GitHub repository for the appropriate file.

2.3 BibTeX

Reference management in R Markdown is taken care of by BibTeX. You can see from the YAML that we need a bibliography text file called “bibliography.bib”. You can name it whatever you like but you will need to change the YAML accordingly. In any case, any citation you intend to make will have to be in the .bib file. I am going to cite Shannon Haymond because this article was her idea and I will toss in a couple of other references so you can see that they get cited in order as we would like.

Below is my bibliography.bib file. I put it in the same folder as my R Markdown file. You make a .bib file using a text file editor (or RStudio) by cutting and pasting the BibTex citations from Google Scholar

2.4 Journal Abbreviations

Now, the fussiest thing I had to do was get the references abbreviating properly. We need an abbreviation database. Fortunately, I could download the abbreviation database from the Web of Science as a .csv file and then convert it to a JSON file. This was Stephen’s idea. I had to deal with a couple of badly behaving characters from some journal titles. This script, if embedded in your document, will download the .csv for you and then make the abbreviation database for you. That way your citations will say, “J Clin Pathol” and not “Journal of Clinical Pathology” etc.

2.5 Citation Management

Now we can proceed to cite articles from our .bib file freely inserting syntax like this: [@shannon2017]. Shannon wrote this interesting article on symmetric dimethylarginine (1), Stephen wrote about wellness initiatives with Dr. Diamandis (2) and Dan wrote a paper about PTH when he was a resident (3). That makes for a total of 3 citations in this manuscript. (1–3)

3 Reporting Your Results

3.1 Figures

You can embed figures in R Markdown as local images or hyperlinks as follows:

![Grumpy Cat does not care about reproducibility](grumpy.jpg)

Grumpy Cat does not care about reproducibility

Grumpy Cat does not care about reproducibility

If you need to do cross-referencing of figures in your document which will change automatically for you if you insert another figure, you can do this by inserting your figure with an R code-chunk and giving the code-chunk a name.


This is the ancient aliens guy.

Figure 3.1: This is the ancient aliens guy.

Then you can reference your code chunk using syntax like this: See figure \@ref(fig:ancient-aliens)

and you will automatically create appropriately cross-referenced figures that get automatically numbered like this: See figure 3.1. Of course the hallmark of the reproducibility to embed R code right into the document. See for example figure 3.2

 

This is a reproducible figure

Figure 3.2: This is a reproducible figure

3.2 Inline Calculations

When you are reporting your amazing results you can have inline code calculations by syntax that looks like this: `r round(median(x),1)` and would result in the median value of \(x\) being reported as 43.9 mmol/L.

3.3 Tables

Tables are not a problem either and can be made with the kable() function of the knitr package or with the xtable package. Tables can be crossreferenced analogously to figures. See table 3.1

Table 3.1: This is a great table
First Second Third
1 2 2
2 3 6
3 4 12
4 5 20
5 6 30

3.4 Math

Math works pretty magically using \(\LaTeX\) syntax. For example inline math can be done like so $\sin^2x + \cos^2x = 1$. This will result in: \(\sin^2x + \cos^2x = 1\). And math can also be done as a code block like so:

Gauss’ Law says: \[
\oint_S {E_n dA = \frac{1}{{\varepsilon _0 }}} Q_{inside}
\]

Equations can be cross-referenced just like tables and figures.

4 Conclusion

I hope this makes writing a reproducible paper easier for you. A minimal template to produce output in PDF is here. And the PDF output itself is here. You’ll need \(\LaTeX\) installed of course.

 

Parting Thought

You can cite books too and get Greek letters too:

“Very truly I tell you,” Jesus answered, “before Abraham was born, \(\varepsilon\gamma\omega\) \(\varepsilon\iota\mu\iota\) (I Am)!” (4)

References

1. Brooks ER, Haymond S, Rademaker A, Pierce C, Helenowski I, Passman R, et al. Contribution of symmetric dimethylarginine to gfr decline in pediatric chronic kidney disease. Pediatr Nephrol. Springer; 2017;1–8.

2. Li M, Diamandis EP, Paneth N, Yeo K-TJ, Vogt H, Master SR. Wellness initiatives: Benefits and limitations. Clin Chem. Clinical Chemistry; 2017;63:1063–8.

3. Holmes DT, Levin A, Forer B, Rosenberg F. Preanalytical influences on DPC IMMULITE 2000 intact PTH assays of plasma and serum from dialysis patients. Clin Chem. Clinical Chemistry; 2005;51:915–7.

4. John the A, Chist J, Spirit H, God F. The Gospel According to John, 8:58. 1 Heavenly Way: Hosannah Press; 80AD.


 

3 thoughts on “Reproducible Research: Write your Clinical Chemistry paper using R Markdown

  1. Nice post! Thanks for sharing!

    Just one minor thing FYI: csl can be set in YAML just like bibliography (instead of using the –csl command-line argument).

    1. Thanks Yihui – that’s very helpful. Thank you for making all this possible with all your hard work on knitr and bookdown.

    2. Hi Yihui,

      Turns out that in this case it does not work to out the csl in the YAML because the csl file is dependent on downloading another parent csl file. It all executes properly when the YAML contains this:

      pandoc_args: [
      “–csl”, “csl/clinical-chemistry.csl”, “–citation-abbreviations”, “csl/abbreviations.json”
      ]

      but when I try this:

      csl: clinical-chemistry.csl

      it halts with:
      Error: pandoc document conversion failed with error 83

      However, if I use the parent style which happens to be american-association-for-cancer-research.csl which does not require an additional download, this does work:

      csl: american-association-for-cancer-research.csl

      So it looks like I do have to leave it as a pandoc argument wherein the download happens on the fly.

Comments are closed.