RMarkdown Template that Manages Academic Affiliations – docx or PDF output

Background

I like writing my academic papers in RMarkdown because it allows reproducible research. The cleanest way to submit a manuscript made in RMarkdown is using the LaTeX code that it generates using the YAML switch keep_tex = true. A minimalist YAML header would look like so:

Introduction

However, when you want mutliple authors affiliations you discover that you can’t do as you would in LaTeX because Pandoc does not know what to do with the affiliations and you end out a dishearting PDF that looks like the output shown in figure 1 below:


This is so sad.

Figure 1: This is so sad.

The situation worsens if you want MS-Word output. As those of us in medical fields know, most journals (with some notable exceptions like the Clinical Mass Spectrometry Journal and other Elsevier journals like Clinical Biochemistry and Clinica Chimica Acta) require submission of a document in MS-Word format which goes against all that Data Science and Reprodicible Research stands for–he says, with hyperbole. Parenthetically, it is my hope that since AACC has indicated that they intend to make Data Science a strategic priority for Lab Medicine, they will soon accept submissons to Clinical Chemistry and Journal of Applied Laboratory Medicine written reproducibly in RMardown or LaTeX.

In the mean time, here are the workarounds for getting the affiliations to display correctly along with all the other stuff we want, namely, cross referencing of figures and tables and correct reference formatting and abbreviation of journal names. This allows you to avoid the horror of manually fixing your Word document after it generated from RMarkdown. In any case, let’s start with MS-Word.

Dependencies for MS-Word and the Associated YAML

You will also need to install Pandoc which is the Swiss Army Knife of document conversion. It’s going to turn your code into a .docx file for you. Mac users can do this with Homebrew on the terminal command line:

There are some extra installs required to help Pandoc do its job. Install the prebuilt binaries if you can.

Finally, you need to use some scripts written in the Lua scripting language which means you will need the language itself:

And you will need two Lua scripts:

These are in Pandoc github repository:

You want the files named scholarly-metadata.lua and author-info-blocks.lua.

You will need to choose a .csl file for your journal. This will tell Pandoc how to format the references. You can download the correct .csl file here. You will also need a journal abbreviations database. I have made one for you from the Web of Science list and you can download it here.

You will need to create a .bibtex database which is just your list of references. This can be exported from various reference managers or built by hand. Name the file mybibfile.bib.

Now follow the bouncing ball:

  1. Go to the directory containing your .Rmd file.
  2. Create a directory in it called “Extras”
  3. Put the two Lua scripts, the Bibtex database, the abbreviations database and the .csl file into the “Extras” folder.
  4. If you want to avoid Pandoc’s goofy default .docx formatting, then put this word document in the same folder.

OR

Download the contents of this folder from my github repo that has everything set up as I describe above.

For two authors, your YAML will need to look like this:

Et voila! Figure 2 shows that we have something reasonable.

This is so great

Figure 2: This is so great

Dependencies for LaTeX and the Associated YAML

It goes without saying that you need to install LaTeX. LaTeX markup language is available here: Mac, Windows. For Linux, just install from the command line with your package manager. Do a full install with all the glorious bloat of all LaTeX packages. This saves many headaches in the future.

You don’t need the lua scripts for LaTeX although you can use them. The issue with LaTeX is that the .tex template that Pandoc uses for generating LaTeX files does not support author affiliations as descibed in the Pandoc documentation. So what you need to do is modify the Pandoc LaTeX template. To get your current working copy of the Pandoc LaTeX template open up a terminal (Mac/Linux) and type:

This will push the contents to a file. Move the file to the “Extras” folder discussed above. If that seems difficult, you can also download it here. Now you have to edit it. Open it up in a text editor and find the section that reads:

Replace this with this code that will invoke the LaTeX authblk package.

Then make your YAML header look like this:

And as you can see in figure 3 you get a correctly list of authors.


This is also great.

Figure 3: This is also great.

Cross Reference of a Table

Of course, tables can be cross referenced in the same manner as figures. Here is a cross reference to table 1 using the code \@ref(tab:mytable) .

Table 1: A short table
term estimate std.error statistic p.value
(Intercept) 36.908 2.191 16.847 0.000
hp -0.019 0.015 -1.275 0.213
cyl -2.265 0.576 -3.933 0.000

This Template also Takes Care of Reference Abbreviation.

As usual, you can make a citation with the code [@bibtexname], where bibtexname is the articles’s abbreviated handle in your bibtex database. Here is a great resource on the bookdown package [1] and reproducible research [2] and here are references where the journal title is longer [3,4]. The references in your documnent (and shown below) will have appropriate abbreviations based on the .json abbreviations database I have provided. In this case, I have chosen the .csl file for Clinical Mass Spectrometry–’cause MSACL.

Other Ways to Skin the YAML Cat

I came across some other ways to deal with this that I did not like as much but they are simpler. Here is one using a footnote.

And you can also misuse the date variable:

Conclusion

This concludes my long personal struggle to get a completely reproducible .docx manusript genereated by RMarkdown and Pandoc. Here is the output for PDF and Word.

Parting Thought

Let us not become weary in doing good, for at the proper time we will reap a harvest if we do not give up.

Galations 6:9

References

[1] Y. Xie, J.J. Allaire, G. Grolemund, R markdown: The definitive guide, Chapman; Hall/CRC, 2018. https://bookdown.org/yihui/bookdown.

[2] R.D. Peng, Reproducible research in computational science, Science. 334 (2011) 1226–1227.

[3] G. Eisenhofer, C. Durán, T. Chavakis, C.V. Cannistraci, Steroid metabolomics: Machine learning and multidimensional diagnostics for adrenal cortical tumors, hyperplasias, and related disorders, Curr. Opin. Endocr. Metab. Res. 8 (2019) 40–49. doi:https://doi.org/10.1016/j.coemr.2019.07.002.

[4] F.B. Vicente, D.C. Lin, S. Haymond, Automation of chromatographic peak review and order to result data transfer in a clinical mass spectrometry laboratory, Clin. Chim. Acta. 498 (2019) 84–89. doi:https://doi.org/10.1016/j.cca.2019.08.004.

Reproducible Research: Write your Clinical Chemistry paper using R Markdown

Abstract
Background: This blog post is going to show you how to write a reproducible article in the field of clinical chemistry using R Markdown. The only thing that will change for journal to journal will be the reference fomating and perhaps section numbering. The source code itself will be provided so that you can use it as a template.

Methods: The paper will use R, R-Markdown, bookdown and pandoc. The references will be taken care of using BibTeX and reference formatting will be managed with Zotero csl files.

Results: The result will be a manuscript that anyone can reproduce.

Conclusions: R Markdown makes reproducible research through literate programming pretty easy.

1 Background

Last week at the MSACL conference Dr. Keith Baggerly from MD Anderson Cancer Centre’s Bioformatics and Computational Biology Group spoke about the importance of reproducible research using the Duke University ovarian cancer biomarker scandal as a backdrop. The talk…was…incredible and illustrated how easy it is to introduce catastrophic errors into your research papers through the use of GUI analytical tools. The now retracted article that Baggerly dismantled is here. I urge everyone in our field to watch similar talks from Keith discussing biomarker analysis in mass spectrometric proteomic data and microarray data. Shannon Haymond and I were then discussing how to make a submission in the field of clinical chemistry that is reproducible. While this article will not discuss the basics of R and R Markdown, it will serve as a guide for those who know a little about these and give you a working YAML header and get the citations and cross-references correct.

2 Overhead

2.1 YAML

RMarkdown articles require a YAML header to instruct R Markdown how to process your article. This is the YAML code that worked for me. I am sure there are other ways to do this:

Indenting and spacing really matter a lot in YAML so don’t mess with them. You can generate PDF, MS Word or HTML output as needed by uncommenting and commenting out the output type as appropriate in the YAML above.

2.2 CSL Files

CSL files take care of citation formatting for you. Depending on what journal you are making submission to, you will need a different .csl file. They exist for every conceivable journal. In the world of non-reproducible reports, this process is taken care of by reference managers in GUI word processors but since GUI word processors do not produce reproducible research, we must break up with them.

From the YAML, you will see that you need a file called “clinical-chemistry.csl” which I downloaded from here. Put this file in the same folder as your R Markdown file. The Clinical Chemistry .csl depends on the .csl file of the American Association for Cancer Research but it will be downloaded for you on the fly. If you need a different reference format, search the CSL GitHub repository for the appropriate file.

2.3 BibTeX

Reference management in R Markdown is taken care of by BibTeX. You can see from the YAML that we need a bibliography text file called “bibliography.bib”. You can name it whatever you like but you will need to change the YAML accordingly. In any case, any citation you intend to make will have to be in the .bib file. I am going to cite Shannon Haymond because this article was her idea and I will toss in a couple of other references so you can see that they get cited in order as we would like.

Below is my bibliography.bib file. I put it in the same folder as my R Markdown file. You make a .bib file using a text file editor (or RStudio) by cutting and pasting the BibTex citations from Google Scholar

2.4 Journal Abbreviations

Now, the fussiest thing I had to do was get the references abbreviating properly. We need an abbreviation database. Fortunately, I could download the abbreviation database from the Web of Science as a .csv file and then convert it to a JSON file. This was Stephen’s idea. I had to deal with a couple of badly behaving characters from some journal titles. This script, if embedded in your document, will download the .csv for you and then make the abbreviation database for you. That way your citations will say, “J Clin Pathol” and not “Journal of Clinical Pathology” etc.

2.5 Citation Management

Now we can proceed to cite articles from our .bib file freely inserting syntax like this: [@shannon2017]. Shannon wrote this interesting article on symmetric dimethylarginine (1), Stephen wrote about wellness initiatives with Dr. Diamandis (2) and Dan wrote a paper about PTH when he was a resident (3). That makes for a total of 3 citations in this manuscript. (1–3)

3 Reporting Your Results

3.1 Figures

You can embed figures in R Markdown as local images or hyperlinks as follows:

![Grumpy Cat does not care about reproducibility](grumpy.jpg)

Grumpy Cat does not care about reproducibility

Grumpy Cat does not care about reproducibility

If you need to do cross-referencing of figures in your document which will change automatically for you if you insert another figure, you can do this by inserting your figure with an R code-chunk and giving the code-chunk a name.


This is the ancient aliens guy.

Figure 3.1: This is the ancient aliens guy.

Then you can reference your code chunk using syntax like this: See figure \@ref(fig:ancient-aliens)

and you will automatically create appropriately cross-referenced figures that get automatically numbered like this: See figure 3.1. Of course the hallmark of the reproducibility to embed R code right into the document. See for example figure 3.2

 

This is a reproducible figure

Figure 3.2: This is a reproducible figure

3.2 Inline Calculations

When you are reporting your amazing results you can have inline code calculations by syntax that looks like this: `r round(median(x),1)` and would result in the median value of \(x\) being reported as 43.9 mmol/L.

3.3 Tables

Tables are not a problem either and can be made with the kable() function of the knitr package or with the xtable package. Tables can be crossreferenced analogously to figures. See table 3.1

Table 3.1: This is a great table
First Second Third
1 2 2
2 3 6
3 4 12
4 5 20
5 6 30

3.4 Math

Math works pretty magically using \(\LaTeX\) syntax. For example inline math can be done like so $\sin^2x + \cos^2x = 1$. This will result in: \(\sin^2x + \cos^2x = 1\). And math can also be done as a code block like so:

Gauss’ Law says: \[
\oint_S {E_n dA = \frac{1}{{\varepsilon _0 }}} Q_{inside}
\]

Equations can be cross-referenced just like tables and figures.

4 Conclusion

I hope this makes writing a reproducible paper easier for you. A minimal template to produce output in PDF is here. And the PDF output itself is here. You’ll need \(\LaTeX\) installed of course.

 

Parting Thought

You can cite books too and get Greek letters too:

“Very truly I tell you,” Jesus answered, “before Abraham was born, \(\varepsilon\gamma\omega\) \(\varepsilon\iota\mu\iota\) (I Am)!” (4)

References

1. Brooks ER, Haymond S, Rademaker A, Pierce C, Helenowski I, Passman R, et al. Contribution of symmetric dimethylarginine to gfr decline in pediatric chronic kidney disease. Pediatr Nephrol. Springer; 2017;1–8.

2. Li M, Diamandis EP, Paneth N, Yeo K-TJ, Vogt H, Master SR. Wellness initiatives: Benefits and limitations. Clin Chem. Clinical Chemistry; 2017;63:1063–8.

3. Holmes DT, Levin A, Forer B, Rosenberg F. Preanalytical influences on DPC IMMULITE 2000 intact PTH assays of plasma and serum from dialysis patients. Clin Chem. Clinical Chemistry; 2005;51:915–7.

4. John the A, Chist J, Spirit H, God F. The Gospel According to John, 8:58. 1 Heavenly Way: Hosannah Press; 80AD.