clinical pathology – Page 2

A Shiny App for Passing Bablok and Deming Regression

August 15, 2016August 15, 2016 dtholmes@mail.ubc.ca

Background

Back in 2011 I was not aware of any tool in R for Passing Bablok (PB) regression, a form of robust regression described in a series of three papers in Clinical Chemistry and Laboratory Medicine (then J Clin Chem and Biochem) available here, here and here. For reasons that are not entirely clear to me, this regression methodology is favoured by clinical chemists but seems largely ignored by other disciplines. However since reviewers clinical chemistry journals will demand the use of PB regression, it seemed expeditious to me to code it in R. This is what spawned a small project for a piece of software to do PB (and Deming and ordinary least squares) regression using a self-contained executable that could be downloaded, unzipped on a Windows Desktop and just ran. You can download here and instructions for installation and use are here and here respectively. The calculations are all done in R, the GUI is built with Python and Py-Qt4 and the executable with cx_freeze. I made it run without an installer because hospital IT often refuse to install software that has not been officially vetted and purchased. The tool was a lot more popular than I anticipated now having about 2000 downloads. In any case, maintenance, upgrades, bug fixing and dealing with operating system updates that break things (like OSX El Capitan's security policies) are no-fun so a Shiny based solution to the same problem makes a lot of sense.

Update

Since 2011, statisticians at Roche Diagnostics programmed the mcr package for PB and Deming regression. Additionally, there is also the MethComp package and the deming package from the Mayo Clinic which both offer PB regression.

Shiny App

Enter Burak Bahar, a like-minded Clinical Pathologist who is currently doing a fellowship at Yale. He liked my cp-R program but he saw the need for a web-based equivalent.

Burak and his wife Ayse, also a physician, have coded a Shiny App for doing Deming, PB and least squares regression in R which is capable of producing publication quality figures and provides all the regression statistics you would need for method-validation or publication. It can also produce a regression report in PDF, Word or HTML. The dynamic duo of the Bahar-MDs deserve all credit here as my only contribution related to suggestions related to usability. This project was presented at the 2016 American Association of Clinical Chemistry meeting in Philadelphia.

The app URL is bahar.shinyapps.io/method_compare. Go to the data tab on the left and then cut and paste your data from an spreadsheet program. Shortcuts CTRL-C (copy) and CTRL-V (paste) work natively in the table. The table is pre-populated with some random data for demonstration purposes. Once your data is pasted in, click on the Plots tab and choose the Bland-Altman or Scatter Plot.

Example

Here is an image generated with the Bahar Shiny app using method comparison data obtained from St. Paul's Hospital Laboratory in migrating from Siemens Immulite 2000 XPi to Roche Cobas e601 for Calcitonin determination. Don't worry, we did more than 33 comparison–I am just showing the low end.

Try adjusting some of the plot parameters. The figures will update in real time. Thanks to Burak and Ayse Bahar for your work!

(Dan's) Parting Thought

There are straight lines that matter a lot more than regression.

I will make justice the measuring line and righteousness the plumb line
(Isa 28:17)

Flat File Interface your Mass Spectrometer to the Laboratory Information System with R

February 27, 2016March 1, 2016 dtholmes@mail.ubc.ca

The Problem

As Clinical Pathologists we work hard to create laboratory developed tests (LDTs) using liquid chromatography and tandem mass spectrometry (LC-MS/MS) that are robust, repeatable, accurate and have a wider dynamic range than commercial immunoassays. In our experience, properly developed LC-MS/MS assays are much less expensive and outperform their commercial immunoassay counterparts from an analytical standpoint.

However, despite mass spectrometry's communal obsession with analytical performance of our LDTs, sometimes we overlook the matter of handling the data we generate. Unlike traditional diagnostic companies (e.g. Siemens, Roche) who take care of upload and download of patient data and results via HL7 streams to the laboratory information system (LIS), mass spectrometry companies have not yet made this a priority. This leaves us either paying out a lot of money for custom middleware solutions or manually transcribing our LC-MS/MS results.

We might naively think, “How bad can the transcription be?” but over time, it becomes painfully evident that manual transcription of result is tedious, error–prone and inefficient use of tech–time.

Many LIS vendors offer what is called a “flat-file interface”. In this case, there is no HL7 stream generated using a communication socket between instrument and LIS. Rather, the results are saved in an ASCII text file with a pre-defined format and then transferred to the LIS via a secure shell (SSH) connection.

For this post, we are going to take some sample flat files from a SCIEX API5000 triple quadrupole mass spectrometer and prepare a flat file for the SunQuest LIS. Please note that this code is provided to you as is under the GNU Public Licence and without any guarantee. You know how all the LC-MS/MS vendors say their instruments are for “research use only”? –yeah, I'm giving this to you in the same spirit. If you use or modify it, you do so at your own risk. Any changes to how your flatfile is generated by your mass spectrometer or any upgrades to your LC-MS/MS software could make this code malfunction. You have been warned.

The Required Format

SunQuest requires the output file to be a comma separated values (CSV) file with a unique specimen or internal QC result in each row. The first column is the instrument ID, the second columns is the specimen container ID (an E followed by a 10–digit integer), the third is testcode and the fourth is the result. The file itself is required to have a time–stamp so that it has a traceable name and should have no header. For an instrument named PAPI (short for Providence API 5000) and a testcode TES (for testosterone), the file might look like this:

PAPI,E2324434511,TES,3.12 
PAPI,E2324434542,TES,8.75 
PAPI,E2324434565,TES,25.34 
...

PAPI,E2324434511,TES,3.12

PAPI,E2324434542,TES,8.75

PAPI,E2324434565,TES,25.34

...

The Starting Material

After we have completed an analytical run and reviewed all peaks to generate our fileable results, we can export the quatified sample batch to an ASCII text file. The file contains a whole lot of diagnostic information about the run like which multiple reaction monitoring (MRM) transitions we used, what the internal standard (IS) counts were, results from the quantifier and qualifier ion, fitted values for the calibrators etc. There are more than 80 columns in a typical file and we could talk about all the things we might do with this data but in this case, we are concerned with extracting and preparing the results file.

Dialogue Box

If we are actually going to make an R script usable by a human, it would be good to be able to choose which file we want to process and what test we want to extract using a simple graphical user interface (GUI). There are a number of tools one can use to build GUIs in R but the most rudimentary is TclTk. I have to confess that I find the language constructs for GUI creation both non–intuitive and boring. For this reason, I present without discussion, a modification of a recipe for creating a box with radio–buttons. We are going to choose which of three analytes (you can increase this number as you please) for which we wish to process a flat–file. These are: aldosterone, cortisol and testosterone. Please note that if you execute this code on a Mac, you will have to install XQuartz because Macs don't have native X-windows support despite the BSD Linux heritage of OSX.

library(tcltk2)
#make a radiobutton widget
#source for tk widget modifed from http://www.sciviews.org/recipes/tcltk/TclTk-radiobuttons/
#accessed Feb 10, 2016
win1 <- tktoplevel()
win1$env$rb1 <- tk2radiobutton(win1)
win1$env$rb2 <- tk2radiobutton(win1)
win1$env$rb3 <- tk2radiobutton(win1)
rbValue <- tclVar("Aldosterone")
tkconfigure(win1$env$rb1, variable = rbValue, value = "Aldosterone")
tkconfigure(win1$env$rb2, variable = rbValue, value = "Cortisol")
tkconfigure(win1$env$rb3, variable = rbValue, value = "Testosterone")
tkgrid(tk2label(win1, text = "Which analyte are you processing?"),
       columnspan = 2, padx = 10, pady = c(15, 5))
tkgrid(tk2label(win1, text = "Aldosterone"), win1$env$rb1,
       padx = 10, pady = c(0, 5))
tkgrid(tk2label(win1,text = "Cortisol"), win1$env$rb2,
       padx = 10, pady = c(0, 5))
tkgrid(tk2label(win1,text = "Testosterone"), win1$env$rb3,
       padx = 10, pady = c(0, 15))

onOK <- function() {
  rbVal <- as.character(tclvalue(rbValue))
  tkdestroy(win1)
}

win1$env$butOK <- tk2button(win1, text = "OK", width = -6, command = onOK)
tkgrid(win1$env$butOK, columnspan = 2, padx = 10, pady = c(5, 15))
tkfocus(win1)
#this final line is necessary to prevent to the program from proceeding until this radio button widget has closed
tkwait.window(win1)

library(tcltk2)

#make a radiobutton widget

#source for tk widget modifed from http://www.sciviews.org/recipes/tcltk/TclTk-radiobuttons/

#accessed Feb 10, 2016

win1 <- tktoplevel()

win1$env$rb1 <- tk2radiobutton(win1)

win1$env$rb2 <- tk2radiobutton(win1)

win1$env$rb3 <- tk2radiobutton(win1)

rbValue <- tclVar("Aldosterone")

tkconfigure(win1$env$rb1, variable = rbValue, value = "Aldosterone")

tkconfigure(win1$env$rb2, variable = rbValue, value = "Cortisol")

tkconfigure(win1$env$rb3, variable = rbValue, value = "Testosterone")

tkgrid(tk2label(win1, text = "Which analyte are you processing?"),

columnspan = 2, padx = 10, pady = c(15, 5))

tkgrid(tk2label(win1, text = "Aldosterone"), win1$env$rb1,

padx = 10, pady = c(0, 5))

tkgrid(tk2label(win1,text = "Cortisol"), win1$env$rb2,

padx = 10, pady = c(0, 5))

tkgrid(tk2label(win1,text = "Testosterone"), win1$env$rb3,

padx = 10, pady = c(0, 15))

onOK <- function() {

rbVal <- as.character(tclvalue(rbValue))

tkdestroy(win1)

}

win1$env$butOK <- tk2button(win1, text = "OK", width = -6, command = onOK)

tkgrid(win1$env$butOK, columnspan = 2, padx = 10, pady = c(5, 15))

tkfocus(win1)

#this final line is necessary to prevent to the program from proceeding until this radio button widget has closed

tkwait.window(win1)

This will give us the following pop-up window with radiobuttons in which I have selected testosterone.

You will notice that Tk windows do not appear native to the operating system. We can live with this because we are not shallow.

After you hit the OK button, the Tk widget then puts the chosen value into an Tk variable called rbValue. We can determine the value using the command tclvalue(rbValue). The reason we need to know which analyte we are working with is because the name of the MRM we want to pull out of the flat file is dependent on the analyte of course. We will also need to replace results below the limit of quantitation (LoQ) with “< x”, whatever x happens to be, which will be a different threshold for each analyte.

In our case, the testcodes for aldosterone, cortisol and testosterone are ALD,CORT and TES respectively, the LoQs are 50 pmol/L, 1 nmol/L and 0.05 nmol/L and the MRM names are “Aldo 1”, “Aldo 2”, “Cortisol 1”, “Cortisol 2” and “Testo 1” and “Testo 2” as we defined them within SCIEX Analyst Software. We will use the switch() function to define three variables (test.code, LoQ, and MRM.names) which we will use later to process the flat–file. We will also define the name of the worksheet in a variable called worksheet. These are the parameters you would have to change in order to modify the code for your purposes.

#set the testcode by test
test.code <- switch(tclvalue(rbValue),
                    "Aldosterone" = "ALD",
                    "Testosterone" = "TES",
                    "Cortisol" = "CORT"
)

#set the LoQ by test
LoQ <- switch(tclvalue(rbValue),
                    "Aldosterone" = "<50",
                    "Testosterone" = "<0.05",
                    "Cortisol" = "<1"
)

#set the MRM names by test
MRM.names <- switch(tclvalue(rbValue),
              "Aldosterone" = c("Aldo 1", "Aldo 2"),
              "Testosterone" = c("Testo 1", "Testo 2"),
              "Cortisol" = c("Cortisol 1", "Cortisol 2")
)

#set the worksheet name for your site
worksheet <- "PAPI"

#set the testcode by test

test.code <- switch(tclvalue(rbValue),

"Aldosterone" = "ALD",

"Testosterone" = "TES",

"Cortisol" = "CORT"

)

#set the LoQ by test

LoQ <- switch(tclvalue(rbValue),

"Aldosterone" = "<50",

"Testosterone" = "<0.05",

"Cortisol" = "<1"

)

#set the MRM names by test

MRM.names <- switch(tclvalue(rbValue),

"Aldosterone" = c("Aldo 1", "Aldo 2"),

"Testosterone" = c("Testo 1", "Testo 2"),

"Cortisol" = c("Cortisol 1", "Cortisol 2")

)

#set the worksheet name for your site

worksheet <- "PAPI"

Building File Names

Now we will prompt the user to tell them that they are to choose an instrument flat–file and we will determine the path of the chosen file. We will need the path to both read in the appropriate file but also to write the output later.

#choose the flat file to process
tkmessageBox(message="You are about to choose a flat file to process.")
flat.file.path <- tk_choose.files(default = "", caption = "Select File", multi = FALSE, filters = NULL, index = 1)
#determine the directory name of the chosen file
flat.file.dir <- dirname(flat.file.path)
#determine the file name of the chosen file
flat.file.name <- basename(flat.file.path)

#choose the flat file to process

tkmessageBox(message="You are about to choose a flat file to process.")

flat.file.path <- tk_choose.files(default = "", caption = "Select File", multi = FALSE, filters = NULL, index = 1)

#determine the directory name of the chosen file

flat.file.dir <- dirname(flat.file.path)

#determine the file name of the chosen file

flat.file.name <- basename(flat.file.path)

This code will create this message box:

and this file choice dialogue box:

and after a file is selected and the Open is pressed, the path to the flat–file is stored in the variable flat.file.path.

Behold: The Data

So we chosen the file we want to read in but what does this file look like? To just get a gander at it, we could open it with Excel and see how it is laid out. But since we have broken up with Excel, we won't do this. SCIEX Analyst exports tab (not comma) delimited files. R has a built in function read.delim() for reading these files but we will quickly discover that read.delim() assumes the files have a rectangular structure, having the same number of columns in each row. R will make assumptions about the shape of the data file based on the first few rows and then try to read it in. In this case, it will fail and you will get gibberish. To get this to work for us we will need to tell R how many rows to skip before the real data starts or we will need to tell R the number of columns the file has (which is not guaranteed to be consistent between versions of vendor software). There are lots of ways to do this but I think the simplest is to use grep().

I did this by reading the file in with no parsing of the tabs using the readLines() function. This function creates a vector for which each successive value is the entire content of the row of the file. I display the first 30 lines of the file. Suppose that we chose a testosterone flat file.

x <- readLines(flat.file.path)
x[1:30]

x <- readLines(flat.file.path)

x[1:30]

##  [1] "Peak Name: Testo-d3 2"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
##  [2] "Use as Internal Standard"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
##  [3] "Q1/Q3 Masses: 292.50/97.20 Da"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
##  [4] ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
##  [5] "Peak Name: Testo 1"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
##  [6] "Internal Standard: Testo-d3 2"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
##  [7] "Q1/Q3 Masses: 289.50/97.20 Da"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
##  [8] ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
##  [9] "Fit\tQuadratic\tWeighting\t1 / x\tIterate\tNo"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
## [10] "a0\t0.00658"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
## [11] "a1\t0.2"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
## [12] "a2\t-0.000443"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
## [13] "Correlation coefficient\t0.9999"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
## [14] "Use Area"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
## [15] ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
## [16] "Peak Name: Testo 2"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
## [17] "Internal Standard: Testo-d3 2"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
## [18] "Q1/Q3 Masses: 289.50/109.10 Da"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
## [19] ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
## [20] "Fit\tQuadratic\tWeighting\t1 / x\tIterate\tNo"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
## [21] "a0\t0.00359"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
## [22] "a1\t0.17"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
## [23] "a2\t-0.000313"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
## [24] "Correlation coefficient\t0.9999"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
## [25] "Use Area"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
## [26] ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
## [27] ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
## [28] ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
## [29] "Sample Name\tSample ID\tSample Type\tSample Comment\tSet Number\tAcquisition Method\tAcquisition Date\tRack Type\tRack Position\tVial Position\tPlate Type\tPlate Position\tFile Name\tDilution Factor\tWeight To Volume Ratio\tSample Annotation\tDisposition\tAnalyte Peak Name\tAnalyte Units\tAnalyte Peak Area (counts)\tAnalyte Peak Area for DAD (mAU x min)\tAnalyte Peak Height (cps)\tAnalyte Peak Height for DAD (mAU)\tAnalyte Concentration (nmol/L)\tAnalyte Retention Time (min)\tAnalyte Expected RT (min)\tAnalyte RT Window (sec)\tAnalyte Centroid Location (min)\tAnalyte Start Scan\tAnalyte Start Time (min)\tAnalyte Stop Scan\tAnalyte Stop Time (min)\tAnalyte Integration Type\tAnalyte Signal To Noise\tAnalyte Peak Width (min)\tStandard Query Status\tAnalyte Mass Ranges (Da)\tAnalyte Wavelength Ranges (nm)\tArea Ratio\tHeight Ratio\tAnalyte Annotation\tAnalyte Channel\tAnalyte Peak Width at 50% Height (min)\tAnalyte Slope of Baseline (%/min)\tAnalyte Processing Alg.\tAnalyte Peak Asymmetry\tAnalyte Integration Quality\tIS Peak Name\tIS Units\tIS Peak Area (counts)\tIS Peak Area for DAD (mAU x min)\tIS Peak Height (cps)\tIS Peak Height for DAD (mAU)\tIS Concentration (nmol/L)\tIS Retention Time (min)\tIS Expected RT (min)\tIS RT Window (sec)\tIS Centroid Location (min)\tIS Start Scan\tIS Start Time (min)\tIS Stop Scan\tIS Stop Time (min)\tIS Integration Type\tIS Signal To Noise\tIS Peak Width (min)\tIS Mass Ranges (Da)\tIS Wavelength Ranges (nm)\tIS Channel\tIS Peak Width at 50% Height (min)\tIS Slope of Baseline (%/min)\tIS Processing Alg.\tIS Peak Asymmetry\tIS Integration Quality\tUse Record\tRecord Modified\tCalculated Concentration (nmol/L)\tCalculated Concentration for DAD (nmol/L)\tRelative Retention Time\tAccuracy (%)\tResponse Factor\tAcq. Start Time (min)\tInjection Volume used\t"
## [30] "Blank\t\tBlank\tBlankMPX_SAMPLE_ID:189001;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;\t0\tTesto_DWP_S2.dam\t2/11/2013 5:06:45 PM\tDeep Well MTP 96 Cooled\t1\t1\tDeep Well MTP 96 Cooled\t2\t140305B1.wiff\t1.00\t0.00\t\t\tTesto 1\tnmol/L\t0\tN/A\t0.00e+000\tN/A\t0.00\t0.00\t1.29\t30.0\t0.00\t0\t0.00\t0\t0.00\tNo Peak\tN/A\t0.00\tN/A\t289.500/97.200 Da\tN/A\t0.00e+000\t0.00e+000\t\tN/A\t0.00\t0.00e+000\tSpecify Parameters - MQIII\t0.00\t0.00\tTesto-d3 2\tnmol/L\t158416\tN/A\t5.58e+004\tN/A\t1.00\t1.27\t1.29\t20.0\t1.28\t115\t1.18\t139\t1.43\tBase To Base\tN/A\t0.248\t292.500/97.200 Da\tN/A\tN/A\t4.36e-002\t1.06e+000\tSpecify Parameters - MQIII\t1.60\t0.956\t\t0\tN/A\tN/A\t0.00\tN/A\tN/A\t2.85\t25\t"

## [1] "Peak Name: Testo-d3 2"

## [2] "Use as Internal Standard"

## [3] "Q1/Q3 Masses: 292.50/97.20 Da"

## [4] ""

## [5] "Peak Name: Testo 1"

## [6] "Internal Standard: Testo-d3 2"

## [7] "Q1/Q3 Masses: 289.50/97.20 Da"

## [8] ""

## [9] "Fit\tQuadratic\tWeighting\t1 / x\tIterate\tNo"

## [10] "a0\t0.00658"

## [11] "a1\t0.2"

## [12] "a2\t-0.000443"

## [13] "Correlation coefficient\t0.9999"

## [14] "Use Area"

## [15] ""

## [16] "Peak Name: Testo 2"

## [17] "Internal Standard: Testo-d3 2"

## [18] "Q1/Q3 Masses: 289.50/109.10 Da"

## [19] ""

## [20] "Fit\tQuadratic\tWeighting\t1 / x\tIterate\tNo"

## [21] "a0\t0.00359"

## [22] "a1\t0.17"

## [23] "a2\t-0.000313"

## [24] "Correlation coefficient\t0.9999"

## [25] "Use Area"

## [26] ""

## [27] ""

## [28] ""

## [29] "Sample Name\tSample ID\tSample Type\tSample Comment\tSet Number\tAcquisition Method\tAcquisition Date\tRack Type\tRack Position\tVial Position\tPlate Type\tPlate Position\tFile Name\tDilution Factor\tWeight To Volume Ratio\tSample Annotation\tDisposition\tAnalyte Peak Name\tAnalyte Units\tAnalyte Peak Area (counts)\tAnalyte Peak Area for DAD (mAU x min)\tAnalyte Peak Height (cps)\tAnalyte Peak Height for DAD (mAU)\tAnalyte Concentration (nmol/L)\tAnalyte Retention Time (min)\tAnalyte Expected RT (min)\tAnalyte RT Window (sec)\tAnalyte Centroid Location (min)\tAnalyte Start Scan\tAnalyte Start Time (min)\tAnalyte Stop Scan\tAnalyte Stop Time (min)\tAnalyte Integration Type\tAnalyte Signal To Noise\tAnalyte Peak Width (min)\tStandard Query Status\tAnalyte Mass Ranges (Da)\tAnalyte Wavelength Ranges (nm)\tArea Ratio\tHeight Ratio\tAnalyte Annotation\tAnalyte Channel\tAnalyte Peak Width at 50% Height (min)\tAnalyte Slope of Baseline (%/min)\tAnalyte Processing Alg.\tAnalyte Peak Asymmetry\tAnalyte Integration Quality\tIS Peak Name\tIS Units\tIS Peak Area (counts)\tIS Peak Area for DAD (mAU x min)\tIS Peak Height (cps)\tIS Peak Height for DAD (mAU)\tIS Concentration (nmol/L)\tIS Retention Time (min)\tIS Expected RT (min)\tIS RT Window (sec)\tIS Centroid Location (min)\tIS Start Scan\tIS Start Time (min)\tIS Stop Scan\tIS Stop Time (min)\tIS Integration Type\tIS Signal To Noise\tIS Peak Width (min)\tIS Mass Ranges (Da)\tIS Wavelength Ranges (nm)\tIS Channel\tIS Peak Width at 50% Height (min)\tIS Slope of Baseline (%/min)\tIS Processing Alg.\tIS Peak Asymmetry\tIS Integration Quality\tUse Record\tRecord Modified\tCalculated Concentration (nmol/L)\tCalculated Concentration for DAD (nmol/L)\tRelative Retention Time\tAccuracy (%)\tResponse Factor\tAcq. Start Time (min)\tInjection Volume used\t"

## [30] "Blank\t\tBlank\tBlankMPX_SAMPLE_ID:189001;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;\t0\tTesto_DWP_S2.dam\t2/11/2013 5:06:45 PM\tDeep Well MTP 96 Cooled\t1\t1\tDeep Well MTP 96 Cooled\t2\t140305B1.wiff\t1.00\t0.00\t\t\tTesto 1\tnmol/L\t0\tN/A\t0.00e+000\tN/A\t0.00\t0.00\t1.29\t30.0\t0.00\t0\t0.00\t0\t0.00\tNo Peak\tN/A\t0.00\tN/A\t289.500/97.200 Da\tN/A\t0.00e+000\t0.00e+000\t\tN/A\t0.00\t0.00e+000\tSpecify Parameters - MQIII\t0.00\t0.00\tTesto-d3 2\tnmol/L\t158416\tN/A\t5.58e+004\tN/A\t1.00\t1.27\t1.29\t20.0\t1.28\t115\t1.18\t139\t1.43\tBase To Base\tN/A\t0.248\t292.500/97.200 Da\tN/A\tN/A\t4.36e-002\t1.06e+000\tSpecify Parameters - MQIII\t1.60\t0.956\t\t0\tN/A\tN/A\t0.00\tN/A\tN/A\t2.85\t25\t"

All of the \t's that you see are the tabs in the file which are has read in literally when we use readLines(). We can see that in this file nothing of use happens until line 29 but this is not consistent from file to file so we should not just assume that 29 is always the magic number where the good stuff begins. We can see that the line starting “Sample Name \t Sample ID” is the real starting point so we can determine how many lines to skip by using grep() and prepare for some error–handling with a variable called problem by which we can deal with the circumstance that no approriate starting row is identified.

skip.val <- grep("Sample Name\tSample ID", x, fixed = TRUE) - 1

#if no such row is found, then the wrong file has been chosen
if (length(skip.val)==0){
  problem <- TRUE
} else {
  problem <- FALSE
}

skip.val

skip.val <- grep("Sample Name\tSample ID", x, fixed = TRUE) - 1

#if no such row is found, then the wrong file has been chosen

if (length(skip.val)==0){

problem <- TRUE

} else {

problem <- FALSE

}

skip.val

## [1] 28

## [1] 28

Now that we know how many lines to skip we can read in the data:

my.data <- read.delim(flat.file.path, sep = "\t", strip.white = TRUE, skip = skip.val, header = TRUE, stringsAsFactors = FALSE)

1 2	my.data <- read.delim(flat.file.path, sep = "\t", strip.white = TRUE, skip = skip.val, header = TRUE, stringsAsFactors = FALSE)

We can have a look at the structure of this file

str(my.data)

1 2	str(my.data)

## 'data.frame':    196 obs. of  83 variables:
##  $ Sample.Name                              : chr  "Blank" "Blank" "STD1" "STD1" ...
##  $ Sample.ID                                : logi  NA NA NA NA NA NA ...
##  $ Sample.Type                              : chr  "Blank" "Blank" "Standard" "Standard" ...
##  $ Sample.Comment                           : chr  "BlankMPX_SAMPLE_ID:189001;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;" "BlankMPX_SAMPLE_ID:189001;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;" "StandardMPX_SAMPLE_ID:189002;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;" "StandardMPX_SAMPLE_ID:189002;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;" ...
##  $ Set.Number                               : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Acquisition.Method                       : chr  "Testo_DWP_S2.dam" "Testo_DWP_S2.dam" "Testo_DWP_S2.dam" "Testo_DWP_S2.dam" ...
##  $ Acquisition.Date                         : chr  "2/11/2013 5:06:45 PM" "2/11/2013 5:06:45 PM" "2/11/2013 5:13:14 PM" "2/11/2013 5:13:14 PM" ...
##  $ Rack.Type                                : chr  "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" ...
##  $ Rack.Position                            : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Vial.Position                            : int  1 1 13 13 25 25 37 37 49 49 ...
##  $ Plate.Type                               : chr  "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" ...
##  $ Plate.Position                           : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ File.Name                                : chr  "140305B1.wiff" "140305B1.wiff" "140305B1.wiff" "140305B1.wiff" ...
##  $ Dilution.Factor                          : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Weight.To.Volume.Ratio                   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Sample.Annotation                        : logi  NA NA NA NA NA NA ...
##  $ Disposition                              : logi  NA NA NA NA NA NA ...
##  $ Analyte.Peak.Name                        : chr  "Testo 1" "Testo 2" "Testo 1" "Testo 2" ...
##  $ Analyte.Units                            : chr  "nmol/L" "nmol/L" "nmol/L" "nmol/L" ...
##  $ Analyte.Peak.Area..counts.               : int  0 0 5273 3464 19412 16195 37994 32722 87815 74821 ...
##  $ Analyte.Peak.Area.for.DAD..mAU.x.min.    : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ Analyte.Peak.Height..cps.                : num  0 0 1830 1300 6620 5700 13600 11400 30900 26100 ...
##  $ Analyte.Peak.Height.for.DAD..mAU.        : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ Analyte.Concentration..nmol.L.           : chr  "0.00" "0.00" "0.108" "0.108" ...
##  $ Analyte.Retention.Time..min.             : num  0 0 1.29 1.29 1.29 1.28 1.29 1.29 1.29 1.29 ...
##  $ Analyte.Expected.RT..min.                : num  1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 ...
##  $ Analyte.RT.Window..sec.                  : num  30 30 30 30 30 30 30 30 30 30 ...
##  $ Analyte.Centroid.Location..min.          : num  0 0 1.3 1.29 1.29 1.29 1.3 1.3 1.3 1.29 ...
##  $ Analyte.Start.Scan                       : int  0 0 119 120 119 118 120 120 119 119 ...
##  $ Analyte.Start.Time..min.                 : num  0 0 1.22 1.23 1.22 1.21 1.23 1.23 1.22 1.22 ...
##  $ Analyte.Stop.Scan                        : int  0 0 137 130 137 135 135 138 141 139 ...
##  $ Analyte.Stop.Time..min.                  : num  0 0 1.41 1.33 1.41 1.39 1.39 1.42 1.45 1.43 ...
##  $ Analyte.Integration.Type                 : chr  "No Peak" "No Peak" "Base To Base" "Base To Base" ...
##  $ Analyte.Signal.To.Noise                  : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ Analyte.Peak.Width..min.                 : num  0 0 0.186 0.103 0.186 0.176 0.155 0.186 0.227 0.207 ...
##  $ Standard.Query.Status                    : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ Analyte.Mass.Ranges..Da.                 : chr  "289.500/97.200 Da" "289.500/109.100 Da" "289.500/97.200 Da" "289.500/109.100 Da" ...
##  $ Analyte.Wavelength.Ranges..nm.           : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ Area.Ratio                               : num  0 0 0.0304 0.02 0.113 0.094 0.219 0.188 0.468 0.398 ...
##  $ Height.Ratio                             : num  0 0 0.0304 0.0216 0.108 0.0933 0.225 0.189 0.471 0.398 ...
##  $ Analyte.Annotation                       : logi  NA NA NA NA NA NA ...
##  $ Analyte.Channel                          : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ Analyte.Peak.Width.at.50..Height..min.   : num  0 0 0.0447 0.0447 0.046 0.0432 0.0439 0.0454 0.0438 0.045 ...
##  $ Analyte.Slope.of.Baseline....min.        : num  0 0 16.4 23.2 2.89 7.54 4.65 3.46 0.631 3 ...
##  $ Analyte.Processing.Alg.                  : chr  "Specify Parameters - MQIII" "Specify Parameters - MQIII" "Specify Parameters - MQIII" "Specify Parameters - MQIII" ...
##  $ Analyte.Peak.Asymmetry                   : num  0 0 1.8 0.874 1.83 1.41 1.51 2.18 2.18 2 ...
##  $ Analyte.Integration.Quality              : num  0 0 0.379 0.233 0.697 0.62 0.794 0.765 0.907 0.875 ...
##  $ IS.Peak.Name                             : chr  "Testo-d3 2" "Testo-d3 2" "Testo-d3 2" "Testo-d3 2" ...
##  $ IS.Units                                 : chr  "nmol/L" "nmol/L" "nmol/L" "nmol/L" ...
##  $ IS.Peak.Area..counts.                    : int  158416 158416 173383 173383 172263 172263 173811 173811 187783 187783 ...
##  $ IS.Peak.Area.for.DAD..mAU.x.min.         : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ IS.Peak.Height..cps.                     : num  55800 55800 60100 60100 61200 61200 60300 60300 65700 65700 ...
##  $ IS.Peak.Height.for.DAD..mAU.             : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ IS.Concentration..nmol.L.                : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ IS.Retention.Time..min.                  : num  1.27 1.27 1.27 1.27 1.27 1.27 1.28 1.28 1.28 1.28 ...
##  $ IS.Expected.RT..min.                     : num  1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 ...
##  $ IS.RT.Window..sec.                       : num  20 20 20 20 20 20 20 20 20 20 ...
##  $ IS.Centroid.Location..min.               : num  1.28 1.28 1.28 1.28 1.28 1.28 1.28 1.28 1.28 1.28 ...
##  $ IS.Start.Scan                            : int  115 115 117 117 115 115 117 117 118 118 ...
##  $ IS.Start.Time..min.                      : num  1.18 1.18 1.2 1.2 1.18 1.18 1.2 1.2 1.21 1.21 ...
##  $ IS.Stop.Scan                             : int  139 139 140 140 140 140 139 139 139 139 ...
##  $ IS.Stop.Time..min.                       : num  1.43 1.43 1.44 1.44 1.44 1.44 1.43 1.43 1.43 1.43 ...
##  $ IS.Integration.Type                      : chr  "Base To Base" "Base To Base" "Base To Base" "Base To Base" ...
##  $ IS.Signal.To.Noise                       : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ IS.Peak.Width..min.                      : num  0.248 0.248 0.238 0.238 0.258 0.258 0.227 0.227 0.217 0.217 ...
##  $ IS.Mass.Ranges..Da.                      : chr  "292.500/97.200 Da" "292.500/97.200 Da" "292.500/97.200 Da" "292.500/97.200 Da" ...
##  $ IS.Wavelength.Ranges..nm.                : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ IS.Channel                               : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ IS.Peak.Width.at.50..Height..min.        : num  0.0436 0.0436 0.0445 0.0445 0.0435 0.0435 0.0455 0.0455 0.0451 0.0451 ...
##  $ IS.Slope.of.Baseline....min.             : num  1.06 1.06 1.17 1.17 1.39 1.39 1.51 1.51 1.93 1.93 ...
##  $ IS.Processing.Alg.                       : chr  "Specify Parameters - MQIII" "Specify Parameters - MQIII" "Specify Parameters - MQIII" "Specify Parameters - MQIII" ...
##  $ IS.Peak.Asymmetry                        : num  1.6 1.6 2.19 2.19 1.77 1.77 1.88 1.88 2.18 2.18 ...
##  $ IS.Integration.Quality                   : num  0.956 0.956 0.971 0.971 0.968 0.968 0.969 0.969 0.97 0.97 ...
##  $ Use.Record                               : int  NA NA 1 1 1 1 1 1 1 1 ...
##  $ Record.Modified                          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Calculated.Concentration..nmol.L.        : chr  "N/A" "N/A" "0.119" "0.0962" ...
##  $ Calculated.Concentration.for.DAD..nmol.L.: chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ Relative.Retention.Time                  : num  0 0 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 ...
##  $ Accuracy....                             : chr  "N/A" "N/A" "111." "89.1" ...
##  $ Response.Factor                          : chr  "N/A" "N/A" "0.282" "0.185" ...
##  $ Acq..Start.Time..min.                    : num  2.85 2.85 2.85 2.85 2.85 2.85 2.85 2.85 2.85 2.85 ...
##  $ Injection.Volume.used                    : int  25 25 25 25 25 25 25 25 25 25 ...
##  $ X                                        : logi  NA NA NA NA NA NA ...

## 'data.frame': 196 obs. of 83 variables:

## $ Sample.Name : chr "Blank" "Blank" "STD1" "STD1" ...

## $ Sample.ID : logi NA NA NA NA NA NA ...

## $ Sample.Type : chr "Blank" "Blank" "Standard" "Standard" ...

## $ Sample.Comment : chr "BlankMPX_SAMPLE_ID:189001;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;" "BlankMPX_SAMPLE_ID:189001;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;" "StandardMPX_SAMPLE_ID:189002;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;" "StandardMPX_SAMPLE_ID:189002;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;" ...

## $ Set.Number : int 0 0 0 0 0 0 0 0 0 0 ...

## $ Acquisition.Method : chr "Testo_DWP_S2.dam" "Testo_DWP_S2.dam" "Testo_DWP_S2.dam" "Testo_DWP_S2.dam" ...

## $ Acquisition.Date : chr "2/11/2013 5:06:45 PM" "2/11/2013 5:06:45 PM" "2/11/2013 5:13:14 PM" "2/11/2013 5:13:14 PM" ...

## $ Rack.Type : chr "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" ...

## $ Rack.Position : int 1 1 1 1 1 1 1 1 1 1 ...

## $ Vial.Position : int 1 1 13 13 25 25 37 37 49 49 ...

## $ Plate.Type : chr "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" ...

## $ Plate.Position : int 2 2 2 2 2 2 2 2 2 2 ...

## $ File.Name : chr "140305B1.wiff" "140305B1.wiff" "140305B1.wiff" "140305B1.wiff" ...

## $ Dilution.Factor : num 1 1 1 1 1 1 1 1 1 1 ...

## $ Weight.To.Volume.Ratio : num 0 0 0 0 0 0 0 0 0 0 ...

## $ Sample.Annotation : logi NA NA NA NA NA NA ...

## $ Disposition : logi NA NA NA NA NA NA ...

## $ Analyte.Peak.Name : chr "Testo 1" "Testo 2" "Testo 1" "Testo 2" ...

## $ Analyte.Units : chr "nmol/L" "nmol/L" "nmol/L" "nmol/L" ...

## $ Analyte.Peak.Area..counts. : int 0 0 5273 3464 19412 16195 37994 32722 87815 74821 ...

## $ Analyte.Peak.Area.for.DAD..mAU.x.min. : chr "N/A" "N/A" "N/A" "N/A" ...

## $ Analyte.Peak.Height..cps. : num 0 0 1830 1300 6620 5700 13600 11400 30900 26100 ...

## $ Analyte.Peak.Height.for.DAD..mAU. : chr "N/A" "N/A" "N/A" "N/A" ...

## $ Analyte.Concentration..nmol.L. : chr "0.00" "0.00" "0.108" "0.108" ...

## $ Analyte.Retention.Time..min. : num 0 0 1.29 1.29 1.29 1.28 1.29 1.29 1.29 1.29 ...

## $ Analyte.Expected.RT..min. : num 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 ...

## $ Analyte.RT.Window..sec. : num 30 30 30 30 30 30 30 30 30 30 ...

## $ Analyte.Centroid.Location..min. : num 0 0 1.3 1.29 1.29 1.29 1.3 1.3 1.3 1.29 ...

## $ Analyte.Start.Scan : int 0 0 119 120 119 118 120 120 119 119 ...

## $ Analyte.Start.Time..min. : num 0 0 1.22 1.23 1.22 1.21 1.23 1.23 1.22 1.22 ...

## $ Analyte.Stop.Scan : int 0 0 137 130 137 135 135 138 141 139 ...

## $ Analyte.Stop.Time..min. : num 0 0 1.41 1.33 1.41 1.39 1.39 1.42 1.45 1.43 ...

## $ Analyte.Integration.Type : chr "No Peak" "No Peak" "Base To Base" "Base To Base" ...

## $ Analyte.Signal.To.Noise : chr "N/A" "N/A" "N/A" "N/A" ...

## $ Analyte.Peak.Width..min. : num 0 0 0.186 0.103 0.186 0.176 0.155 0.186 0.227 0.207 ...

## $ Standard.Query.Status : chr "N/A" "N/A" "N/A" "N/A" ...

## $ Analyte.Mass.Ranges..Da. : chr "289.500/97.200 Da" "289.500/109.100 Da" "289.500/97.200 Da" "289.500/109.100 Da" ...

## $ Analyte.Wavelength.Ranges..nm. : chr "N/A" "N/A" "N/A" "N/A" ...

## $ Area.Ratio : num 0 0 0.0304 0.02 0.113 0.094 0.219 0.188 0.468 0.398 ...

## $ Height.Ratio : num 0 0 0.0304 0.0216 0.108 0.0933 0.225 0.189 0.471 0.398 ...

## $ Analyte.Annotation : logi NA NA NA NA NA NA ...

## $ Analyte.Channel : chr "N/A" "N/A" "N/A" "N/A" ...

## $ Analyte.Peak.Width.at.50..Height..min. : num 0 0 0.0447 0.0447 0.046 0.0432 0.0439 0.0454 0.0438 0.045 ...

## $ Analyte.Slope.of.Baseline....min. : num 0 0 16.4 23.2 2.89 7.54 4.65 3.46 0.631 3 ...

## $ Analyte.Processing.Alg. : chr "Specify Parameters - MQIII" "Specify Parameters - MQIII" "Specify Parameters - MQIII" "Specify Parameters - MQIII" ...

## $ Analyte.Peak.Asymmetry : num 0 0 1.8 0.874 1.83 1.41 1.51 2.18 2.18 2 ...

## $ Analyte.Integration.Quality : num 0 0 0.379 0.233 0.697 0.62 0.794 0.765 0.907 0.875 ...

## $ IS.Peak.Name : chr "Testo-d3 2" "Testo-d3 2" "Testo-d3 2" "Testo-d3 2" ...

## $ IS.Units : chr "nmol/L" "nmol/L" "nmol/L" "nmol/L" ...

## $ IS.Peak.Area..counts. : int 158416 158416 173383 173383 172263 172263 173811 173811 187783 187783 ...

## $ IS.Peak.Area.for.DAD..mAU.x.min. : chr "N/A" "N/A" "N/A" "N/A" ...

## $ IS.Peak.Height..cps. : num 55800 55800 60100 60100 61200 61200 60300 60300 65700 65700 ...

## $ IS.Peak.Height.for.DAD..mAU. : chr "N/A" "N/A" "N/A" "N/A" ...

## $ IS.Concentration..nmol.L. : num 1 1 1 1 1 1 1 1 1 1 ...

## $ IS.Retention.Time..min. : num 1.27 1.27 1.27 1.27 1.27 1.27 1.28 1.28 1.28 1.28 ...

## $ IS.Expected.RT..min. : num 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 ...

## $ IS.RT.Window..sec. : num 20 20 20 20 20 20 20 20 20 20 ...

## $ IS.Centroid.Location..min. : num 1.28 1.28 1.28 1.28 1.28 1.28 1.28 1.28 1.28 1.28 ...

## $ IS.Start.Scan : int 115 115 117 117 115 115 117 117 118 118 ...

## $ IS.Start.Time..min. : num 1.18 1.18 1.2 1.2 1.18 1.18 1.2 1.2 1.21 1.21 ...

## $ IS.Stop.Scan : int 139 139 140 140 140 140 139 139 139 139 ...

## $ IS.Stop.Time..min. : num 1.43 1.43 1.44 1.44 1.44 1.44 1.43 1.43 1.43 1.43 ...

## $ IS.Integration.Type : chr "Base To Base" "Base To Base" "Base To Base" "Base To Base" ...

## $ IS.Signal.To.Noise : chr "N/A" "N/A" "N/A" "N/A" ...

## $ IS.Peak.Width..min. : num 0.248 0.248 0.238 0.238 0.258 0.258 0.227 0.227 0.217 0.217 ...

## $ IS.Mass.Ranges..Da. : chr "292.500/97.200 Da" "292.500/97.200 Da" "292.500/97.200 Da" "292.500/97.200 Da" ...

## $ IS.Wavelength.Ranges..nm. : chr "N/A" "N/A" "N/A" "N/A" ...

## $ IS.Channel : chr "N/A" "N/A" "N/A" "N/A" ...

## $ IS.Peak.Width.at.50..Height..min. : num 0.0436 0.0436 0.0445 0.0445 0.0435 0.0435 0.0455 0.0455 0.0451 0.0451 ...

## $ IS.Slope.of.Baseline....min. : num 1.06 1.06 1.17 1.17 1.39 1.39 1.51 1.51 1.93 1.93 ...

## $ IS.Processing.Alg. : chr "Specify Parameters - MQIII" "Specify Parameters - MQIII" "Specify Parameters - MQIII" "Specify Parameters - MQIII" ...

## $ IS.Peak.Asymmetry : num 1.6 1.6 2.19 2.19 1.77 1.77 1.88 1.88 2.18 2.18 ...

## $ IS.Integration.Quality : num 0.956 0.956 0.971 0.971 0.968 0.968 0.969 0.969 0.97 0.97 ...

## $ Use.Record : int NA NA 1 1 1 1 1 1 1 1 ...

## $ Record.Modified : int 0 0 0 0 0 0 0 0 0 0 ...

## $ Calculated.Concentration..nmol.L. : chr "N/A" "N/A" "0.119" "0.0962" ...

## $ Calculated.Concentration.for.DAD..nmol.L.: chr "N/A" "N/A" "N/A" "N/A" ...

## $ Relative.Retention.Time : num 0 0 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 ...

## $ Accuracy.... : chr "N/A" "N/A" "111." "89.1" ...

## $ Response.Factor : chr "N/A" "N/A" "0.282" "0.185" ...

## $ Acq..Start.Time..min. : num 2.85 2.85 2.85 2.85 2.85 2.85 2.85 2.85 2.85 2.85 ...

## $ Injection.Volume.used : int 25 25 25 25 25 25 25 25 25 25 ...

## $ X : logi NA NA NA NA NA NA ...

Just Tell Me the Results

And we see that there is lots of stuff we don't need. What we do need are the columns titled “Sample.Name” (which is the specimen container ID in this case), the “Analyte.Peak.Name” (which is the MRM, either quantifier or qualifier), and the one whose name starts with “Calculated.Concentration..”. The last of these also contains the units of measure which is analyte–dependent. To get rid of this analyte–dependence of the column name, we can find out which column this is and rename it:

conc.col.num <- grep("Calculated.Concentration..",names(my.data), fixed = TRUE)
names(my.data)[conc.col.num]<- "Calculated.Concentration"

conc.col.num <- grep("Calculated.Concentration..",names(my.data), fixed = TRUE)

names(my.data)[conc.col.num]<- "Calculated.Concentration"

Now we can pull out the three columns of interest and put them into a dataframe named results.

#pull out the columns of interest
results <- my.data[,c("Sample.Name", "Analyte.Peak.Name","Calculated.Concentration")]
names(results) <- c("sampleID", "mrm", "conc")

#pull out the columns of interest

results <- my.data[,c("Sample.Name", "Analyte.Peak.Name","Calculated.Concentration")]

names(results) <- c("sampleID", "mrm", "conc")

Now we only need the quantifier ion results which we were defined by the user with Tk GUI, so we can pull them out with grep. I will pull out the qualifiers also but we do not need them unless we wanted to compute ion-ratios, for example.

#handle non-numeric results
quantifiers <- results[grep(MRM.names[1], results$mrm),]
quantifiers$conc <- as.numeric(quantifiers$conc)

#handle non-numeric results

quantifiers <- results[grep(MRM.names[1], results$mrm),]

quantifiers$conc <- as.numeric(quantifiers$conc)

## Warning: NAs introduced by coercion

1	## Warning: NAs introduced by coercion

qualifiers <- results[grep(MRM.names[2], results$mrm),]
qualifiers$conc <- as.numeric(qualifiers$conc)

qualifiers <- results[grep(MRM.names[2], results$mrm),]

qualifiers$conc <- as.numeric(qualifiers$conc)

## Warning: NAs introduced by coercion

1	## Warning: NAs introduced by coercion

Having pulled out the MRM of interest, we can define which rows correspond to standards, QC and patients by appropriate use of grep(). It happens that the CIDs all start with E followed by a 10 digit number so we can search for this pattern with a simple regular expression. Since we only need the QCs and patient data, the variable standards is calculated only as a matter of completeness.

#separate out sample types
standards <- grep("Blank|STD",quantifiers$sampleID)
qc <- grep("C-", quantifiers$sampleID)
#create a regular expression to identify samples (E followed by 10 digits)
regexp<-"(^E)([[:digit:]]{10})"
patients <-grep(pattern=regexp,quantifiers$sampleID)
output.data <- quantifiers[c(qc,patients),]

#separate out sample types

standards <- grep("Blank|STD",quantifiers$sampleID)

qc <- grep("C-", quantifiers$sampleID)

#create a regular expression to identify samples (E followed by 10 digits)

regexp<-"(^E)([[:digit:]]{10})"

patients <-grep(pattern=regexp,quantifiers$sampleID)

output.data <- quantifiers[c(qc,patients),]

Preparing Data for Output

Now we can prepare to write a dataframe corresponding to the required format of the output file. To do so, we'll need to find out how many rows we are writing and then prepare a vector of the same length repeating the name of the worksheet and testcode:

#prepare the final data
num.rows <- length(output.data$sampleID)
final.output.data <- data.frame(rep(worksheet,num.rows), output.data$sampleID, rep(test.code, num.rows), output.data$conc)
names(final.output.data) <- c("worksheet","sample","test","conc")

#prepare the final data

num.rows <- length(output.data$sampleID)

final.output.data <- data.frame(rep(worksheet,num.rows), output.data$sampleID, rep(test.code, num.rows), output.data$conc)

names(final.output.data) <- c("worksheet","sample","test","conc")

Now we can replace all the NA values that replaced “No Peak” with the correct LoQ according to which analyte we are looking at.

#to put LOQs in, we need to convert to character
#this assumes that all non numeric results are undetectable
final.output.data$conc <- as.character(final.output.data$conc)
final.output.data$conc[is.na(final.output.data$conc)] <- LoQ

#to put LOQs in, we need to convert to character

#this assumes that all non numeric results are undetectable

final.output.data$conc <- as.character(final.output.data$conc)

final.output.data$conc[is.na(final.output.data$conc)] <- LoQ

Our final.output.data dataframe looks like it behaved properly.

head(final.output.data,10)

1 2	head(final.output.data,10)

##    worksheet      sample test  conc
## 1       PAPI     C-LY1LR  TES 0.557
## 2       PAPI       C-LY1  TES  5.65
## 3       PAPI       C-LY2  TES  20.6
## 4       PAPI       C-LY3  TES  28.1
## 5       PAPI      C-PTES  TES 0.737
## 6       PAPI E1234083035  TES  1.04
## 7       PAPI E1234109065  TES  14.1
## 8       PAPI E1234086634  TES  19.2
## 9       PAPI E1234107491  TES    13
## 10      PAPI E1234114052  TES  18.6

## worksheet sample test conc

## 1 PAPI C-LY1LR TES 0.557

## 2 PAPI C-LY1 TES 5.65

## 3 PAPI C-LY2 TES 20.6

## 4 PAPI C-LY3 TES 28.1

## 5 PAPI C-PTES TES 0.737

## 6 PAPI E1234083035 TES 1.04

## 7 PAPI E1234109065 TES 14.1

## 8 PAPI E1234086634 TES 19.2

## 9 PAPI E1234107491 TES 13

## 10 PAPI E1234114052 TES 18.6

Timestamping, Writing and Archiving

And finally, we create directories to archive our data (if those directories do not exist) and write the files with an appropriate timestamp determined using Sys.time(). Since colons (i.e : ) don't play nice in all operating systems as filenames, we can use gsub() to get rid of them. We also pass along error messages or confirmation messages to the user as appropriate.

#If the data file happens to be empty because you selected the wrong file, abort
if(nrow(final.output.data)==0){
  tkmessageBox(message="Your flat file contained no patient data. Aborting file output")
} else if (nrow(final.output.data)>0) {

  #create the output directory if it does not exist
  if(!dir.exists(file.path(flat.file.dir, "Processed"))){
    dir.create(file.path(flat.file.dir, "Processed"))
  }

  if(!dir.exists(file.path(flat.file.dir, "Raw"))){
    dir.create(file.path(flat.file.dir, "Raw"))
  }

  #create a  ISO 8601 compliant timestamp - get rid of spaces and colons
  time.stamp <- gsub(":","", Sys.time(), fixed = TRUE)
  time.stamp <- gsub(" ","T", time.stamp, fixed = TRUE)

  #save a copy of the input file
  flat.file.copy.name <- paste(test.code,"_",time.stamp, "_Raw.txt", sep="")
  file.copy(flat.file.path, file.path(flat.file.dir,"Raw", flat.file.copy.name ))

  #write the final output file
  final.output.name <-  paste(test.code,"_",time.stamp, ".txt", sep="")
  final.output.path <- file.path(flat.file.dir,"Processed" ,final.output.name)
  write.table(file = final.output.path, final.output.data, quote = FALSE, row.names = FALSE, col.names = FALSE, sep = ",")

  #check that the file was created as expected
  if(file.exists(final.output.path)){
    tkmessageBox(message="Data successfully processed \n Check Processed directory")
  } else {
    tkmessageBox(message="Your file was not created. There was a problem")
  }
}

#If the data file happens to be empty because you selected the wrong file, abort

if(nrow(final.output.data)==0){

tkmessageBox(message="Your flat file contained no patient data. Aborting file output")

} else if (nrow(final.output.data)>0) {

#create the output directory if it does not exist

if(!dir.exists(file.path(flat.file.dir, "Processed"))){

dir.create(file.path(flat.file.dir, "Processed"))

}

if(!dir.exists(file.path(flat.file.dir, "Raw"))){

dir.create(file.path(flat.file.dir, "Raw"))

}

#create a ISO 8601 compliant timestamp - get rid of spaces and colons

time.stamp <- gsub(":","", Sys.time(), fixed = TRUE)

time.stamp <- gsub(" ","T", time.stamp, fixed = TRUE)

#save a copy of the input file

flat.file.copy.name <- paste(test.code,"_",time.stamp, "_Raw.txt", sep="")

file.copy(flat.file.path, file.path(flat.file.dir,"Raw", flat.file.copy.name ))

#write the final output file

final.output.name <- paste(test.code,"_",time.stamp, ".txt", sep="")

final.output.path <- file.path(flat.file.dir,"Processed" ,final.output.name)

write.table(file = final.output.path, final.output.data, quote = FALSE, row.names = FALSE, col.names = FALSE, sep = ",")

#check that the file was created as expected

if(file.exists(final.output.path)){

tkmessageBox(message="Data successfully processed \n Check Processed directory")

} else {

tkmessageBox(message="Your file was not created. There was a problem")

}

Finally, we would wrap all of the directory–creation and file–operation in an if statement tied to the variable called problem we created previously. You will see this in the final source–code linked below.

Other Things You Can Do

Now, you can easily modify this to deal with multiple anlytes that are always on the same run, such as Vitamin D2 and Vitamin D3. If you wanted to suppress results failing ion ratio criteria (which could be concentration–dependent of course) or if you had specimens unexpectedly low IS counts, you could easily censor them to prevent their upload and then review them manually. You could also append canned comments to your results with a dash between your result and the comment. In fact, you could theoretically develop very elaborate middleware for QC evaluation and interpretation. You could also use RMarkdown to generate PDF reports for the run which could include calibration curve plots, plots of quantifier results vs qualifier results, and results that fail various criteria.

Source

You can download the source code and three example flat files here. Setting the source–code up as a “clickable” script is somewhat dependent on the operating system you are working on. Since most of you will be on a windows system you can follow this tutorial. You can also use a windows batch file to call your script.

Final Thought

Now that your file is generated, it is read to upload via ssh. This is usually performed manually but could be automated. Don't implement this code into routine use unless you know what you are doing and you have tested it extensively. By using and/or modifying it, you become entirely responsible for its correct operation. Excel is like a butter knife and R is like Swiss Army Knife. You must be careful with it because…

From everyone who has been given much, much will be demanded; and from the one who has been entrusted with much, much more will be asked.

Luke 12:48

Count The Mondays in a Time Interval with Lubridate

November 25, 2015November 25, 2015 dtholmes@mail.ubc.ca

Recently, while working on quantifying the inpatient workload volume of routine tests as a function of the day of the week, I needed to be able to count the number of Mondays, Tuesdays, etc in a time–interval so I could calculate the average volume for each weekday in a time–interval.

The lubridate package makes this a very easy thing to do. Suppose the first date in your series is 21-May-2015 and the last date is 19-Aug-2015.

library(lubridate)
startDate <- dmy("21-May-2015")
endDate <- dmy("19-Aug-2015")

library(lubridate)

startDate <- dmy("21-May-2015")

endDate <- dmy("19-Aug-2015")

Now build a sequence between the dates:

myDates <-seq(from = startDate, to = endDate, by = "days")
head(myDates)

myDates <-seq(from = startDate, to = endDate, by = "days")

head(myDates)

## [1] "2015-05-21 UTC" "2015-05-22 UTC" "2015-05-23 UTC" "2015-05-24 UTC"
## [5] "2015-05-25 UTC" "2015-05-26 UTC"

1 2	## [1] "2015-05-21 UTC" "2015-05-22 UTC" "2015-05-23 UTC" "2015-05-24 UTC" ## [5] "2015-05-25 UTC" "2015-05-26 UTC"

The function wday() tells you which weekday a date corresponds to with Sunday being 1, Monday being 2 etc.

wday(startDate)

1 2	wday(startDate)

## [1] 5

## [1] 5

This means that 2015-05-21 was a Thursday. To get the abbreviation, you can enter:

wday(startDate, label = TRUE)

1 2	wday(startDate, label = TRUE)

## [1] Thurs
## Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat

1 2	## [1] Thurs ## Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat

and to get the full name of the day:

wday(startDate, label = TRUE, abbr = FALSE)

1 2	wday(startDate, label = TRUE, abbr = FALSE)

## [1] Thursday
## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday

1 2	## [1] Thursday ## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday

Leap years are accounted for:

wday(dmy("29-Feb-1504"), label = TRUE, abbr = FALSE)

1 2	wday(dmy("29-Feb-1504"), label = TRUE, abbr = FALSE)

## [1] Monday
## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday

1 2	## [1] Monday ## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday

So, we can use this as follows to find the Mondays:

which(wday(myDates)==2)

1 2	which(wday(myDates)==2)

##  [1]  5 12 19 26 33 40 47 54 61 68 75 82 89

1	## [1] 5 12 19 26 33 40 47 54 61 68 75 82 89

So the whole code to count them is:

startDate <- dmy("21-May-2015")
endDate <- dmy("19-Aug-2015")
myDates <-seq(from = startDate, to = endDate, by = "days")
length(which(wday(myDates)==2))

startDate <- dmy("21-May-2015")

endDate <- dmy("19-Aug-2015")

myDates <-seq(from = startDate, to = endDate, by = "days")

length(which(wday(myDates)==2))

## [1] 13

## [1] 13

I was born on August 04, 1971. This was a Wednesday. How many Wednesdays since I was born?

startDate <- dmy("04-Aug-1971")
endDate <- dmy("25-Nov-2015")
myDates <-seq(from = startDate, to = endDate, by = "days")
length(which(wday(myDates, label = TRUE)=="Wed"))

startDate <- dmy("04-Aug-1971")

endDate <- dmy("25-Nov-2015")

myDates <-seq(from = startDate, to = endDate, by = "days")

length(which(wday(myDates, label = TRUE)=="Wed"))

## [1] 2313

1	## [1] 2313

Which means, today I am 2312 weeks old today! Hurray. This is not a typo. The time interval is flanked by Wednesdays so there is one more Wednesday than the number of weeks in the interval. I thank my first–year calculus prof for beating this into me with reference to Simpson's Rule numerical integration.

Hope that comes in handy.

-Dan

Teach us to number our days, that we may gain a heart of wisdom. Psalm 90:12.

Making Youden Plots in R

October 15, 2015October 15, 2015 dtholmes@mail.ubc.ca

Background

I was honoured by a site visit by Drs. Yeo-Min Yun and Junghan Song of the Korean Society for Clinical Chemistry a few weeks ago. As both professors are on the organizing committee of the Cherry Blossom Symposium for Lab Automation in Seoul in Spring 2016, their primary motivation for visiting was to discuss mass spectrometry sample prep automation but later we got on to the topic of automated reporting for quality assurance schemes.

Naturally, I was promoting R, R-Markdown and knitr as a good pipeline for automated Quality Assurance (QA) report.

This brought to mind Youden Plots which are used by DGKL in their reports. I like DGKL reports for three reasons:

They are accuracy based against GC-MS when it comes to steroids.
I can see all the other LC-MS/MS methods immediately.
Youden plots look like a target and can be assessed rapidly.

The data for a Youden plot is generated by providing a number of laboratories aliquots from two separate unknown samples, which we will call A and B. Every lab analyzes both samples and a scatter plot of the A and B results are generated–the A results on the $x$–axis and the B results on the $y$–axis. Once this is completed, limits of acceptability are plotted and outliers can be identified.

In Youden's original formulation of the plot (see page 133-1 this online document) he required that the concentrations of the A and B samples be close to one another. As you might guess, in clinical medicine, this is not all that useful because we often want to test more than one part of the analytical range in an external quality assurance (EQA) scheme. One workaround for this is the make a Youden plot of the standard normal variates for the A and B samples, that is to plot $z_b = \frac{b_i-\bar{b}} {\sigma_b}$ vs $z_a = \frac{a_i-\bar{a}} {\sigma_a}$, where $a_i$ and $b_i$ are the individual values of the A and B samples from the $i$ labs. This has the disadvantage of representing the results in a manner that is not easily assessed from a clinical perspective.

While there are published approaches to coping with this problem, these are out of scope here but I will show you a couple of other ways I have seen Youden plots represented. If you want to see R code to generate the classic Youden Plot, it can be found in this this stackoverflow post and below.

Random Data

Let's start by generating some data. For the sake of argument, let's say we are looking at testosterone results in males and measured in nmol/L. Suppose that the A sample has a true concentration of 5.3 nmol/L and the B sample has a true concentration of 16.2 nmol/L. Let's also assume that they are all performed by the same analytical method. If you have looked at EQA reports, you will know that a scatter plot of results for the A and B samples does not typically look like this.

plot of chunk unnamed-chunk-1

The (mock) data abaove are bivariate Gaussian and uncorrelated. In reality we often see something that looks a little more like this:

plot of chunk unnamed-chunk-2

That is, the A and B values are usually correlated.

Rectangular Youden Plots

The most common manner in which you will see a Youden plot prepared is just a box with mean $\pm$ 2SD and $\pm$ 3SD limits.

plot(A,B, xlim = c(0,10), ylim = c(0,30), pch=19, col="#00000080")
grid()

A.mu <- mean(A)
A.sd <- sd(A)
B.mu <- mean(B)
B.sd <- sd(B)

#draw a box around the 2SD limit
rect(xleft = A.mu - 2*A.sd, ybottom = B.mu - 2*B.sd, xright = A.mu + 2*A.sd, ytop = B.mu + 2*B.sd, lwd = 2, border = "orange")
#draw a box around the 3SD limit
rect(xleft = A.mu - 3*A.sd, ybottom = B.mu - 3*B.sd, xright = A.mu + 3*A.sd, ytop = B.mu + 3*B.sd, lwd = 2, border = "red")


#draw a diagonal line - which is unnecessary but you will see people do it
lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd),c(B.mu - 3*B.sd, B.mu + 3*B.sd))
#draw horizontal line
lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd), c(B.mu, B.mu))
#draw vertical line
lines(c(A.mu, A.mu), c(B.mu - 3*B.sd, B.mu + 3*B.sd))

#add a legend
legend("topleft", c("2SD limit","3SD limit"), col=c("orange","red"), lty=c(1,1))

plot(A,B, xlim = c(0,10), ylim = c(0,30), pch=19, col="#00000080")

grid()

A.mu <- mean(A)

A.sd <- sd(A)

B.mu <- mean(B)

B.sd <- sd(B)

#draw a box around the 2SD limit

rect(xleft = A.mu - 2*A.sd, ybottom = B.mu - 2*B.sd, xright = A.mu + 2*A.sd, ytop = B.mu + 2*B.sd, lwd = 2, border = "orange")

#draw a box around the 3SD limit

rect(xleft = A.mu - 3*A.sd, ybottom = B.mu - 3*B.sd, xright = A.mu + 3*A.sd, ytop = B.mu + 3*B.sd, lwd = 2, border = "red")

#draw a diagonal line - which is unnecessary but you will see people do it

lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd),c(B.mu - 3*B.sd, B.mu + 3*B.sd))

#draw horizontal line

lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd), c(B.mu, B.mu))

#draw vertical line

lines(c(A.mu, A.mu), c(B.mu - 3*B.sd, B.mu + 3*B.sd))

#add a legend

legend("topleft", c("2SD limit","3SD limit"), col=c("orange","red"), lty=c(1,1))

plot of chunk unnamed-chunk-3

Non Parametric

Obviously, if you prefer, you could prepare this Youden plot in a non-parametric fashion by plotting the medians and the non-parametrically calculated 1st, 2.5th, 97.5th, and 99th percentiles. In this case, the code would be:

plot(A,B, xlim = c(0,10), ylim = c(0,30), pch=19, col="#00000080")
grid()

A.med <- median(A)
A.quants <- quantile(A, probs=c(0.01,0.025,0.975,0.99))
B.med <- median(B)
B.quants <- quantile(B, probs=c(0.01,0.025,0.975,0.99))

#draw a box around the central 95% limit
rect(xleft = A.quants[2], ybottom = B.quants[2], xright = A.quants[3], ytop = B.quants[3], lwd = 2, border = "orange")
#draw a box around the central 99% limit
rect(xleft = A.quants[1], ybottom = B.quants[1], xright = A.quants[4], ytop = B.quants[4], lwd = 2, border = "red")

#draw a vertical line
lines(c(A.quants[1],A.quants[4]),c(B.med,B.med))
#draw vertical line
lines(c(A.med,A.med),c(B.quants[1],B.quants[4]))

legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))

plot(A,B, xlim = c(0,10), ylim = c(0,30), pch=19, col="#00000080")

grid()

A.med <- median(A)

A.quants <- quantile(A, probs=c(0.01,0.025,0.975,0.99))

B.med <- median(B)

B.quants <- quantile(B, probs=c(0.01,0.025,0.975,0.99))

#draw a box around the central 95% limit

rect(xleft = A.quants[2], ybottom = B.quants[2], xright = A.quants[3], ytop = B.quants[3], lwd = 2, border = "orange")

#draw a box around the central 99% limit

rect(xleft = A.quants[1], ybottom = B.quants[1], xright = A.quants[4], ytop = B.quants[4], lwd = 2, border = "red")

#draw a vertical line

lines(c(A.quants[1],A.quants[4]),c(B.med,B.med))

#draw vertical line

lines(c(A.med,A.med),c(B.quants[1],B.quants[4]))

legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))

plot of chunk unnamed-chunk-4

Note that if you increase the number of points, this non-parametric plot will not quite converge to look like the parametric one shown above because $\mu \pm 2\sigma$ actually encompases 95.45% of the data in a univariate normal distribution, and $\mu \pm 3\sigma$ actually encompases 99.73% of the data.

FYI

Be ye not deceived. Even with the non-parametric approach or a parametric approach with the correct $z$-scores, the orange (“Central 95%” or “2SD”) boxes shown above do not house 95% of the data and the red (“Central 99%” or “3SD”) boxes do not house 99% of the data. You can see this pretty easily if you consider the case of uncorrelated data. Let's take 100000 random pairs of uncorrelated Gaussian data with $\mu = 10$ and $\sigma = 1$.

Five percent of data points are excluded by the vertical orange lines shown below at the $\mu_A \pm 1.96 \sigma_A$ and 5% of data are excluded by the horizontal orange lines positioned at $\mu_B \pm 1.96 \sigma_B$.

Points will fall into the one of the 4 areas shaded yellow 0.95 x 0.025 or 2.375 % of the time and points will fall into one of the 4 areas shaded purple 0.025 x 0.025 or 0.0625 % of the time. This means that the “2SD” box actually encloses 100 – 4×2.375 – 4×0.0625 % = 90.25 % of the data.

plot of chunk unnamed-chunk-5

Much more directly, the probability of both A and B falling inside the center square is 0.95×0.95 = 0.9025 = 90.25%.

You can do a random simulation to prove this to yourself:

#generate the data, n=10000
A <- rnorm(10^5,10,1)
B <- rnorm(10^5,10,1)
# convert results to z-scores
scale.df <- data.frame(scA = scale(A), scB = scale(B))
#calculate how many samples are inside the centre box
nrow(subset(scale.df, abs(scA)<1.96 & abs(scB)<1.96))

#generate the data, n=10000

A <- rnorm(10^5,10,1)

B <- rnorm(10^5,10,1)

# convert results to z-scores

scale.df <- data.frame(scA = scale(A), scB = scale(B))

#calculate how many samples are inside the centre box

nrow(subset(scale.df, abs(scA)<1.96 & abs(scB)<1.96))

## [1] 90245

1	## [1] 90245

This is pretty darn close to the 90.25% we were expecting.

Elliptical Youden Plots

The rectangular plot shown works (with caveats described) but there is something slightly undesirable about it because a point could be off in the corner, far away from the other data, but still inside the 3SD box. It seems much preferable to encircle the data with an ellipse. Fortunately, there is a built in function to achieve this in the car package which makes the code is very simple. The other nice thing is that the ellipses are actually calculated to house 95% and 99% of the data respectively.

library(car)
dataEllipse(A, B, levels=c(0.95, 0.99), fill = FALSE, xlim = c(0,10), ylim = c(0,30), pch = 19, col=c("#00000080", "red"), center.pch = FALSE)

library(car)

dataEllipse(A, B, levels=c(0.95, 0.99), fill = FALSE, xlim = c(0,10), ylim = c(0,30), pch = 19, col=c("#00000080", "red"), center.pch = FALSE)

plot of chunk unnamed-chunk-8

To generate a two–colour equivalent to what we have above we draw the Youden plot in two stages.

dataEllipse(A, B, levels = 0.95, fill = FALSE, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "orange"), center.pch = FALSE)

dataEllipse(A, B, levels = 0.99, fill = FALSE, col = c(NA, "red"), center.pch = FALSE, plot.points = FALSE)

legend("topleft", c("Central 95%","Central 99%"), col = c("orange","red"), lty=c(1,1))

dataEllipse(A, B, levels = 0.95, fill = FALSE, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "orange"), center.pch = FALSE)

dataEllipse(A, B, levels = 0.99, fill = FALSE, col = c(NA, "red"), center.pch = FALSE, plot.points = FALSE)

legend("topleft", c("Central 95%","Central 99%"), col = c("orange","red"), lty=c(1,1))

plot of chunk unnamed-chunk-9

Now we might want to add the horizontal, vertical lines.

#store the points of the outer ellipse in a matrix
outer.ellipse <- dataEllipse(A, B, levels = 0.99, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "red"), draw = TRUE, center.pch = FALSE)
#plot the innter ellipse
dataEllipse(A, B, levels = 0.95, fill = FALSE, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "orange"), center.pch = FALSE, plot.points = FALSE, add = TRUE)

#find the value of x that is closest to the mean value of A
vert.index <- which.min(abs(A.mu-outer.ellipse[,1]))
#calculate the vertical distance to centre
vert.dist <- outer.ellipse[,2][vert.index] - B.mu
#draw the line one end of ellipse to the other
lines(c(A.mu, A.mu), c(outer.ellipse[,2][vert.index], outer.ellipse[,2][vert.index] - 2*vert.dist))

#find the value of y that is closest to the mean valye of B
horiz.index <- which.min(abs(B.mu-outer.ellipse[,2]))
#calculate the horizonal distance to centre
horiz.dist <- outer.ellipse[,1][horiz.index] - A.mu
#draw the line one end of ellipse to the other
lines(c(outer.ellipse[,1][horiz.index], outer.ellipse[,1][horiz.index] - 2*horiz.dist), c(B.mu,B.mu))
#add a legend
legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))

#store the points of the outer ellipse in a matrix

outer.ellipse <- dataEllipse(A, B, levels = 0.99, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "red"), draw = TRUE, center.pch = FALSE)

#plot the innter ellipse

dataEllipse(A, B, levels = 0.95, fill = FALSE, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "orange"), center.pch = FALSE, plot.points = FALSE, add = TRUE)

#find the value of x that is closest to the mean value of A

vert.index <- which.min(abs(A.mu-outer.ellipse[,1]))

#calculate the vertical distance to centre

vert.dist <- outer.ellipse[,2][vert.index] - B.mu

#draw the line one end of ellipse to the other

lines(c(A.mu, A.mu), c(outer.ellipse[,2][vert.index], outer.ellipse[,2][vert.index] - 2*vert.dist))

#find the value of y that is closest to the mean valye of B

horiz.index <- which.min(abs(B.mu-outer.ellipse[,2]))

#calculate the horizonal distance to centre

horiz.dist <- outer.ellipse[,1][horiz.index] - A.mu

#draw the line one end of ellipse to the other

lines(c(outer.ellipse[,1][horiz.index], outer.ellipse[,1][horiz.index] - 2*horiz.dist), c(B.mu,B.mu))

#add a legend

legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))

plot of chunk unnamed-chunk-10

And if you wanted a little more groove you can add shading.

dataEllipse(A, B, levels = 0.99, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "red"), center.pch = FALSE, plot.points = TRUE, fill = TRUE, fill.alpha = 0.1)
dataEllipse(A, B, levels=0.95 ,xlim = c(0,10), ylim = c(0,30), pch = 19, col="orange", center.pch = FALSE, plot.points = FALSE, add = TRUE, fill = TRUE, fill.alpha = 0.3)
lines(c(A.mu, A.mu), c(outer.ellipse[,2][vert.index], outer.ellipse[,2][vert.index] - 2*vert.dist), col = "blue")
lines(c(outer.ellipse[,1][horiz.index], outer.ellipse[,1][horiz.index] - 2*horiz.dist), c(B.mu,B.mu), col = "blue")
legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))

dataEllipse(A, B, levels = 0.99, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "red"), center.pch = FALSE, plot.points = TRUE, fill = TRUE, fill.alpha = 0.1)

dataEllipse(A, B, levels=0.95 ,xlim = c(0,10), ylim = c(0,30), pch = 19, col="orange", center.pch = FALSE, plot.points = FALSE, add = TRUE, fill = TRUE, fill.alpha = 0.3)

lines(c(A.mu, A.mu), c(outer.ellipse[,2][vert.index], outer.ellipse[,2][vert.index] - 2*vert.dist), col = "blue")

lines(c(outer.ellipse[,1][horiz.index], outer.ellipse[,1][horiz.index] - 2*horiz.dist), c(B.mu,B.mu), col = "blue")

legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))

plot of chunk unnamed-chunk-11

Build a Function

We can also fold all of this into a function:

youden <- function(A,B,shape){

    A.mu <- mean(A)
    A.sd <- sd(A)
    B.mu <- mean(B)
    B.sd <- sd(B)

    plot(A,B, pch=19, col = "#00000080", xlim = c(A.mu - 5*A.sd,A.mu + 5*A.sd), ylim = c(B.mu - 5*B.sd,B.mu + 5*B.sd))
    grid()

  if (missing(shape)){
    shape <- "ellipse"
  }

  if (shape == "rectangle"){
    rect(xleft = A.mu - 2*A.sd, ybottom = B.mu - 2*B.sd, xright = A.mu + 2*A.sd, ytop = B.mu + 2*B.sd, lwd = 2, border = "orange")
    rect(xleft = A.mu - 3*A.sd, ybottom = B.mu - 3*B.sd, xright = A.mu + 3*A.sd, ytop = B.mu + 3*B.sd, lwd = 2, border = "red")
    lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd),c(B.mu - 3*B.sd, B.mu + 3*B.sd))
    lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd), c(B.mu, B.mu))
    lines(c(A.mu, A.mu), c(B.mu - 3*B.sd, B.mu + 3*B.sd))
    legend("topleft", c("2SD limit","3SD limit"), col=c("orange","red"), lty=c(1,1))

  } else if (shape=="ellipse") {
    outer.ellipse <- dataEllipse(A, B, levels = 0.99, pch = 19, col=c("#00000080", "red"), draw = TRUE, center.pch = FALSE, plot.points = FALSE)
    dataEllipse(A, B, levels = 0.95, fill = FALSE, pch = 19, col=c("#00000080", "orange"), center.pch = FALSE, plot.points = FALSE, add = TRUE)
    vert.index <- which.min(abs(A.mu-outer.ellipse[,1]))
    vert.dist <- outer.ellipse[,2][vert.index] - B.mu
    lines(c(A.mu, A.mu), c(outer.ellipse[,2][vert.index], outer.ellipse[,2][vert.index] - 2*vert.dist))
    horiz.index <- which.min(abs(B.mu-outer.ellipse[,2]))
    horiz.dist <- outer.ellipse[,1][horiz.index] - A.mu
    lines(c(outer.ellipse[,1][horiz.index], outer.ellipse[,1][horiz.index] - 2*horiz.dist), c(B.mu,B.mu))
    legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))
  }
}

#use the function
par(mfrow=c(1,2))
youden(A,B, shape = "rectangle")
youden(A,B, shape = "ellipse")

youden <- function(A,B,shape){

A.mu <- mean(A)

A.sd <- sd(A)

B.mu <- mean(B)

B.sd <- sd(B)

plot(A,B, pch=19, col = "#00000080", xlim = c(A.mu - 5*A.sd,A.mu + 5*A.sd), ylim = c(B.mu - 5*B.sd,B.mu + 5*B.sd))

grid()

if (missing(shape)){

shape <- "ellipse"

}

if (shape == "rectangle"){

rect(xleft = A.mu - 2*A.sd, ybottom = B.mu - 2*B.sd, xright = A.mu + 2*A.sd, ytop = B.mu + 2*B.sd, lwd = 2, border = "orange")

rect(xleft = A.mu - 3*A.sd, ybottom = B.mu - 3*B.sd, xright = A.mu + 3*A.sd, ytop = B.mu + 3*B.sd, lwd = 2, border = "red")

lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd),c(B.mu - 3*B.sd, B.mu + 3*B.sd))

lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd), c(B.mu, B.mu))

lines(c(A.mu, A.mu), c(B.mu - 3*B.sd, B.mu + 3*B.sd))

legend("topleft", c("2SD limit","3SD limit"), col=c("orange","red"), lty=c(1,1))

} else if (shape=="ellipse") {

outer.ellipse <- dataEllipse(A, B, levels = 0.99, pch = 19, col=c("#00000080", "red"), draw = TRUE, center.pch = FALSE, plot.points = FALSE)

dataEllipse(A, B, levels = 0.95, fill = FALSE, pch = 19, col=c("#00000080", "orange"), center.pch = FALSE, plot.points = FALSE, add = TRUE)

vert.index <- which.min(abs(A.mu-outer.ellipse[,1]))

vert.dist <- outer.ellipse[,2][vert.index] - B.mu

lines(c(A.mu, A.mu), c(outer.ellipse[,2][vert.index], outer.ellipse[,2][vert.index] - 2*vert.dist))

horiz.index <- which.min(abs(B.mu-outer.ellipse[,2]))

horiz.dist <- outer.ellipse[,1][horiz.index] - A.mu

lines(c(outer.ellipse[,1][horiz.index], outer.ellipse[,1][horiz.index] - 2*horiz.dist), c(B.mu,B.mu))

legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))

}

#use the function

par(mfrow=c(1,2))

youden(A,B, shape = "rectangle")

youden(A,B, shape = "ellipse")

plot of chunk unnamed-chunk-12

Comparison with the Classic Youden Plot

If the data happens to be uncorrelated, and has identical average A and B values, the elliptical approach will generate a (nearly) circular Youden plot according to the original description. The youden.classic() function code shown below is borrowed from the stackoverflow post mentioned above.

youden.classic <- function(A, B){
    plot(A,B,asp = 1, xlab = "A", ylab = "B", pch=".") 
    mB <- median(B)
    mA <- median(A)
    abline(h = mB, v = mA)
    curve(x-(mA-mB),add=TRUE)
    d <- mean(A-B)
    d_prime <- A-B-d
    r <- 2.45*mean(abs(d_prime))*sqrt(pi)/2
    t <- seq(0,2*pi,by=0.01)
    x <- r*cos(t)+mA
    y <- r*sin(t)+mB
    lines(x,y)
}

#generate 10000 pairs of bivariate data with a mean of 10 for both A and B and 15% CV
A <- rnorm(10000,10,1.5)
B <- rnorm(10000,10,1.5)
youden.classic(A,B)

#overlay ellipse
dataEllipse(A, B, levels = 0.95, fill = TRUE, fill.alpha = 0.3, col="orange", center.pch = FALSE, plot.points = FALSE, add = TRUE)

youden.classic <- function(A, B){

plot(A,B,asp = 1, xlab = "A", ylab = "B", pch=".")

mB <- median(B)

mA <- median(A)

abline(h = mB, v = mA)

curve(x-(mA-mB),add=TRUE)

d <- mean(A-B)

d_prime <- A-B-d

r <- 2.45*mean(abs(d_prime))*sqrt(pi)/2

t <- seq(0,2*pi,by=0.01)

x <- r*cos(t)+mA

y <- r*sin(t)+mB

lines(x,y)

}

#generate 10000 pairs of bivariate data with a mean of 10 for both A and B and 15% CV

A <- rnorm(10000,10,1.5)

B <- rnorm(10000,10,1.5)

youden.classic(A,B)

#overlay ellipse

dataEllipse(A, B, levels = 0.95, fill = TRUE, fill.alpha = 0.3, col="orange", center.pch = FALSE, plot.points = FALSE, add = TRUE)

plot of chunk unnamed-chunk-13

Conclusion

There you have it. With a Youden Plot it is easy to separate the sheep from the goats. There are lots of ways that you can dress up your plot to suit your needs. Of course, this could be embedded into an automated EQA report generated with R, Rmarkdown and knitr.

I hope that was helpful.

-Dan

Deming and Passing Bablok Regression in R

September 14, 2015September 21, 2015 dtholmes@mail.ubc.ca

Regression Methods

In this post we will be discussing how to perform Passing Bablok and Deming regression in R. Those who work in Clinical Chemistry know that these two approaches are required by the journals in the field. The idiosyncratic affection for these two forms of regression appears to be historical but this is something unlikely to change in my lifetime–hence the need to cover it here.

Along the way, we shall touch on the ways in which Deming and Passing Bablok differ from ordinary least squares (OLS) and from one another.

Creating some random data

Let's start by making some heteroscedastic random data that we can use for regression. We will use the command set.seed() to begin with because by this means, the reader can generate the same random data as the post. This function takes any number you wish as its argument, but if you set the same seed, you will get the same random numbers. We will generate 100 random $x$ values in the uniform distribution and then an accompanying 100 random $y$ values with proportional bias, constant bias and random noise that increases with $x$. I have added a bit of non–linearity because we do see this a fair bit in our work.

set.seed(20)
x <- runif(100,0,100)
y <- 1.10*x - 0.001*x^2 + rnorm(100,0,1)*(2 + 0.05*x) + 15

set.seed(20)

x <- runif(100,0,100)

y <- 1.10*x - 0.001*x^2 + rnorm(100,0,1)*(2 + 0.05*x) + 15

The constants I chose are arbitrary. I chose them to produce something resembling a comparison of, say, two automated immunoassays.

Let's quickly produce a scatter plot to see what our data looks like:

plot(x,y, main = "Regression Comparison", xlab = "Current Method", ylab = "New Method")

1	plot(x,y, main = "Regression Comparison", xlab = "Current Method", ylab = "New Method")

plot of chunk unnamed-chunk-2

Residuals in OLS

OLS regression minimizes the sum of squared residuals. In the case of OLS, the residual of a point is defined as the vertical distance from that point to the regression line. The regression line is chosen so that the sum of the squares of the residuals in minimal.

OLS regression assumes that there is no error in the $x$–axis values and that there is no heteroscedasticity, that is, the scatter of $y$ is constant. Neither of these assumptions is true in the case of bioanaytical method comparisons. In contrast, for calibration curves in mass–spectrometry, a linear response is plotted as a function of pre–defined calibrator concentration. This means that the $x$–axis has very little error and so OLS regression is an appropriate choice (though I doubt that the assumption about homoscedasticity is generally met).

OLS is part of R's base package. We can find the OLS regression line using lm() and we will store the results in the variable lin.reg.

plot(x,y, main = "Regression Comparison", xlab = "Current Method", ylab = "New Method")
lin.reg <- lm(y~x)
abline(lin.reg, col="blue")

plot(x,y, main = "Regression Comparison", xlab = "Current Method", ylab = "New Method")

lin.reg <- lm(y~x)

abline(lin.reg, col="blue")

plot of chunk unnamed-chunk-3

Just to demonstrate the point about residuals graphically, the following shows them in vertical red lines.

plot of chunk unnamed-chunk-4

Deming Regression

Deming regression differs from OLS regression in that it does not make the assumption that the $x$ values are free of error. It (more or less) defines the residual as the perpendicular distance from a point to its fitted value on the regression line.

Deming regression does not come as part of R's base package but can be performed using the MethComp and mcr packages. In this case, we will use the latter. If not already installed, you must install the mcr package with install.packages("mcr").

Then to perform Deming regression, we will load the mcr library and execute the following using the mcreg() command, storing the output in the variable dem.reg.

library(mcr)
dem.reg <- mcreg(x,y, method.reg = "Deming")

1 2	library(mcr) dem.reg <- mcreg(x,y, method.reg = "Deming")

By performing the str() command on dem.reg, we can see that the regression parameters are stored in the slot @para. Because the authors have used an S4 object as the output of their function, we don't address output as we would in lists (with a $), but rather with an @.

str(dem.reg)

1	str(dem.reg)

## Formal class 'MCResultResampling' [package "mcr"] with 21 slots
##   ..@ glob.coef  : num [1:2] 15.58 1.04
##   ..@ glob.sigma : num [1:2] 0.8165 0.0147
##   ..@ xmean      : num 46.8
##   ..@ nsamples   : int 999
##   ..@ nnested    : num 25
##   ..@ B0         : num [1:999] 15.9 15.4 16 16.1 15.6 ...
##   ..@ B1         : num [1:999] 1.01 1.04 1.02 1.04 1.03 ...
##   ..@ sigmaB0    : num [1:999] 0.794 0.766 0.846 0.815 0.737 ...
##   ..@ sigmaB1    : num [1:999] 0.0141 0.0142 0.0155 0.0141 0.0135 ...
##   ..@ MX         : num [1:999] 46.8 45.9 45.4 48.9 45.5 ...
##   ..@ bootcimeth : chr "quantile"
##   ..@ rng.seed   : num NA
##   ..@ rng.kind   : chr NA
##   ..@ data       :'data.frame':  100 obs. of  3 variables:
##   .. ..$ sid: Factor w/ 100 levels "S1","S10","S100",..: 1 13 24 35 46 57 68 79 90 2 ...
##   .. ..$ x  : num [1:100] 87.8 76.9 27.9 52.9 96.3 ...
##   .. ..$ y  : num [1:100] 110.8 93.5 45.6 76.6 116.6 ...
##   ..@ para       : num [1:2, 1:4] 15.58 1.04 NA NA 14.45 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:2] "Intercept" "Slope"
##   .. .. ..$ : chr [1:4] "EST" "SE" "LCI" "UCI"
##   ..@ mnames     : chr [1:2] "Method1" "Method2"
##   ..@ regmeth    : chr "Deming"
##   ..@ cimeth     : chr "bootstrap"
##   ..@ error.ratio: num 1
##   ..@ alpha      : num 0.05
##   ..@ weight     : Named num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..- attr(*, "names")= chr [1:100] "S1" "S2" "S3" "S4" ...

## Formal class 'MCResultResampling' [package "mcr"] with 21 slots

## ..@ glob.coef : num [1:2] 15.58 1.04

## ..@ glob.sigma : num [1:2] 0.8165 0.0147

## ..@ xmean : num 46.8

## ..@ nsamples : int 999

## ..@ nnested : num 25

## ..@ B0 : num [1:999] 15.9 15.4 16 16.1 15.6 ...

## ..@ B1 : num [1:999] 1.01 1.04 1.02 1.04 1.03 ...

## ..@ sigmaB0 : num [1:999] 0.794 0.766 0.846 0.815 0.737 ...

## ..@ sigmaB1 : num [1:999] 0.0141 0.0142 0.0155 0.0141 0.0135 ...

## ..@ MX : num [1:999] 46.8 45.9 45.4 48.9 45.5 ...

## ..@ bootcimeth : chr "quantile"

## ..@ rng.seed : num NA

## ..@ rng.kind : chr NA

## ..@ data :'data.frame': 100 obs. of 3 variables:

## .. ..$ sid: Factor w/ 100 levels "S1","S10","S100",..: 1 13 24 35 46 57 68 79 90 2 ...

## .. ..$ x : num [1:100] 87.8 76.9 27.9 52.9 96.3 ...

## .. ..$ y : num [1:100] 110.8 93.5 45.6 76.6 116.6 ...

## ..@ para : num [1:2, 1:4] 15.58 1.04 NA NA 14.45 ...

## .. ..- attr(*, "dimnames")=List of 2

## .. .. ..$ : chr [1:2] "Intercept" "Slope"

## .. .. ..$ : chr [1:4] "EST" "SE" "LCI" "UCI"

## ..@ mnames : chr [1:2] "Method1" "Method2"

## ..@ regmeth : chr "Deming"

## ..@ cimeth : chr "bootstrap"

## ..@ error.ratio: num 1

## ..@ alpha : num 0.05

## ..@ weight : Named num [1:100] 1 1 1 1 1 1 1 1 1 1 ...

## .. ..- attr(*, "names")= chr [1:100] "S1" "S2" "S3" "S4" ...

dem.reg@para

1	dem.reg@para

##                EST SE       LCI       UCI
## Intercept 15.57790 NA 14.446677 16.810321
## Slope      1.03658 NA  1.006434  1.066066

## EST SE LCI UCI

## Intercept 15.57790 NA 14.446677 16.810321

## Slope 1.03658 NA 1.006434 1.066066

The intercept and slope are stored in demreg@para[1] and dem.reg@para[2] respectively. Therefore, we can add the regression line as follows:

plot(x,y, main = "Regression Comparison", xlab = "Current Method", ylab = "New Method")
abline(dem.reg@para[1:2], col = "blue")

1 2	plot(x,y, main = "Regression Comparison", xlab = "Current Method", ylab = "New Method") abline(dem.reg@para[1:2], col = "blue")

plot of chunk unnamed-chunk-7

To emphasize how the residuals are different from OLS we can plot them as before:

plot of chunk unnamed-chunk-8

We present the figure above for instructional purposes only. The usual way to present a residuals plot is to show the same picture rotated until the line is horizontal–this is a slight simplification but is essentially what is happening:

plot of chunk unnamed-chunk-9

Ratio of Variances

It is important to mention that if one knows that the $x$–axis method is subject to a different amount of random analytical variability than the $y$–axis method, one should provide the ratio of the variances of the two methods to mcreg(). In general, this requires us to have “CV” data from precision studies already available. Another approach is to perform every analysis in duplicate by both methods and use the data to estimate this ratio.

If the methods happen to have similar CVs throughout the analytical range, the default value of 1 is assumed. But suppose that the ratio of the CVs of the $x$ axis method to the $y$–axis method was 1.2, we could provide this in the regression call by setting the error.ratio parameter. The resulting regression parameters will be slightly different.

mcreg(x,y, method.reg = "Deming", error.ratio = 1.2)@para

1	mcreg(x,y, method.reg = "Deming", error.ratio = 1.2)@para

##                 EST SE      LCI       UCI
## Intercept 15.534921 NA 14.39904 16.777065
## Slope      1.037499 NA  1.00792  1.067316

## EST SE LCI UCI

## Intercept 15.534921 NA 14.39904 16.777065

## Slope 1.037499 NA 1.00792 1.067316

Weighting

In the case of heteroscedastic data, it would be customary to weight the regression which in the case of the mcr package is weighted as $1/x^2$. This means that having 0's in your $x$–data will cause the calculation to “crump”. In any case, if we wanted weighted regression parameters we would make the call:

w.dem.reg <- mcreg(x,y, method.reg = "WDeming")

1	w.dem.reg <- mcreg(x,y, method.reg = "WDeming")

## The global.sigma is calculated with Linnet's method

1	## The global.sigma is calculated with Linnet's method

w.dem.reg@para

1	w.dem.reg@para

##                 EST SE       LCI       UCI
## Intercept 13.788450 NA 12.858803 14.861006
## Slope      1.088119 NA  1.058042  1.116879

## EST SE LCI UCI

## Intercept 13.788450 NA 12.858803 14.861006

## Slope 1.088119 NA 1.058042 1.116879

And plotting both on the same figure:

plot(x,y, main = "Regression Comparison", xlab = "Current Method", ylab = "New Method")
abline(dem.reg@para[1:2], col = "blue")
abline(w.dem.reg@para[1:2], col = "green")
legend("topleft", c("Deming","Weighted Deming"), lty=c(1,1), col = c("blue","green"))

plot(x,y, main = "Regression Comparison", xlab = "Current Method", ylab = "New Method")

abline(dem.reg@para[1:2], col = "blue")

abline(w.dem.reg@para[1:2], col = "green")

legend("topleft", c("Deming","Weighted Deming"), lty=c(1,1), col = c("blue","green"))

plot of chunk unnamed-chunk-12

Passing Bablok

Passing Bablok regression is not performed by the minimization of residuals. Rather, all possible pairs of $x$–$y$ points are determined and slopes are calculated using each pair of points. Work–arounds are undertaken for pairs of points that generate infinite slopes and other peculiarities. In any case, the median of the $\frac{N(N-1)}{2!}$ possible slopes becomes the final slope estimate and the corresponding intercept can be calculated. With regards to weighted Passing Bablok regression, I’d like to acknowledge commenter glen_b for bringing to my attention that there is a paradigm for calculating the weighted median of pairwise slopes. See the comment section for a discussion.

Passing Bablok regression takes a lot of computational time as the number of points grows, so expect some delays on data sets larger than $N=100$ if you are using an ordinary computer. To get the Passing Bablok regression equation, we just change the method.reg parameter:

PB.reg <- mcreg(x,y, method.reg = "PaBa")
PB.reg@para

1 2	PB.reg <- mcreg(x,y, method.reg = "PaBa") PB.reg@para

##                 EST SE       LCI       UCI
## Intercept 14.684463 NA 13.648554 16.495846
## Slope      1.046021 NA  1.015893  1.075632

## EST SE LCI UCI

## Intercept 14.684463 NA 13.648554 16.495846

## Slope 1.046021 NA 1.015893 1.075632

and the procedures to plot this regression are identical. The mcreg() function does have an option for Passing Bablok regression on large data sets. See the instructions by typing help("mcreg") in the R terminal.

Outlier Effects

As a consequence of the means by which the slope is determined, the Passing Bablok method is relatively resistant to the effect of outlier(s) as compared to OLS and Deming. To demonstrate this, we can add on outlier to some data scattered about the line $y=x$ and show how all three methods are affected.

x <- 1:20
y <- c(1:19,10) + rnorm(20,0,0.5)

1 2	x <- 1:20 y <- c(1:19,10) + rnorm(20,0,0.5)

plot of chunk unnamed-chunk-15

Because of this outlier, the OLS slope drops to 0.84, the Deming slope to 0.91, while the Passing Bablok is much better off at 0.99.

Generating a Pretty Plot

The code authors of the mcr package have created a feature such that if you put the regression model inside the plot function, you can quickly generate a figure for yourself that has all the required information on it. For example,

plot(PB.reg)

1	plot(PB.reg)

plot of chunk unnamed-chunk-16

But this method of out–of–the–box figure is not very customizable and you may want it to appear differently for your publication. Never fear. There is a solution. The MCResult.plot() function offers complete customization of the figure so that you can show it exactly as you wish for your publication.

MCResult.plot(PB.reg, equal.axis = TRUE, x.lab = "x method", y.lab = "y method", points.col = "#FF7F5060", points.pch = 19, ci.area = TRUE, ci.area.col = "#0000FF50", main = "My Passing Bablok Regression", sub = "", add.grid = FALSE, points.cex = 1)

1	MCResult.plot(PB.reg, equal.axis = TRUE, x.lab = "x method", y.lab = "y method", points.col = "#FF7F5060", points.pch = 19, ci.area = TRUE, ci.area.col = "#0000FF50", main = "My Passing Bablok Regression", sub = "", add.grid = FALSE, points.cex = 1)

custom mcr plot

In this example, I have created semi–transparent “darkorchid4” (hex = #68228B) points and a semi–transparent blue (hex = #0000FF) confidence band of the regression. Maybe darkorchid would not be my first choice for a publication after all, but it demonstrates the customization. Additionally, I have suppressed my least favourite features of the default plot method. Specifically, the sub="" term removes the sentence at the bottom margin and the add.grid = FALSE prevents the grid from being plotted. Enter help(MCResult.plot) for the complete low–down on customization.

Conclusion

We have seen how to perform Deming and Passing Bablok regression in the R programming language and have touched on how the methods differ “under the hood”. We have used the mcr to perform the regressions and have shown how you can beautify your plot.

The reader should have a look at the rlm() function in the MASS package and the rq() function in the quantreg package to see other robust (outlier–resistant) regression approaches. A good tutorial can be found here

I hope that makes it easy for you.

-Dan

May all your paths (and regressions) be straight:

Trust in the Lord with all your heart
and lean not on your own understanding;
in all your ways submit to him,
and he will make your paths straight.

Proverbs 3:5-6

Background

Update

Shiny App

Example

The Problem

The Required Format

The Starting Material

Dialogue Box

Building File Names

Behold: The Data

Just Tell Me the Results

Preparing Data for Output

Timestamping, Writing and Archiving

Other Things You Can Do

Source

Final Thought

Teach us to number our days, that we may gain a heart of wisdom. Psalm 90:12.

Background

Random Data

Rectangular Youden Plots

Non Parametric

FYI

Elliptical Youden Plots

Build a Function

Comparison with the Classic Youden Plot

Conclusion

Regression Methods

Creating some random data

Residuals in OLS

Deming Regression

Ratio of Variances

Weighting

Passing Bablok

Outlier Effects

Generating a Pretty Plot

Conclusion

Trust in the Lord with all your heart and lean not on your own understanding; in all your ways submit to him, and he will make your paths straight. Proverbs 3:5-6

Trust in the Lord with all your heart
and lean not on your own understanding;
in all your ways submit to him,
and he will make your paths straight.

Proverbs 3:5-6