Quality Management – The Lab-R-torian

Calculate all the CVs of all the QC Levels of all the Methods of all the Instruments at all the Sites all at once … with Sunquest LIS and dplyr

February 4, 2020November 24, 2020 dtholmes@mail.ubc.ca

Background

As part of our lab accreditation requirements, we have to provide measurement uncertianty estimates for all tests at all hospital sites. As you might imagine, with thousands of testcodes in Sunquest LIS, getting all the coefficients of variation (CVs) represents a daunting task for the quality technologist to accomplish. As it turns out, by capturing the ssh session in a .txt file, you can use R’s dplyr package to do this all in few lines of code.

Getting the Raw Data

You need to get the raw data from Sunquest. You can capture the telnet (yes… older versions of Sunquest use telnet and pass patient information and user passwords unencrypted across the hospital network o_O) or the ssh session to a file using the Esker SmarTerm which Sunquest packages in their product and refers to as “roll-n-scroll”. People disparriage SmarTerm as an old “dos tool”–whereas Sunquest is hosted on AIX operating system. SmarTerm access to Sunquest is a gagillion times faster than the GUI and permits us to capture the raw QC data we need. To capture the session select from the dropdown menu as shown here:

SQ Screenshot1

If you are using Mac OS or Linux OS, you can also capture the ssh session by connecting from the terminal and using tee to dump the session to a file.

ssh user@serverIPaddress | tee captured_session.txt

Once you have connected, use the QC function and select output printer 0 (meaning the screen) and make these selections, changing the dates as appropriate:

SQ Screenshot1

If you make no selections at all for any of:

TEST:
WORKSHEET:
METHOD:
CONTROL:
SHIFT #:
TECH:
TESTS REQUESTED:

then you will extract everything, which is what you want and which will make for a very big .txt file. There will be a delay and then thousands of QC results will dump to the screen and to your file. When this is complete, end your SmarTerm or ssh or telnet (cringe) session. I saved my text dump as raw_SQ8.txt.

Getting it intro R and parsing it

Your data will come out as a fixed with file with no delimiters. It will also have a bunch of junk at the bottom and top of the file detailing your commands from the start and end of the session. These need to be discarded. I just used grep() to find all the lines with the appropriate date pattern. After reading it in, because I am lazy, I wrote it back out and read it in again with read.fwf()

library(tidyverse)
library(lubridate)
library(knitr)

# Note to my friend SK - yes... this is mostly in base-R... 

# create a connection
con < file(file.path("raw_SQ8.txt"))
raw.qc.data <- readLines(con)
close(con)
#find good rows
good.data <- grep("[0-9]{2}(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[2][0][0-9]{6}",raw.qc.data)
raw.qc.data <- raw.qc.data[good.data]
#remove a screwball encoding character
raw.qc.data[1] <- substr(raw.qc.data[1],6,nchar(raw.qc.data[1]))
con <- file("temp.txt")
#rewrite the file with no garbage in it.
writeLines(raw.qc.data, con)
close(con)
raw.qc.data <- read.fwf("temp.txt",c(6,6,6,20,13,6,2,15,100))
file.remove("temp.txt")
names(raw.qc.data) <- c("test.code","instr.code","qc.name","qc.expire",
                        "date.performed","tech.code","shift",
                        "result","modifier")
raw.qc.data <- data.frame(lapply(raw.qc.data, trimws))
raw.qc.data$result <- as.numeric(as.character(raw.qc.data$result))
raw.qc.data$date.performed <- dmy_hm(raw.qc.data$date.performed)
raw.qc.data$tech.code <- as.numeric(raw.qc.data$tech.code) #anonymize tech codes
raw.qc.data <- arrange(raw.qc.data, instr.code, test.code)

library(tidyverse)

library(lubridate)

library(knitr)

# Note to my friend SK - yes... this is mostly in base-R...

# create a connection

con < file(file.path("raw_SQ8.txt"))

raw.qc.data <- readLines(con)

close(con)

#find good rows

good.data <- grep("[0-9]{2}(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[2][0][0-9]{6}",raw.qc.data)

raw.qc.data <- raw.qc.data[good.data]

#remove a screwball encoding character

raw.qc.data[1] <- substr(raw.qc.data[1],6,nchar(raw.qc.data[1]))

con <- file("temp.txt")

#rewrite the file with no garbage in it.

writeLines(raw.qc.data, con)

close(con)

raw.qc.data <- read.fwf("temp.txt",c(6,6,6,20,13,6,2,15,100))

file.remove("temp.txt")

names(raw.qc.data) <- c("test.code","instr.code","qc.name","qc.expire",

"date.performed","tech.code","shift",

"result","modifier")

raw.qc.data <- data.frame(lapply(raw.qc.data, trimws))

raw.qc.data$result <- as.numeric(as.character(raw.qc.data$result))

raw.qc.data$date.performed <- dmy_hm(raw.qc.data$date.performed)

raw.qc.data$tech.code <- as.numeric(raw.qc.data$tech.code) #anonymize tech codes

raw.qc.data <- arrange(raw.qc.data, instr.code, test.code)

Now that all the data munging is done, we can examine the data:

test.code	instr.code	qc.name	qc.expire	date.performed	tech.code	shift	result	modifier
BCL	JBGAS	RAD1	R0173 EXP MAR 2021	2019-11-15 09:17:00	68	2	122	NA
BCL	JBGAS	RAD1	R0173 EXP MAR 2021	2019-11-15 20:51:00	68	3	122	NA
BCL	JBGAS	RAD1	R0173 EXP MAR 2021	2019-11-15 21:47:00	68	3	122	NA
BCL	JBGAS	RAD1	R0173 EXP MAR 2021	2019-11-15 21:50:00	68	3	122	NA
BCL	JBGAS	RAD1	R0173 EXP MAR 2021	2019-11-17 07:10:00	15	1	122	NA
BCL	JBGAS	RAD1	R0173 EXP MAR 2021	2019-11-17 07:11:00	15	1	122	NA

And finally, we can make the dplyr magic happen and discard results for which the counts are too small, which I have chosen to be <20:

raw.qc.data %>% dplyr::filter(!is.na(result)) %>%
  group_by(instr.code,test.code,qc.name,qc.expire) %>%
  summarise(median = median(result),
            IQR = IQR(result),
            mean = mean(result),
            SD = sd(result),
            min = min(result),
            max = max(result),
            CV = round(sd(result, na.rm = TRUE)/mean(result, na.rm = TRUE)*100,2),
            count = n()) %>%
  filter(count ≥ 20) %>%
  arrange(instr.code, test.code, median) -> summary.table

raw.qc.data %>% dplyr::filter(!is.na(result)) %>%

group_by(instr.code,test.code,qc.name,qc.expire) %>%

summarise(median = median(result),

IQR = IQR(result),

mean = mean(result),

SD = sd(result),

min = min(result),

max = max(result),

CV = round(sd(result, na.rm = TRUE)/mean(result, na.rm = TRUE)*100,2),

count = n()) %>%

filter(count ≥ 20) %>%

arrange(instr.code, test.code, median) -> summary.table

Which gives us output like this:

head(summary.table)

1 2	head(summary.table)

instr.code	test.code	qc.name	qc.expire	median	IQR	mean	SD	min	max	CV	count
JBGAS	BCL	RAD3	R0141 EXP SEP 2017	65.0	1.000	65.145454	0.6503043	63.0	66.0	1.00	55
JBGAS	BCL	RAD2	R0175 EXP MAR 2021	97.0	0.000	97.128205	0.3364820	97.0	98.0	0.35	78
JBGAS	BCL	RAD1	R0173 EXP MAR 2021	122.0	0.000	122.122807	0.5691527	121.0	124.0	0.47	57
JBGAS	BGLUC	RAD1	R0173 EXP MAR 2021	1.5	0.000	1.507017	0.0257713	1.5	1.6	1.71	57
JBGAS	BGLUC	RAD2	R0175 EXP MAR 2021	5.6	0.075	5.585897	0.0639081	5.4	5.7	1.14	78
JBGAS	BGLUC	RAD3	R0141 EXP SEP 2017	13.7	0.100	13.763636	0.1310409	13.4	14.1	0.95	55

This permits us to toss out results with low counts. But what about handling outliers? Well, we can calculate the z-scores of the raw data by joining the the mean and SD results back to the raw data.

raw.qc.data %>%
  left_join(select(summary.table,c(instr.code:qc.expire, mean, SD)),
             by = c("test.code","instr.code", "qc.name", "qc.expire")) %>%
  mutate(z.score = (result - mean)/SD) -> raw.qc.data

raw.qc.data %>%

left_join(select(summary.table,c(instr.code:qc.expire, mean, SD)),

by = c("test.code","instr.code", "qc.name", "qc.expire")) %>%

mutate(z.score = (result - mean)/SD) -> raw.qc.data

This will permit you to suppress results outside a certain z-score. So, let’s suppress all results with an undefined z-score and all results with a z-score >= 4:

raw.qc.data %>%
  drop_na(z.score) %>%
  filter(abs(z.score) < 4) -> raw.qc.data

raw.qc.data %>%

drop_na(z.score) %>%

filter(abs(z.score) < 4) -> raw.qc.data

Now , we can re-run the dplyr summary:

raw.qc.data %>% dplyr::filter(!is.na(result)) %>%
  group_by(instr.code,test.code,qc.name,qc.expire) %>%
  summarise(median = median(result),
            IQR = IQR(result),
            mean = mean(result),
            SD = sd(result),
            min = min(result),
            max = max(result),
            CV = round(sd(result, na.rm = TRUE)/mean(result, na.rm = TRUE)*100,2),
            count = n()) %>%
  filter(count ≥ 20) %>%
  arrange(instr.code, test.code, median) -> summary.table.no.outliers

raw.qc.data %>% dplyr::filter(!is.na(result)) %>%

group_by(instr.code,test.code,qc.name,qc.expire) %>%

summarise(median = median(result),

IQR = IQR(result),

mean = mean(result),

SD = sd(result),

min = min(result),

max = max(result),

CV = round(sd(result, na.rm = TRUE)/mean(result, na.rm = TRUE)*100,2),

count = n()) %>%

filter(count ≥ 20) %>%

arrange(instr.code, test.code, median) -> summary.table.no.outliers

And now we have a summary of every QC CV in our Sunquest system with outliers suppressed:

head(summary.table.no.outliers)

1 2	head(summary.table.no.outliers)

instr.code	test.code	qc.name	qc.expire	median	IQR	mean	SD	min	max	CV	count
JBGAS	BCL	RAD3	R0141 EXP SEP 2017	65.0	1.000	65.145454	0.6503043	63.0	66.0	1.00	55
JBGAS	BCL	RAD2	R0175 EXP MAR 2021	97.0	0.000	97.128205	0.3364820	97.0	98.0	0.35	78
JBGAS	BCL	RAD1	R0173 EXP MAR 2021	122.0	0.000	122.122807	0.5691527	121.0	124.0	0.47	57
JBGAS	BGLUC	RAD1	R0173 EXP MAR 2021	1.5	0.000	1.507017	0.0257713	1.5	1.6	1.71	57
JBGAS	BGLUC	RAD2	R0175 EXP MAR 2021	5.6	0.075	5.585897	0.0639081	5.4	5.7	1.14	78
JBGAS	BGLUC	RAD3	R0141 EXP SEP 2017	13.7	0.100	13.763636	0.1310409	13.4	14.1	0.95	55

And there we have it:

SQ Screenshot1

Now I can write the output file

write_csv(summary.table.no.outliers, "QC_summary.csv")

1 2	write_csv(summary.table.no.outliers, "QC_summary.csv")

With dplyr, if you direct your energies to the right place, you reap much. Similarly:

“But seek ye first the kingdom of God, and his righteousness; and all these things shall be added unto you.”

Matthew 6:33

Determine the CV of a Calculated Lab Reportable – Bioavailable Testosterone

August 7, 2017August 7, 2017 dtholmes@mail.ubc.ca

Background

At the AACC meeting last week, some of my friends were bugging me that I had not made a blog post in 10 months. Without getting into it too much, let's just say I can blame Cerner. Thanks also to a prod from a friend, here is an approach to a fairly common problem.

We all report calculated quantities out of our laboratories–quantities such as LDL cholesterol, non-HDL cholesterol, aldosterone:renin ratio, free testosterone, eGFR etc. How does one determine the precision (i.e. imprecision) of a calculated quantity. While earlier in my life, I might go to the trouble of trying to do such calculations analytically using the rules of error propagation, in my later years, I am more pragmatic and I'm happy to use a computational approach.

In this example, we will model the precision in calculated bioavailable testosterone (CBAT). Without explanation, I provide an R function for CBAT (and free testosterone) where testosterone is reported in nmol/L, sex hormone binding globulin (SHBG) is reported in nmol/L, and albumin is reported in g/L. Using the Vermeulen Equation as discussed in this publication, you can calculate CBAT as follows:

cbat <- function(TT,SHBG,ALB = 43){
    Kalb <- 3.6*10^4
    Kshbg <- 10^9
    N <- 1 + Kalb*ALB/69000
    a <- N*Kshbg
    b <- N + Kshbg*(SHBG - TT)/10^9
    c <- -TT/10^9
    FT <- (-b + sqrt(b^2 - 4*a*c))/(2*a)*10^9
    cbat <- N*FT
    return(list(free.T = FT, cbat = cbat))
}

cbat <- function(TT,SHBG,ALB = 43){

Kalb <- 3.6*10^4

Kshbg <- 10^9

N <- 1 + Kalb*ALB/69000

a <- N*Kshbg

b <- N + Kshbg*(SHBG - TT)/10^9

c <- -TT/10^9

FT <- (-b + sqrt(b^2 - 4*a*c))/(2*a)*10^9

cbat <- N*FT

return(list(free.T = FT, cbat = cbat))

}

To sanity-check this, we can use this online calculator. Taking a typical male testosterone of 20 nmol/L, an SHBG of 50 nmol/L and an albumin of 43 g/L, we get the following:

cbat(20,50)

1 2	cbat(20,50)

## $free.T
## [1] 0.3273049
## 
## $cbat
## [1] 7.670319

## $free.T

## [1] 0.3273049

## $cbat

## [1] 7.670319

which is confirmed by the online calculator. Because the function is vectorized, we an submit a vector of testosterone results and SHBG results and get a vector of CBAT results.

cbat(c(10,20,30), c(40,50,60))

1 2	cbat(c(10,20,30), c(40,50,60))

## $free.T
## [1] 0.1738837 0.3273049 0.4661380
## 
## $cbat
## [1]  4.074926  7.670319 10.923842

## $free.T

## [1] 0.1738837 0.3273049 0.4661380

## $cbat

## [1] 4.074926 7.670319 10.923842

Precision of Components

We now need some precision data for the three components. However, in our lab, we just substitute 43 g/L for the albumin, so we will leave that term out of the analysis and limit our precision calculation to testosterone and SHBG. This will allow us to present the precision as surface plots as a function of total testosterone and SHBG.

We do testosterone by LC-MS/MS using Deborah French's method. In the last three months, the precision has been 3.9% at 0.78 nmol/L, 5.5% at 6.7 nmol/L, 5.2% at 18.0 nmol/L, and 6.0% at 28.2 nmol/L. We are using the Roche Cobas e601 SHBG method which, according to the package insert, has precision of 1.8% at 14.9 nmol/L, 2.1 % at 45.7 nmol/L, and 4.0% at 219 nmol/L.

cv.tt <- c(3.9, 5.5, 5.2, 6.0)
conc.tt <- c(0.78, 6.7, 18.0, 28.2)
tt.df <- data.frame(conc.tt,cv.tt)

plot(cv.tt ~ conc.tt, data = tt.df,
                    main = "Precision Profile of Testosterone",
                    xlab = "Testosterone (nmol/L)",
                    ylab = "CV Testosterone (%)",
                    ylim = c(0,8),
                    type = "o")

cv.tt <- c(3.9, 5.5, 5.2, 6.0)

conc.tt <- c(0.78, 6.7, 18.0, 28.2)

tt.df <- data.frame(conc.tt,cv.tt)

plot(cv.tt ~ conc.tt, data = tt.df,

main = "Precision Profile of Testosterone",

xlab = "Testosterone (nmol/L)",

ylab = "CV Testosterone (%)",

ylim = c(0,8),

type = "o")

plot of chunk unnamed-chunk-4

cv.shbg <- c(1.8, 2.1, 4.0)
conc.shbg <- c(14.9,45.7,219)
shbg.df <- data.frame(cv.shbg, conc.shbg)
plot(cv.shbg ~ conc.shbg, data = shbg.df,
                    main = "Precision Profile of SHBG",
                    xlab = "SHBG (nmol/L)",
                    ylab = "CV SHGB (%)",
                    ylim = c(0,5),
                    type = "o")

cv.shbg <- c(1.8, 2.1, 4.0)

conc.shbg <- c(14.9,45.7,219)

shbg.df <- data.frame(cv.shbg, conc.shbg)

plot(cv.shbg ~ conc.shbg, data = shbg.df,

main = "Precision Profile of SHBG",

xlab = "SHBG (nmol/L)",

ylab = "CV SHGB (%)",

ylim = c(0,5),

type = "o")

plot of chunk unnamed-chunk-4

Build Approximation Functions

We will want to generate linear interpolations of these precision profiles. Generally, we might watnt to use non-linear regression to do this but I will just linearly interpolate with the approxfun() function. This will allow us to just call a function to get the approximate CV at concentrations other than those for which we have data.

tt.fun <- approxfun(x = tt.df$conc.tt, y = tt.df$cv.tt)
shbg.fun <- approxfun(x = shbg.df$conc.shbg, y = shbg.df$cv.shbg)

tt.fun <- approxfun(x = tt.df$conc.tt, y = tt.df$cv.tt)

shbg.fun <- approxfun(x = shbg.df$conc.shbg, y = shbg.df$cv.shbg)

Now, if we want to know the precision of SHBG at, say, 100 nmol/L, we can just write,

shbg.fun(100)

1 2	shbg.fun(100)

## [1] 2.695326

1	## [1] 2.695326

to obtain our precision result.

Random Simulation

Now let's build a grid of SHBG and total testosterone (TT) values at which we will calculate the precision for CBAT.

shbg <- seq(from = 15, to = 200, by = 5)
tt <- seq(from = 1, to = 28, by = 1)

shbg <- seq(from = 15, to = 200, by = 5)

tt <- seq(from = 1, to = 28, by = 1)

At each point on the grid, we will have to generate, say, 100000 random TT values and 100000 random SHBG values with the appropriate precision and then calculate the expected precision of CBAT at those concentrations.

Let's do this for a single pair of concentrations by way of example modelling the random analytical error as Gaussian using the rnorm() function.

# [SHBG] = 15 nmol/L
# [TT] = 5.0 nmol/L
set.seed(100) #just to get consistent results
rng.tt <- rnorm(100000, mean = 5.0, sd = tt.fun(5.0)/100*5.0)
rng.shbg <- rnorm(100000, mean = 15, sd = shbg.fun(15)/100*15)
rng.cbat <- cbat(rng.tt, rng.shbg)
cv.cbat <- sd(rng.cbat$cbat)/mean(rng.cbat$cbat)*100
cv.cbat

# [SHBG] = 15 nmol/L

# [TT] = 5.0 nmol/L

set.seed(100) #just to get consistent results

rng.tt <- rnorm(100000, mean = 5.0, sd = tt.fun(5.0)/100*5.0)

rng.shbg <- rnorm(100000, mean = 15, sd = shbg.fun(15)/100*15)

rng.cbat <- cbat(rng.tt, rng.shbg)

cv.cbat <- sd(rng.cbat$cbat)/mean(rng.cbat$cbat)*100

cv.cbat

## [1] 5.30598

1	## [1] 5.30598

So, we can build the process of calculating the CV of CBAT into a function as follows:

cbat.cv <- function(TT, SHBG, N = 100000){
  rng.tt <- rnorm(N, mean = TT, sd = tt.fun(TT)/100*TT)
  rng.shbg <- rnorm(N, mean = SHBG, sd = shbg.fun(SHBG)/100*SHBG)
  rng.cbat <- cbat(rng.tt, rng.shbg)
  cv <- sd(rng.cbat$cbat)/mean(rng.cbat$cbat)*100
  return(cv)
}

cbat.cv <- function(TT, SHBG, N = 100000){

rng.tt <- rnorm(N, mean = TT, sd = tt.fun(TT)/100*TT)

rng.shbg <- rnorm(N, mean = SHBG, sd = shbg.fun(SHBG)/100*SHBG)

rng.cbat <- cbat(rng.tt, rng.shbg)

cv <- sd(rng.cbat$cbat)/mean(rng.cbat$cbat)*100

return(cv)

}

Now, we can make a matrix of the data for presenting a plot, calculating the CV and appending it to the dataframe.

cv.grid <- expand.grid(tt, shbg)
names(cv.grid) <- c("tt", "shbg")
cv.grid$cv.cbat <- mapply(cbat.cv, cv.grid$tt, cv.grid$shbg)

cv.grid <- expand.grid(tt, shbg)

names(cv.grid) <- c("tt", "shbg")

cv.grid$cv.cbat <- mapply(cbat.cv, cv.grid$tt, cv.grid$shbg)

Now make plot using the wireframe() function.

library(lattice)
wireframe(cv.cbat ~ tt*shbg, data = cv.grid,
          xlab = "Testo \n (nmol/L)",
          ylab = "SHBG \n (nmol/L)",
          zlab = "CV \n (%)",
          drape = TRUE,
          colorkey = TRUE,
          col.regions = colorRampPalette(c("blue", "red", "yellow"))(100),
          scales = list(arrows=FALSE,cex=.5,tick.number = 10)
          )

library(lattice)

wireframe(cv.cbat ~ tt*shbg, data = cv.grid,

xlab = "Testo \n (nmol/L)",

ylab = "SHBG \n (nmol/L)",

zlab = "CV \n (%)",

drape = TRUE,

colorkey = TRUE,

col.regions = colorRampPalette(c("blue", "red", "yellow"))(100),

scales = list(arrows=FALSE,cex=.5,tick.number = 10)

)

plot of chunk unnamed-chunk-11

This shows us that the CV of CBAT ranges from about 4–8% over the TT and SHBG ranges we have looked at.

Conclusion

We have determined the CV of calculated bioavailable testosterone using random number simulations using empirical CV data and produced a surface plot of CV. This allows us to comment on the CV of this lab reportable as a function of the two variables by which it is determined.

Parting Thought on Monte Carlo Simulations

The die is cast into the lap, but its every decision is from the LORD.

(Prov 16:33)

Conditional Formatting of a Table in R

November 7, 2016 dtholmes@mail.ubc.ca

Background

There are a few ways to approach the problem of a conditionally formatted table in R. You can use the ReporteRs package's FlexTable() function, the formattable package, or the condformat package. These allow you to produce a conditionally formatted tables in HTML. You can also use xtable package and essentially program what you want in LaTeX via the xtable() function.

In my desire for something simple-ish, I am going do this graphically using the image() function as suggested here. The benefit is that I can then push the table into an RMarkdown generated PDF document easily.

The Problem

Suppose that you want to prepare a summary of how resident and medical student orders are placed on various wards. You obtain data that is formatted in the following manner.

head(orders,10)

1 2	head(orders,10)

##    ward order.type cosigned
## 1   Med       CPOE     TRUE
## 2   Med    Written    FALSE
## 3   Med       CPOE     TRUE
## 4   Med    Written     TRUE
## 5   Med    Written     TRUE
## 6   Med    Written     TRUE
## 7   Med       CPOE     TRUE
## 8   Med       CPOE    FALSE
## 9   Med       CPOE     TRUE
## 10  Med       CPOE     TRUE

## ward order.type cosigned

## 1 Med CPOE TRUE

## 2 Med Written FALSE

## 3 Med CPOE TRUE

## 4 Med Written TRUE

## 5 Med Written TRUE

## 6 Med Written TRUE

## 7 Med CPOE TRUE

## 8 Med CPOE FALSE

## 9 Med CPOE TRUE

## 10 Med CPOE TRUE

There are 4 wards: medicine, surgery, ER and orthopedics. Orders can come in as computerized physician order entry (CPOE), verbal or written. The orders have to be cosigned by staff and this is recorded as TRUE/FALSE because staff are not always compliant in logging on to the EMR to cosign the trainee orders.

str(orders)

1 2	str(orders)

## 'data.frame':    550 obs. of  3 variables:
##  $ ward      : Factor w/ 4 levels "Med","Surg","ER",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ order.type: Factor w/ 3 levels "CPOE","Verbal",..: 1 3 1 3 3 3 1 1 1 1 ...
##  $ cosigned  : logi  TRUE FALSE TRUE TRUE TRUE TRUE ...

## 'data.frame': 550 obs. of 3 variables:

## $ ward : Factor w/ 4 levels "Med","Surg","ER",..: 1 1 1 1 1 1 1 1 1 1 ...

## $ order.type: Factor w/ 3 levels "CPOE","Verbal",..: 1 3 1 3 3 3 1 1 1 1 ...

## $ cosigned : logi TRUE FALSE TRUE TRUE TRUE TRUE ...

summary(orders)

1 2	summary(orders)

##    ward       order.type   cosigned      
##  Med :150   CPOE   :291   Mode :logical  
##  Surg:100   Verbal : 54   FALSE:195      
##  ER  :200   Written:205   TRUE :355      
##  Orth:100                 NA's :0

## ward order.type cosigned

## Med :150 CPOE :291 Mode :logical

## Surg:100 Verbal : 54 FALSE:195

## ER :200 Written:205 TRUE :355

## Orth:100 NA's :0

Preparing Proportions Table

Let's start with the assumption that we want to apply the same conditional formatting to all data in the table. That is, we want to color code all results with the same algorithm. We can used the image() function to get this done. Let's display the rates at which different order types (CPOE, verbal,or written) from the four wards. We can generate the proportions table in percent very easily with the prop.table() and table() functions operating on the first two columns of our orders data:

my.data <- round(prop.table(table(orders[,1:2]),1)*100,1)
my.data

my.data <- round(prop.table(table(orders[,1:2]),1)*100,1)

my.data

##       order.type
## ward   CPOE Verbal Written
##   Med  49.3    8.7    42.0
##   Surg 30.0    4.0    66.0
##   ER   89.5    7.0     3.5
##   Orth  8.0   23.0    69.0

## order.type

## ward CPOE Verbal Written

## Med 49.3 8.7 42.0

## Surg 30.0 4.0 66.0

## ER 89.5 7.0 3.5

## Orth 8.0 23.0 69.0

A DIY Approach with the Image Function

The image() function produces a tile plot based on matrix of z values, where z = f(x,y) using colours we can define and thresholds for switching from one colour to the next based on a breaks parameter. In our case, we will say that if the result is less than equal to 25%, we will colour the tile blue, if it is greater than 25% but less than or equal to 50%, we will colour it red, and if it greater than 50%, it will be yellow.

You will note that we have to transpose the data with the t() function because the image function plots the rows on the x axis on the columns on the y axis. You will also notice that we need to plot y descending on the y-axis to account for the fact that our tabular data has increasing index going down but the tile plot will default to have increasing y going up. We can also need to suppress the axes and their labels. The reader can comment out the lines xaxt = 'n' and yaxt = 'n' to see what is going on in terms of x and y values.

x = 1:ncol(my.data)
y = 1:nrow(my.data)
centers <- expand.grid(y,x)

#make the plot margins a little bigger
par(mar = c(2,7,4,2))

image(x, y, t(my.data),
      col = c(rgb(0,0,1,0.3),rgb(1,0,0,0.3), rgb(1,1,0,0.3)),
      breaks = c(0, 25, 50, 100),
      xaxt = 'n', 
      yaxt = 'n', 
      xlab = '', 
      ylab = '',
      ylim = c(max(y) + 0.5, min(y) - 0.5)
      )

x = 1:ncol(my.data)

y = 1:nrow(my.data)

centers <- expand.grid(y,x)

#make the plot margins a little bigger

par(mar = c(2,7,4,2))

image(x, y, t(my.data),

col = c(rgb(0,0,1,0.3),rgb(1,0,0,0.3), rgb(1,1,0,0.3)),

breaks = c(0, 25, 50, 100),

xaxt = 'n',

yaxt = 'n',

xlab = '',

ylab = '',

ylim = c(max(y) + 0.5, min(y) - 0.5)

)

plot of chunk unnamed-chunk-5

Now we can write our values over top with the text() function.

text(centers[,2], centers[,1], c(my.data), col= "black")

1 2	text(centers[,2], centers[,1], c(my.data), col= "black")

plot of chunk unnamed-chunk-7

And then we can write the variable names (which we yank from the attributes of the table) into the figure margin and draw some lines to make it look pretty. It was necessary to use the adj and padj parameters to make it look a little cleaner.

#add margin text
mtext(paste(attributes(my.data)$dimnames[[2]],"(%)"), at=1:ncol(my.data), padj = -1)
mtext(attributes(my.data)$dimnames[[1]], at=1:nrow(my.data), side = 2, las = 1, adj = 1.2)

#add black lines
abline(h=y + 0.5)
abline(v=x + 0.5)

#add margin text

mtext(paste(attributes(my.data)$dimnames[[2]],"(%)"), at=1:ncol(my.data), padj = -1)

mtext(attributes(my.data)$dimnames[[1]], at=1:nrow(my.data), side = 2, las = 1, adj = 1.2)

#add black lines

abline(h=y + 0.5)

abline(v=x + 0.5)

plot of chunk unnamed-chunk-9

Conditionally Coloured Text

Now, if you want to make the text colour match the background colour, we will need a little function.

color.picker <- function(z){
  if(z <= 25){return("blue")}
  else if( z > 25 & z <= 50){return("red")}
  else {return("darkorange4")}
}

color.picker <- function(z){

if(z <= 25){return("blue")}

else if( z > 25 & z <= 50){return("red")}

else {return("darkorange4")}

}

and then apply it over the values of the matrix:

text.cols <- sapply(c(my.data), color.picker)
text(centers[,2], centers[,1], c(my.data), col= text.cols)

text.cols <- sapply(c(my.data), color.picker)

text(centers[,2], centers[,1], c(my.data), col= text.cols)

plot of chunk unnamed-chunk-12

Different Conditions for Different Columns

Now suppose you wanted different conditional formatting for each column. This is kind of a pain because you will need to provide the image() function a matrix to generate an appropriate fill-colour and a different matrix for the data to be written in each cell. Let's imagine for example that we want to include the compliance rate for co-signing in a fourth column and this is the only column we want coloured. To this column we want a colour scheme applied wherein if compliance is less than or equal to 20%, the colour is red, between 20% and 80%, it is yellow, and above 80% it is green.

We can calculate a proportions table based on columns 1 and 3 of the orders dataframe and then we can define a matrix fill.data that has NA on all the rates we calculated above.

my.data <- cbind(my.data,Cosigned = round(prop.table(table(orders[,c(1,3)]),1)*100,1)[,2])
fill.data <- my.data
fill.data[,1:3] <- matrix(NA, nrow = nrow(my.data), ncol = ncol(my.data) - 1)

my.data <- cbind(my.data,Cosigned = round(prop.table(table(orders[,c(1,3)]),1)*100,1)[,2])

fill.data <- my.data

fill.data[,1:3] <- matrix(NA, nrow = nrow(my.data), ncol = ncol(my.data) - 1)

Now the proportions matrix is as follows:

my.data

my.data

##      CPOE Verbal Written Cosigned
## Med  49.3    8.7    42.0     75.3
## Surg 30.0    4.0    66.0     52.0
## ER   89.5    7.0     3.5     88.5
## Orth  8.0   23.0    69.0     13.0

## CPOE Verbal Written Cosigned

## Med 49.3 8.7 42.0 75.3

## Surg 30.0 4.0 66.0 52.0

## ER 89.5 7.0 3.5 88.5

## Orth 8.0 23.0 69.0 13.0

and the fill data is:

fill.data

fill.data

##      CPOE Verbal Written Cosigned
## Med    NA     NA      NA     75.3
## Surg   NA     NA      NA     52.0
## ER     NA     NA      NA     88.5
## Orth   NA     NA      NA     13.0

## CPOE Verbal Written Cosigned

## Med NA NA NA 75.3

## Surg NA NA NA 52.0

## ER NA NA NA 88.5

## Orth NA NA NA 13.0

Now we can apply the image() function to the fill.data matrix. When it comes to writing the data in the cells, we will use the original my.data matrix and we will adjust out color.picker() function.

color.picker <- function(z){
  if(is.na(z)){return("black")}
  else if(z <= 20){return("red")}
  else if( z > 20 & z <= 80){return("darkorange4")}
  else {return("darkgreen")}
}

x = 1:ncol(my.data)
y = 1:nrow(my.data)
centers <- expand.grid(y,x)

par(mar = c(2,7,4,2))
image(x, y, t(fill.data),
      col = c(rgb(1,0,0,0.3),rgb(1,1,0,0.3), rgb(0,1,0,0.3)),
      breaks = c(0, 20, 80, 100),
      xaxt='n', 
      yaxt='n', 
      xlab='', 
      ylab='',
      ylim = c(max(y) + 0.5, min(y) - 0.5)
)

#write in values
text.cols <- sapply(c(fill.data), color.picker)
text(centers[,2], centers[,1], format(c(my.data),nsmall = 1), col= text.cols)

#add margin text
mtext(paste(attributes(my.data)$dimnames[[2]],"(%)"), at=1:ncol(my.data), padj = -1)
mtext(attributes(my.data)$dimnames[[1]], at=1:nrow(my.data), side = 2, las = 1, adj = 1.2)

#add black lines
abline(h=y + 0.5)
abline(v=x + 0.5)

color.picker <- function(z){

if(is.na(z)){return("black")}

else if(z <= 20){return("red")}

else if( z > 20 & z <= 80){return("darkorange4")}

else {return("darkgreen")}

}

x = 1:ncol(my.data)

y = 1:nrow(my.data)

centers <- expand.grid(y,x)

par(mar = c(2,7,4,2))

image(x, y, t(fill.data),

col = c(rgb(1,0,0,0.3),rgb(1,1,0,0.3), rgb(0,1,0,0.3)),

breaks = c(0, 20, 80, 100),

xaxt='n',

yaxt='n',

xlab='',

ylab='',

ylim = c(max(y) + 0.5, min(y) - 0.5)

)

#write in values

text.cols <- sapply(c(fill.data), color.picker)

text(centers[,2], centers[,1], format(c(my.data),nsmall = 1), col= text.cols)

#add margin text

mtext(paste(attributes(my.data)$dimnames[[2]],"(%)"), at=1:ncol(my.data), padj = -1)

mtext(attributes(my.data)$dimnames[[1]], at=1:nrow(my.data), side = 2, las = 1, adj = 1.2)

#add black lines

abline(h=y + 0.5)

abline(v=x + 0.5)

plot of chunk unnamed-chunk-16

So, it looks like this could become super–awkward if we had elaborate conditions to apply. This is where a packages like condformat and formattable come in handy. If you use the condformat package, you can include the table in an RMarkdown generated PDF or HTML document. However, the formattable() function, though capable of much prettier output, does not work with PDFs generated using RMarkdown.

First, here is a condformat example. Suppose we wanted to colourized CPOE in shades of green because CPOE is more operationally desirable and verbal/written orders in shades of red because they are less operationally desirable. We also want the red/yellow/green formatting in the Cosigned column. Using condformat we could do the following:

library(condformat)

my.data <- as.data.frame(my.data)

color.picker <- function(z){
  if(is.na(z)){return(0)}
  else if(z <= 20){return(1)}
  else if( z > 20 & z <= 80){return(2)}
  else {return(3)}
}

condformat(my.data) +
rule_fill_gradient(CPOE, low = rgb(1,1,1), high = rgb(0,1,0)) +
rule_fill_gradient(Verbal, low = rgb(1,1,1), high = rgb(1,0,0)) +
rule_fill_gradient(Written, low = rgb(1,1,1), high = rgb(1,0,0)) +
rule_fill_discrete(Cosigned, expression = sapply(Cosigned,
       color.picker),colours=c("0" = "white", "1" = "red", 
       "2" =  "yellow", "3" = "lightgreen"))

library(condformat)

my.data <- as.data.frame(my.data)

color.picker <- function(z){

if(is.na(z)){return(0)}

else if(z <= 20){return(1)}

else if( z > 20 & z <= 80){return(2)}

else {return(3)}

}

condformat(my.data) +

rule_fill_gradient(CPOE, low = rgb(1,1,1), high = rgb(0,1,0)) +

rule_fill_gradient(Verbal, low = rgb(1,1,1), high = rgb(1,0,0)) +

rule_fill_gradient(Written, low = rgb(1,1,1), high = rgb(1,0,0)) +

rule_fill_discrete(Cosigned, expression = sapply(Cosigned,

color.picker),colours=c("0" = "white", "1" = "red",

"2" = "yellow", "3" = "lightgreen"))

	CPOE	Verbal	Written	Cosigned
1	49.3	8.7	42.0	75.3
2	30.0	4.0	66.0	52.0
3	89.5	7.0	3.5	88.5
4	8.0	23.0	69.0	13.0

You can see that the rownames are suppressed with condformat(). You could circumvent this by putting the rownames into their own column. This package is pretty easy to use and with PDF rendering (shown below) it produces something more LaTeX-ish than what is shown above which was generated straight to HTML.

plot of chunk unnamed-chunk-18

For something more attractive looking, here is an example of something similar using the formattable package (borrowing heavily from the code author's examples ):

library(formattable)

color.picker <- function(z){
  if(is.na(z)){return("black")}
  else if(z <= 20){return("red")}
  else if( z > 20 & z <= 80){return("darkorange")}
  else {return("darkgreen")}
}

bg.picker <- function(z){
  if(is.na(z)){return("black")}
  else if(z <= 20){return("pink")}
  else if( z > 20 & z <= 80){return("yellow")}
  else {return("lightgreen")}
}

my.data <- as.data.frame(my.data)

formattable(my.data, list(
  CPOE = color_tile("white", "green"),
  Verbal = color_tile("white", "red"),
  Written = color_tile("white", "red"),
  Cosigned = formatter("span",
       style = x ~ style(display = "block",
       "border-radius" = "4px",
       "padding-right" = "4px",
       color = sapply(x,color.picker),
       "background-color" = sapply(x,bg.picker)),
       x ~ sprintf("%.2f (rank: %02d)", x, rank(-x)))
))

library(formattable)

color.picker <- function(z){

if(is.na(z)){return("black")}

else if(z <= 20){return("red")}

else if( z > 20 & z <= 80){return("darkorange")}

else {return("darkgreen")}

}

bg.picker <- function(z){

if(is.na(z)){return("black")}

else if(z <= 20){return("pink")}

else if( z > 20 & z <= 80){return("yellow")}

else {return("lightgreen")}

}

my.data <- as.data.frame(my.data)

formattable(my.data, list(

CPOE = color_tile("white", "green"),

Verbal = color_tile("white", "red"),

Written = color_tile("white", "red"),

Cosigned = formatter("span",

style = x ~ style(display = "block",

"border-radius" = "4px",

"padding-right" = "4px",

color = sapply(x,color.picker),

"background-color" = sapply(x,bg.picker)),

x ~ sprintf("%.2f (rank: %02d)", x, rank(-x)))

))

	CPOE	Verbal	Written	Cosigned
Med	49.3	8.7	42.0	75.30 (rank: 02)
Surg	30.0	4.0	66.0	52.00 (rank: 03)
ER	89.5	7.0	3.5	88.50 (rank: 01)
Orth	8.0	23.0	69.0	13.00 (rank: 04)

I hope that this points you in the right direction.

And as for conditions:

“If you declare with your mouth, “Jesus is Lord,” and believe in your heart that God raised him from the dead, you will be saved.”

Romans 10:9

Make Easy Heatmaps to Visualize your Turnaround Times

September 8, 2016 dtholmes@mail.ubc.ca

The Problem

In two previous posts, I discussed visualizing your turnaround times (TATs). These posts are here and here. One other nice way to visualize your TAT is by means of a heatmap. In particular, we would like to look at the TAT for every hour of the week in a single figure. This manner of dataviz bling seems to be particularly attractive to managers because it costs you $0 to do this with R, but with commercial tools like Tableau, you'd have to pay a fortune and, as with Excel, your report would not be readily reproducible. Further, to make it autogenerate a PDF would mean you had to fork out more money for a report-generation module. Pffft.

The Data

We're going to read in a year's worth of order times and result times for a stat immunoassay test offered to a particular ward. The data, as I've formatted it, has two columns, ord and res.

test.data <- read.csv("test_data.csv")
head(test.data)

test.data <- read.csv("test_data.csv")

head(test.data)

##                   ord                 res
## 1 2015-01-01 13:24:00 2015-01-01 14:29:00
## 2 2015-01-01 06:16:00 2015-01-01 07:43:00
## 3 2015-01-01 06:32:00 2015-01-01 07:43:00
## 4 2015-01-01 06:32:00 2015-01-01 07:43:00
## 5 2015-01-01 12:12:00 2015-01-01 13:13:00
## 6 2015-01-01 12:12:00 2015-01-01 13:13:00

## ord res

## 1 2015-01-01 13:24:00 2015-01-01 14:29:00

## 2 2015-01-01 06:16:00 2015-01-01 07:43:00

## 3 2015-01-01 06:32:00 2015-01-01 07:43:00

## 4 2015-01-01 06:32:00 2015-01-01 07:43:00

## 5 2015-01-01 12:12:00 2015-01-01 13:13:00

## 6 2015-01-01 12:12:00 2015-01-01 13:13:00

Now, of course, we want to look at data collected from a long period of time so that we can be sure that the observations we are not simply an artifact of recent instrument downtime, maintenance, or whoever happened to be running the instrument. This is why I chose a year's worth of data. We are going to visualize the median order-to-file TAT for this test.

Formatting and Calculations

To calculate the hourly medians, we'll need to be able to label every TAT with the day it was run and the hour in the day it was run. This is pretty easy with the lubridate package. We'll do three things:

We'll convert the dates to POSIXct objects
We'll use the difftime() function to calculate the TATs
We'll use the wday() function to determine which day of the week the specimen was run on
We'll pull out the hour of the day on which it was run with the format() function.

library("dplyr")
library("lubridate")
library("fields")
library("magrittr")

test.data$ord <- ymd_hms(test.data$ord)
test.data$res <- ymd_hms(test.data$res)
test.data <- mutate(test.data,otf = difftime(res,ord,units="min"))
test.data <- mutate(test.data,dow = wday(ord))
test.data <- mutate(test.data,hod = as.numeric(format(test.data$ord, "%H")))

library("dplyr")

library("lubridate")

library("fields")

library("magrittr")

test.data$ord <- ymd_hms(test.data$ord)

test.data$res <- ymd_hms(test.data$res)

test.data <- mutate(test.data,otf = difftime(res,ord,units="min"))

test.data <- mutate(test.data,dow = wday(ord))

test.data <- mutate(test.data,hod = as.numeric(format(test.data$ord, "%H")))

And now the data will look like this:

head(test.data)

1 2	head(test.data)

##                   ord                 res     otf dow hod
## 1 2015-01-01 13:24:00 2015-01-01 14:29:00 65 mins   5  13
## 2 2015-01-01 06:16:00 2015-01-01 07:43:00 87 mins   5   6
## 3 2015-01-01 06:32:00 2015-01-01 07:43:00 71 mins   5   6
## 4 2015-01-01 06:32:00 2015-01-01 07:43:00 71 mins   5   6
## 5 2015-01-01 12:12:00 2015-01-01 13:13:00 61 mins   5  12
## 6 2015-01-01 12:12:00 2015-01-01 13:13:00 61 mins   5  12

## ord res otf dow hod

## 1 2015-01-01 13:24:00 2015-01-01 14:29:00 65 mins 5 13

## 2 2015-01-01 06:16:00 2015-01-01 07:43:00 87 mins 5 6

## 3 2015-01-01 06:32:00 2015-01-01 07:43:00 71 mins 5 6

## 4 2015-01-01 06:32:00 2015-01-01 07:43:00 71 mins 5 6

## 5 2015-01-01 12:12:00 2015-01-01 13:13:00 61 mins 5 12

## 6 2015-01-01 12:12:00 2015-01-01 13:13:00 61 mins 5 12

where the order-to-file TAT is in the otf column, the day-of-week is in the dow column and the hour-of-day is in the hod column. Now we can cycle though the days of the week and the hours of the day and calculate the year's median TAT for each hour, storing it in a matrix:

#prepare an empty matrix
heat.data <- matrix(rep(NA,7*24),nrow = 7, ncol = 24)
#loop over the days and hours and calculate the median TAT
for(i in 1:7){
  for(j in 0:23){
    heat.data[i,j+1] <- subset(test.data, test.data$dow==i & test.data$hod==j)$otf %>% median
  }
}

#prepare an empty matrix

heat.data <- matrix(rep(NA,7*24),nrow = 7, ncol = 24)

#loop over the days and hours and calculate the median TAT

for(i in 1:7){

for(j in 0:23){

heat.data[i,j+1] <- subset(test.data, test.data$dow==i & test.data$hod==j)$otf %>% median

}

Making the Heatmap

There are many ways to make the heatmap but I am particularly fond of the appearance of surface plots made with the fields package.

image.plot(1:7,seq(from=0.5, to=23.5, by = 1),heat.data,axes=FALSE, 
           xlab = "Day of Week", ylab = "Hour of Day", ylim=c(0,24))
# the following pointless command is necessary to make the custom axis labels non-transparent
# google revealed this among a number of other workarounds.
points(0,0)
# now these will display properly
axis(side=1, at=1:7, labels=as.character(wday(1:7, label=TRUE)), las=2, cex.axis = 0.8)
axis(side=2, at= 0:24, labels=0:24, las=1, cex.axis=0.8)

image.plot(1:7,seq(from=0.5, to=23.5, by = 1),heat.data,axes=FALSE,

xlab = "Day of Week", ylab = "Hour of Day", ylim=c(0,24))

# the following pointless command is necessary to make the custom axis labels non-transparent

# google revealed this among a number of other workarounds.

points(0,0)

# now these will display properly

axis(side=1, at=1:7, labels=as.character(wday(1:7, label=TRUE)), las=2, cex.axis = 0.8)

axis(side=2, at= 0:24, labels=0:24, las=1, cex.axis=0.8)

plot of chunk unnamed-chunk-5

Overlay Printed Times

We can see that there is a morning slowdown that is particularly bad on Saturday. But what if we wanted to know the exact value for these eye-catching problem times? We'd have trouble, unless we overlaid some text.

It turns out that if you use white printing, you can't read the numbers when the background colour is yellow and green. There is a 64 colour gradient used in the image.plot() function, so I calculated which integers in 0–64 were the problem and found the TATs that would correspond. It turned out that colours 20–45 out of the 64 colours in the gradient are the problem. By this means, I can make the printing black over the yellows and greens but white everywhere else:

image.plot(1:7,seq(from=0.5, to=23.5, by = 1),heat.data,axes=FALSE, 
           xlab = "Day of Week", ylab = "Hour of Day", ylim=c(0,24))
points(0,0) #random command that resets par
axis(side=1, at=1:7, labels=as.character(wday(1:7, label=TRUE)), las=2, cex.axis = 0.8)
axis(side=2, at= 0:24, labels=0:24, las=1, cex.axis=0.8)

# calculate the lowest and highest TAT
min.z <- min(heat.data)
max.z <- max(heat.data)
# determine which TAT's will have yellow to green shading
z.yellows <- min.z + (max.z - min.z)/64*c(20,45) 
# print the labels
for(i in 1:7){
  for(j in 1:24){
    if((heat.data[i,j] > z.yellows[1])&(heat.data[i,j] < z.yellows[2])){
      text(i,j-0.5,heat.data[i,j], col="black", cex = 0.8)
    }else{
      text(i,j-0.5,heat.data[i,j], col="white", cex = 0.8)     
    }
  }
}

image.plot(1:7,seq(from=0.5, to=23.5, by = 1),heat.data,axes=FALSE,

xlab = "Day of Week", ylab = "Hour of Day", ylim=c(0,24))

points(0,0) #random command that resets par

axis(side=1, at=1:7, labels=as.character(wday(1:7, label=TRUE)), las=2, cex.axis = 0.8)

axis(side=2, at= 0:24, labels=0:24, las=1, cex.axis=0.8)

# calculate the lowest and highest TAT

min.z <- min(heat.data)

max.z <- max(heat.data)

# determine which TAT's will have yellow to green shading

z.yellows <- min.z + (max.z - min.z)/64*c(20,45)

# print the labels

for(i in 1:7){

for(j in 1:24){

if((heat.data[i,j] > z.yellows[1])&(heat.data[i,j] < z.yellows[2])){

text(i,j-0.5,heat.data[i,j], col="black", cex = 0.8)

}else{

text(i,j-0.5,heat.data[i,j], col="white", cex = 0.8)

}

plot of chunk unnamed-chunk-6

So, that is not too bad, and if you wanted to look at the 75th percentile instead you would only have to adjust the heat.data calculation as follows:

#prepare an empty matrix
heat.data <- matrix(rep(NA,7*24),nrow = 7, ncol = 24)
#loop over the days and hours and calculate the median TAT
for(i in 1:7){
  for(j in 0:23){
    heat.data[i,j+1] <- subset(test.data, test.data$dow==i & test.data$hod==j)$otf %>% quantile(.,probs=0.75)
  }
}

#prepare an empty matrix

heat.data <- matrix(rep(NA,7*24),nrow = 7, ncol = 24)

#loop over the days and hours and calculate the median TAT

for(i in 1:7){

for(j in 0:23){

heat.data[i,j+1] <- subset(test.data, test.data$dow==i & test.data$hod==j)$otf %>% quantile(.,probs=0.75)

}

And this is what you will get.

plot of chunk unnamed-chunk-8

Hmmm…we'd better look at Saturday morning, 6 am. I hope you have found this helpful.

And as for heat

“He will sit as a refiner and purifier of silver”

Malachi 3:3

Flat File Interface your Mass Spectrometer to the Laboratory Information System with R

February 27, 2016March 1, 2016 dtholmes@mail.ubc.ca

The Problem

As Clinical Pathologists we work hard to create laboratory developed tests (LDTs) using liquid chromatography and tandem mass spectrometry (LC-MS/MS) that are robust, repeatable, accurate and have a wider dynamic range than commercial immunoassays. In our experience, properly developed LC-MS/MS assays are much less expensive and outperform their commercial immunoassay counterparts from an analytical standpoint.

However, despite mass spectrometry's communal obsession with analytical performance of our LDTs, sometimes we overlook the matter of handling the data we generate. Unlike traditional diagnostic companies (e.g. Siemens, Roche) who take care of upload and download of patient data and results via HL7 streams to the laboratory information system (LIS), mass spectrometry companies have not yet made this a priority. This leaves us either paying out a lot of money for custom middleware solutions or manually transcribing our LC-MS/MS results.

We might naively think, “How bad can the transcription be?” but over time, it becomes painfully evident that manual transcription of result is tedious, error–prone and inefficient use of tech–time.

Many LIS vendors offer what is called a “flat-file interface”. In this case, there is no HL7 stream generated using a communication socket between instrument and LIS. Rather, the results are saved in an ASCII text file with a pre-defined format and then transferred to the LIS via a secure shell (SSH) connection.

For this post, we are going to take some sample flat files from a SCIEX API5000 triple quadrupole mass spectrometer and prepare a flat file for the SunQuest LIS. Please note that this code is provided to you as is under the GNU Public Licence and without any guarantee. You know how all the LC-MS/MS vendors say their instruments are for “research use only”? –yeah, I'm giving this to you in the same spirit. If you use or modify it, you do so at your own risk. Any changes to how your flatfile is generated by your mass spectrometer or any upgrades to your LC-MS/MS software could make this code malfunction. You have been warned.

The Required Format

SunQuest requires the output file to be a comma separated values (CSV) file with a unique specimen or internal QC result in each row. The first column is the instrument ID, the second columns is the specimen container ID (an E followed by a 10–digit integer), the third is testcode and the fourth is the result. The file itself is required to have a time–stamp so that it has a traceable name and should have no header. For an instrument named PAPI (short for Providence API 5000) and a testcode TES (for testosterone), the file might look like this:

PAPI,E2324434511,TES,3.12 
PAPI,E2324434542,TES,8.75 
PAPI,E2324434565,TES,25.34 
...

PAPI,E2324434511,TES,3.12

PAPI,E2324434542,TES,8.75

PAPI,E2324434565,TES,25.34

...

The Starting Material

After we have completed an analytical run and reviewed all peaks to generate our fileable results, we can export the quatified sample batch to an ASCII text file. The file contains a whole lot of diagnostic information about the run like which multiple reaction monitoring (MRM) transitions we used, what the internal standard (IS) counts were, results from the quantifier and qualifier ion, fitted values for the calibrators etc. There are more than 80 columns in a typical file and we could talk about all the things we might do with this data but in this case, we are concerned with extracting and preparing the results file.

Dialogue Box

If we are actually going to make an R script usable by a human, it would be good to be able to choose which file we want to process and what test we want to extract using a simple graphical user interface (GUI). There are a number of tools one can use to build GUIs in R but the most rudimentary is TclTk. I have to confess that I find the language constructs for GUI creation both non–intuitive and boring. For this reason, I present without discussion, a modification of a recipe for creating a box with radio–buttons. We are going to choose which of three analytes (you can increase this number as you please) for which we wish to process a flat–file. These are: aldosterone, cortisol and testosterone. Please note that if you execute this code on a Mac, you will have to install XQuartz because Macs don't have native X-windows support despite the BSD Linux heritage of OSX.

library(tcltk2)
#make a radiobutton widget
#source for tk widget modifed from http://www.sciviews.org/recipes/tcltk/TclTk-radiobuttons/
#accessed Feb 10, 2016
win1 <- tktoplevel()
win1$env$rb1 <- tk2radiobutton(win1)
win1$env$rb2 <- tk2radiobutton(win1)
win1$env$rb3 <- tk2radiobutton(win1)
rbValue <- tclVar("Aldosterone")
tkconfigure(win1$env$rb1, variable = rbValue, value = "Aldosterone")
tkconfigure(win1$env$rb2, variable = rbValue, value = "Cortisol")
tkconfigure(win1$env$rb3, variable = rbValue, value = "Testosterone")
tkgrid(tk2label(win1, text = "Which analyte are you processing?"),
       columnspan = 2, padx = 10, pady = c(15, 5))
tkgrid(tk2label(win1, text = "Aldosterone"), win1$env$rb1,
       padx = 10, pady = c(0, 5))
tkgrid(tk2label(win1,text = "Cortisol"), win1$env$rb2,
       padx = 10, pady = c(0, 5))
tkgrid(tk2label(win1,text = "Testosterone"), win1$env$rb3,
       padx = 10, pady = c(0, 15))

onOK <- function() {
  rbVal <- as.character(tclvalue(rbValue))
  tkdestroy(win1)
}

win1$env$butOK <- tk2button(win1, text = "OK", width = -6, command = onOK)
tkgrid(win1$env$butOK, columnspan = 2, padx = 10, pady = c(5, 15))
tkfocus(win1)
#this final line is necessary to prevent to the program from proceeding until this radio button widget has closed
tkwait.window(win1)

library(tcltk2)

#make a radiobutton widget

#source for tk widget modifed from http://www.sciviews.org/recipes/tcltk/TclTk-radiobuttons/

#accessed Feb 10, 2016

win1 <- tktoplevel()

win1$env$rb1 <- tk2radiobutton(win1)

win1$env$rb2 <- tk2radiobutton(win1)

win1$env$rb3 <- tk2radiobutton(win1)

rbValue <- tclVar("Aldosterone")

tkconfigure(win1$env$rb1, variable = rbValue, value = "Aldosterone")

tkconfigure(win1$env$rb2, variable = rbValue, value = "Cortisol")

tkconfigure(win1$env$rb3, variable = rbValue, value = "Testosterone")

tkgrid(tk2label(win1, text = "Which analyte are you processing?"),

columnspan = 2, padx = 10, pady = c(15, 5))

tkgrid(tk2label(win1, text = "Aldosterone"), win1$env$rb1,

padx = 10, pady = c(0, 5))

tkgrid(tk2label(win1,text = "Cortisol"), win1$env$rb2,

padx = 10, pady = c(0, 5))

tkgrid(tk2label(win1,text = "Testosterone"), win1$env$rb3,

padx = 10, pady = c(0, 15))

onOK <- function() {

rbVal <- as.character(tclvalue(rbValue))

tkdestroy(win1)

}

win1$env$butOK <- tk2button(win1, text = "OK", width = -6, command = onOK)

tkgrid(win1$env$butOK, columnspan = 2, padx = 10, pady = c(5, 15))

tkfocus(win1)

#this final line is necessary to prevent to the program from proceeding until this radio button widget has closed

tkwait.window(win1)

This will give us the following pop-up window with radiobuttons in which I have selected testosterone.

You will notice that Tk windows do not appear native to the operating system. We can live with this because we are not shallow.

After you hit the OK button, the Tk widget then puts the chosen value into an Tk variable called rbValue. We can determine the value using the command tclvalue(rbValue). The reason we need to know which analyte we are working with is because the name of the MRM we want to pull out of the flat file is dependent on the analyte of course. We will also need to replace results below the limit of quantitation (LoQ) with “< x”, whatever x happens to be, which will be a different threshold for each analyte.

In our case, the testcodes for aldosterone, cortisol and testosterone are ALD,CORT and TES respectively, the LoQs are 50 pmol/L, 1 nmol/L and 0.05 nmol/L and the MRM names are “Aldo 1”, “Aldo 2”, “Cortisol 1”, “Cortisol 2” and “Testo 1” and “Testo 2” as we defined them within SCIEX Analyst Software. We will use the switch() function to define three variables (test.code, LoQ, and MRM.names) which we will use later to process the flat–file. We will also define the name of the worksheet in a variable called worksheet. These are the parameters you would have to change in order to modify the code for your purposes.

#set the testcode by test
test.code <- switch(tclvalue(rbValue),
                    "Aldosterone" = "ALD",
                    "Testosterone" = "TES",
                    "Cortisol" = "CORT"
)

#set the LoQ by test
LoQ <- switch(tclvalue(rbValue),
                    "Aldosterone" = "<50",
                    "Testosterone" = "<0.05",
                    "Cortisol" = "<1"
)

#set the MRM names by test
MRM.names <- switch(tclvalue(rbValue),
              "Aldosterone" = c("Aldo 1", "Aldo 2"),
              "Testosterone" = c("Testo 1", "Testo 2"),
              "Cortisol" = c("Cortisol 1", "Cortisol 2")
)

#set the worksheet name for your site
worksheet <- "PAPI"

#set the testcode by test

test.code <- switch(tclvalue(rbValue),

"Aldosterone" = "ALD",

"Testosterone" = "TES",

"Cortisol" = "CORT"

)

#set the LoQ by test

LoQ <- switch(tclvalue(rbValue),

"Aldosterone" = "<50",

"Testosterone" = "<0.05",

"Cortisol" = "<1"

)

#set the MRM names by test

MRM.names <- switch(tclvalue(rbValue),

"Aldosterone" = c("Aldo 1", "Aldo 2"),

"Testosterone" = c("Testo 1", "Testo 2"),

"Cortisol" = c("Cortisol 1", "Cortisol 2")

)

#set the worksheet name for your site

worksheet <- "PAPI"

Building File Names

Now we will prompt the user to tell them that they are to choose an instrument flat–file and we will determine the path of the chosen file. We will need the path to both read in the appropriate file but also to write the output later.

#choose the flat file to process
tkmessageBox(message="You are about to choose a flat file to process.")
flat.file.path <- tk_choose.files(default = "", caption = "Select File", multi = FALSE, filters = NULL, index = 1)
#determine the directory name of the chosen file
flat.file.dir <- dirname(flat.file.path)
#determine the file name of the chosen file
flat.file.name <- basename(flat.file.path)

#choose the flat file to process

tkmessageBox(message="You are about to choose a flat file to process.")

flat.file.path <- tk_choose.files(default = "", caption = "Select File", multi = FALSE, filters = NULL, index = 1)

#determine the directory name of the chosen file

flat.file.dir <- dirname(flat.file.path)

#determine the file name of the chosen file

flat.file.name <- basename(flat.file.path)

This code will create this message box:

and this file choice dialogue box:

and after a file is selected and the Open is pressed, the path to the flat–file is stored in the variable flat.file.path.

Behold: The Data

So we chosen the file we want to read in but what does this file look like? To just get a gander at it, we could open it with Excel and see how it is laid out. But since we have broken up with Excel, we won't do this. SCIEX Analyst exports tab (not comma) delimited files. R has a built in function read.delim() for reading these files but we will quickly discover that read.delim() assumes the files have a rectangular structure, having the same number of columns in each row. R will make assumptions about the shape of the data file based on the first few rows and then try to read it in. In this case, it will fail and you will get gibberish. To get this to work for us we will need to tell R how many rows to skip before the real data starts or we will need to tell R the number of columns the file has (which is not guaranteed to be consistent between versions of vendor software). There are lots of ways to do this but I think the simplest is to use grep().

I did this by reading the file in with no parsing of the tabs using the readLines() function. This function creates a vector for which each successive value is the entire content of the row of the file. I display the first 30 lines of the file. Suppose that we chose a testosterone flat file.

x <- readLines(flat.file.path)
x[1:30]

x <- readLines(flat.file.path)

x[1:30]

##  [1] "Peak Name: Testo-d3 2"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
##  [2] "Use as Internal Standard"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
##  [3] "Q1/Q3 Masses: 292.50/97.20 Da"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
##  [4] ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
##  [5] "Peak Name: Testo 1"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
##  [6] "Internal Standard: Testo-d3 2"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
##  [7] "Q1/Q3 Masses: 289.50/97.20 Da"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
##  [8] ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
##  [9] "Fit\tQuadratic\tWeighting\t1 / x\tIterate\tNo"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
## [10] "a0\t0.00658"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
## [11] "a1\t0.2"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
## [12] "a2\t-0.000443"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
## [13] "Correlation coefficient\t0.9999"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
## [14] "Use Area"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
## [15] ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
## [16] "Peak Name: Testo 2"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
## [17] "Internal Standard: Testo-d3 2"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
## [18] "Q1/Q3 Masses: 289.50/109.10 Da"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
## [19] ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
## [20] "Fit\tQuadratic\tWeighting\t1 / x\tIterate\tNo"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
## [21] "a0\t0.00359"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
## [22] "a1\t0.17"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
## [23] "a2\t-0.000313"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
## [24] "Correlation coefficient\t0.9999"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
## [25] "Use Area"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
## [26] ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
## [27] ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
## [28] ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
## [29] "Sample Name\tSample ID\tSample Type\tSample Comment\tSet Number\tAcquisition Method\tAcquisition Date\tRack Type\tRack Position\tVial Position\tPlate Type\tPlate Position\tFile Name\tDilution Factor\tWeight To Volume Ratio\tSample Annotation\tDisposition\tAnalyte Peak Name\tAnalyte Units\tAnalyte Peak Area (counts)\tAnalyte Peak Area for DAD (mAU x min)\tAnalyte Peak Height (cps)\tAnalyte Peak Height for DAD (mAU)\tAnalyte Concentration (nmol/L)\tAnalyte Retention Time (min)\tAnalyte Expected RT (min)\tAnalyte RT Window (sec)\tAnalyte Centroid Location (min)\tAnalyte Start Scan\tAnalyte Start Time (min)\tAnalyte Stop Scan\tAnalyte Stop Time (min)\tAnalyte Integration Type\tAnalyte Signal To Noise\tAnalyte Peak Width (min)\tStandard Query Status\tAnalyte Mass Ranges (Da)\tAnalyte Wavelength Ranges (nm)\tArea Ratio\tHeight Ratio\tAnalyte Annotation\tAnalyte Channel\tAnalyte Peak Width at 50% Height (min)\tAnalyte Slope of Baseline (%/min)\tAnalyte Processing Alg.\tAnalyte Peak Asymmetry\tAnalyte Integration Quality\tIS Peak Name\tIS Units\tIS Peak Area (counts)\tIS Peak Area for DAD (mAU x min)\tIS Peak Height (cps)\tIS Peak Height for DAD (mAU)\tIS Concentration (nmol/L)\tIS Retention Time (min)\tIS Expected RT (min)\tIS RT Window (sec)\tIS Centroid Location (min)\tIS Start Scan\tIS Start Time (min)\tIS Stop Scan\tIS Stop Time (min)\tIS Integration Type\tIS Signal To Noise\tIS Peak Width (min)\tIS Mass Ranges (Da)\tIS Wavelength Ranges (nm)\tIS Channel\tIS Peak Width at 50% Height (min)\tIS Slope of Baseline (%/min)\tIS Processing Alg.\tIS Peak Asymmetry\tIS Integration Quality\tUse Record\tRecord Modified\tCalculated Concentration (nmol/L)\tCalculated Concentration for DAD (nmol/L)\tRelative Retention Time\tAccuracy (%)\tResponse Factor\tAcq. Start Time (min)\tInjection Volume used\t"
## [30] "Blank\t\tBlank\tBlankMPX_SAMPLE_ID:189001;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;\t0\tTesto_DWP_S2.dam\t2/11/2013 5:06:45 PM\tDeep Well MTP 96 Cooled\t1\t1\tDeep Well MTP 96 Cooled\t2\t140305B1.wiff\t1.00\t0.00\t\t\tTesto 1\tnmol/L\t0\tN/A\t0.00e+000\tN/A\t0.00\t0.00\t1.29\t30.0\t0.00\t0\t0.00\t0\t0.00\tNo Peak\tN/A\t0.00\tN/A\t289.500/97.200 Da\tN/A\t0.00e+000\t0.00e+000\t\tN/A\t0.00\t0.00e+000\tSpecify Parameters - MQIII\t0.00\t0.00\tTesto-d3 2\tnmol/L\t158416\tN/A\t5.58e+004\tN/A\t1.00\t1.27\t1.29\t20.0\t1.28\t115\t1.18\t139\t1.43\tBase To Base\tN/A\t0.248\t292.500/97.200 Da\tN/A\tN/A\t4.36e-002\t1.06e+000\tSpecify Parameters - MQIII\t1.60\t0.956\t\t0\tN/A\tN/A\t0.00\tN/A\tN/A\t2.85\t25\t"

## [1] "Peak Name: Testo-d3 2"

## [2] "Use as Internal Standard"

## [3] "Q1/Q3 Masses: 292.50/97.20 Da"

## [4] ""

## [5] "Peak Name: Testo 1"

## [6] "Internal Standard: Testo-d3 2"

## [7] "Q1/Q3 Masses: 289.50/97.20 Da"

## [8] ""

## [9] "Fit\tQuadratic\tWeighting\t1 / x\tIterate\tNo"

## [10] "a0\t0.00658"

## [11] "a1\t0.2"

## [12] "a2\t-0.000443"

## [13] "Correlation coefficient\t0.9999"

## [14] "Use Area"

## [15] ""

## [16] "Peak Name: Testo 2"

## [17] "Internal Standard: Testo-d3 2"

## [18] "Q1/Q3 Masses: 289.50/109.10 Da"

## [19] ""

## [20] "Fit\tQuadratic\tWeighting\t1 / x\tIterate\tNo"

## [21] "a0\t0.00359"

## [22] "a1\t0.17"

## [23] "a2\t-0.000313"

## [24] "Correlation coefficient\t0.9999"

## [25] "Use Area"

## [26] ""

## [27] ""

## [28] ""

## [29] "Sample Name\tSample ID\tSample Type\tSample Comment\tSet Number\tAcquisition Method\tAcquisition Date\tRack Type\tRack Position\tVial Position\tPlate Type\tPlate Position\tFile Name\tDilution Factor\tWeight To Volume Ratio\tSample Annotation\tDisposition\tAnalyte Peak Name\tAnalyte Units\tAnalyte Peak Area (counts)\tAnalyte Peak Area for DAD (mAU x min)\tAnalyte Peak Height (cps)\tAnalyte Peak Height for DAD (mAU)\tAnalyte Concentration (nmol/L)\tAnalyte Retention Time (min)\tAnalyte Expected RT (min)\tAnalyte RT Window (sec)\tAnalyte Centroid Location (min)\tAnalyte Start Scan\tAnalyte Start Time (min)\tAnalyte Stop Scan\tAnalyte Stop Time (min)\tAnalyte Integration Type\tAnalyte Signal To Noise\tAnalyte Peak Width (min)\tStandard Query Status\tAnalyte Mass Ranges (Da)\tAnalyte Wavelength Ranges (nm)\tArea Ratio\tHeight Ratio\tAnalyte Annotation\tAnalyte Channel\tAnalyte Peak Width at 50% Height (min)\tAnalyte Slope of Baseline (%/min)\tAnalyte Processing Alg.\tAnalyte Peak Asymmetry\tAnalyte Integration Quality\tIS Peak Name\tIS Units\tIS Peak Area (counts)\tIS Peak Area for DAD (mAU x min)\tIS Peak Height (cps)\tIS Peak Height for DAD (mAU)\tIS Concentration (nmol/L)\tIS Retention Time (min)\tIS Expected RT (min)\tIS RT Window (sec)\tIS Centroid Location (min)\tIS Start Scan\tIS Start Time (min)\tIS Stop Scan\tIS Stop Time (min)\tIS Integration Type\tIS Signal To Noise\tIS Peak Width (min)\tIS Mass Ranges (Da)\tIS Wavelength Ranges (nm)\tIS Channel\tIS Peak Width at 50% Height (min)\tIS Slope of Baseline (%/min)\tIS Processing Alg.\tIS Peak Asymmetry\tIS Integration Quality\tUse Record\tRecord Modified\tCalculated Concentration (nmol/L)\tCalculated Concentration for DAD (nmol/L)\tRelative Retention Time\tAccuracy (%)\tResponse Factor\tAcq. Start Time (min)\tInjection Volume used\t"

## [30] "Blank\t\tBlank\tBlankMPX_SAMPLE_ID:189001;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;\t0\tTesto_DWP_S2.dam\t2/11/2013 5:06:45 PM\tDeep Well MTP 96 Cooled\t1\t1\tDeep Well MTP 96 Cooled\t2\t140305B1.wiff\t1.00\t0.00\t\t\tTesto 1\tnmol/L\t0\tN/A\t0.00e+000\tN/A\t0.00\t0.00\t1.29\t30.0\t0.00\t0\t0.00\t0\t0.00\tNo Peak\tN/A\t0.00\tN/A\t289.500/97.200 Da\tN/A\t0.00e+000\t0.00e+000\t\tN/A\t0.00\t0.00e+000\tSpecify Parameters - MQIII\t0.00\t0.00\tTesto-d3 2\tnmol/L\t158416\tN/A\t5.58e+004\tN/A\t1.00\t1.27\t1.29\t20.0\t1.28\t115\t1.18\t139\t1.43\tBase To Base\tN/A\t0.248\t292.500/97.200 Da\tN/A\tN/A\t4.36e-002\t1.06e+000\tSpecify Parameters - MQIII\t1.60\t0.956\t\t0\tN/A\tN/A\t0.00\tN/A\tN/A\t2.85\t25\t"

All of the \t's that you see are the tabs in the file which are has read in literally when we use readLines(). We can see that in this file nothing of use happens until line 29 but this is not consistent from file to file so we should not just assume that 29 is always the magic number where the good stuff begins. We can see that the line starting “Sample Name \t Sample ID” is the real starting point so we can determine how many lines to skip by using grep() and prepare for some error–handling with a variable called problem by which we can deal with the circumstance that no approriate starting row is identified.

skip.val <- grep("Sample Name\tSample ID", x, fixed = TRUE) - 1

#if no such row is found, then the wrong file has been chosen
if (length(skip.val)==0){
  problem <- TRUE
} else {
  problem <- FALSE
}

skip.val

skip.val <- grep("Sample Name\tSample ID", x, fixed = TRUE) - 1

#if no such row is found, then the wrong file has been chosen

if (length(skip.val)==0){

problem <- TRUE

} else {

problem <- FALSE

}

skip.val

## [1] 28

## [1] 28

Now that we know how many lines to skip we can read in the data:

my.data <- read.delim(flat.file.path, sep = "\t", strip.white = TRUE, skip = skip.val, header = TRUE, stringsAsFactors = FALSE)

1 2	my.data <- read.delim(flat.file.path, sep = "\t", strip.white = TRUE, skip = skip.val, header = TRUE, stringsAsFactors = FALSE)

We can have a look at the structure of this file

str(my.data)

1 2	str(my.data)

## 'data.frame':    196 obs. of  83 variables:
##  $ Sample.Name                              : chr  "Blank" "Blank" "STD1" "STD1" ...
##  $ Sample.ID                                : logi  NA NA NA NA NA NA ...
##  $ Sample.Type                              : chr  "Blank" "Blank" "Standard" "Standard" ...
##  $ Sample.Comment                           : chr  "BlankMPX_SAMPLE_ID:189001;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;" "BlankMPX_SAMPLE_ID:189001;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;" "StandardMPX_SAMPLE_ID:189002;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;" "StandardMPX_SAMPLE_ID:189002;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;" ...
##  $ Set.Number                               : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Acquisition.Method                       : chr  "Testo_DWP_S2.dam" "Testo_DWP_S2.dam" "Testo_DWP_S2.dam" "Testo_DWP_S2.dam" ...
##  $ Acquisition.Date                         : chr  "2/11/2013 5:06:45 PM" "2/11/2013 5:06:45 PM" "2/11/2013 5:13:14 PM" "2/11/2013 5:13:14 PM" ...
##  $ Rack.Type                                : chr  "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" ...
##  $ Rack.Position                            : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Vial.Position                            : int  1 1 13 13 25 25 37 37 49 49 ...
##  $ Plate.Type                               : chr  "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" ...
##  $ Plate.Position                           : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ File.Name                                : chr  "140305B1.wiff" "140305B1.wiff" "140305B1.wiff" "140305B1.wiff" ...
##  $ Dilution.Factor                          : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Weight.To.Volume.Ratio                   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Sample.Annotation                        : logi  NA NA NA NA NA NA ...
##  $ Disposition                              : logi  NA NA NA NA NA NA ...
##  $ Analyte.Peak.Name                        : chr  "Testo 1" "Testo 2" "Testo 1" "Testo 2" ...
##  $ Analyte.Units                            : chr  "nmol/L" "nmol/L" "nmol/L" "nmol/L" ...
##  $ Analyte.Peak.Area..counts.               : int  0 0 5273 3464 19412 16195 37994 32722 87815 74821 ...
##  $ Analyte.Peak.Area.for.DAD..mAU.x.min.    : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ Analyte.Peak.Height..cps.                : num  0 0 1830 1300 6620 5700 13600 11400 30900 26100 ...
##  $ Analyte.Peak.Height.for.DAD..mAU.        : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ Analyte.Concentration..nmol.L.           : chr  "0.00" "0.00" "0.108" "0.108" ...
##  $ Analyte.Retention.Time..min.             : num  0 0 1.29 1.29 1.29 1.28 1.29 1.29 1.29 1.29 ...
##  $ Analyte.Expected.RT..min.                : num  1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 ...
##  $ Analyte.RT.Window..sec.                  : num  30 30 30 30 30 30 30 30 30 30 ...
##  $ Analyte.Centroid.Location..min.          : num  0 0 1.3 1.29 1.29 1.29 1.3 1.3 1.3 1.29 ...
##  $ Analyte.Start.Scan                       : int  0 0 119 120 119 118 120 120 119 119 ...
##  $ Analyte.Start.Time..min.                 : num  0 0 1.22 1.23 1.22 1.21 1.23 1.23 1.22 1.22 ...
##  $ Analyte.Stop.Scan                        : int  0 0 137 130 137 135 135 138 141 139 ...
##  $ Analyte.Stop.Time..min.                  : num  0 0 1.41 1.33 1.41 1.39 1.39 1.42 1.45 1.43 ...
##  $ Analyte.Integration.Type                 : chr  "No Peak" "No Peak" "Base To Base" "Base To Base" ...
##  $ Analyte.Signal.To.Noise                  : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ Analyte.Peak.Width..min.                 : num  0 0 0.186 0.103 0.186 0.176 0.155 0.186 0.227 0.207 ...
##  $ Standard.Query.Status                    : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ Analyte.Mass.Ranges..Da.                 : chr  "289.500/97.200 Da" "289.500/109.100 Da" "289.500/97.200 Da" "289.500/109.100 Da" ...
##  $ Analyte.Wavelength.Ranges..nm.           : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ Area.Ratio                               : num  0 0 0.0304 0.02 0.113 0.094 0.219 0.188 0.468 0.398 ...
##  $ Height.Ratio                             : num  0 0 0.0304 0.0216 0.108 0.0933 0.225 0.189 0.471 0.398 ...
##  $ Analyte.Annotation                       : logi  NA NA NA NA NA NA ...
##  $ Analyte.Channel                          : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ Analyte.Peak.Width.at.50..Height..min.   : num  0 0 0.0447 0.0447 0.046 0.0432 0.0439 0.0454 0.0438 0.045 ...
##  $ Analyte.Slope.of.Baseline....min.        : num  0 0 16.4 23.2 2.89 7.54 4.65 3.46 0.631 3 ...
##  $ Analyte.Processing.Alg.                  : chr  "Specify Parameters - MQIII" "Specify Parameters - MQIII" "Specify Parameters - MQIII" "Specify Parameters - MQIII" ...
##  $ Analyte.Peak.Asymmetry                   : num  0 0 1.8 0.874 1.83 1.41 1.51 2.18 2.18 2 ...
##  $ Analyte.Integration.Quality              : num  0 0 0.379 0.233 0.697 0.62 0.794 0.765 0.907 0.875 ...
##  $ IS.Peak.Name                             : chr  "Testo-d3 2" "Testo-d3 2" "Testo-d3 2" "Testo-d3 2" ...
##  $ IS.Units                                 : chr  "nmol/L" "nmol/L" "nmol/L" "nmol/L" ...
##  $ IS.Peak.Area..counts.                    : int  158416 158416 173383 173383 172263 172263 173811 173811 187783 187783 ...
##  $ IS.Peak.Area.for.DAD..mAU.x.min.         : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ IS.Peak.Height..cps.                     : num  55800 55800 60100 60100 61200 61200 60300 60300 65700 65700 ...
##  $ IS.Peak.Height.for.DAD..mAU.             : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ IS.Concentration..nmol.L.                : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ IS.Retention.Time..min.                  : num  1.27 1.27 1.27 1.27 1.27 1.27 1.28 1.28 1.28 1.28 ...
##  $ IS.Expected.RT..min.                     : num  1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 ...
##  $ IS.RT.Window..sec.                       : num  20 20 20 20 20 20 20 20 20 20 ...
##  $ IS.Centroid.Location..min.               : num  1.28 1.28 1.28 1.28 1.28 1.28 1.28 1.28 1.28 1.28 ...
##  $ IS.Start.Scan                            : int  115 115 117 117 115 115 117 117 118 118 ...
##  $ IS.Start.Time..min.                      : num  1.18 1.18 1.2 1.2 1.18 1.18 1.2 1.2 1.21 1.21 ...
##  $ IS.Stop.Scan                             : int  139 139 140 140 140 140 139 139 139 139 ...
##  $ IS.Stop.Time..min.                       : num  1.43 1.43 1.44 1.44 1.44 1.44 1.43 1.43 1.43 1.43 ...
##  $ IS.Integration.Type                      : chr  "Base To Base" "Base To Base" "Base To Base" "Base To Base" ...
##  $ IS.Signal.To.Noise                       : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ IS.Peak.Width..min.                      : num  0.248 0.248 0.238 0.238 0.258 0.258 0.227 0.227 0.217 0.217 ...
##  $ IS.Mass.Ranges..Da.                      : chr  "292.500/97.200 Da" "292.500/97.200 Da" "292.500/97.200 Da" "292.500/97.200 Da" ...
##  $ IS.Wavelength.Ranges..nm.                : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ IS.Channel                               : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ IS.Peak.Width.at.50..Height..min.        : num  0.0436 0.0436 0.0445 0.0445 0.0435 0.0435 0.0455 0.0455 0.0451 0.0451 ...
##  $ IS.Slope.of.Baseline....min.             : num  1.06 1.06 1.17 1.17 1.39 1.39 1.51 1.51 1.93 1.93 ...
##  $ IS.Processing.Alg.                       : chr  "Specify Parameters - MQIII" "Specify Parameters - MQIII" "Specify Parameters - MQIII" "Specify Parameters - MQIII" ...
##  $ IS.Peak.Asymmetry                        : num  1.6 1.6 2.19 2.19 1.77 1.77 1.88 1.88 2.18 2.18 ...
##  $ IS.Integration.Quality                   : num  0.956 0.956 0.971 0.971 0.968 0.968 0.969 0.969 0.97 0.97 ...
##  $ Use.Record                               : int  NA NA 1 1 1 1 1 1 1 1 ...
##  $ Record.Modified                          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Calculated.Concentration..nmol.L.        : chr  "N/A" "N/A" "0.119" "0.0962" ...
##  $ Calculated.Concentration.for.DAD..nmol.L.: chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ Relative.Retention.Time                  : num  0 0 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 ...
##  $ Accuracy....                             : chr  "N/A" "N/A" "111." "89.1" ...
##  $ Response.Factor                          : chr  "N/A" "N/A" "0.282" "0.185" ...
##  $ Acq..Start.Time..min.                    : num  2.85 2.85 2.85 2.85 2.85 2.85 2.85 2.85 2.85 2.85 ...
##  $ Injection.Volume.used                    : int  25 25 25 25 25 25 25 25 25 25 ...
##  $ X                                        : logi  NA NA NA NA NA NA ...

## 'data.frame': 196 obs. of 83 variables:

## $ Sample.Name : chr "Blank" "Blank" "STD1" "STD1" ...

## $ Sample.ID : logi NA NA NA NA NA NA ...

## $ Sample.Type : chr "Blank" "Blank" "Standard" "Standard" ...

## $ Sample.Comment : chr "BlankMPX_SAMPLE_ID:189001;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;" "BlankMPX_SAMPLE_ID:189001;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;" "StandardMPX_SAMPLE_ID:189002;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;" "StandardMPX_SAMPLE_ID:189002;Stream Number:2;Plate Code:Deep Well MTP 96 Cooled;Injection Volume:25;" ...

## $ Set.Number : int 0 0 0 0 0 0 0 0 0 0 ...

## $ Acquisition.Method : chr "Testo_DWP_S2.dam" "Testo_DWP_S2.dam" "Testo_DWP_S2.dam" "Testo_DWP_S2.dam" ...

## $ Acquisition.Date : chr "2/11/2013 5:06:45 PM" "2/11/2013 5:06:45 PM" "2/11/2013 5:13:14 PM" "2/11/2013 5:13:14 PM" ...

## $ Rack.Type : chr "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" ...

## $ Rack.Position : int 1 1 1 1 1 1 1 1 1 1 ...

## $ Vial.Position : int 1 1 13 13 25 25 37 37 49 49 ...

## $ Plate.Type : chr "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" "Deep Well MTP 96 Cooled" ...

## $ Plate.Position : int 2 2 2 2 2 2 2 2 2 2 ...

## $ File.Name : chr "140305B1.wiff" "140305B1.wiff" "140305B1.wiff" "140305B1.wiff" ...

## $ Dilution.Factor : num 1 1 1 1 1 1 1 1 1 1 ...

## $ Weight.To.Volume.Ratio : num 0 0 0 0 0 0 0 0 0 0 ...

## $ Sample.Annotation : logi NA NA NA NA NA NA ...

## $ Disposition : logi NA NA NA NA NA NA ...

## $ Analyte.Peak.Name : chr "Testo 1" "Testo 2" "Testo 1" "Testo 2" ...

## $ Analyte.Units : chr "nmol/L" "nmol/L" "nmol/L" "nmol/L" ...

## $ Analyte.Peak.Area..counts. : int 0 0 5273 3464 19412 16195 37994 32722 87815 74821 ...

## $ Analyte.Peak.Area.for.DAD..mAU.x.min. : chr "N/A" "N/A" "N/A" "N/A" ...

## $ Analyte.Peak.Height..cps. : num 0 0 1830 1300 6620 5700 13600 11400 30900 26100 ...

## $ Analyte.Peak.Height.for.DAD..mAU. : chr "N/A" "N/A" "N/A" "N/A" ...

## $ Analyte.Concentration..nmol.L. : chr "0.00" "0.00" "0.108" "0.108" ...

## $ Analyte.Retention.Time..min. : num 0 0 1.29 1.29 1.29 1.28 1.29 1.29 1.29 1.29 ...

## $ Analyte.Expected.RT..min. : num 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 ...

## $ Analyte.RT.Window..sec. : num 30 30 30 30 30 30 30 30 30 30 ...

## $ Analyte.Centroid.Location..min. : num 0 0 1.3 1.29 1.29 1.29 1.3 1.3 1.3 1.29 ...

## $ Analyte.Start.Scan : int 0 0 119 120 119 118 120 120 119 119 ...

## $ Analyte.Start.Time..min. : num 0 0 1.22 1.23 1.22 1.21 1.23 1.23 1.22 1.22 ...

## $ Analyte.Stop.Scan : int 0 0 137 130 137 135 135 138 141 139 ...

## $ Analyte.Stop.Time..min. : num 0 0 1.41 1.33 1.41 1.39 1.39 1.42 1.45 1.43 ...

## $ Analyte.Integration.Type : chr "No Peak" "No Peak" "Base To Base" "Base To Base" ...

## $ Analyte.Signal.To.Noise : chr "N/A" "N/A" "N/A" "N/A" ...

## $ Analyte.Peak.Width..min. : num 0 0 0.186 0.103 0.186 0.176 0.155 0.186 0.227 0.207 ...

## $ Standard.Query.Status : chr "N/A" "N/A" "N/A" "N/A" ...

## $ Analyte.Mass.Ranges..Da. : chr "289.500/97.200 Da" "289.500/109.100 Da" "289.500/97.200 Da" "289.500/109.100 Da" ...

## $ Analyte.Wavelength.Ranges..nm. : chr "N/A" "N/A" "N/A" "N/A" ...

## $ Area.Ratio : num 0 0 0.0304 0.02 0.113 0.094 0.219 0.188 0.468 0.398 ...

## $ Height.Ratio : num 0 0 0.0304 0.0216 0.108 0.0933 0.225 0.189 0.471 0.398 ...

## $ Analyte.Annotation : logi NA NA NA NA NA NA ...

## $ Analyte.Channel : chr "N/A" "N/A" "N/A" "N/A" ...

## $ Analyte.Peak.Width.at.50..Height..min. : num 0 0 0.0447 0.0447 0.046 0.0432 0.0439 0.0454 0.0438 0.045 ...

## $ Analyte.Slope.of.Baseline....min. : num 0 0 16.4 23.2 2.89 7.54 4.65 3.46 0.631 3 ...

## $ Analyte.Processing.Alg. : chr "Specify Parameters - MQIII" "Specify Parameters - MQIII" "Specify Parameters - MQIII" "Specify Parameters - MQIII" ...

## $ Analyte.Peak.Asymmetry : num 0 0 1.8 0.874 1.83 1.41 1.51 2.18 2.18 2 ...

## $ Analyte.Integration.Quality : num 0 0 0.379 0.233 0.697 0.62 0.794 0.765 0.907 0.875 ...

## $ IS.Peak.Name : chr "Testo-d3 2" "Testo-d3 2" "Testo-d3 2" "Testo-d3 2" ...

## $ IS.Units : chr "nmol/L" "nmol/L" "nmol/L" "nmol/L" ...

## $ IS.Peak.Area..counts. : int 158416 158416 173383 173383 172263 172263 173811 173811 187783 187783 ...

## $ IS.Peak.Area.for.DAD..mAU.x.min. : chr "N/A" "N/A" "N/A" "N/A" ...

## $ IS.Peak.Height..cps. : num 55800 55800 60100 60100 61200 61200 60300 60300 65700 65700 ...

## $ IS.Peak.Height.for.DAD..mAU. : chr "N/A" "N/A" "N/A" "N/A" ...

## $ IS.Concentration..nmol.L. : num 1 1 1 1 1 1 1 1 1 1 ...

## $ IS.Retention.Time..min. : num 1.27 1.27 1.27 1.27 1.27 1.27 1.28 1.28 1.28 1.28 ...

## $ IS.Expected.RT..min. : num 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 1.29 ...

## $ IS.RT.Window..sec. : num 20 20 20 20 20 20 20 20 20 20 ...

## $ IS.Centroid.Location..min. : num 1.28 1.28 1.28 1.28 1.28 1.28 1.28 1.28 1.28 1.28 ...

## $ IS.Start.Scan : int 115 115 117 117 115 115 117 117 118 118 ...

## $ IS.Start.Time..min. : num 1.18 1.18 1.2 1.2 1.18 1.18 1.2 1.2 1.21 1.21 ...

## $ IS.Stop.Scan : int 139 139 140 140 140 140 139 139 139 139 ...

## $ IS.Stop.Time..min. : num 1.43 1.43 1.44 1.44 1.44 1.44 1.43 1.43 1.43 1.43 ...

## $ IS.Integration.Type : chr "Base To Base" "Base To Base" "Base To Base" "Base To Base" ...

## $ IS.Signal.To.Noise : chr "N/A" "N/A" "N/A" "N/A" ...

## $ IS.Peak.Width..min. : num 0.248 0.248 0.238 0.238 0.258 0.258 0.227 0.227 0.217 0.217 ...

## $ IS.Mass.Ranges..Da. : chr "292.500/97.200 Da" "292.500/97.200 Da" "292.500/97.200 Da" "292.500/97.200 Da" ...

## $ IS.Wavelength.Ranges..nm. : chr "N/A" "N/A" "N/A" "N/A" ...

## $ IS.Channel : chr "N/A" "N/A" "N/A" "N/A" ...

## $ IS.Peak.Width.at.50..Height..min. : num 0.0436 0.0436 0.0445 0.0445 0.0435 0.0435 0.0455 0.0455 0.0451 0.0451 ...

## $ IS.Slope.of.Baseline....min. : num 1.06 1.06 1.17 1.17 1.39 1.39 1.51 1.51 1.93 1.93 ...

## $ IS.Processing.Alg. : chr "Specify Parameters - MQIII" "Specify Parameters - MQIII" "Specify Parameters - MQIII" "Specify Parameters - MQIII" ...

## $ IS.Peak.Asymmetry : num 1.6 1.6 2.19 2.19 1.77 1.77 1.88 1.88 2.18 2.18 ...

## $ IS.Integration.Quality : num 0.956 0.956 0.971 0.971 0.968 0.968 0.969 0.969 0.97 0.97 ...

## $ Use.Record : int NA NA 1 1 1 1 1 1 1 1 ...

## $ Record.Modified : int 0 0 0 0 0 0 0 0 0 0 ...

## $ Calculated.Concentration..nmol.L. : chr "N/A" "N/A" "0.119" "0.0962" ...

## $ Calculated.Concentration.for.DAD..nmol.L.: chr "N/A" "N/A" "N/A" "N/A" ...

## $ Relative.Retention.Time : num 0 0 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 ...

## $ Accuracy.... : chr "N/A" "N/A" "111." "89.1" ...

## $ Response.Factor : chr "N/A" "N/A" "0.282" "0.185" ...

## $ Acq..Start.Time..min. : num 2.85 2.85 2.85 2.85 2.85 2.85 2.85 2.85 2.85 2.85 ...

## $ Injection.Volume.used : int 25 25 25 25 25 25 25 25 25 25 ...

## $ X : logi NA NA NA NA NA NA ...

Just Tell Me the Results

And we see that there is lots of stuff we don't need. What we do need are the columns titled “Sample.Name” (which is the specimen container ID in this case), the “Analyte.Peak.Name” (which is the MRM, either quantifier or qualifier), and the one whose name starts with “Calculated.Concentration..”. The last of these also contains the units of measure which is analyte–dependent. To get rid of this analyte–dependence of the column name, we can find out which column this is and rename it:

conc.col.num <- grep("Calculated.Concentration..",names(my.data), fixed = TRUE)
names(my.data)[conc.col.num]<- "Calculated.Concentration"

conc.col.num <- grep("Calculated.Concentration..",names(my.data), fixed = TRUE)

names(my.data)[conc.col.num]<- "Calculated.Concentration"

Now we can pull out the three columns of interest and put them into a dataframe named results.

#pull out the columns of interest
results <- my.data[,c("Sample.Name", "Analyte.Peak.Name","Calculated.Concentration")]
names(results) <- c("sampleID", "mrm", "conc")

#pull out the columns of interest

results <- my.data[,c("Sample.Name", "Analyte.Peak.Name","Calculated.Concentration")]

names(results) <- c("sampleID", "mrm", "conc")

Now we only need the quantifier ion results which we were defined by the user with Tk GUI, so we can pull them out with grep. I will pull out the qualifiers also but we do not need them unless we wanted to compute ion-ratios, for example.

#handle non-numeric results
quantifiers <- results[grep(MRM.names[1], results$mrm),]
quantifiers$conc <- as.numeric(quantifiers$conc)

#handle non-numeric results

quantifiers <- results[grep(MRM.names[1], results$mrm),]

quantifiers$conc <- as.numeric(quantifiers$conc)

## Warning: NAs introduced by coercion

1	## Warning: NAs introduced by coercion

qualifiers <- results[grep(MRM.names[2], results$mrm),]
qualifiers$conc <- as.numeric(qualifiers$conc)

qualifiers <- results[grep(MRM.names[2], results$mrm),]

qualifiers$conc <- as.numeric(qualifiers$conc)

## Warning: NAs introduced by coercion

1	## Warning: NAs introduced by coercion

Having pulled out the MRM of interest, we can define which rows correspond to standards, QC and patients by appropriate use of grep(). It happens that the CIDs all start with E followed by a 10 digit number so we can search for this pattern with a simple regular expression. Since we only need the QCs and patient data, the variable standards is calculated only as a matter of completeness.

#separate out sample types
standards <- grep("Blank|STD",quantifiers$sampleID)
qc <- grep("C-", quantifiers$sampleID)
#create a regular expression to identify samples (E followed by 10 digits)
regexp<-"(^E)([[:digit:]]{10})"
patients <-grep(pattern=regexp,quantifiers$sampleID)
output.data <- quantifiers[c(qc,patients),]

#separate out sample types

standards <- grep("Blank|STD",quantifiers$sampleID)

qc <- grep("C-", quantifiers$sampleID)

#create a regular expression to identify samples (E followed by 10 digits)

regexp<-"(^E)([[:digit:]]{10})"

patients <-grep(pattern=regexp,quantifiers$sampleID)

output.data <- quantifiers[c(qc,patients),]

Preparing Data for Output

Now we can prepare to write a dataframe corresponding to the required format of the output file. To do so, we'll need to find out how many rows we are writing and then prepare a vector of the same length repeating the name of the worksheet and testcode:

#prepare the final data
num.rows <- length(output.data$sampleID)
final.output.data <- data.frame(rep(worksheet,num.rows), output.data$sampleID, rep(test.code, num.rows), output.data$conc)
names(final.output.data) <- c("worksheet","sample","test","conc")

#prepare the final data

num.rows <- length(output.data$sampleID)

final.output.data <- data.frame(rep(worksheet,num.rows), output.data$sampleID, rep(test.code, num.rows), output.data$conc)

names(final.output.data) <- c("worksheet","sample","test","conc")

Now we can replace all the NA values that replaced “No Peak” with the correct LoQ according to which analyte we are looking at.

#to put LOQs in, we need to convert to character
#this assumes that all non numeric results are undetectable
final.output.data$conc <- as.character(final.output.data$conc)
final.output.data$conc[is.na(final.output.data$conc)] <- LoQ

#to put LOQs in, we need to convert to character

#this assumes that all non numeric results are undetectable

final.output.data$conc <- as.character(final.output.data$conc)

final.output.data$conc[is.na(final.output.data$conc)] <- LoQ

Our final.output.data dataframe looks like it behaved properly.

head(final.output.data,10)

1 2	head(final.output.data,10)

##    worksheet      sample test  conc
## 1       PAPI     C-LY1LR  TES 0.557
## 2       PAPI       C-LY1  TES  5.65
## 3       PAPI       C-LY2  TES  20.6
## 4       PAPI       C-LY3  TES  28.1
## 5       PAPI      C-PTES  TES 0.737
## 6       PAPI E1234083035  TES  1.04
## 7       PAPI E1234109065  TES  14.1
## 8       PAPI E1234086634  TES  19.2
## 9       PAPI E1234107491  TES    13
## 10      PAPI E1234114052  TES  18.6

## worksheet sample test conc

## 1 PAPI C-LY1LR TES 0.557

## 2 PAPI C-LY1 TES 5.65

## 3 PAPI C-LY2 TES 20.6

## 4 PAPI C-LY3 TES 28.1

## 5 PAPI C-PTES TES 0.737

## 6 PAPI E1234083035 TES 1.04

## 7 PAPI E1234109065 TES 14.1

## 8 PAPI E1234086634 TES 19.2

## 9 PAPI E1234107491 TES 13

## 10 PAPI E1234114052 TES 18.6

Timestamping, Writing and Archiving

And finally, we create directories to archive our data (if those directories do not exist) and write the files with an appropriate timestamp determined using Sys.time(). Since colons (i.e : ) don't play nice in all operating systems as filenames, we can use gsub() to get rid of them. We also pass along error messages or confirmation messages to the user as appropriate.

#If the data file happens to be empty because you selected the wrong file, abort
if(nrow(final.output.data)==0){
  tkmessageBox(message="Your flat file contained no patient data. Aborting file output")
} else if (nrow(final.output.data)>0) {

  #create the output directory if it does not exist
  if(!dir.exists(file.path(flat.file.dir, "Processed"))){
    dir.create(file.path(flat.file.dir, "Processed"))
  }

  if(!dir.exists(file.path(flat.file.dir, "Raw"))){
    dir.create(file.path(flat.file.dir, "Raw"))
  }

  #create a  ISO 8601 compliant timestamp - get rid of spaces and colons
  time.stamp <- gsub(":","", Sys.time(), fixed = TRUE)
  time.stamp <- gsub(" ","T", time.stamp, fixed = TRUE)

  #save a copy of the input file
  flat.file.copy.name <- paste(test.code,"_",time.stamp, "_Raw.txt", sep="")
  file.copy(flat.file.path, file.path(flat.file.dir,"Raw", flat.file.copy.name ))

  #write the final output file
  final.output.name <-  paste(test.code,"_",time.stamp, ".txt", sep="")
  final.output.path <- file.path(flat.file.dir,"Processed" ,final.output.name)
  write.table(file = final.output.path, final.output.data, quote = FALSE, row.names = FALSE, col.names = FALSE, sep = ",")

  #check that the file was created as expected
  if(file.exists(final.output.path)){
    tkmessageBox(message="Data successfully processed \n Check Processed directory")
  } else {
    tkmessageBox(message="Your file was not created. There was a problem")
  }
}

#If the data file happens to be empty because you selected the wrong file, abort

if(nrow(final.output.data)==0){

tkmessageBox(message="Your flat file contained no patient data. Aborting file output")

} else if (nrow(final.output.data)>0) {

#create the output directory if it does not exist

if(!dir.exists(file.path(flat.file.dir, "Processed"))){

dir.create(file.path(flat.file.dir, "Processed"))

}

if(!dir.exists(file.path(flat.file.dir, "Raw"))){

dir.create(file.path(flat.file.dir, "Raw"))

}

#create a ISO 8601 compliant timestamp - get rid of spaces and colons

time.stamp <- gsub(":","", Sys.time(), fixed = TRUE)

time.stamp <- gsub(" ","T", time.stamp, fixed = TRUE)

#save a copy of the input file

flat.file.copy.name <- paste(test.code,"_",time.stamp, "_Raw.txt", sep="")

file.copy(flat.file.path, file.path(flat.file.dir,"Raw", flat.file.copy.name ))

#write the final output file

final.output.name <- paste(test.code,"_",time.stamp, ".txt", sep="")

final.output.path <- file.path(flat.file.dir,"Processed" ,final.output.name)

write.table(file = final.output.path, final.output.data, quote = FALSE, row.names = FALSE, col.names = FALSE, sep = ",")

#check that the file was created as expected

if(file.exists(final.output.path)){

tkmessageBox(message="Data successfully processed \n Check Processed directory")

} else {

tkmessageBox(message="Your file was not created. There was a problem")

}

Finally, we would wrap all of the directory–creation and file–operation in an if statement tied to the variable called problem we created previously. You will see this in the final source–code linked below.

Other Things You Can Do

Now, you can easily modify this to deal with multiple anlytes that are always on the same run, such as Vitamin D2 and Vitamin D3. If you wanted to suppress results failing ion ratio criteria (which could be concentration–dependent of course) or if you had specimens unexpectedly low IS counts, you could easily censor them to prevent their upload and then review them manually. You could also append canned comments to your results with a dash between your result and the comment. In fact, you could theoretically develop very elaborate middleware for QC evaluation and interpretation. You could also use RMarkdown to generate PDF reports for the run which could include calibration curve plots, plots of quantifier results vs qualifier results, and results that fail various criteria.

Source

You can download the source code and three example flat files here. Setting the source–code up as a “clickable” script is somewhat dependent on the operating system you are working on. Since most of you will be on a windows system you can follow this tutorial. You can also use a windows batch file to call your script.

Final Thought

Now that your file is generated, it is read to upload via ssh. This is usually performed manually but could be automated. Don't implement this code into routine use unless you know what you are doing and you have tested it extensively. By using and/or modifying it, you become entirely responsible for its correct operation. Excel is like a butter knife and R is like Swiss Army Knife. You must be careful with it because…

From everyone who has been given much, much will be demanded; and from the one who has been entrusted with much, much more will be asked.

Luke 12:48

Making Youden Plots in R

October 15, 2015October 15, 2015 dtholmes@mail.ubc.ca

Background

I was honoured by a site visit by Drs. Yeo-Min Yun and Junghan Song of the Korean Society for Clinical Chemistry a few weeks ago. As both professors are on the organizing committee of the Cherry Blossom Symposium for Lab Automation in Seoul in Spring 2016, their primary motivation for visiting was to discuss mass spectrometry sample prep automation but later we got on to the topic of automated reporting for quality assurance schemes.

Naturally, I was promoting R, R-Markdown and knitr as a good pipeline for automated Quality Assurance (QA) report.

This brought to mind Youden Plots which are used by DGKL in their reports. I like DGKL reports for three reasons:

They are accuracy based against GC-MS when it comes to steroids.
I can see all the other LC-MS/MS methods immediately.
Youden plots look like a target and can be assessed rapidly.

The data for a Youden plot is generated by providing a number of laboratories aliquots from two separate unknown samples, which we will call A and B. Every lab analyzes both samples and a scatter plot of the A and B results are generated–the A results on the $x$–axis and the B results on the $y$–axis. Once this is completed, limits of acceptability are plotted and outliers can be identified.

In Youden's original formulation of the plot (see page 133-1 this online document) he required that the concentrations of the A and B samples be close to one another. As you might guess, in clinical medicine, this is not all that useful because we often want to test more than one part of the analytical range in an external quality assurance (EQA) scheme. One workaround for this is the make a Youden plot of the standard normal variates for the A and B samples, that is to plot $z_b = \frac{b_i-\bar{b}} {\sigma_b}$ vs $z_a = \frac{a_i-\bar{a}} {\sigma_a}$, where $a_i$ and $b_i$ are the individual values of the A and B samples from the $i$ labs. This has the disadvantage of representing the results in a manner that is not easily assessed from a clinical perspective.

While there are published approaches to coping with this problem, these are out of scope here but I will show you a couple of other ways I have seen Youden plots represented. If you want to see R code to generate the classic Youden Plot, it can be found in this this stackoverflow post and below.

Random Data

Let's start by generating some data. For the sake of argument, let's say we are looking at testosterone results in males and measured in nmol/L. Suppose that the A sample has a true concentration of 5.3 nmol/L and the B sample has a true concentration of 16.2 nmol/L. Let's also assume that they are all performed by the same analytical method. If you have looked at EQA reports, you will know that a scatter plot of results for the A and B samples does not typically look like this.

plot of chunk unnamed-chunk-1

The (mock) data abaove are bivariate Gaussian and uncorrelated. In reality we often see something that looks a little more like this:

plot of chunk unnamed-chunk-2

That is, the A and B values are usually correlated.

Rectangular Youden Plots

The most common manner in which you will see a Youden plot prepared is just a box with mean $\pm$ 2SD and $\pm$ 3SD limits.

plot(A,B, xlim = c(0,10), ylim = c(0,30), pch=19, col="#00000080")
grid()

A.mu <- mean(A)
A.sd <- sd(A)
B.mu <- mean(B)
B.sd <- sd(B)

#draw a box around the 2SD limit
rect(xleft = A.mu - 2*A.sd, ybottom = B.mu - 2*B.sd, xright = A.mu + 2*A.sd, ytop = B.mu + 2*B.sd, lwd = 2, border = "orange")
#draw a box around the 3SD limit
rect(xleft = A.mu - 3*A.sd, ybottom = B.mu - 3*B.sd, xright = A.mu + 3*A.sd, ytop = B.mu + 3*B.sd, lwd = 2, border = "red")


#draw a diagonal line - which is unnecessary but you will see people do it
lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd),c(B.mu - 3*B.sd, B.mu + 3*B.sd))
#draw horizontal line
lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd), c(B.mu, B.mu))
#draw vertical line
lines(c(A.mu, A.mu), c(B.mu - 3*B.sd, B.mu + 3*B.sd))

#add a legend
legend("topleft", c("2SD limit","3SD limit"), col=c("orange","red"), lty=c(1,1))

plot(A,B, xlim = c(0,10), ylim = c(0,30), pch=19, col="#00000080")

grid()

A.mu <- mean(A)

A.sd <- sd(A)

B.mu <- mean(B)

B.sd <- sd(B)

#draw a box around the 2SD limit

rect(xleft = A.mu - 2*A.sd, ybottom = B.mu - 2*B.sd, xright = A.mu + 2*A.sd, ytop = B.mu + 2*B.sd, lwd = 2, border = "orange")

#draw a box around the 3SD limit

rect(xleft = A.mu - 3*A.sd, ybottom = B.mu - 3*B.sd, xright = A.mu + 3*A.sd, ytop = B.mu + 3*B.sd, lwd = 2, border = "red")

#draw a diagonal line - which is unnecessary but you will see people do it

lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd),c(B.mu - 3*B.sd, B.mu + 3*B.sd))

#draw horizontal line

lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd), c(B.mu, B.mu))

#draw vertical line

lines(c(A.mu, A.mu), c(B.mu - 3*B.sd, B.mu + 3*B.sd))

#add a legend

legend("topleft", c("2SD limit","3SD limit"), col=c("orange","red"), lty=c(1,1))

plot of chunk unnamed-chunk-3

Non Parametric

Obviously, if you prefer, you could prepare this Youden plot in a non-parametric fashion by plotting the medians and the non-parametrically calculated 1st, 2.5th, 97.5th, and 99th percentiles. In this case, the code would be:

plot(A,B, xlim = c(0,10), ylim = c(0,30), pch=19, col="#00000080")
grid()

A.med <- median(A)
A.quants <- quantile(A, probs=c(0.01,0.025,0.975,0.99))
B.med <- median(B)
B.quants <- quantile(B, probs=c(0.01,0.025,0.975,0.99))

#draw a box around the central 95% limit
rect(xleft = A.quants[2], ybottom = B.quants[2], xright = A.quants[3], ytop = B.quants[3], lwd = 2, border = "orange")
#draw a box around the central 99% limit
rect(xleft = A.quants[1], ybottom = B.quants[1], xright = A.quants[4], ytop = B.quants[4], lwd = 2, border = "red")

#draw a vertical line
lines(c(A.quants[1],A.quants[4]),c(B.med,B.med))
#draw vertical line
lines(c(A.med,A.med),c(B.quants[1],B.quants[4]))

legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))

plot(A,B, xlim = c(0,10), ylim = c(0,30), pch=19, col="#00000080")

grid()

A.med <- median(A)

A.quants <- quantile(A, probs=c(0.01,0.025,0.975,0.99))

B.med <- median(B)

B.quants <- quantile(B, probs=c(0.01,0.025,0.975,0.99))

#draw a box around the central 95% limit

rect(xleft = A.quants[2], ybottom = B.quants[2], xright = A.quants[3], ytop = B.quants[3], lwd = 2, border = "orange")

#draw a box around the central 99% limit

rect(xleft = A.quants[1], ybottom = B.quants[1], xright = A.quants[4], ytop = B.quants[4], lwd = 2, border = "red")

#draw a vertical line

lines(c(A.quants[1],A.quants[4]),c(B.med,B.med))

#draw vertical line

lines(c(A.med,A.med),c(B.quants[1],B.quants[4]))

legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))

plot of chunk unnamed-chunk-4

Note that if you increase the number of points, this non-parametric plot will not quite converge to look like the parametric one shown above because $\mu \pm 2\sigma$ actually encompases 95.45% of the data in a univariate normal distribution, and $\mu \pm 3\sigma$ actually encompases 99.73% of the data.

FYI

Be ye not deceived. Even with the non-parametric approach or a parametric approach with the correct $z$-scores, the orange (“Central 95%” or “2SD”) boxes shown above do not house 95% of the data and the red (“Central 99%” or “3SD”) boxes do not house 99% of the data. You can see this pretty easily if you consider the case of uncorrelated data. Let's take 100000 random pairs of uncorrelated Gaussian data with $\mu = 10$ and $\sigma = 1$.

Five percent of data points are excluded by the vertical orange lines shown below at the $\mu_A \pm 1.96 \sigma_A$ and 5% of data are excluded by the horizontal orange lines positioned at $\mu_B \pm 1.96 \sigma_B$.

Points will fall into the one of the 4 areas shaded yellow 0.95 x 0.025 or 2.375 % of the time and points will fall into one of the 4 areas shaded purple 0.025 x 0.025 or 0.0625 % of the time. This means that the “2SD” box actually encloses 100 – 4×2.375 – 4×0.0625 % = 90.25 % of the data.

plot of chunk unnamed-chunk-5

Much more directly, the probability of both A and B falling inside the center square is 0.95×0.95 = 0.9025 = 90.25%.

You can do a random simulation to prove this to yourself:

#generate the data, n=10000
A <- rnorm(10^5,10,1)
B <- rnorm(10^5,10,1)
# convert results to z-scores
scale.df <- data.frame(scA = scale(A), scB = scale(B))
#calculate how many samples are inside the centre box
nrow(subset(scale.df, abs(scA)<1.96 & abs(scB)<1.96))

#generate the data, n=10000

A <- rnorm(10^5,10,1)

B <- rnorm(10^5,10,1)

# convert results to z-scores

scale.df <- data.frame(scA = scale(A), scB = scale(B))

#calculate how many samples are inside the centre box

nrow(subset(scale.df, abs(scA)<1.96 & abs(scB)<1.96))

## [1] 90245

1	## [1] 90245

This is pretty darn close to the 90.25% we were expecting.

Elliptical Youden Plots

The rectangular plot shown works (with caveats described) but there is something slightly undesirable about it because a point could be off in the corner, far away from the other data, but still inside the 3SD box. It seems much preferable to encircle the data with an ellipse. Fortunately, there is a built in function to achieve this in the car package which makes the code is very simple. The other nice thing is that the ellipses are actually calculated to house 95% and 99% of the data respectively.

library(car)
dataEllipse(A, B, levels=c(0.95, 0.99), fill = FALSE, xlim = c(0,10), ylim = c(0,30), pch = 19, col=c("#00000080", "red"), center.pch = FALSE)

library(car)

dataEllipse(A, B, levels=c(0.95, 0.99), fill = FALSE, xlim = c(0,10), ylim = c(0,30), pch = 19, col=c("#00000080", "red"), center.pch = FALSE)

plot of chunk unnamed-chunk-8

To generate a two–colour equivalent to what we have above we draw the Youden plot in two stages.

dataEllipse(A, B, levels = 0.95, fill = FALSE, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "orange"), center.pch = FALSE)

dataEllipse(A, B, levels = 0.99, fill = FALSE, col = c(NA, "red"), center.pch = FALSE, plot.points = FALSE)

legend("topleft", c("Central 95%","Central 99%"), col = c("orange","red"), lty=c(1,1))

dataEllipse(A, B, levels = 0.95, fill = FALSE, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "orange"), center.pch = FALSE)

dataEllipse(A, B, levels = 0.99, fill = FALSE, col = c(NA, "red"), center.pch = FALSE, plot.points = FALSE)

legend("topleft", c("Central 95%","Central 99%"), col = c("orange","red"), lty=c(1,1))

plot of chunk unnamed-chunk-9

Now we might want to add the horizontal, vertical lines.

#store the points of the outer ellipse in a matrix
outer.ellipse <- dataEllipse(A, B, levels = 0.99, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "red"), draw = TRUE, center.pch = FALSE)
#plot the innter ellipse
dataEllipse(A, B, levels = 0.95, fill = FALSE, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "orange"), center.pch = FALSE, plot.points = FALSE, add = TRUE)

#find the value of x that is closest to the mean value of A
vert.index <- which.min(abs(A.mu-outer.ellipse[,1]))
#calculate the vertical distance to centre
vert.dist <- outer.ellipse[,2][vert.index] - B.mu
#draw the line one end of ellipse to the other
lines(c(A.mu, A.mu), c(outer.ellipse[,2][vert.index], outer.ellipse[,2][vert.index] - 2*vert.dist))

#find the value of y that is closest to the mean valye of B
horiz.index <- which.min(abs(B.mu-outer.ellipse[,2]))
#calculate the horizonal distance to centre
horiz.dist <- outer.ellipse[,1][horiz.index] - A.mu
#draw the line one end of ellipse to the other
lines(c(outer.ellipse[,1][horiz.index], outer.ellipse[,1][horiz.index] - 2*horiz.dist), c(B.mu,B.mu))
#add a legend
legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))

#store the points of the outer ellipse in a matrix

outer.ellipse <- dataEllipse(A, B, levels = 0.99, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "red"), draw = TRUE, center.pch = FALSE)

#plot the innter ellipse

dataEllipse(A, B, levels = 0.95, fill = FALSE, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "orange"), center.pch = FALSE, plot.points = FALSE, add = TRUE)

#find the value of x that is closest to the mean value of A

vert.index <- which.min(abs(A.mu-outer.ellipse[,1]))

#calculate the vertical distance to centre

vert.dist <- outer.ellipse[,2][vert.index] - B.mu

#draw the line one end of ellipse to the other

lines(c(A.mu, A.mu), c(outer.ellipse[,2][vert.index], outer.ellipse[,2][vert.index] - 2*vert.dist))

#find the value of y that is closest to the mean valye of B

horiz.index <- which.min(abs(B.mu-outer.ellipse[,2]))

#calculate the horizonal distance to centre

horiz.dist <- outer.ellipse[,1][horiz.index] - A.mu

#draw the line one end of ellipse to the other

lines(c(outer.ellipse[,1][horiz.index], outer.ellipse[,1][horiz.index] - 2*horiz.dist), c(B.mu,B.mu))

#add a legend

legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))

plot of chunk unnamed-chunk-10

And if you wanted a little more groove you can add shading.

dataEllipse(A, B, levels = 0.99, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "red"), center.pch = FALSE, plot.points = TRUE, fill = TRUE, fill.alpha = 0.1)
dataEllipse(A, B, levels=0.95 ,xlim = c(0,10), ylim = c(0,30), pch = 19, col="orange", center.pch = FALSE, plot.points = FALSE, add = TRUE, fill = TRUE, fill.alpha = 0.3)
lines(c(A.mu, A.mu), c(outer.ellipse[,2][vert.index], outer.ellipse[,2][vert.index] - 2*vert.dist), col = "blue")
lines(c(outer.ellipse[,1][horiz.index], outer.ellipse[,1][horiz.index] - 2*horiz.dist), c(B.mu,B.mu), col = "blue")
legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))

dataEllipse(A, B, levels = 0.99, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "red"), center.pch = FALSE, plot.points = TRUE, fill = TRUE, fill.alpha = 0.1)

dataEllipse(A, B, levels=0.95 ,xlim = c(0,10), ylim = c(0,30), pch = 19, col="orange", center.pch = FALSE, plot.points = FALSE, add = TRUE, fill = TRUE, fill.alpha = 0.3)

lines(c(A.mu, A.mu), c(outer.ellipse[,2][vert.index], outer.ellipse[,2][vert.index] - 2*vert.dist), col = "blue")

lines(c(outer.ellipse[,1][horiz.index], outer.ellipse[,1][horiz.index] - 2*horiz.dist), c(B.mu,B.mu), col = "blue")

legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))

plot of chunk unnamed-chunk-11

Build a Function

We can also fold all of this into a function:

youden <- function(A,B,shape){

    A.mu <- mean(A)
    A.sd <- sd(A)
    B.mu <- mean(B)
    B.sd <- sd(B)

    plot(A,B, pch=19, col = "#00000080", xlim = c(A.mu - 5*A.sd,A.mu + 5*A.sd), ylim = c(B.mu - 5*B.sd,B.mu + 5*B.sd))
    grid()

  if (missing(shape)){
    shape <- "ellipse"
  }

  if (shape == "rectangle"){
    rect(xleft = A.mu - 2*A.sd, ybottom = B.mu - 2*B.sd, xright = A.mu + 2*A.sd, ytop = B.mu + 2*B.sd, lwd = 2, border = "orange")
    rect(xleft = A.mu - 3*A.sd, ybottom = B.mu - 3*B.sd, xright = A.mu + 3*A.sd, ytop = B.mu + 3*B.sd, lwd = 2, border = "red")
    lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd),c(B.mu - 3*B.sd, B.mu + 3*B.sd))
    lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd), c(B.mu, B.mu))
    lines(c(A.mu, A.mu), c(B.mu - 3*B.sd, B.mu + 3*B.sd))
    legend("topleft", c("2SD limit","3SD limit"), col=c("orange","red"), lty=c(1,1))

  } else if (shape=="ellipse") {
    outer.ellipse <- dataEllipse(A, B, levels = 0.99, pch = 19, col=c("#00000080", "red"), draw = TRUE, center.pch = FALSE, plot.points = FALSE)
    dataEllipse(A, B, levels = 0.95, fill = FALSE, pch = 19, col=c("#00000080", "orange"), center.pch = FALSE, plot.points = FALSE, add = TRUE)
    vert.index <- which.min(abs(A.mu-outer.ellipse[,1]))
    vert.dist <- outer.ellipse[,2][vert.index] - B.mu
    lines(c(A.mu, A.mu), c(outer.ellipse[,2][vert.index], outer.ellipse[,2][vert.index] - 2*vert.dist))
    horiz.index <- which.min(abs(B.mu-outer.ellipse[,2]))
    horiz.dist <- outer.ellipse[,1][horiz.index] - A.mu
    lines(c(outer.ellipse[,1][horiz.index], outer.ellipse[,1][horiz.index] - 2*horiz.dist), c(B.mu,B.mu))
    legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))
  }
}

#use the function
par(mfrow=c(1,2))
youden(A,B, shape = "rectangle")
youden(A,B, shape = "ellipse")

youden <- function(A,B,shape){

A.mu <- mean(A)

A.sd <- sd(A)

B.mu <- mean(B)

B.sd <- sd(B)

plot(A,B, pch=19, col = "#00000080", xlim = c(A.mu - 5*A.sd,A.mu + 5*A.sd), ylim = c(B.mu - 5*B.sd,B.mu + 5*B.sd))

grid()

if (missing(shape)){

shape <- "ellipse"

}

if (shape == "rectangle"){

rect(xleft = A.mu - 2*A.sd, ybottom = B.mu - 2*B.sd, xright = A.mu + 2*A.sd, ytop = B.mu + 2*B.sd, lwd = 2, border = "orange")

rect(xleft = A.mu - 3*A.sd, ybottom = B.mu - 3*B.sd, xright = A.mu + 3*A.sd, ytop = B.mu + 3*B.sd, lwd = 2, border = "red")

lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd),c(B.mu - 3*B.sd, B.mu + 3*B.sd))

lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd), c(B.mu, B.mu))

lines(c(A.mu, A.mu), c(B.mu - 3*B.sd, B.mu + 3*B.sd))

legend("topleft", c("2SD limit","3SD limit"), col=c("orange","red"), lty=c(1,1))

} else if (shape=="ellipse") {

outer.ellipse <- dataEllipse(A, B, levels = 0.99, pch = 19, col=c("#00000080", "red"), draw = TRUE, center.pch = FALSE, plot.points = FALSE)

dataEllipse(A, B, levels = 0.95, fill = FALSE, pch = 19, col=c("#00000080", "orange"), center.pch = FALSE, plot.points = FALSE, add = TRUE)

vert.index <- which.min(abs(A.mu-outer.ellipse[,1]))

vert.dist <- outer.ellipse[,2][vert.index] - B.mu

lines(c(A.mu, A.mu), c(outer.ellipse[,2][vert.index], outer.ellipse[,2][vert.index] - 2*vert.dist))

horiz.index <- which.min(abs(B.mu-outer.ellipse[,2]))

horiz.dist <- outer.ellipse[,1][horiz.index] - A.mu

lines(c(outer.ellipse[,1][horiz.index], outer.ellipse[,1][horiz.index] - 2*horiz.dist), c(B.mu,B.mu))

legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))

}

#use the function

par(mfrow=c(1,2))

youden(A,B, shape = "rectangle")

youden(A,B, shape = "ellipse")

plot of chunk unnamed-chunk-12

Comparison with the Classic Youden Plot

If the data happens to be uncorrelated, and has identical average A and B values, the elliptical approach will generate a (nearly) circular Youden plot according to the original description. The youden.classic() function code shown below is borrowed from the stackoverflow post mentioned above.

youden.classic <- function(A, B){
    plot(A,B,asp = 1, xlab = "A", ylab = "B", pch=".") 
    mB <- median(B)
    mA <- median(A)
    abline(h = mB, v = mA)
    curve(x-(mA-mB),add=TRUE)
    d <- mean(A-B)
    d_prime <- A-B-d
    r <- 2.45*mean(abs(d_prime))*sqrt(pi)/2
    t <- seq(0,2*pi,by=0.01)
    x <- r*cos(t)+mA
    y <- r*sin(t)+mB
    lines(x,y)
}

#generate 10000 pairs of bivariate data with a mean of 10 for both A and B and 15% CV
A <- rnorm(10000,10,1.5)
B <- rnorm(10000,10,1.5)
youden.classic(A,B)

#overlay ellipse
dataEllipse(A, B, levels = 0.95, fill = TRUE, fill.alpha = 0.3, col="orange", center.pch = FALSE, plot.points = FALSE, add = TRUE)

youden.classic <- function(A, B){

plot(A,B,asp = 1, xlab = "A", ylab = "B", pch=".")

mB <- median(B)

mA <- median(A)

abline(h = mB, v = mA)

curve(x-(mA-mB),add=TRUE)

d <- mean(A-B)

d_prime <- A-B-d

r <- 2.45*mean(abs(d_prime))*sqrt(pi)/2

t <- seq(0,2*pi,by=0.01)

x <- r*cos(t)+mA

y <- r*sin(t)+mB

lines(x,y)

}

#generate 10000 pairs of bivariate data with a mean of 10 for both A and B and 15% CV

A <- rnorm(10000,10,1.5)

B <- rnorm(10000,10,1.5)

youden.classic(A,B)

#overlay ellipse

dataEllipse(A, B, levels = 0.95, fill = TRUE, fill.alpha = 0.3, col="orange", center.pch = FALSE, plot.points = FALSE, add = TRUE)

plot of chunk unnamed-chunk-13

Conclusion

There you have it. With a Youden Plot it is easy to separate the sheep from the goats. There are lots of ways that you can dress up your plot to suit your needs. Of course, this could be embedded into an automated EQA report generated with R, Rmarkdown and knitr.

I hope that was helpful.

-Dan

A Closer Look at TAT Time Dependence

August 28, 2015September 2, 2015 dtholmes@mail.ubc.ca

The Problem

We want to have a closer look at the time–dependence of turn around times (TATs). In particular, we would like to see if there is a significant trend in TAT over time (improvement or deterioration) and we would like the data to inform us of slowdowns and potentially unexpected problems that occur throughout each week. This should allow us to identify areas of the pre-analytical and/or analytical process phlebotomists that require attention.

My interest in this topic (which in past seemed entirely banal) came from the frustration of receiving monthly TAT reports showing spaghetti plots produced in Excel. In examining these figures is was entirely unclear to me whether any observed changes in the median (the only measure of central tendency provided) represented stochastic behaviour or a real problem. Utlimately, we want to be able to identify real problems in the preanalytical and analytical process but to do this, we need to visualize the data in a more sophisticated manner.

To do this, we are going to look at order–to–file times for a whole year for a nameless test X. You should be able to modify this approach to the manner in which your data is provided to you.

The real data was a little dirty but I have pre–cleaned it—this will have to be the topic of another post. In short, I purged the cancelled tests, removed duplicate records and limited my analysis to stat tests based on a stat flag that is stored in the laboratory information system (LIS). I won’t discuss this process here. The buffed–up file is named “2014_and_All_Clean.txt”. This happens to be a tab–delimited txt file. For this reason, I used read.delim() rather than read.csv(). These are basically the same function with different defaults for the seperator–one uses a comma and the other uses a tab. Please see our first post on TAT to understand how we are using the lubridate function ymd_hm().

Loading the Data

library(lubridate)
myData <- read.delim(file = "2014_and_All_clean.txt")
myData$ordered <- ymd_hm(myData$ordered)
myData$collected <- ymd_hm(myData$collected)
myData$received <- ymd_hm(myData$received)
myData$resulted <- ymd_hm(myData$resulted)
#confirm success
head(myData)

library(lubridate)

myData <- read.delim(file = "2014_and_All_clean.txt")

myData$ordered <- ymd_hm(myData$ordered)

myData$collected <- ymd_hm(myData$collected)

myData$received <- ymd_hm(myData$received)

myData$resulted <- ymd_hm(myData$resulted)

#confirm success

head(myData)

##               ordered           collected            received
## 1 2014-01-01 17:53:00 2014-01-01 17:54:00 2014-01-01 18:08:00
## 2 2014-01-01 15:10:00 2014-01-01 15:19:00 2014-01-01 15:21:00
## 3 2014-01-01 17:07:00 2014-01-01 17:15:00 2014-01-01 17:17:00
## 4 2014-01-01 18:20:00 2014-01-01 18:30:00 2014-01-01 18:35:00
## 5 2014-01-01 11:19:00 2014-01-01 11:25:00 2014-01-01 11:29:00
## 6 2014-01-01 11:00:00 2014-01-01 11:08:00 2014-01-01 11:11:00
##              resulted
## 1 2014-01-01 18:45:00
## 2 2014-01-01 16:33:00
## 3 2014-01-01 18:09:00
## 4 2014-01-01 19:45:00
## 5 2014-01-01 13:33:00
## 6 2014-01-01 11:47:00

## ordered collected received

## 1 2014-01-01 17:53:00 2014-01-01 17:54:00 2014-01-01 18:08:00

## 2 2014-01-01 15:10:00 2014-01-01 15:19:00 2014-01-01 15:21:00

## 3 2014-01-01 17:07:00 2014-01-01 17:15:00 2014-01-01 17:17:00

## 4 2014-01-01 18:20:00 2014-01-01 18:30:00 2014-01-01 18:35:00

## 5 2014-01-01 11:19:00 2014-01-01 11:25:00 2014-01-01 11:29:00

## 6 2014-01-01 11:00:00 2014-01-01 11:08:00 2014-01-01 11:11:00

## resulted

## 1 2014-01-01 18:45:00

## 2 2014-01-01 16:33:00

## 3 2014-01-01 18:09:00

## 4 2014-01-01 19:45:00

## 5 2014-01-01 13:33:00

## 6 2014-01-01 11:47:00

Now we want to look at a TAT. As in our first post on this topic, we will look at the order–to–file time.

otf <- difftime(myData$resulted, myData$ordered,units = "min")
myData <- cbind(myData,otf)

1 2	otf <- difftime(myData$resulted, myData$ordered,units = "min") myData <- cbind(myData,otf)

Sanity Check

Let’s just have a quick look at this to make sure nothing crazy is happening.

hist(as.numeric(myData$otf),xlim = c(0,200),breaks = 150, col = "orange", xlab = "TAT for X (min)", main = "Histogram of TAT for X")

1	hist(as.numeric(myData$otf),xlim = c(0,200),breaks = 150, col = "orange", xlab = "TAT for X (min)", main = "Histogram of TAT for X")

summary(as.numeric(myData$otf))

1	summary(as.numeric(myData$otf))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00   51.00   62.00   70.85   77.00 1656.00

1 2	## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 3.00 51.00 62.00 70.85 77.00 1656.00

Some Nutty Stuff

We do note one thing—there is a sample with a TAT of 1656 min. This is a little crazy so we could investigate those samples to see if this is real (because of a lost sample) or an artifact of an add–on analysis being misidentified as a stat or some other similar nonsensical event.

If you wanted to list all of these extreme outliers for the year, you could do so like this:

library("dplyr")

1	library("dplyr")

## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:lubridate':
## 
##     intersect, setdiff, union
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## Attaching package: 'dplyr'

## The following objects are masked from 'package:lubridate':

## intersect, setdiff, union

## The following objects are masked from 'package:stats':

## filter, lag

## The following objects are masked from 'package:base':

## intersect, setdiff, setequal, union

head(arrange(myData,desc(otf)),10)

1	head(arrange(myData,desc(otf)),10)

##                ordered           collected            received
## 1  2014-09-21 21:50:00 2014-09-21 21:58:00 2014-09-21 22:04:00
## 2  2014-09-07 14:59:00 2014-09-08 12:00:00 2014-09-08 12:04:00
## 3  2014-03-18 13:45:00 2014-03-18 14:10:00 2014-03-18 14:29:00
## 4  2014-04-01 01:48:00 2014-04-01 02:03:00 2014-04-01 02:06:00
## 5  2014-04-01 02:21:00 2014-04-01 02:28:00 2014-04-01 02:31:00
## 6  2014-09-04 21:35:00 2014-09-04 21:45:00 2014-09-04 22:08:00
## 7  2014-10-08 10:38:00 2014-10-08 10:42:00 2014-10-08 10:46:00
## 8  2014-05-25 13:23:00 2014-05-25 13:35:00 2014-05-25 13:45:00
## 9  2014-08-06 16:50:00 2014-08-06 17:09:00 2014-08-06 17:15:00
## 10 2014-08-02 11:52:00 2014-08-02 21:30:00 2014-08-02 21:44:00
##               resulted       otf
## 1  2014-09-23 01:26:00 1656 mins
## 2  2014-09-08 13:26:00 1347 mins
## 3  2014-03-19 11:17:00 1292 mins
## 4  2014-04-01 17:29:00  941 mins
## 5  2014-04-01 16:20:00  839 mins
## 6  2014-09-05 11:07:00  812 mins
## 7  2014-10-08 22:41:00  723 mins
## 8  2014-05-26 01:20:00  717 mins
## 9  2014-08-07 03:46:00  656 mins
## 10 2014-08-02 22:44:00  652 mins

## ordered collected received

## 1 2014-09-21 21:50:00 2014-09-21 21:58:00 2014-09-21 22:04:00

## 2 2014-09-07 14:59:00 2014-09-08 12:00:00 2014-09-08 12:04:00

## 3 2014-03-18 13:45:00 2014-03-18 14:10:00 2014-03-18 14:29:00

## 4 2014-04-01 01:48:00 2014-04-01 02:03:00 2014-04-01 02:06:00

## 5 2014-04-01 02:21:00 2014-04-01 02:28:00 2014-04-01 02:31:00

## 6 2014-09-04 21:35:00 2014-09-04 21:45:00 2014-09-04 22:08:00

## 7 2014-10-08 10:38:00 2014-10-08 10:42:00 2014-10-08 10:46:00

## 8 2014-05-25 13:23:00 2014-05-25 13:35:00 2014-05-25 13:45:00

## 9 2014-08-06 16:50:00 2014-08-06 17:09:00 2014-08-06 17:15:00

## 10 2014-08-02 11:52:00 2014-08-02 21:30:00 2014-08-02 21:44:00

## resulted otf

## 1 2014-09-23 01:26:00 1656 mins

## 2 2014-09-08 13:26:00 1347 mins

## 3 2014-03-19 11:17:00 1292 mins

## 4 2014-04-01 17:29:00 941 mins

## 5 2014-04-01 16:20:00 839 mins

## 6 2014-09-05 11:07:00 812 mins

## 7 2014-10-08 22:41:00 723 mins

## 8 2014-05-26 01:20:00 717 mins

## 9 2014-08-07 03:46:00 656 mins

## 10 2014-08-02 22:44:00 652 mins

which gives you the TAT of the 10 (or whatever number you prefer) worst specimens for the year. Obviously when you do this kind of analysis on your own data, you will retain the specimen ID in the data set and you could explore what is going on here–whether these are add–ons etc. You discover interesting things when you dig into your data.

Time Dependence

But we are interested in time-dependence of the TAT, so let’s look at a scatterplot of the whole year.

start <- ceiling_date(min(myData$collected))
finish <- start+days(356)
plot(myData$collected,myData$otf, pch = 19, xlim = c(start,finish),col = "#00000020",cex = 0.5, ylim = c(0,200), ylab = "TAT of X (min)", xlab = "Date")

start <- ceiling_date(min(myData$collected))

finish <- start+days(356)

plot(myData$collected,myData$otf, pch = 19, xlim = c(start,finish),col = "#00000020",cex = 0.5, ylim = c(0,200), ylab = "TAT of X (min)", xlab = "Date")

So, that’s pretty hard to draw inferences from. We can see that there are some outliers with inconceivably low TAT. We will have to investigate what is going on with those collections but not right now. These outliers will not affect the non–parametric measures of central tendency.

Tunnelling Down

Let’s have a look at one week.

finish <- start + days(7)
ticks <- seq(from = start, to = finish, by = "days")
plot(myData$collected,myData$otf, pch = 19, xlim = c(start,finish), col = "#00000020", cex = 0.5, ylim = c(0,200), ylab = "TAT of X (min)", xlab = "", xaxt = "n")
axis.POSIXct(side = 1, ticks, at = ticks, las = 2, cex.axis = 0.6, col.axis = "gray30", format = "%b %d %Y")
mtext("Date of analysis", side = 1, line = 4)

finish <- start + days(7)

ticks <- seq(from = start, to = finish, by = "days")

plot(myData$collected,myData$otf, pch = 19, xlim = c(start,finish), col = "#00000020", cex = 0.5, ylim = c(0,200), ylab = "TAT of X (min)", xlab = "", xaxt = "n")

axis.POSIXct(side = 1, ticks, at = ticks, las = 2, cex.axis = 0.6, col.axis = "gray30", format = "%b %d %Y")

mtext("Date of analysis", side = 1, line = 4)

See the first post on this topic for more information about the plotting parameters.

We can see there is a definite (and unsurprising) periodicity in the number of tests per hour. We can look at “volumes” another time. What we want to do now is look for time–dependence in the TAT so we can ultimately investigate what days of the week and times of the day are worse. But we don’t want to do this for one week—we want to do this for all weeks in the year. It would be nice, for example to plot all the Sundays, Mondays, Tuesdays etc overlapping and then see if we can see day–of–week and time–of–day trends.

Some More Lubridate Magic

Therefore, we need to assign every point in our myData dataframe a day of the week. The lubridate function wday() does this for us.

start

start

## [1] "2014-01-01 02:39:00 UTC"

1	## [1] "2014-01-01 02:39:00 UTC"

wday(start, label = TRUE)

1	wday(start, label = TRUE)

## [1] Wed
## Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat

1 2	## [1] Wed ## Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat

#if you want them as numbers, just leave the label option out.
wday(start)

1 2	#if you want them as numbers, just leave the label option out. wday(start)

## [1] 4

## [1] 4

So, January 1, 2014 was a Wednesday, which is the 4th day of the week. Let’s assign the day of the week for all our days and then bind this to our data.

weekday <- wday(myData$collected)
myData <- cbind(myData,weekday)
head(myData)

weekday <- wday(myData$collected)

myData <- cbind(myData,weekday)

head(myData)

##               ordered           collected            received
## 1 2014-01-01 17:53:00 2014-01-01 17:54:00 2014-01-01 18:08:00
## 2 2014-01-01 15:10:00 2014-01-01 15:19:00 2014-01-01 15:21:00
## 3 2014-01-01 17:07:00 2014-01-01 17:15:00 2014-01-01 17:17:00
## 4 2014-01-01 18:20:00 2014-01-01 18:30:00 2014-01-01 18:35:00
## 5 2014-01-01 11:19:00 2014-01-01 11:25:00 2014-01-01 11:29:00
## 6 2014-01-01 11:00:00 2014-01-01 11:08:00 2014-01-01 11:11:00
##              resulted      otf weekday
## 1 2014-01-01 18:45:00  52 mins       4
## 2 2014-01-01 16:33:00  83 mins       4
## 3 2014-01-01 18:09:00  62 mins       4
## 4 2014-01-01 19:45:00  85 mins       4
## 5 2014-01-01 13:33:00 134 mins       4
## 6 2014-01-01 11:47:00  47 mins       4

## ordered collected received

## 1 2014-01-01 17:53:00 2014-01-01 17:54:00 2014-01-01 18:08:00

## 2 2014-01-01 15:10:00 2014-01-01 15:19:00 2014-01-01 15:21:00

## 3 2014-01-01 17:07:00 2014-01-01 17:15:00 2014-01-01 17:17:00

## 4 2014-01-01 18:20:00 2014-01-01 18:30:00 2014-01-01 18:35:00

## 5 2014-01-01 11:19:00 2014-01-01 11:25:00 2014-01-01 11:29:00

## 6 2014-01-01 11:00:00 2014-01-01 11:08:00 2014-01-01 11:11:00

## resulted otf weekday

## 1 2014-01-01 18:45:00 52 mins 4

## 2 2014-01-01 16:33:00 83 mins 4

## 3 2014-01-01 18:09:00 62 mins 4

## 4 2014-01-01 19:45:00 85 mins 4

## 5 2014-01-01 13:33:00 134 mins 4

## 6 2014-01-01 11:47:00 47 mins 4

Now let’s plot all of the Monday–data for the whole year and look at the with–day–trends for Mondays. We are going to convert all of the TATs and times at which they are collected to decimal numbers so we don’t run into any hassles. (Yes, I ran into hassles when I did not do this.)

This little function accomplishes this for us:

#convert the time to decimal number of hours since midnight for simplicity in plotting
timeconvert = function(t){hour(t)+minute(t)/60}

1 2	#convert the time to decimal number of hours since midnight for simplicity in plotting timeconvert = function(t){hour(t)+minute(t)/60}

mondayData <- subset(myData,weekday == 2)
mondayTimes <- timeconvert(mondayData$collected)
mondayData <- cbind(mondayData,times = mondayTimes)
mondayData$otf <- as.numeric(mondayData$otf)
head(mondayData)

mondayData <- subset(myData,weekday == 2)

mondayTimes <- timeconvert(mondayData$collected)

mondayData <- cbind(mondayData,times = mondayTimes)

mondayData$otf <- as.numeric(mondayData$otf)

head(mondayData)

##                 ordered           collected            received
## 225 2014-01-06 23:35:00 2014-01-06 23:30:00 2014-01-07 00:09:00
## 226 2014-01-06 11:02:00 2014-01-06 11:10:00 2014-01-06 11:14:00
## 227 2014-01-06 14:08:00 2014-01-06 14:20:00 2014-01-06 14:24:00
## 228 2014-01-06 10:06:00 2014-01-06 10:16:00 2014-01-06 10:19:00
## 229 2014-01-06 14:42:00 2014-01-06 15:09:00 2014-01-06 15:16:00
## 230 2014-01-06 20:02:00 2014-01-06 20:17:00 2014-01-06 20:22:00
##                resulted otf weekday    times
## 225 2014-01-07 00:46:00  71       2 23.50000
## 226 2014-01-06 12:42:00 100       2 11.16667
## 227 2014-01-06 16:02:00 114       2 14.33333
## 228 2014-01-06 11:44:00  98       2 10.26667
## 229 2014-01-06 16:04:00  82       2 15.15000
## 230 2014-01-06 20:55:00  53       2 20.28333

## ordered collected received

## 225 2014-01-06 23:35:00 2014-01-06 23:30:00 2014-01-07 00:09:00

## 226 2014-01-06 11:02:00 2014-01-06 11:10:00 2014-01-06 11:14:00

## 227 2014-01-06 14:08:00 2014-01-06 14:20:00 2014-01-06 14:24:00

## 228 2014-01-06 10:06:00 2014-01-06 10:16:00 2014-01-06 10:19:00

## 229 2014-01-06 14:42:00 2014-01-06 15:09:00 2014-01-06 15:16:00

## 230 2014-01-06 20:02:00 2014-01-06 20:17:00 2014-01-06 20:22:00

## resulted otf weekday times

## 225 2014-01-07 00:46:00 71 2 23.50000

## 226 2014-01-06 12:42:00 100 2 11.16667

## 227 2014-01-06 16:02:00 114 2 14.33333

## 228 2014-01-06 11:44:00 98 2 10.26667

## 229 2014-01-06 16:04:00 82 2 15.15000

## 230 2014-01-06 20:55:00 53 2 20.28333

So this seems to have worked and now we can make a scatter plot.

Monday Monday, So Good to Me?

plot(mondayData$times,mondayData$otf, pch = 19, col = "#00000020", cex = 0.5, ylim = c(0,200), xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)")
axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")

1 2	plot(mondayData$times,mondayData$otf, pch = 19, col = "#00000020", cex = 0.5, ylim = c(0,200), xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)") axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")

But now for the interesting part. We want to see how the median TAT is related to the time of day. We might want to look at, say, the running median over one–hour window all day long. Notice that I have made the times, t, go from 0.5 to 23.5 because these are the only times for which a 60 min moving median can be calculated. Otherwise we’d have this really annoying situation where we’d have to fetch data from the last half-hour of Sunday and the first half-hour of Tuesday. I don’t need that level of perfectionism at present.

plot(mondayData$times,mondayData$otf, pch = 19, col = "#00000020",cex = 0.5, ylim = c(30,100),xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)")
axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")

#60 min moving median calculation:
#Create a point at every minute of the day
t <- seq(from = 0.5,to = 23.5, by = 1/60)
#create and empty vector to store the data
med.tats <- vector()

for(i in 1:length(t)){
  #get all points within an hour of the minute
  tats <- subset(mondayData,(mondayData$times >= (t[i]-0.5)) & (mondayData$times < (t[i]+0.5)))
  #alternatively using filter() from the dplyr package
  #tats <- filter(mondayData, times >= (t[i] - 0.5) & times < (t[i] + 0.5))
  med.tats[i] <- median(tats$otf)
}

lines(t,med.tats, col = "blue")
grid(NA,NULL)

plot(mondayData$times,mondayData$otf, pch = 19, col = "#00000020",cex = 0.5, ylim = c(30,100),xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)")

axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")

#60 min moving median calculation:

#Create a point at every minute of the day

t <- seq(from = 0.5,to = 23.5, by = 1/60)

#create and empty vector to store the data

med.tats <- vector()

for(i in 1:length(t)){

#get all points within an hour of the minute

tats <- subset(mondayData,(mondayData$times >= (t[i]-0.5)) & (mondayData$times < (t[i]+0.5)))

#alternatively using filter() from the dplyr package

#tats <- filter(mondayData, times >= (t[i] - 0.5) & times < (t[i] + 0.5))

med.tats[i] <- median(tats$otf)

}

lines(t,med.tats, col = "blue")

grid(NA,NULL)

Removing the For–ness

Many R folks don’t like for–loops and would rather use the apply() family of functions. I’m not sure I always understand contempt towards loops for small simple tasks but if you wanted to accomplish the same looping task without using a for–loop, you could do as follows:

t <- seq(from = 0.5, to = 23.5, by = 1/60)
movingmed<-function(t,x){
  tats <- filter(x, times >= (t - 0.5) & times < (t + 0.5))
  median(tats$otf)
}
med.tats <- sapply(t, movingmed, x=mondayData)

t <- seq(from = 0.5, to = 23.5, by = 1/60)

movingmed<-function(t,x){

tats <- filter(x, times >= (t - 0.5) & times < (t + 0.5))

median(tats$otf)

}

med.tats <- sapply(t, movingmed, x=mondayData)

Smoothing

This approach is reasonable but the problem (I have found) is that it is computationally expensive on large data sets. For this reason, it is nice to use a canned smoothing algorithm like LOWESS which is much faster. The parameter f of the lowess function has a default of 2/3 which in our case results in a fit that is way–too smoothed. I played around with f until I got something that more or less tracked with the 60–min moving median. There are many approaches to smoothing–don’t get lost in the vortex.

Lowess Smoothing

plot(mondayData$times,mondayData$otf, pch = 19, col = "#00000020",cex = 0.5, ylim = c(30,100),xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)")
axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")
#running median
#create a point at every minute of the day
t <- seq(from = 0.5,to = 23.5, by = 1/60)
med.tats <- vector()
for(i in 1:length(t)){
  #get all points within an hour of the minute
  tats <- subset(mondayData,(mondayData$times >= (t[i] - 0.5)) & (mondayData$times < (t[i] + 0.5)))
  #alternatively using filter() from the dplyr package
  #tats <- filter(mondayData, times >= (t[i] - 0.5) & times < (t[i] + 0.5))
  med.tats[i] <- median(tats$otf)
}
lines(t,med.tats, col = "blue")
grid(col = "black")
mondayFit <- lowess(mondayData$times,mondayData$otf,f = 0.05)
lines(mondayFit,col = "red")

plot(mondayData$times,mondayData$otf, pch = 19, col = "#00000020",cex = 0.5, ylim = c(30,100),xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)")

axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")

#running median

#create a point at every minute of the day

t <- seq(from = 0.5,to = 23.5, by = 1/60)

med.tats <- vector()

for(i in 1:length(t)){

#get all points within an hour of the minute

tats <- subset(mondayData,(mondayData$times >= (t[i] - 0.5)) & (mondayData$times < (t[i] + 0.5)))

#alternatively using filter() from the dplyr package

#tats <- filter(mondayData, times >= (t[i] - 0.5) & times < (t[i] + 0.5))

med.tats[i] <- median(tats$otf)

}

lines(t,med.tats, col = "blue")

grid(col = "black")

mondayFit <- lowess(mondayData$times,mondayData$otf,f = 0.05)

lines(mondayFit,col = "red")

So, that’s cool. Now lets loop over all the days of the week and make plots for each day.

#create a 2x4 plot window
par(mfrow = c(2,4))
#loop over days
for (i in 1:7){
  #make the lowess fit
  mydayData <- subset(myData,weekday == i)
  mydayTimes <- timeconvert(mydayData$collected)
  mydayData <- cbind(mydayData,times = mydayTimes)
  mydayFit <- lowess(mydayData$times,mydayData$otf,f = 0.05)
  plot(NA,NA,ylim = c(30,100), xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)")
  axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")
  
  #running median
  #create a point at every minute of the day
  t <- seq(from = 0.5,to = 23.5, by = 1/60)
  med.tats <- vector()
  for(j in 1:length(t)){
    #get all points within an hour of the minute
    tats <- subset(mydayData,(mydayData$times >= (t[j] - 0.5)) & (mydayData$times < (t[j] + 0.5)))
    #alternatively using filter() from the dplyr package
    #tats <- filter(mydayData, times >= (t[j] - 0.5) & times < (t[j] + 0.5))
    med.tats[j] <- median(tats$otf)
  }
  lines(t,med.tats, col = "blue")
  grid(col = "black")
  mydayFit <- lowess(mydayData$times,mydayData$otf,f = 0.05)
  lines(mydayFit,col = "red")
  
  #put in horizontal gridding
  grid(NA,NULL)
  #you could add a legend on all the plots if you wanted
  #legend("bottomright", c("moving median","lowess"), lty = c(1,1), col = c("blue","red", byt = "n"))
}

#create a 2x4 plot window

par(mfrow = c(2,4))

#loop over days

for (i in 1:7){

#make the lowess fit

mydayData <- subset(myData,weekday == i)

mydayTimes <- timeconvert(mydayData$collected)

mydayData <- cbind(mydayData,times = mydayTimes)

mydayFit <- lowess(mydayData$times,mydayData$otf,f = 0.05)

plot(NA,NA,ylim = c(30,100), xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)")

axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")

#running median

#create a point at every minute of the day

t <- seq(from = 0.5,to = 23.5, by = 1/60)

med.tats <- vector()

for(j in 1:length(t)){

#get all points within an hour of the minute

tats <- subset(mydayData,(mydayData$times >= (t[j] - 0.5)) & (mydayData$times < (t[j] + 0.5)))

#alternatively using filter() from the dplyr package

#tats <- filter(mydayData, times >= (t[j] - 0.5) & times < (t[j] + 0.5))

med.tats[j] <- median(tats$otf)

}

lines(t,med.tats, col = "blue")

grid(col = "black")

mydayFit <- lowess(mydayData$times,mydayData$otf,f = 0.05)

lines(mydayFit,col = "red")

#put in horizontal gridding

grid(NA,NULL)

#you could add a legend on all the plots if you wanted

#legend("bottomright", c("moving median","lowess"), lty = c(1,1), col = c("blue","red", byt = "n"))

}

Now, let’s overplot all the lowess fits on a single graph and see what practical observations we can make. I have increased the lowess() smoothing to make things easier to look at.

par(mfrow = c(1,1))
plot(0,0,ylim = c(40,80),xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)")
axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")
for (i in 1:7){
  mydayData <- subset(myData,weekday == i)
  mydayTimes <- timeconvert(mydayData$collected)
  mydayData <- cbind(mydayData,times = mydayTimes)
  mydayFit <- lowess(mydayData$times,mydayData$otf,f = 0.1)
  lines(mydayFit, col = rainbow(7)[i], ylim = c(40,80),xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)", main = paste("TAT for",wday(i,label = TRUE),"in 2014"))
}
legend("bottomright",as.character(wday(1:7,label = TRUE)), col = rainbow(7), lty = rep(1,7), cex = 0.5)

par(mfrow = c(1,1))

plot(0,0,ylim = c(40,80),xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)")

axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")

for (i in 1:7){

mydayData <- subset(myData,weekday == i)

mydayTimes <- timeconvert(mydayData$collected)

mydayData <- cbind(mydayData,times = mydayTimes)

mydayFit <- lowess(mydayData$times,mydayData$otf,f = 0.1)

lines(mydayFit, col = rainbow(7)[i], ylim = c(40,80),xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)", main = paste("TAT for",wday(i,label = TRUE),"in 2014"))

}

legend("bottomright",as.character(wday(1:7,label = TRUE)), col = rainbow(7), lty = rep(1,7), cex = 0.5)

Observations

We can immediately see some issues. Weekends in the early hours of the morning are bad. 8 am is bad across all days. Noon is generally problematic and particularly so on Saturdays. There is also a slowdown in mid–afternoon and in the early evenings. Saturday midnight is the most problematic time, although the endpoints of the figure have fewer local weighting points and their confidence intervals are wider. This is something we can cover another time.

Remember, also, this is only the median we have looked at. Other horrors may be lurking in the 90th percentile.

Next time what we will do is move all of this TAT visualization to a 3D representation so we can more easily spot the problematic times.

-Dan

The lot is cast into the lap, but its every decision is from the LORD.

Proverbs 16:33

Generating Meaningful Turaround Time Plots for Clinical Laboratory Medicine

August 11, 2015August 11, 2015 dtholmes@mail.ubc.ca

The Problem

It is standard practice in Clinical Laboratory Medicine to monitor turn around times (TATs) for high volume tests like potassium (K), Troponin (Tn) and Hemoglobin (Hb). The term TAT is typically understood to mean “the time elapsed from when the doctor orders the test to the time the result is available in the Laboratory Information System (LIS)”. This of course does not take into account the lag between the result availability and the time when the physician logs in to view it and respond, but let’s just say that we are not there yet.

Traditionally, some dedicated soul would take .csv extracts from the LIS and do laborious things in Excel to generate the median TAT for the month for each test and each lab location for which they were responsible. Not only is it impossible to automate such a process, it is entirely manual and produces fairly uninformative output since (at least at our site) only medians were generated.

What really frustrates physicians is not where the median goes each month, it is the behaviour of, say, the 90th percentile of TAT or the outliers. These are the ones they remember.

R allows us to produce a much more informative figure in an automatable fashion. I provide here an example of a TAT figure for Hb with some statistical metric included.

Look at the Data

Let’s start by reading in our data and looking at how it is structured.

myData<-read.csv(file = "Hb_TAT_data.csv",header = TRUE)
str(myData)
head(myData)

myData<-read.csv(file = "Hb_TAT_data.csv",header = TRUE)

str(myData)

head(myData)

## 'data.frame':    4497 obs. of  6 variables:
##  $ specimenID: int  4221 5281 5308 5320 5356 5374 5375 5376 5241 5270 ...
##  $ ordered   : Factor w/ 4244 levels "2015-06-30 23:15",..: 4 68 86 92 115 126 127 128 46 61 ...
##  $ collected : Factor w/ 4245 levels "2015-07-01 00:02",..: 2 72 88 95 114 126 128 129 46 65 ...
##  $ received  : Factor w/ 4098 levels "2015-07-01 00:07",..: 2 69 81 88 107 119 120 122 45 62 ...
##  $ resulted  : Factor w/ 3765 levels "2015-07-01 00:11",..: 2 70 79 86 104 113 114 116 42 63 ...
##  $ result    : int  126 110 134 117 135 113 109 111 106 129 ...

## specimenID ordered collected received
## 1 4221 2015-06-30 23:28 2015-07-01 00:08 2015-07-01 00:29
## 2 5281 2015-07-01 14:12 2015-07-01 14:25 2015-07-01 14:30
## 3 5308 2015-07-01 15:56 2015-07-01 16:03 2015-07-01 16:10
## 4 5320 2015-07-01 16:57 2015-07-01 17:12 2015-07-01 17:20
## 5 5356 2015-07-01 20:00 2015-07-01 20:07 2015-07-01 20:18
##6 5374 2015-07-01 21:37 2015-07-01 21:40 2015-07-01 21:44
 resulted result
1 2015-07-01 00:37 126
2 2015-07-01 14:50 110
3 2015-07-01 16:16 134
4 2015-07-01 17:23 117
5 2015-07-01 20:23 135
6 2015-07-01 21:53 113

## 'data.frame': 4497 obs. of 6 variables:

## $ specimenID: int 4221 5281 5308 5320 5356 5374 5375 5376 5241 5270 ...

## $ ordered : Factor w/ 4244 levels "2015-06-30 23:15",..: 4 68 86 92 115 126 127 128 46 61 ...

## $ collected : Factor w/ 4245 levels "2015-07-01 00:02",..: 2 72 88 95 114 126 128 129 46 65 ...

## $ received : Factor w/ 4098 levels "2015-07-01 00:07",..: 2 69 81 88 107 119 120 122 45 62 ...

## $ resulted : Factor w/ 3765 levels "2015-07-01 00:11",..: 2 70 79 86 104 113 114 116 42 63 ...

## $ result : int 126 110 134 117 135 113 109 111 106 129 ...

## specimenID ordered collected received

## 1 4221 2015-06-30 23:28 2015-07-01 00:08 2015-07-01 00:29

## 2 5281 2015-07-01 14:12 2015-07-01 14:25 2015-07-01 14:30

## 3 5308 2015-07-01 15:56 2015-07-01 16:03 2015-07-01 16:10

## 4 5320 2015-07-01 16:57 2015-07-01 17:12 2015-07-01 17:20

## 5 5356 2015-07-01 20:00 2015-07-01 20:07 2015-07-01 20:18

##6 5374 2015-07-01 21:37 2015-07-01 21:40 2015-07-01 21:44

resulted result

1 2015-07-01 00:37 126

2 2015-07-01 14:50 110

3 2015-07-01 16:16 134

4 2015-07-01 17:23 117

5 2015-07-01 20:23 135

6 2015-07-01 21:53 113

In this simplified anonymized data set we can see that we have 4497 observations with all of the necessary time points to calculate the turnaround times of the preanalytical and analytical processes. For the sake of this example, let’s focus on the order-to-file time.

We are going to need to handle the dates, for which there is only one package worth discussing, namely lubridate.

library(lubridate)

1	library(lubridate)

Basic Data Preparation

The first thing we need to do is to convert the order, collect, receive and result times to lubridate objects (i.e. time and date objects) so that we can do some algebra on them. We can see from the structure of myData that the order, collect, receive and result time points are in the format “YYYY-MM-DD HH:MM”. Therefore we can use the lubridate function ymd_hm() to perform the conversion.

myData$ordered<-ymd_hm(myData$ordered)
myData$collected<-ymd_hm(myData$collected)
myData$received<-ymd_hm(myData$received)
myData$resulted<-ymd_hm(myData$resulted)

myData$ordered<-ymd_hm(myData$ordered)

myData$collected<-ymd_hm(myData$collected)

myData$received<-ymd_hm(myData$received)

myData$resulted<-ymd_hm(myData$resulted)

Applying str() again to myData, you will see that the dates and times are now POSIXct, that is, they are now dates and times. This allows use to calculate the order-to-file TAT, we can do with the difftime() function exporting the result in minutes. We will also append the order-to-file (otf) TAT to the dataframe and do some quick sanity-checking.

Sanity Check

otf<-difftime(myData$resulted,myData$ordered,units = "min")
myData<-cbind(myData,otf)
summary(as.numeric(myData$otf))
hist(as.numeric(myData$otf), main = "Histogram of Hb TATs", breaks = 60, col = "darkred",xlim = c(0,200), xlab = "Order to File in Minutes")

otf<-difftime(myData$resulted,myData$ordered,units = "min")

myData<-cbind(myData,otf)

summary(as.numeric(myData$otf))

hist(as.numeric(myData$otf), main = "Histogram of Hb TATs", breaks = 60, col = "darkred",xlim = c(0,200), xlab = "Order to File in Minutes")

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00   22.00   30.00   36.58   43.00  694.00

1 2	## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 3.00 22.00 30.00 36.58 43.00 694.00

This looks reasonable, so we can proceed with a TAT scatterplot.

Scatterplot

plot(myData$ordered,myData$otf, pch = 19, main = "Hemoglobin TAT",xlab = "Date",ylab = "TAT (min)")

1	plot(myData$ordered,myData$otf, pch = 19, main = "Hemoglobin TAT",xlab = "Date",ylab = "TAT (min)")

Beautifying

This is kind-of problematic because we really want to focus on results in the 0-200 minute range. There are some wild-outliers as occurs in real life because of instrument down-time, add-ons, etc. We can leave this matter for the present. Notice that I have displayed every day on the x-axis because this will allow us to investigate any problems we see. So we will adjust the ylim and we will also make the plot points semitransparent by using hexidecimal colour codes followed by a fractional transparency expressed in hexidecimal. Black is “#000000” and “20” is hexidecimal for 32 which is 32/256 or 12.5% opacity.

#make the points semistranparent and a little smaller
plot(myData$ordered,myData$otf, pch = 19, main = "Hemoglobin TAT",xlab = "Date of Analysis",ylab = "TAT (min)", ylim = c(0,200), col = "#00000020",cex = 0.5)

1 2	#make the points semistranparent and a little smaller plot(myData$ordered,myData$otf, pch = 19, main = "Hemoglobin TAT",xlab = "Date of Analysis",ylab = "TAT (min)", ylim = c(0,200), col = "#00000020",cex = 0.5)

We’ll accept the fact that we know that there are a number of outliers. We could easily have a plot that displayed them or a tabular summary of them.

Now we will need to prepare the vector of daily medians, 10th and 90th percentiles to plot. We will loop through each day of the month and then calculate the statistics for that day.

#calculate the start and end date
#fist collection in the month's reporting is always collected in the previous month so ceiling forces startDate to be first day of the month we are interested in. Same is true for endDate.
startDate <- ceiling_date(min(myData$ordered),"day")
endDate <- ceiling_date(max(myData$ordered),"day")
days <- seq (from = startDate, to = endDate, by = 'days')
#when we plot stastics, we want them at mid day
middays <- days + hours(12)
tenth<-vector()
fiftieth<-vector()
ninetieth<-vector()

for ( i in seq_along(days) ) {
  daysData<-subset(myData,myData$ordered >= days[i]&myData$ordered<days[i+1])
  tenth[i]<-quantile(daysData$otf,probs = 0.10)
  fiftieth[i]<-median(daysData$otf)
  ninetieth[i]<-quantile(daysData$otf,probs = 0.90)
}

quantileData<-data.frame(middays,tenth,fiftieth,ninetieth)

plot(myData$ordered,myData$otf, pch = 19, main = "Hemoglobin TAT",xlab = "Date of Analysis", ylab = "TAT (min)", ylim = c(0,200), col = "#00000020",cex = 0.5)
lines(quantileData$middays,quantileData$tenth, col = "red")
lines(quantileData$middays,quantileData$ninetieth, col = "red")
lines(quantileData$middays,quantileData$fiftieth, col = "blue")

#calculate the start and end date

#fist collection in the month's reporting is always collected in the previous month so ceiling forces startDate to be first day of the month we are interested in. Same is true for endDate.

startDate <- ceiling_date(min(myData$ordered),"day")

endDate <- ceiling_date(max(myData$ordered),"day")

days <- seq (from = startDate, to = endDate, by = 'days')

#when we plot stastics, we want them at mid day

middays <- days + hours(12)

tenth<-vector()

fiftieth<-vector()

ninetieth<-vector()

for ( i in seq_along(days) ) {

daysData<-subset(myData,myData$ordered >= days[i]&myData$ordered<days[i+1])

tenth[i]<-quantile(daysData$otf,probs = 0.10)

fiftieth[i]<-median(daysData$otf)

ninetieth[i]<-quantile(daysData$otf,probs = 0.90)

}

quantileData<-data.frame(middays,tenth,fiftieth,ninetieth)

plot(myData$ordered,myData$otf, pch = 19, main = "Hemoglobin TAT",xlab = "Date of Analysis", ylab = "TAT (min)", ylim = c(0,200), col = "#00000020",cex = 0.5)

lines(quantileData$middays,quantileData$tenth, col = "red")

lines(quantileData$middays,quantileData$ninetieth, col = "red")

lines(quantileData$middays,quantileData$fiftieth, col = "blue")

But this is not all that easy to look at. First, it’s kind-of ugly and second, if we find a problem date, we can’t read it from the figure. So let’s start by fixing the x-axis labels:

plot(myData$ordered,myData$otf, pch = 19, main = "Hemoglobin TAT", xlab = "", ylab = "TAT (min)", ylim = c(0,200), col = "#00000020",cex = 0.4,xaxt = "n")
#don't plot first day of next month on axis
axis.POSIXct(side = 1, quantileData$middays[1:length(quantileData$middays)-1],at = quantileData$middays[1:length(quantileData$middays)-1], las = 2, cex.axis = 0.6, col.axis = "gray30", format = "%b %d %Y")
#allows me to move the xlab down manually so as not to overwrite the dates.
mtext("Date of analysis", side = 1, line = 4)

plot(myData$ordered,myData$otf, pch = 19, main = "Hemoglobin TAT", xlab = "", ylab = "TAT (min)", ylim = c(0,200), col = "#00000020",cex = 0.4,xaxt = "n")

#don't plot first day of next month on axis

axis.POSIXct(side = 1, quantileData$middays[1:length(quantileData$middays)-1],at = quantileData$middays[1:length(quantileData$middays)-1], las = 2, cex.axis = 0.6, col.axis = "gray30", format = "%b %d %Y")

#allows me to move the xlab down manually so as not to overwrite the dates.

mtext("Date of analysis", side = 1, line = 4)

To paint the central 80% as a band, we will need to use the polygon() function. I am going to write a function to which and x-vector and two y-vectors is supplied which then fills the area between then with a supplied color. Naturally, the three vectors must have the same length.

#colours in the space between two curves.
fillitin<-function(x,ymin,ymax,colour){
  for (i in 1:length(x)){
    #define the x coordinates of the vertices of the polygon
    xvert<-c(x[i],x[i],x[i+1],x[i+1])
    #define the y coordinates of the vertices of the polygon
    yvert<-c(ymin[i],ymax[i],ymax[i+1],ymin[i+1])
    polygon(xvert,yvert, col = colour, border = NA)
  }
}

#now add these effects to the existing figure
fillitin(middays,tenth,ninetieth,"#FF000020")
lines(quantileData$middays,quantileData$fiftieth, col = "blue")
lines(quantileData$middays,quantileData$tenth, col = "red")
lines(quantileData$middays,quantileData$ninetieth, col = "red")

#colours in the space between two curves.

fillitin<-function(x,ymin,ymax,colour){

for (i in 1:length(x)){

#define the x coordinates of the vertices of the polygon

xvert<-c(x[i],x[i],x[i+1],x[i+1])

#define the y coordinates of the vertices of the polygon

yvert<-c(ymin[i],ymax[i],ymax[i+1],ymin[i+1])

polygon(xvert,yvert, col = colour, border = NA)

}

#now add these effects to the existing figure

fillitin(middays,tenth,ninetieth,"#FF000020")

lines(quantileData$middays,quantileData$fiftieth, col = "blue")

lines(quantileData$middays,quantileData$tenth, col = "red")

lines(quantileData$middays,quantileData$ninetieth, col = "red")

Final Product

Now we should just finish it off with a legend.

legend("topright",c("Median","Central 80%"),lty = c(1,1),col = c("blue","red"),inset = .05)

1	legend("topright",c("Median","Central 80%"),lty = c(1,1),col = c("blue","red"),inset = .05)

And that is a little more informative. There are many features you could add from this point – like smoothing, statistical analysis, outlier report. You could also loop over different tests, examine both the preanalytical and analytical processes at different locations, and produce a pdf report using MarkDown for all the institutions you look after.

-Dan

Background

Getting the Raw Data

Getting it intro R and parsing it

Background

Precision of Components

Build Approximation Functions

Random Simulation

Conclusion

Background

The Problem

Preparing Proportions Table

A DIY Approach with the Image Function

Conditionally Coloured Text

Different Conditions for Different Columns

The Problem

The Data

Formatting and Calculations

Making the Heatmap

Overlay Printed Times

The Problem

The Required Format

The Starting Material

Dialogue Box

Building File Names

Behold: The Data

Just Tell Me the Results

Preparing Data for Output

Timestamping, Writing and Archiving

Other Things You Can Do

Source

Final Thought

Background

Random Data

Rectangular Youden Plots

Non Parametric

FYI

Elliptical Youden Plots

Build a Function

Comparison with the Classic Youden Plot

Conclusion

The Problem

Loading the Data

Sanity Check

Some Nutty Stuff

Time Dependence

Tunnelling Down

Some More Lubridate Magic

Monday Monday, So Good to Me?

Removing the For–ness

Smoothing

Lowess Smoothing

Observations

The lot is cast into the lap, but its every decision is from the LORD.

Proverbs 16:33

The Problem

Look at the Data

Basic Data Preparation

Sanity Check

Scatterplot

Beautifying

Final Product

“The LORD detests dishonest scales, but accurate weights find favor with him.”

Proverbs 11:1