medicine – The Lab-R-torian

Unit Converter

September 2, 2015September 5, 2015 Stephen Master

Introduction

Dan continues to crank out book chapter-length posts, which probably means that I should jump in before getting further behind…so here we go.

In the next few posts, I’d like to cover some work to help you to process aggregated proficiency testing (PT) data. Interpreting PT data from groups such as the College of American Pathologists (CAP) is, of course, a fundamental task for lab management. Comparing your lab’s results to peer group data from other users of the same instrumentation helps to ensure that your patients receive consistent results, and it provides at least a crude measure to ensure that your instrument performance is “in the ballpark”. Of course, many assays show significant differences between instrument models and manufacturers that can lead to results that are not comparable as a patient moves from institution to institution (or when your own lab changes instruments!). There are a number of standardization and harmonization initiatives underway (see http://harmonization.net, for example) to address this, and understanding which assays show significant bias compared to benchmark studies or national guidelines is a critical task for laboratorians. All of this is further complicated by the fact that sample matrix can significantly affect assay results, and sample commutability is one important reason why we can’t just take, say, CAP PT survey results (not counting the accuracy-based surveys) and determine which assays aren’t harmonized.

However.

With all of those caveats, it can still be useful to look through PT data in a systematic way to compare instruments. Ideally, we’d like to have everything in an R-friendly format that would allow us to ask systematic questions about data (things like “for how many assays does instrument X differ from instrument Y by >30% using PT material?”, or “how many PT materials give good concordance across all manufacturers?”). If we have good, commutable, accuracy-based testing materials, we can do even better. The first task is all of this fun, however, is getting the data into a format that R is happy with; no one I know likes the idea of retyping numbers from paper reports. I’m hoping to talk more about this in a future post, as there are lots of fun R text processing issues lurking here. In the mean time, though, we have a much more modest preliminary task to tackle.

Simple unit conversion

I’m currently staring at a CAP PT booklet. It happens to be D-dimer, but you can pick your own favorite analyte (and PT provider, for that matter). Some of the results are in ng/mL, some are ug/mL, and one is in mg/L. Let’s create an R function that allows us to convert between sets of comparable units. Now, although I know that Dan is in love with SI units (#murica), we’ll start by simply converting molar→molar and gravimetric→gravimetric. Yes, we can add fancy analyte-by-analyte conversion tables in the future…but right now we just want to get things on the same scale. In the process, we’ll cover three useful R command families.

First of all, we should probably decide how we want the final function to look. I’m thinking of something like this:

results <- labunit.convert(2.3, "mg/dL", "g/L")
results

## [1] 0.023

…which converts 2.3 mg/dL to 0.023 g/L. We should also give ourselves bonus points if we can make it work with vectors. For example, we may have this data frame:

mydata

##   Value   Units Target.Units
## 1  2.30    g/dL         mg/L
## 2 47.00 nmol/mL      mmol/dL
## 3  0.19    IU/L        mIU/L

and we would like to be able to use our function like this:

labunit.convert(mydata$Value, mydata$Units, mydata$Target.Units)

## [1] 2.3e+04 4.7e-03 1.9e+02

We should also handle things that are simpler

labunit.convert(0.23, "g", "mg")

## [1] 230

Getting started

Now that we know where we’re going, let’s start by writing a function that just converts between two units and returns the log difference. We’ll call this function convert.one.unit(), and it will take two arguments:

convert.one.unit("mg", "ng")

## [1] 6

Basically, we want to take a character variable (like, say, “dL”) and break it into two pieces: the metric prefix (“d”) and the base unit (“L”). If it isn’t something we recognize, the function should quit and complain (you could also make it return ‘NA’ and just give a warning instead, but we’ll hold off on that for now). We’ll start with a list of things that we want to recognize.

convert.one.unit <- function (unitin, unitout) {
  metric.prefixes <- c("y", "z", "a", "f", "p", "n", "u", "m", "c", "d", "", "da", "h", "k", "M", "G", "T", "P", "E", "Z", "Y")
  metric.logmultipliers <- c(-24, -21, -18, -15, -12, -9, -6, -3, -2, -1, 0, 1, 2, 3, 6, 9, 12, 15, 18, 21, 24)
  units.for.lab <- c("mol", "g", "L", "U", "IU")

Notice that the metric.prefixes variable contains the appropriate one- or two-character prefixes, and metric.logmultipliers has the corresponding log multiplier (for example, metric.prefixes[8] = “m”, and metric.logmultipliers[8] is -3). It’s also worth noting the "" (metric.prefixes[11]), which corresponds to a log multiplier of 0. The fact that "" is a zero-length string instead of a null means that we can search for it in a vector…which will be very handy!

And now for some regular expressions

This is the point where we tackle the first of the three command families that I told you about. If you’re not familiar with “regular expressions” in R or another language (Perl, Python, whatever), this is your entry point into some very useful text searching capabilities. Basically, a regular expression is a way of specifying a search for a matching text pattern, and it’s used with a number of R commands (grep(), grepl(), gsub(), regexpr(), regexec(), etc.). We’ll use gsub() as an example, since it’s one that many people are familiar with. Suppose that I have the character string “This is not a test”, and I want to change it to “This is a test”. I can feed gsub() a pattern that I want to recognize and some text that I want to use to replace the pattern. For example:

my.string <- "This is not a test"
my.altered.string <- gsub("not a ", "", my.string)   # replace "not a " with an empty string, ""
my.altered.string

## [1] "This is test"

That’s fine as far as it goes, but we will drive ourselves crazy if we’re limited to explicit matches. What if, for example, we also to also recognize “This is not…a test”, or “This is not my kind of a test”? We could write three different gsub statements, but that would get old fairly quickly. Instead of exactly matching the text, we’ll use a pattern. A regular expression that will match all three of our input statements is "not.+a ", so we can do the following:

gsub("not.+a ", "", "This is not a test")

## [1] "This is test"

gsub("not.+a ", "", "This is not my kind of a test")

## [1] "This is test"

You can read the regular expression "not.+a " as “match the letters ‘not’ followed by a group of one or more characters (denoted by the special symbol ‘.’) followed by an ‘a’”. You can find some very nice tutorials on regular expressions through Google, but for the purposes of this brief lesson I’ll give you a mini-cheat sheet that probably handles 90% of the regular expressions that I have to write:

Special Character	Meaning
.	match any character
\d	match any digit
\D	match anything that isn’t a digit
\s	match white space
\S	match anything that isn’t white space
\t	match a tab (less important in R, since you usually already have things in a data frame)
^	match the beginning of the string (i.e. “^Bob” matches “Bob my uncle” but not “Uncle Bob”)
$	match the end of the string
*	match the previous thing when it occurs 0 or more times
+	match the previous thing when it occurs 1 or more times
?	match the previous thing when it occurs 0 or 1 times
( .. )	(parentheses) enclose a group of choices or a particular substring in the match
\|	match this OR that (e.g. “(Bob\|Pete)” matches “Dr. Bob Smith” or “Dr. Pete Jones” but not “Dr. Sam Jones”

It’s also important to remember for things like "\d" that R uses backslashes as the escape character…so you actually have to write a double backslash, like this: "\\d". A regular expression to match one or more digits would be "\\d+".

OK, back to work. Our next step is to remove all white space from the unit text (we want "dL" to be handled the same way as " dL" or "dL "), so we’ll add the following lines:

  unitin <- gsub("\\s", "", unitin)
  unitout <- gsub("\\s", "", unitout)

See what we’ve done? We asked gsub() to replace every instance of white space (the regular expression is "\\s") with "". Easy.

Paste, briefly

Next, we want to put together a regular expression that will detect any of our metric.prefixes or units.for.lab. To save typing, we’ll do it with paste(), the second of our three R command families for the day. You probably already know about paste(), but if not, it’s basically the way to join R character variables into one big string. paste("Hi", "there") gives “Hi there” (paste() defaults to joining things with a space), paste("Super", "cali", "fragi", "listic", sep="") changes the separator to "" and gives us “Supercalifragilistic”. paste0() does the same thing as paste(..., sep=""). The little nuance that it’s worth noting today is that we are going to join together elements from a single vector rather than a bunch of separate variables…so we need to use the collapse = "..." option, where we set collapse to whatever character we want. You remember from the last section that | (OR) lets us put a bunch of alternative matches into our regular expression, so we will join all of the prefixes like this:

  prefix.combo <- paste0(metric.prefixes, collapse = "|")
  prefix.combo

## [1] "y|z|a|f|p|n|u|m|c|d||da|h|k|M|G|T|P|E|Z|Y"

What we’re really after is a regular expression that matches the beginning of the string, followed by 0 or 1 matches to one of the prefixes, followed by a match to one of the units. Soooo…

  prefix.combo <- paste0(metric.prefixes, collapse = "|")
  unit.combo <- paste0(units.for.lab, collapse = "|")
  
  unit.search <- paste0("^(", prefix.combo, ")?(", unit.combo, ")$")

  unit.search

## [1] "^(y|z|a|f|p|n|u|m|c|d||da|h|k|M|G|T|P|E|Z|Y)?(mol|g|L|U|IU)$"

So much nicer than trying to type that by hand. Next we’ll do actual pattern matching using the regexec() command. regexec(), as the documentation so nicely states, returns a list of vectors of substring matches. This is useful, since it means that we’ll get one match for the prefix (in the first set of parentheses of our regular expression), and one match for the units (in the second set of parentheses of our regular expression). I don’t want to belabor the details of this, but if we feed the output of regexec() to the regmatches() command, we can pull out one string for our prefix and another for our units. Since these are returned as a list, we’ll also use unlist() to coerce our results into one nice vector. If the length of that vector is 0, indicating no match, an error is generated.

  match.unit.in <- unlist(regmatches(unitin, regexec(unit.search, unitin)))
  match.unit.out <- unlist(regmatches(unitout, regexec(unit.search, unitout)))
  
  if (length(match.unit.in) == 0) stop(paste0("Can't parse input units (", unitin, ")"))
  if (length(match.unit.out) == 0) stop(paste0("Can't parse output units (", unitout, ")"))

If we were to take a closer look look at match.unit.in, we would see that the first entry is the full match, the second entry is the prefix match, and the third entry is the unit match. To make sure that the units agree (i.e. that we’re not trying to convert grams into liters or something similar), we use:

  if (match.unit.in[3] != match.unit.out[3]) stop("Base units don't match")

…and then finish by using the match() command to find the index in the metric.prefixes vector corresponding to the correct prefix (note that if there’s no prefix matched, it matches the "" entry of the vector–very handy). That index allows us to pull out the corresponding log multiplier, and we then return the difference to get a conversion factor. Our final function looks like this1:

convert.one.unit <- function (unitin, unitout) {
  # the prefix codes for the metric system
  metric.prefixes <- c("y", "z", "a", "f", "p", "n", "u", "m", "c", "d", "", "da", "h", "k", "M", "G", "T", "P", "E", "Z", "Y")
  # ...and their corresponding log multipliers
  metric.logmultipliers <- c(-24, -21, -18, -15, -12, -9, -6, -3, -2, -1, 0, 1, 2, 3, 6, 9, 12, 15, 18, 21, 24)
  # The units that we'd like to detect.  I guess we could add distance, but that's not too relevant to most of the analytes that I can think of
  units.for.lab <- c("mol", "g", "L", "U", "IU")

  # remove white space
  unitin <- gsub("\\s", "", unitin)
  unitout <- gsub("\\s", "", unitout)
  
  # build the pieces of our regular expression...
  prefix.combo <- paste0(metric.prefixes, collapse = "|")
  unit.combo <- paste0(units.for.lab, collapse = "|")

  # ...and stitch it all together
  unit.search <- paste0("^(", prefix.combo, ")?(", unit.combo, ")$")

  # identify the matches
  match.unit.in <- unlist(regmatches(unitin, regexec(unit.search, unitin)))
  match.unit.out <- unlist(regmatches(unitout, regexec(unit.search, unitout)))
  
  if (length(match.unit.in) == 0) stop(paste0("Can't parse input units (", unitin, ")"))
  if (length(match.unit.out) == 0) stop(paste0("Can't parse output units (", unitout, ")"))
  
  if (match.unit.in[3] != match.unit.out[3]) stop("Base units don't match")
  
  # get the appropriate log multipliers
  logmult.in <- metric.logmultipliers[match(match.unit.in[2], metric.prefixes)]
  logmult.out <- metric.logmultipliers[match(match.unit.out[2], metric.prefixes)]
  
  # return the appropriate (log) conversion factor
  return(logmult.in - logmult.out)
}


# Try it out
convert.one.unit("mL","L")

## [1] -3

‘Apply’-ing yourself

We’re actually most of the way there now. The final family of commands that we’d like to use is apply(), with various flavors that allow you to repeatedly apply (no surprise) a function to many entries of a variable. Dan mentioned this in his last post. He also mentioned not understanding the bad press that for loops get when they’re small. I completely agree with him, but the issue tends to arise when you’re used to a language like C (yes, I know we’re talking about compiled vs. interpreted in that case), where your loops are blazingly fast. You come to R and try nested loops that run from 1:10000, and then you have to go for coffee. lapply(), mapply(), mapply(), apply(), etc. have advantages in the R world. Might as well go with the flow on this one.

We’re going to make a convert.multiple.units() function that takes unitsin and unitsout vectors, binds them together as two columns, and then runs apply() to feed them to convert.one.unit(). Because apply() lets us interate a function over either dimension of a matrix, we can bind the two columns (a vector of original units and a vector of target units) and then iterate over each pair by rows (that’s what the 1 means as the second argument of apply(): it applies the function by row). If the anonymous function syntax throws you off…let us know in the comments, and we’ll cover it some time. For now, just understand that the last part of the line feeds values to the convert.one.unit()function.

convert.multiple.units <- function (unitsin, unitsout) {
  apply(cbind(unitsin, unitsout), 1, function (x) {convert.one.unit(x[1], x[2])})
}

Finally, we’ll go back to our original labunit.convert() function. Our overall plan is to split each unit by recognizing the “/” character using strsplit(). This returns a list of vectors of split groups (i.e. “mg/dL” becomes the a list where the first element is a character vector (“mg”, “dl”)). We then make sure that the lengths match (i.e. if the input is “mg/dL” and the output if “g/mL” that’s OK, but if the output is “g” then that’s a problem), obtain all the multipliers, and then add them all up. We add because they’re logs…and actually we mostly subtract, because we’re dividing. For cuteness points, we return 2*x[1] - sum(x), which will accurately calculate not only conversions like mg→g and mg/dL→g/L, but will even do crazy stuff like U/g/L→mU/kg/dL. Don’t ask me why you’d want to do that, but it works. The final multiplier is used to convert the vector of values (good for you if you notice that we didn’t check to make sure that the length of the values vector matched the unitsin vector…but we can always recycle our values that way).

labunit.convert <- function (values, unitsin, unitsout) {
  insep <- strsplit(unitsin, "/")
  outsep <- strsplit(unitsout, "/")

  lengthsin <- sapply(insep, length)
  lengthsout <- sapply(outsep, length)
  
  if (!all(lengthsin == lengthsout)) stop("Input and output units can't be converted")

  multipliers <- mapply(convert.multiple.units, insep, outsep)
  
  final.multiplier <- apply(t(multipliers), 1, function (x) {2*x[1] - sum(x)})
  
  return(values * 10^final.multiplier)
}

OK, enough. Back over to you, Dan. We now have a piece of code that we can use when we start comparing PT data from different instruments. That’s the immediate plan for future posts2, and before long there may even be an entry with nice graphics like those of my Canadian colleague.

-SRM

I received a request to convert “G/L” to “M/mL”, which was interpreted as converting billions/L to millions/mL. This requires changing our convert.one.unit() function to handle a “no units” case. Actually, it’s not as difficult as it sounds; if we just add an empty string (i.e. "") to the end of the units.for.lab vector, our regular expression does the right thing. Your edited line would read units.for.lab <- c("mol", "g", "L", "U", "IU", ""). The reason this works, incidentally, is that there’s no overlap (except "") between the prefixes and the units, so the pattern match doesn’t have a chance to be confused.↩
Following Dan’s lead, I should point out a major caveat to any such plans is James 4:13-15. Double extra credit if you are interested enough to look it up.↩

A Closer Look at TAT Time Dependence

August 28, 2015September 2, 2015 dtholmes@mail.ubc.ca

The Problem

We want to have a closer look at the time–dependence of turn around times (TATs). In particular, we would like to see if there is a significant trend in TAT over time (improvement or deterioration) and we would like the data to inform us of slowdowns and potentially unexpected problems that occur throughout each week. This should allow us to identify areas of the pre-analytical and/or analytical process phlebotomists that require attention.

My interest in this topic (which in past seemed entirely banal) came from the frustration of receiving monthly TAT reports showing spaghetti plots produced in Excel. In examining these figures is was entirely unclear to me whether any observed changes in the median (the only measure of central tendency provided) represented stochastic behaviour or a real problem. Utlimately, we want to be able to identify real problems in the preanalytical and analytical process but to do this, we need to visualize the data in a more sophisticated manner.

To do this, we are going to look at order–to–file times for a whole year for a nameless test X. You should be able to modify this approach to the manner in which your data is provided to you.

The real data was a little dirty but I have pre–cleaned it—this will have to be the topic of another post. In short, I purged the cancelled tests, removed duplicate records and limited my analysis to stat tests based on a stat flag that is stored in the laboratory information system (LIS). I won’t discuss this process here. The buffed–up file is named “2014_and_All_Clean.txt”. This happens to be a tab–delimited txt file. For this reason, I used read.delim() rather than read.csv(). These are basically the same function with different defaults for the seperator–one uses a comma and the other uses a tab. Please see our first post on TAT to understand how we are using the lubridate function ymd_hm().

Loading the Data

library(lubridate)
myData <- read.delim(file = "2014_and_All_clean.txt")
myData$ordered <- ymd_hm(myData$ordered)
myData$collected <- ymd_hm(myData$collected)
myData$received <- ymd_hm(myData$received)
myData$resulted <- ymd_hm(myData$resulted)
#confirm success
head(myData)

library(lubridate)

myData <- read.delim(file = "2014_and_All_clean.txt")

myData$ordered <- ymd_hm(myData$ordered)

myData$collected <- ymd_hm(myData$collected)

myData$received <- ymd_hm(myData$received)

myData$resulted <- ymd_hm(myData$resulted)

#confirm success

head(myData)

##               ordered           collected            received
## 1 2014-01-01 17:53:00 2014-01-01 17:54:00 2014-01-01 18:08:00
## 2 2014-01-01 15:10:00 2014-01-01 15:19:00 2014-01-01 15:21:00
## 3 2014-01-01 17:07:00 2014-01-01 17:15:00 2014-01-01 17:17:00
## 4 2014-01-01 18:20:00 2014-01-01 18:30:00 2014-01-01 18:35:00
## 5 2014-01-01 11:19:00 2014-01-01 11:25:00 2014-01-01 11:29:00
## 6 2014-01-01 11:00:00 2014-01-01 11:08:00 2014-01-01 11:11:00
##              resulted
## 1 2014-01-01 18:45:00
## 2 2014-01-01 16:33:00
## 3 2014-01-01 18:09:00
## 4 2014-01-01 19:45:00
## 5 2014-01-01 13:33:00
## 6 2014-01-01 11:47:00

## ordered collected received

## 1 2014-01-01 17:53:00 2014-01-01 17:54:00 2014-01-01 18:08:00

## 2 2014-01-01 15:10:00 2014-01-01 15:19:00 2014-01-01 15:21:00

## 3 2014-01-01 17:07:00 2014-01-01 17:15:00 2014-01-01 17:17:00

## 4 2014-01-01 18:20:00 2014-01-01 18:30:00 2014-01-01 18:35:00

## 5 2014-01-01 11:19:00 2014-01-01 11:25:00 2014-01-01 11:29:00

## 6 2014-01-01 11:00:00 2014-01-01 11:08:00 2014-01-01 11:11:00

## resulted

## 1 2014-01-01 18:45:00

## 2 2014-01-01 16:33:00

## 3 2014-01-01 18:09:00

## 4 2014-01-01 19:45:00

## 5 2014-01-01 13:33:00

## 6 2014-01-01 11:47:00

Now we want to look at a TAT. As in our first post on this topic, we will look at the order–to–file time.

otf <- difftime(myData$resulted, myData$ordered,units = "min")
myData <- cbind(myData,otf)

1 2	otf <- difftime(myData$resulted, myData$ordered,units = "min") myData <- cbind(myData,otf)

Sanity Check

Let’s just have a quick look at this to make sure nothing crazy is happening.

hist(as.numeric(myData$otf),xlim = c(0,200),breaks = 150, col = "orange", xlab = "TAT for X (min)", main = "Histogram of TAT for X")

1	hist(as.numeric(myData$otf),xlim = c(0,200),breaks = 150, col = "orange", xlab = "TAT for X (min)", main = "Histogram of TAT for X")

summary(as.numeric(myData$otf))

1	summary(as.numeric(myData$otf))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00   51.00   62.00   70.85   77.00 1656.00

1 2	## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 3.00 51.00 62.00 70.85 77.00 1656.00

Some Nutty Stuff

We do note one thing—there is a sample with a TAT of 1656 min. This is a little crazy so we could investigate those samples to see if this is real (because of a lost sample) or an artifact of an add–on analysis being misidentified as a stat or some other similar nonsensical event.

If you wanted to list all of these extreme outliers for the year, you could do so like this:

library("dplyr")

1	library("dplyr")

## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:lubridate':
## 
##     intersect, setdiff, union
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## Attaching package: 'dplyr'

## The following objects are masked from 'package:lubridate':

## intersect, setdiff, union

## The following objects are masked from 'package:stats':

## filter, lag

## The following objects are masked from 'package:base':

## intersect, setdiff, setequal, union

head(arrange(myData,desc(otf)),10)

1	head(arrange(myData,desc(otf)),10)

##                ordered           collected            received
## 1  2014-09-21 21:50:00 2014-09-21 21:58:00 2014-09-21 22:04:00
## 2  2014-09-07 14:59:00 2014-09-08 12:00:00 2014-09-08 12:04:00
## 3  2014-03-18 13:45:00 2014-03-18 14:10:00 2014-03-18 14:29:00
## 4  2014-04-01 01:48:00 2014-04-01 02:03:00 2014-04-01 02:06:00
## 5  2014-04-01 02:21:00 2014-04-01 02:28:00 2014-04-01 02:31:00
## 6  2014-09-04 21:35:00 2014-09-04 21:45:00 2014-09-04 22:08:00
## 7  2014-10-08 10:38:00 2014-10-08 10:42:00 2014-10-08 10:46:00
## 8  2014-05-25 13:23:00 2014-05-25 13:35:00 2014-05-25 13:45:00
## 9  2014-08-06 16:50:00 2014-08-06 17:09:00 2014-08-06 17:15:00
## 10 2014-08-02 11:52:00 2014-08-02 21:30:00 2014-08-02 21:44:00
##               resulted       otf
## 1  2014-09-23 01:26:00 1656 mins
## 2  2014-09-08 13:26:00 1347 mins
## 3  2014-03-19 11:17:00 1292 mins
## 4  2014-04-01 17:29:00  941 mins
## 5  2014-04-01 16:20:00  839 mins
## 6  2014-09-05 11:07:00  812 mins
## 7  2014-10-08 22:41:00  723 mins
## 8  2014-05-26 01:20:00  717 mins
## 9  2014-08-07 03:46:00  656 mins
## 10 2014-08-02 22:44:00  652 mins

## ordered collected received

## 1 2014-09-21 21:50:00 2014-09-21 21:58:00 2014-09-21 22:04:00

## 2 2014-09-07 14:59:00 2014-09-08 12:00:00 2014-09-08 12:04:00

## 3 2014-03-18 13:45:00 2014-03-18 14:10:00 2014-03-18 14:29:00

## 4 2014-04-01 01:48:00 2014-04-01 02:03:00 2014-04-01 02:06:00

## 5 2014-04-01 02:21:00 2014-04-01 02:28:00 2014-04-01 02:31:00

## 6 2014-09-04 21:35:00 2014-09-04 21:45:00 2014-09-04 22:08:00

## 7 2014-10-08 10:38:00 2014-10-08 10:42:00 2014-10-08 10:46:00

## 8 2014-05-25 13:23:00 2014-05-25 13:35:00 2014-05-25 13:45:00

## 9 2014-08-06 16:50:00 2014-08-06 17:09:00 2014-08-06 17:15:00

## 10 2014-08-02 11:52:00 2014-08-02 21:30:00 2014-08-02 21:44:00

## resulted otf

## 1 2014-09-23 01:26:00 1656 mins

## 2 2014-09-08 13:26:00 1347 mins

## 3 2014-03-19 11:17:00 1292 mins

## 4 2014-04-01 17:29:00 941 mins

## 5 2014-04-01 16:20:00 839 mins

## 6 2014-09-05 11:07:00 812 mins

## 7 2014-10-08 22:41:00 723 mins

## 8 2014-05-26 01:20:00 717 mins

## 9 2014-08-07 03:46:00 656 mins

## 10 2014-08-02 22:44:00 652 mins

which gives you the TAT of the 10 (or whatever number you prefer) worst specimens for the year. Obviously when you do this kind of analysis on your own data, you will retain the specimen ID in the data set and you could explore what is going on here–whether these are add–ons etc. You discover interesting things when you dig into your data.

Time Dependence

But we are interested in time-dependence of the TAT, so let’s look at a scatterplot of the whole year.

start <- ceiling_date(min(myData$collected))
finish <- start+days(356)
plot(myData$collected,myData$otf, pch = 19, xlim = c(start,finish),col = "#00000020",cex = 0.5, ylim = c(0,200), ylab = "TAT of X (min)", xlab = "Date")

start <- ceiling_date(min(myData$collected))

finish <- start+days(356)

plot(myData$collected,myData$otf, pch = 19, xlim = c(start,finish),col = "#00000020",cex = 0.5, ylim = c(0,200), ylab = "TAT of X (min)", xlab = "Date")

So, that’s pretty hard to draw inferences from. We can see that there are some outliers with inconceivably low TAT. We will have to investigate what is going on with those collections but not right now. These outliers will not affect the non–parametric measures of central tendency.

Tunnelling Down

Let’s have a look at one week.

finish <- start + days(7)
ticks <- seq(from = start, to = finish, by = "days")
plot(myData$collected,myData$otf, pch = 19, xlim = c(start,finish), col = "#00000020", cex = 0.5, ylim = c(0,200), ylab = "TAT of X (min)", xlab = "", xaxt = "n")
axis.POSIXct(side = 1, ticks, at = ticks, las = 2, cex.axis = 0.6, col.axis = "gray30", format = "%b %d %Y")
mtext("Date of analysis", side = 1, line = 4)

finish <- start + days(7)

ticks <- seq(from = start, to = finish, by = "days")

plot(myData$collected,myData$otf, pch = 19, xlim = c(start,finish), col = "#00000020", cex = 0.5, ylim = c(0,200), ylab = "TAT of X (min)", xlab = "", xaxt = "n")

axis.POSIXct(side = 1, ticks, at = ticks, las = 2, cex.axis = 0.6, col.axis = "gray30", format = "%b %d %Y")

mtext("Date of analysis", side = 1, line = 4)

See the first post on this topic for more information about the plotting parameters.

We can see there is a definite (and unsurprising) periodicity in the number of tests per hour. We can look at “volumes” another time. What we want to do now is look for time–dependence in the TAT so we can ultimately investigate what days of the week and times of the day are worse. But we don’t want to do this for one week—we want to do this for all weeks in the year. It would be nice, for example to plot all the Sundays, Mondays, Tuesdays etc overlapping and then see if we can see day–of–week and time–of–day trends.

Some More Lubridate Magic

Therefore, we need to assign every point in our myData dataframe a day of the week. The lubridate function wday() does this for us.

start

start

## [1] "2014-01-01 02:39:00 UTC"

1	## [1] "2014-01-01 02:39:00 UTC"

wday(start, label = TRUE)

1	wday(start, label = TRUE)

## [1] Wed
## Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat

1 2	## [1] Wed ## Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat

#if you want them as numbers, just leave the label option out.
wday(start)

1 2	#if you want them as numbers, just leave the label option out. wday(start)

## [1] 4

## [1] 4

So, January 1, 2014 was a Wednesday, which is the 4th day of the week. Let’s assign the day of the week for all our days and then bind this to our data.

weekday <- wday(myData$collected)
myData <- cbind(myData,weekday)
head(myData)

weekday <- wday(myData$collected)

myData <- cbind(myData,weekday)

head(myData)

##               ordered           collected            received
## 1 2014-01-01 17:53:00 2014-01-01 17:54:00 2014-01-01 18:08:00
## 2 2014-01-01 15:10:00 2014-01-01 15:19:00 2014-01-01 15:21:00
## 3 2014-01-01 17:07:00 2014-01-01 17:15:00 2014-01-01 17:17:00
## 4 2014-01-01 18:20:00 2014-01-01 18:30:00 2014-01-01 18:35:00
## 5 2014-01-01 11:19:00 2014-01-01 11:25:00 2014-01-01 11:29:00
## 6 2014-01-01 11:00:00 2014-01-01 11:08:00 2014-01-01 11:11:00
##              resulted      otf weekday
## 1 2014-01-01 18:45:00  52 mins       4
## 2 2014-01-01 16:33:00  83 mins       4
## 3 2014-01-01 18:09:00  62 mins       4
## 4 2014-01-01 19:45:00  85 mins       4
## 5 2014-01-01 13:33:00 134 mins       4
## 6 2014-01-01 11:47:00  47 mins       4

## ordered collected received

## 1 2014-01-01 17:53:00 2014-01-01 17:54:00 2014-01-01 18:08:00

## 2 2014-01-01 15:10:00 2014-01-01 15:19:00 2014-01-01 15:21:00

## 3 2014-01-01 17:07:00 2014-01-01 17:15:00 2014-01-01 17:17:00

## 4 2014-01-01 18:20:00 2014-01-01 18:30:00 2014-01-01 18:35:00

## 5 2014-01-01 11:19:00 2014-01-01 11:25:00 2014-01-01 11:29:00

## 6 2014-01-01 11:00:00 2014-01-01 11:08:00 2014-01-01 11:11:00

## resulted otf weekday

## 1 2014-01-01 18:45:00 52 mins 4

## 2 2014-01-01 16:33:00 83 mins 4

## 3 2014-01-01 18:09:00 62 mins 4

## 4 2014-01-01 19:45:00 85 mins 4

## 5 2014-01-01 13:33:00 134 mins 4

## 6 2014-01-01 11:47:00 47 mins 4

Now let’s plot all of the Monday–data for the whole year and look at the with–day–trends for Mondays. We are going to convert all of the TATs and times at which they are collected to decimal numbers so we don’t run into any hassles. (Yes, I ran into hassles when I did not do this.)

This little function accomplishes this for us:

#convert the time to decimal number of hours since midnight for simplicity in plotting
timeconvert = function(t){hour(t)+minute(t)/60}

1 2	#convert the time to decimal number of hours since midnight for simplicity in plotting timeconvert = function(t){hour(t)+minute(t)/60}

mondayData <- subset(myData,weekday == 2)
mondayTimes <- timeconvert(mondayData$collected)
mondayData <- cbind(mondayData,times = mondayTimes)
mondayData$otf <- as.numeric(mondayData$otf)
head(mondayData)

mondayData <- subset(myData,weekday == 2)

mondayTimes <- timeconvert(mondayData$collected)

mondayData <- cbind(mondayData,times = mondayTimes)

mondayData$otf <- as.numeric(mondayData$otf)

head(mondayData)

##                 ordered           collected            received
## 225 2014-01-06 23:35:00 2014-01-06 23:30:00 2014-01-07 00:09:00
## 226 2014-01-06 11:02:00 2014-01-06 11:10:00 2014-01-06 11:14:00
## 227 2014-01-06 14:08:00 2014-01-06 14:20:00 2014-01-06 14:24:00
## 228 2014-01-06 10:06:00 2014-01-06 10:16:00 2014-01-06 10:19:00
## 229 2014-01-06 14:42:00 2014-01-06 15:09:00 2014-01-06 15:16:00
## 230 2014-01-06 20:02:00 2014-01-06 20:17:00 2014-01-06 20:22:00
##                resulted otf weekday    times
## 225 2014-01-07 00:46:00  71       2 23.50000
## 226 2014-01-06 12:42:00 100       2 11.16667
## 227 2014-01-06 16:02:00 114       2 14.33333
## 228 2014-01-06 11:44:00  98       2 10.26667
## 229 2014-01-06 16:04:00  82       2 15.15000
## 230 2014-01-06 20:55:00  53       2 20.28333

## ordered collected received

## 225 2014-01-06 23:35:00 2014-01-06 23:30:00 2014-01-07 00:09:00

## 226 2014-01-06 11:02:00 2014-01-06 11:10:00 2014-01-06 11:14:00

## 227 2014-01-06 14:08:00 2014-01-06 14:20:00 2014-01-06 14:24:00

## 228 2014-01-06 10:06:00 2014-01-06 10:16:00 2014-01-06 10:19:00

## 229 2014-01-06 14:42:00 2014-01-06 15:09:00 2014-01-06 15:16:00

## 230 2014-01-06 20:02:00 2014-01-06 20:17:00 2014-01-06 20:22:00

## resulted otf weekday times

## 225 2014-01-07 00:46:00 71 2 23.50000

## 226 2014-01-06 12:42:00 100 2 11.16667

## 227 2014-01-06 16:02:00 114 2 14.33333

## 228 2014-01-06 11:44:00 98 2 10.26667

## 229 2014-01-06 16:04:00 82 2 15.15000

## 230 2014-01-06 20:55:00 53 2 20.28333

So this seems to have worked and now we can make a scatter plot.

Monday Monday, So Good to Me?

plot(mondayData$times,mondayData$otf, pch = 19, col = "#00000020", cex = 0.5, ylim = c(0,200), xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)")
axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")

1 2	plot(mondayData$times,mondayData$otf, pch = 19, col = "#00000020", cex = 0.5, ylim = c(0,200), xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)") axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")

But now for the interesting part. We want to see how the median TAT is related to the time of day. We might want to look at, say, the running median over one–hour window all day long. Notice that I have made the times, t, go from 0.5 to 23.5 because these are the only times for which a 60 min moving median can be calculated. Otherwise we’d have this really annoying situation where we’d have to fetch data from the last half-hour of Sunday and the first half-hour of Tuesday. I don’t need that level of perfectionism at present.

plot(mondayData$times,mondayData$otf, pch = 19, col = "#00000020",cex = 0.5, ylim = c(30,100),xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)")
axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")

#60 min moving median calculation:
#Create a point at every minute of the day
t <- seq(from = 0.5,to = 23.5, by = 1/60)
#create and empty vector to store the data
med.tats <- vector()

for(i in 1:length(t)){
  #get all points within an hour of the minute
  tats <- subset(mondayData,(mondayData$times >= (t[i]-0.5)) & (mondayData$times < (t[i]+0.5)))
  #alternatively using filter() from the dplyr package
  #tats <- filter(mondayData, times >= (t[i] - 0.5) & times < (t[i] + 0.5))
  med.tats[i] <- median(tats$otf)
}

lines(t,med.tats, col = "blue")
grid(NA,NULL)

plot(mondayData$times,mondayData$otf, pch = 19, col = "#00000020",cex = 0.5, ylim = c(30,100),xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)")

axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")

#60 min moving median calculation:

#Create a point at every minute of the day

t <- seq(from = 0.5,to = 23.5, by = 1/60)

#create and empty vector to store the data

med.tats <- vector()

for(i in 1:length(t)){

#get all points within an hour of the minute

tats <- subset(mondayData,(mondayData$times >= (t[i]-0.5)) & (mondayData$times < (t[i]+0.5)))

#alternatively using filter() from the dplyr package

#tats <- filter(mondayData, times >= (t[i] - 0.5) & times < (t[i] + 0.5))

med.tats[i] <- median(tats$otf)

}

lines(t,med.tats, col = "blue")

grid(NA,NULL)

Removing the For–ness

Many R folks don’t like for–loops and would rather use the apply() family of functions. I’m not sure I always understand contempt towards loops for small simple tasks but if you wanted to accomplish the same looping task without using a for–loop, you could do as follows:

t <- seq(from = 0.5, to = 23.5, by = 1/60)
movingmed<-function(t,x){
  tats <- filter(x, times >= (t - 0.5) & times < (t + 0.5))
  median(tats$otf)
}
med.tats <- sapply(t, movingmed, x=mondayData)

t <- seq(from = 0.5, to = 23.5, by = 1/60)

movingmed<-function(t,x){

tats <- filter(x, times >= (t - 0.5) & times < (t + 0.5))

median(tats$otf)

}

med.tats <- sapply(t, movingmed, x=mondayData)

Smoothing

This approach is reasonable but the problem (I have found) is that it is computationally expensive on large data sets. For this reason, it is nice to use a canned smoothing algorithm like LOWESS which is much faster. The parameter f of the lowess function has a default of 2/3 which in our case results in a fit that is way–too smoothed. I played around with f until I got something that more or less tracked with the 60–min moving median. There are many approaches to smoothing–don’t get lost in the vortex.

Lowess Smoothing

plot(mondayData$times,mondayData$otf, pch = 19, col = "#00000020",cex = 0.5, ylim = c(30,100),xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)")
axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")
#running median
#create a point at every minute of the day
t <- seq(from = 0.5,to = 23.5, by = 1/60)
med.tats <- vector()
for(i in 1:length(t)){
  #get all points within an hour of the minute
  tats <- subset(mondayData,(mondayData$times >= (t[i] - 0.5)) & (mondayData$times < (t[i] + 0.5)))
  #alternatively using filter() from the dplyr package
  #tats <- filter(mondayData, times >= (t[i] - 0.5) & times < (t[i] + 0.5))
  med.tats[i] <- median(tats$otf)
}
lines(t,med.tats, col = "blue")
grid(col = "black")
mondayFit <- lowess(mondayData$times,mondayData$otf,f = 0.05)
lines(mondayFit,col = "red")

plot(mondayData$times,mondayData$otf, pch = 19, col = "#00000020",cex = 0.5, ylim = c(30,100),xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)")

axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")

#running median

#create a point at every minute of the day

t <- seq(from = 0.5,to = 23.5, by = 1/60)

med.tats <- vector()

for(i in 1:length(t)){

#get all points within an hour of the minute

tats <- subset(mondayData,(mondayData$times >= (t[i] - 0.5)) & (mondayData$times < (t[i] + 0.5)))

#alternatively using filter() from the dplyr package

#tats <- filter(mondayData, times >= (t[i] - 0.5) & times < (t[i] + 0.5))

med.tats[i] <- median(tats$otf)

}

lines(t,med.tats, col = "blue")

grid(col = "black")

mondayFit <- lowess(mondayData$times,mondayData$otf,f = 0.05)

lines(mondayFit,col = "red")

So, that’s cool. Now lets loop over all the days of the week and make plots for each day.

#create a 2x4 plot window
par(mfrow = c(2,4))
#loop over days
for (i in 1:7){
  #make the lowess fit
  mydayData <- subset(myData,weekday == i)
  mydayTimes <- timeconvert(mydayData$collected)
  mydayData <- cbind(mydayData,times = mydayTimes)
  mydayFit <- lowess(mydayData$times,mydayData$otf,f = 0.05)
  plot(NA,NA,ylim = c(30,100), xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)")
  axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")
  
  #running median
  #create a point at every minute of the day
  t <- seq(from = 0.5,to = 23.5, by = 1/60)
  med.tats <- vector()
  for(j in 1:length(t)){
    #get all points within an hour of the minute
    tats <- subset(mydayData,(mydayData$times >= (t[j] - 0.5)) & (mydayData$times < (t[j] + 0.5)))
    #alternatively using filter() from the dplyr package
    #tats <- filter(mydayData, times >= (t[j] - 0.5) & times < (t[j] + 0.5))
    med.tats[j] <- median(tats$otf)
  }
  lines(t,med.tats, col = "blue")
  grid(col = "black")
  mydayFit <- lowess(mydayData$times,mydayData$otf,f = 0.05)
  lines(mydayFit,col = "red")
  
  #put in horizontal gridding
  grid(NA,NULL)
  #you could add a legend on all the plots if you wanted
  #legend("bottomright", c("moving median","lowess"), lty = c(1,1), col = c("blue","red", byt = "n"))
}

#create a 2x4 plot window

par(mfrow = c(2,4))

#loop over days

for (i in 1:7){

#make the lowess fit

mydayData <- subset(myData,weekday == i)

mydayTimes <- timeconvert(mydayData$collected)

mydayData <- cbind(mydayData,times = mydayTimes)

mydayFit <- lowess(mydayData$times,mydayData$otf,f = 0.05)

plot(NA,NA,ylim = c(30,100), xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)")

axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")

#running median

#create a point at every minute of the day

t <- seq(from = 0.5,to = 23.5, by = 1/60)

med.tats <- vector()

for(j in 1:length(t)){

#get all points within an hour of the minute

tats <- subset(mydayData,(mydayData$times >= (t[j] - 0.5)) & (mydayData$times < (t[j] + 0.5)))

#alternatively using filter() from the dplyr package

#tats <- filter(mydayData, times >= (t[j] - 0.5) & times < (t[j] + 0.5))

med.tats[j] <- median(tats$otf)

}

lines(t,med.tats, col = "blue")

grid(col = "black")

mydayFit <- lowess(mydayData$times,mydayData$otf,f = 0.05)

lines(mydayFit,col = "red")

#put in horizontal gridding

grid(NA,NULL)

#you could add a legend on all the plots if you wanted

#legend("bottomright", c("moving median","lowess"), lty = c(1,1), col = c("blue","red", byt = "n"))

}

Now, let’s overplot all the lowess fits on a single graph and see what practical observations we can make. I have increased the lowess() smoothing to make things easier to look at.

par(mfrow = c(1,1))
plot(0,0,ylim = c(40,80),xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)")
axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")
for (i in 1:7){
  mydayData <- subset(myData,weekday == i)
  mydayTimes <- timeconvert(mydayData$collected)
  mydayData <- cbind(mydayData,times = mydayTimes)
  mydayFit <- lowess(mydayData$times,mydayData$otf,f = 0.1)
  lines(mydayFit, col = rainbow(7)[i], ylim = c(40,80),xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)", main = paste("TAT for",wday(i,label = TRUE),"in 2014"))
}
legend("bottomright",as.character(wday(1:7,label = TRUE)), col = rainbow(7), lty = rep(1,7), cex = 0.5)

par(mfrow = c(1,1))

plot(0,0,ylim = c(40,80),xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)")

axis(side = 1, 0:24, at = 0:24, las = 2, cex.axis = 0.6, col.axis = "gray30")

for (i in 1:7){

mydayData <- subset(myData,weekday == i)

mydayTimes <- timeconvert(mydayData$collected)

mydayData <- cbind(mydayData,times = mydayTimes)

mydayFit <- lowess(mydayData$times,mydayData$otf,f = 0.1)

lines(mydayFit, col = rainbow(7)[i], ylim = c(40,80),xlim = c(0,24), xaxt = "n", xlab = "Hour of Day", ylab = "Turnaround Time (min)", main = paste("TAT for",wday(i,label = TRUE),"in 2014"))

}

legend("bottomright",as.character(wday(1:7,label = TRUE)), col = rainbow(7), lty = rep(1,7), cex = 0.5)

Observations

We can immediately see some issues. Weekends in the early hours of the morning are bad. 8 am is bad across all days. Noon is generally problematic and particularly so on Saturdays. There is also a slowdown in mid–afternoon and in the early evenings. Saturday midnight is the most problematic time, although the endpoints of the figure have fewer local weighting points and their confidence intervals are wider. This is something we can cover another time.

Remember, also, this is only the median we have looked at. Other horrors may be lurking in the 90th percentile.

Next time what we will do is move all of this TAT visualization to a 3D representation so we can more easily spot the problematic times.

-Dan

The lot is cast into the lap, but its every decision is from the LORD.

Proverbs 16:33

Generating Meaningful Turaround Time Plots for Clinical Laboratory Medicine

August 11, 2015August 11, 2015 dtholmes@mail.ubc.ca

The Problem

It is standard practice in Clinical Laboratory Medicine to monitor turn around times (TATs) for high volume tests like potassium (K), Troponin (Tn) and Hemoglobin (Hb). The term TAT is typically understood to mean “the time elapsed from when the doctor orders the test to the time the result is available in the Laboratory Information System (LIS)”. This of course does not take into account the lag between the result availability and the time when the physician logs in to view it and respond, but let’s just say that we are not there yet.

Traditionally, some dedicated soul would take .csv extracts from the LIS and do laborious things in Excel to generate the median TAT for the month for each test and each lab location for which they were responsible. Not only is it impossible to automate such a process, it is entirely manual and produces fairly uninformative output since (at least at our site) only medians were generated.

What really frustrates physicians is not where the median goes each month, it is the behaviour of, say, the 90th percentile of TAT or the outliers. These are the ones they remember.

R allows us to produce a much more informative figure in an automatable fashion. I provide here an example of a TAT figure for Hb with some statistical metric included.

Look at the Data

Let’s start by reading in our data and looking at how it is structured.

myData<-read.csv(file = "Hb_TAT_data.csv",header = TRUE)
str(myData)
head(myData)

myData<-read.csv(file = "Hb_TAT_data.csv",header = TRUE)

str(myData)

head(myData)

## 'data.frame':    4497 obs. of  6 variables:
##  $ specimenID: int  4221 5281 5308 5320 5356 5374 5375 5376 5241 5270 ...
##  $ ordered   : Factor w/ 4244 levels "2015-06-30 23:15",..: 4 68 86 92 115 126 127 128 46 61 ...
##  $ collected : Factor w/ 4245 levels "2015-07-01 00:02",..: 2 72 88 95 114 126 128 129 46 65 ...
##  $ received  : Factor w/ 4098 levels "2015-07-01 00:07",..: 2 69 81 88 107 119 120 122 45 62 ...
##  $ resulted  : Factor w/ 3765 levels "2015-07-01 00:11",..: 2 70 79 86 104 113 114 116 42 63 ...
##  $ result    : int  126 110 134 117 135 113 109 111 106 129 ...

## specimenID ordered collected received
## 1 4221 2015-06-30 23:28 2015-07-01 00:08 2015-07-01 00:29
## 2 5281 2015-07-01 14:12 2015-07-01 14:25 2015-07-01 14:30
## 3 5308 2015-07-01 15:56 2015-07-01 16:03 2015-07-01 16:10
## 4 5320 2015-07-01 16:57 2015-07-01 17:12 2015-07-01 17:20
## 5 5356 2015-07-01 20:00 2015-07-01 20:07 2015-07-01 20:18
##6 5374 2015-07-01 21:37 2015-07-01 21:40 2015-07-01 21:44
 resulted result
1 2015-07-01 00:37 126
2 2015-07-01 14:50 110
3 2015-07-01 16:16 134
4 2015-07-01 17:23 117
5 2015-07-01 20:23 135
6 2015-07-01 21:53 113

## 'data.frame': 4497 obs. of 6 variables:

## $ specimenID: int 4221 5281 5308 5320 5356 5374 5375 5376 5241 5270 ...

## $ ordered : Factor w/ 4244 levels "2015-06-30 23:15",..: 4 68 86 92 115 126 127 128 46 61 ...

## $ collected : Factor w/ 4245 levels "2015-07-01 00:02",..: 2 72 88 95 114 126 128 129 46 65 ...

## $ received : Factor w/ 4098 levels "2015-07-01 00:07",..: 2 69 81 88 107 119 120 122 45 62 ...

## $ resulted : Factor w/ 3765 levels "2015-07-01 00:11",..: 2 70 79 86 104 113 114 116 42 63 ...

## $ result : int 126 110 134 117 135 113 109 111 106 129 ...

## specimenID ordered collected received

## 1 4221 2015-06-30 23:28 2015-07-01 00:08 2015-07-01 00:29

## 2 5281 2015-07-01 14:12 2015-07-01 14:25 2015-07-01 14:30

## 3 5308 2015-07-01 15:56 2015-07-01 16:03 2015-07-01 16:10

## 4 5320 2015-07-01 16:57 2015-07-01 17:12 2015-07-01 17:20

## 5 5356 2015-07-01 20:00 2015-07-01 20:07 2015-07-01 20:18

##6 5374 2015-07-01 21:37 2015-07-01 21:40 2015-07-01 21:44

resulted result

1 2015-07-01 00:37 126

2 2015-07-01 14:50 110

3 2015-07-01 16:16 134

4 2015-07-01 17:23 117

5 2015-07-01 20:23 135

6 2015-07-01 21:53 113

In this simplified anonymized data set we can see that we have 4497 observations with all of the necessary time points to calculate the turnaround times of the preanalytical and analytical processes. For the sake of this example, let’s focus on the order-to-file time.

We are going to need to handle the dates, for which there is only one package worth discussing, namely lubridate.

library(lubridate)

1	library(lubridate)

Basic Data Preparation

The first thing we need to do is to convert the order, collect, receive and result times to lubridate objects (i.e. time and date objects) so that we can do some algebra on them. We can see from the structure of myData that the order, collect, receive and result time points are in the format “YYYY-MM-DD HH:MM”. Therefore we can use the lubridate function ymd_hm() to perform the conversion.

myData$ordered<-ymd_hm(myData$ordered)
myData$collected<-ymd_hm(myData$collected)
myData$received<-ymd_hm(myData$received)
myData$resulted<-ymd_hm(myData$resulted)

myData$ordered<-ymd_hm(myData$ordered)

myData$collected<-ymd_hm(myData$collected)

myData$received<-ymd_hm(myData$received)

myData$resulted<-ymd_hm(myData$resulted)

Applying str() again to myData, you will see that the dates and times are now POSIXct, that is, they are now dates and times. This allows use to calculate the order-to-file TAT, we can do with the difftime() function exporting the result in minutes. We will also append the order-to-file (otf) TAT to the dataframe and do some quick sanity-checking.

Sanity Check

otf<-difftime(myData$resulted,myData$ordered,units = "min")
myData<-cbind(myData,otf)
summary(as.numeric(myData$otf))
hist(as.numeric(myData$otf), main = "Histogram of Hb TATs", breaks = 60, col = "darkred",xlim = c(0,200), xlab = "Order to File in Minutes")

otf<-difftime(myData$resulted,myData$ordered,units = "min")

myData<-cbind(myData,otf)

summary(as.numeric(myData$otf))

hist(as.numeric(myData$otf), main = "Histogram of Hb TATs", breaks = 60, col = "darkred",xlim = c(0,200), xlab = "Order to File in Minutes")

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00   22.00   30.00   36.58   43.00  694.00

1 2	## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 3.00 22.00 30.00 36.58 43.00 694.00

This looks reasonable, so we can proceed with a TAT scatterplot.

Scatterplot

plot(myData$ordered,myData$otf, pch = 19, main = "Hemoglobin TAT",xlab = "Date",ylab = "TAT (min)")

1	plot(myData$ordered,myData$otf, pch = 19, main = "Hemoglobin TAT",xlab = "Date",ylab = "TAT (min)")

Beautifying

This is kind-of problematic because we really want to focus on results in the 0-200 minute range. There are some wild-outliers as occurs in real life because of instrument down-time, add-ons, etc. We can leave this matter for the present. Notice that I have displayed every day on the x-axis because this will allow us to investigate any problems we see. So we will adjust the ylim and we will also make the plot points semitransparent by using hexidecimal colour codes followed by a fractional transparency expressed in hexidecimal. Black is “#000000” and “20” is hexidecimal for 32 which is 32/256 or 12.5% opacity.

#make the points semistranparent and a little smaller
plot(myData$ordered,myData$otf, pch = 19, main = "Hemoglobin TAT",xlab = "Date of Analysis",ylab = "TAT (min)", ylim = c(0,200), col = "#00000020",cex = 0.5)

1 2	#make the points semistranparent and a little smaller plot(myData$ordered,myData$otf, pch = 19, main = "Hemoglobin TAT",xlab = "Date of Analysis",ylab = "TAT (min)", ylim = c(0,200), col = "#00000020",cex = 0.5)

We’ll accept the fact that we know that there are a number of outliers. We could easily have a plot that displayed them or a tabular summary of them.

Now we will need to prepare the vector of daily medians, 10th and 90th percentiles to plot. We will loop through each day of the month and then calculate the statistics for that day.

#calculate the start and end date
#fist collection in the month's reporting is always collected in the previous month so ceiling forces startDate to be first day of the month we are interested in. Same is true for endDate.
startDate <- ceiling_date(min(myData$ordered),"day")
endDate <- ceiling_date(max(myData$ordered),"day")
days <- seq (from = startDate, to = endDate, by = 'days')
#when we plot stastics, we want them at mid day
middays <- days + hours(12)
tenth<-vector()
fiftieth<-vector()
ninetieth<-vector()

for ( i in seq_along(days) ) {
  daysData<-subset(myData,myData$ordered >= days[i]&myData$ordered<days[i+1])
  tenth[i]<-quantile(daysData$otf,probs = 0.10)
  fiftieth[i]<-median(daysData$otf)
  ninetieth[i]<-quantile(daysData$otf,probs = 0.90)
}

quantileData<-data.frame(middays,tenth,fiftieth,ninetieth)

plot(myData$ordered,myData$otf, pch = 19, main = "Hemoglobin TAT",xlab = "Date of Analysis", ylab = "TAT (min)", ylim = c(0,200), col = "#00000020",cex = 0.5)
lines(quantileData$middays,quantileData$tenth, col = "red")
lines(quantileData$middays,quantileData$ninetieth, col = "red")
lines(quantileData$middays,quantileData$fiftieth, col = "blue")

#calculate the start and end date

#fist collection in the month's reporting is always collected in the previous month so ceiling forces startDate to be first day of the month we are interested in. Same is true for endDate.

startDate <- ceiling_date(min(myData$ordered),"day")

endDate <- ceiling_date(max(myData$ordered),"day")

days <- seq (from = startDate, to = endDate, by = 'days')

#when we plot stastics, we want them at mid day

middays <- days + hours(12)

tenth<-vector()

fiftieth<-vector()

ninetieth<-vector()

for ( i in seq_along(days) ) {

daysData<-subset(myData,myData$ordered >= days[i]&myData$ordered<days[i+1])

tenth[i]<-quantile(daysData$otf,probs = 0.10)

fiftieth[i]<-median(daysData$otf)

ninetieth[i]<-quantile(daysData$otf,probs = 0.90)

}

quantileData<-data.frame(middays,tenth,fiftieth,ninetieth)

plot(myData$ordered,myData$otf, pch = 19, main = "Hemoglobin TAT",xlab = "Date of Analysis", ylab = "TAT (min)", ylim = c(0,200), col = "#00000020",cex = 0.5)

lines(quantileData$middays,quantileData$tenth, col = "red")

lines(quantileData$middays,quantileData$ninetieth, col = "red")

lines(quantileData$middays,quantileData$fiftieth, col = "blue")

But this is not all that easy to look at. First, it’s kind-of ugly and second, if we find a problem date, we can’t read it from the figure. So let’s start by fixing the x-axis labels:

plot(myData$ordered,myData$otf, pch = 19, main = "Hemoglobin TAT", xlab = "", ylab = "TAT (min)", ylim = c(0,200), col = "#00000020",cex = 0.4,xaxt = "n")
#don't plot first day of next month on axis
axis.POSIXct(side = 1, quantileData$middays[1:length(quantileData$middays)-1],at = quantileData$middays[1:length(quantileData$middays)-1], las = 2, cex.axis = 0.6, col.axis = "gray30", format = "%b %d %Y")
#allows me to move the xlab down manually so as not to overwrite the dates.
mtext("Date of analysis", side = 1, line = 4)

plot(myData$ordered,myData$otf, pch = 19, main = "Hemoglobin TAT", xlab = "", ylab = "TAT (min)", ylim = c(0,200), col = "#00000020",cex = 0.4,xaxt = "n")

#don't plot first day of next month on axis

axis.POSIXct(side = 1, quantileData$middays[1:length(quantileData$middays)-1],at = quantileData$middays[1:length(quantileData$middays)-1], las = 2, cex.axis = 0.6, col.axis = "gray30", format = "%b %d %Y")

#allows me to move the xlab down manually so as not to overwrite the dates.

mtext("Date of analysis", side = 1, line = 4)

To paint the central 80% as a band, we will need to use the polygon() function. I am going to write a function to which and x-vector and two y-vectors is supplied which then fills the area between then with a supplied color. Naturally, the three vectors must have the same length.

#colours in the space between two curves.
fillitin<-function(x,ymin,ymax,colour){
  for (i in 1:length(x)){
    #define the x coordinates of the vertices of the polygon
    xvert<-c(x[i],x[i],x[i+1],x[i+1])
    #define the y coordinates of the vertices of the polygon
    yvert<-c(ymin[i],ymax[i],ymax[i+1],ymin[i+1])
    polygon(xvert,yvert, col = colour, border = NA)
  }
}

#now add these effects to the existing figure
fillitin(middays,tenth,ninetieth,"#FF000020")
lines(quantileData$middays,quantileData$fiftieth, col = "blue")
lines(quantileData$middays,quantileData$tenth, col = "red")
lines(quantileData$middays,quantileData$ninetieth, col = "red")

#colours in the space between two curves.

fillitin<-function(x,ymin,ymax,colour){

for (i in 1:length(x)){

#define the x coordinates of the vertices of the polygon

xvert<-c(x[i],x[i],x[i+1],x[i+1])

#define the y coordinates of the vertices of the polygon

yvert<-c(ymin[i],ymax[i],ymax[i+1],ymin[i+1])

polygon(xvert,yvert, col = colour, border = NA)

}

#now add these effects to the existing figure

fillitin(middays,tenth,ninetieth,"#FF000020")

lines(quantileData$middays,quantileData$fiftieth, col = "blue")

lines(quantileData$middays,quantileData$tenth, col = "red")

lines(quantileData$middays,quantileData$ninetieth, col = "red")

Final Product

Now we should just finish it off with a legend.

legend("topright",c("Median","Central 80%"),lty = c(1,1),col = c("blue","red"),inset = .05)

1	legend("topright",c("Median","Central 80%"),lty = c(1,1),col = c("blue","red"),inset = .05)

And that is a little more informative. There are many features you could add from this point – like smoothing, statistical analysis, outlier report. You could also loop over different tests, examine both the preanalytical and analytical processes at different locations, and produce a pdf report using MarkDown for all the institutions you look after.

-Dan

The Lab-R-torian

Tag: medicine

Unit Converter

Introduction

Simple unit conversion

Getting started

And now for some regular expressions

Paste, briefly

‘Apply’-ing yourself

A Closer Look at TAT Time Dependence

The Problem

Loading the Data

Sanity Check

Some Nutty Stuff

Time Dependence

Tunnelling Down

Some More Lubridate Magic

Monday Monday, So Good to Me?

Removing the For–ness

Smoothing

Lowess Smoothing

Observations

The lot is cast into the lap, but its every decision is from the LORD.

Proverbs 16:33

Generating Meaningful Turaround Time Plots for Clinical Laboratory Medicine

The Problem

Look at the Data

Basic Data Preparation

Sanity Check

Scatterplot

Beautifying

Final Product

“The LORD detests dishonest scales, but accurate weights find favor with him.”

Proverbs 11:1