Quality Control – The Lab-R-torian

Calculate all the CVs of all the QC Levels of all the Methods of all the Instruments at all the Sites all at once … with Sunquest LIS and dplyr

February 4, 2020November 24, 2020 dtholmes@mail.ubc.ca

Background

As part of our lab accreditation requirements, we have to provide measurement uncertianty estimates for all tests at all hospital sites. As you might imagine, with thousands of testcodes in Sunquest LIS, getting all the coefficients of variation (CVs) represents a daunting task for the quality technologist to accomplish. As it turns out, by capturing the ssh session in a .txt file, you can use R’s dplyr package to do this all in few lines of code.

Getting the Raw Data

You need to get the raw data from Sunquest. You can capture the telnet (yes… older versions of Sunquest use telnet and pass patient information and user passwords unencrypted across the hospital network o_O) or the ssh session to a file using the Esker SmarTerm which Sunquest packages in their product and refers to as “roll-n-scroll”. People disparriage SmarTerm as an old “dos tool”–whereas Sunquest is hosted on AIX operating system. SmarTerm access to Sunquest is a gagillion times faster than the GUI and permits us to capture the raw QC data we need. To capture the session select from the dropdown menu as shown here:

SQ Screenshot1

If you are using Mac OS or Linux OS, you can also capture the ssh session by connecting from the terminal and using tee to dump the session to a file.

ssh user@serverIPaddress | tee captured_session.txt

Once you have connected, use the QC function and select output printer 0 (meaning the screen) and make these selections, changing the dates as appropriate:

SQ Screenshot1

If you make no selections at all for any of:

TEST:
WORKSHEET:
METHOD:
CONTROL:
SHIFT #:
TECH:
TESTS REQUESTED:

then you will extract everything, which is what you want and which will make for a very big .txt file. There will be a delay and then thousands of QC results will dump to the screen and to your file. When this is complete, end your SmarTerm or ssh or telnet (cringe) session. I saved my text dump as raw_SQ8.txt.

Getting it intro R and parsing it

Your data will come out as a fixed with file with no delimiters. It will also have a bunch of junk at the bottom and top of the file detailing your commands from the start and end of the session. These need to be discarded. I just used grep() to find all the lines with the appropriate date pattern. After reading it in, because I am lazy, I wrote it back out and read it in again with read.fwf()

library(tidyverse)
library(lubridate)
library(knitr)

# Note to my friend SK - yes... this is mostly in base-R... 

# create a connection
con < file(file.path("raw_SQ8.txt"))
raw.qc.data <- readLines(con)
close(con)
#find good rows
good.data <- grep("[0-9]{2}(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[2][0][0-9]{6}",raw.qc.data)
raw.qc.data <- raw.qc.data[good.data]
#remove a screwball encoding character
raw.qc.data[1] <- substr(raw.qc.data[1],6,nchar(raw.qc.data[1]))
con <- file("temp.txt")
#rewrite the file with no garbage in it.
writeLines(raw.qc.data, con)
close(con)
raw.qc.data <- read.fwf("temp.txt",c(6,6,6,20,13,6,2,15,100))
file.remove("temp.txt")
names(raw.qc.data) <- c("test.code","instr.code","qc.name","qc.expire",
                        "date.performed","tech.code","shift",
                        "result","modifier")
raw.qc.data <- data.frame(lapply(raw.qc.data, trimws))
raw.qc.data$result <- as.numeric(as.character(raw.qc.data$result))
raw.qc.data$date.performed <- dmy_hm(raw.qc.data$date.performed)
raw.qc.data$tech.code <- as.numeric(raw.qc.data$tech.code) #anonymize tech codes
raw.qc.data <- arrange(raw.qc.data, instr.code, test.code)

library(tidyverse)

library(lubridate)

library(knitr)

# Note to my friend SK - yes... this is mostly in base-R...

# create a connection

con < file(file.path("raw_SQ8.txt"))

raw.qc.data <- readLines(con)

close(con)

#find good rows

good.data <- grep("[0-9]{2}(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[2][0][0-9]{6}",raw.qc.data)

raw.qc.data <- raw.qc.data[good.data]

#remove a screwball encoding character

raw.qc.data[1] <- substr(raw.qc.data[1],6,nchar(raw.qc.data[1]))

con <- file("temp.txt")

#rewrite the file with no garbage in it.

writeLines(raw.qc.data, con)

close(con)

raw.qc.data <- read.fwf("temp.txt",c(6,6,6,20,13,6,2,15,100))

file.remove("temp.txt")

names(raw.qc.data) <- c("test.code","instr.code","qc.name","qc.expire",

"date.performed","tech.code","shift",

"result","modifier")

raw.qc.data <- data.frame(lapply(raw.qc.data, trimws))

raw.qc.data$result <- as.numeric(as.character(raw.qc.data$result))

raw.qc.data$date.performed <- dmy_hm(raw.qc.data$date.performed)

raw.qc.data$tech.code <- as.numeric(raw.qc.data$tech.code) #anonymize tech codes

raw.qc.data <- arrange(raw.qc.data, instr.code, test.code)

Now that all the data munging is done, we can examine the data:

test.code	instr.code	qc.name	qc.expire	date.performed	tech.code	shift	result	modifier
BCL	JBGAS	RAD1	R0173 EXP MAR 2021	2019-11-15 09:17:00	68	2	122	NA
BCL	JBGAS	RAD1	R0173 EXP MAR 2021	2019-11-15 20:51:00	68	3	122	NA
BCL	JBGAS	RAD1	R0173 EXP MAR 2021	2019-11-15 21:47:00	68	3	122	NA
BCL	JBGAS	RAD1	R0173 EXP MAR 2021	2019-11-15 21:50:00	68	3	122	NA
BCL	JBGAS	RAD1	R0173 EXP MAR 2021	2019-11-17 07:10:00	15	1	122	NA
BCL	JBGAS	RAD1	R0173 EXP MAR 2021	2019-11-17 07:11:00	15	1	122	NA

And finally, we can make the dplyr magic happen and discard results for which the counts are too small, which I have chosen to be <20:

raw.qc.data %>% dplyr::filter(!is.na(result)) %>%
  group_by(instr.code,test.code,qc.name,qc.expire) %>%
  summarise(median = median(result),
            IQR = IQR(result),
            mean = mean(result),
            SD = sd(result),
            min = min(result),
            max = max(result),
            CV = round(sd(result, na.rm = TRUE)/mean(result, na.rm = TRUE)*100,2),
            count = n()) %>%
  filter(count ≥ 20) %>%
  arrange(instr.code, test.code, median) -> summary.table

raw.qc.data %>% dplyr::filter(!is.na(result)) %>%

group_by(instr.code,test.code,qc.name,qc.expire) %>%

summarise(median = median(result),

IQR = IQR(result),

mean = mean(result),

SD = sd(result),

min = min(result),

max = max(result),

CV = round(sd(result, na.rm = TRUE)/mean(result, na.rm = TRUE)*100,2),

count = n()) %>%

filter(count ≥ 20) %>%

arrange(instr.code, test.code, median) -> summary.table

Which gives us output like this:

head(summary.table)

1 2	head(summary.table)

instr.code	test.code	qc.name	qc.expire	median	IQR	mean	SD	min	max	CV	count
JBGAS	BCL	RAD3	R0141 EXP SEP 2017	65.0	1.000	65.145454	0.6503043	63.0	66.0	1.00	55
JBGAS	BCL	RAD2	R0175 EXP MAR 2021	97.0	0.000	97.128205	0.3364820	97.0	98.0	0.35	78
JBGAS	BCL	RAD1	R0173 EXP MAR 2021	122.0	0.000	122.122807	0.5691527	121.0	124.0	0.47	57
JBGAS	BGLUC	RAD1	R0173 EXP MAR 2021	1.5	0.000	1.507017	0.0257713	1.5	1.6	1.71	57
JBGAS	BGLUC	RAD2	R0175 EXP MAR 2021	5.6	0.075	5.585897	0.0639081	5.4	5.7	1.14	78
JBGAS	BGLUC	RAD3	R0141 EXP SEP 2017	13.7	0.100	13.763636	0.1310409	13.4	14.1	0.95	55

This permits us to toss out results with low counts. But what about handling outliers? Well, we can calculate the z-scores of the raw data by joining the the mean and SD results back to the raw data.

raw.qc.data %>%
  left_join(select(summary.table,c(instr.code:qc.expire, mean, SD)),
             by = c("test.code","instr.code", "qc.name", "qc.expire")) %>%
  mutate(z.score = (result - mean)/SD) -> raw.qc.data

raw.qc.data %>%

left_join(select(summary.table,c(instr.code:qc.expire, mean, SD)),

by = c("test.code","instr.code", "qc.name", "qc.expire")) %>%

mutate(z.score = (result - mean)/SD) -> raw.qc.data

This will permit you to suppress results outside a certain z-score. So, let’s suppress all results with an undefined z-score and all results with a z-score >= 4:

raw.qc.data %>%
  drop_na(z.score) %>%
  filter(abs(z.score) < 4) -> raw.qc.data

raw.qc.data %>%

drop_na(z.score) %>%

filter(abs(z.score) < 4) -> raw.qc.data

Now , we can re-run the dplyr summary:

raw.qc.data %>% dplyr::filter(!is.na(result)) %>%
  group_by(instr.code,test.code,qc.name,qc.expire) %>%
  summarise(median = median(result),
            IQR = IQR(result),
            mean = mean(result),
            SD = sd(result),
            min = min(result),
            max = max(result),
            CV = round(sd(result, na.rm = TRUE)/mean(result, na.rm = TRUE)*100,2),
            count = n()) %>%
  filter(count ≥ 20) %>%
  arrange(instr.code, test.code, median) -> summary.table.no.outliers

raw.qc.data %>% dplyr::filter(!is.na(result)) %>%

group_by(instr.code,test.code,qc.name,qc.expire) %>%

summarise(median = median(result),

IQR = IQR(result),

mean = mean(result),

SD = sd(result),

min = min(result),

max = max(result),

CV = round(sd(result, na.rm = TRUE)/mean(result, na.rm = TRUE)*100,2),

count = n()) %>%

filter(count ≥ 20) %>%

arrange(instr.code, test.code, median) -> summary.table.no.outliers

And now we have a summary of every QC CV in our Sunquest system with outliers suppressed:

head(summary.table.no.outliers)

1 2	head(summary.table.no.outliers)

instr.code	test.code	qc.name	qc.expire	median	IQR	mean	SD	min	max	CV	count
JBGAS	BCL	RAD3	R0141 EXP SEP 2017	65.0	1.000	65.145454	0.6503043	63.0	66.0	1.00	55
JBGAS	BCL	RAD2	R0175 EXP MAR 2021	97.0	0.000	97.128205	0.3364820	97.0	98.0	0.35	78
JBGAS	BCL	RAD1	R0173 EXP MAR 2021	122.0	0.000	122.122807	0.5691527	121.0	124.0	0.47	57
JBGAS	BGLUC	RAD1	R0173 EXP MAR 2021	1.5	0.000	1.507017	0.0257713	1.5	1.6	1.71	57
JBGAS	BGLUC	RAD2	R0175 EXP MAR 2021	5.6	0.075	5.585897	0.0639081	5.4	5.7	1.14	78
JBGAS	BGLUC	RAD3	R0141 EXP SEP 2017	13.7	0.100	13.763636	0.1310409	13.4	14.1	0.95	55

And there we have it:

SQ Screenshot1

Now I can write the output file

write_csv(summary.table.no.outliers, "QC_summary.csv")

1 2	write_csv(summary.table.no.outliers, "QC_summary.csv")

With dplyr, if you direct your energies to the right place, you reap much. Similarly:

“But seek ye first the kingdom of God, and his righteousness; and all these things shall be added unto you.”

Matthew 6:33

Parse an Online Table into an R Dataframe – Westgard’s Biological Variation Database

August 14, 2017August 14, 2017 dtholmes@mail.ubc.ca

Background

From time to time I have wanted to bring an online table into an R dataframe. While in principle, the data can be cut and paste into Excel, sometimes the table is very large and sometimes the columns get goofed up in the process. Fortunately, there are a number of R tools for accomplishing this. I am just going to show one approach using the rvest package. The rvest package also makes it possible to interact with forms on webpages to request specific material which can then be scraped. I think you will see the potential if you look here.

In our (simple) case, we will apply this process to Westgard's desirable assay specifications as shown on his website. The goal is to parse out the biological variation tables, get them into a dataframe and the write to csv or xlsx.

Reading in the Data

The first thing to do is to load the rvest and httr packages and define an html session with the html_session() function.

library(rvest)
library(httr)
wg <- html_session("https://www.westgard.com/biodatabase1.htm", user_agent("LabRtorian"))

library(rvest)

library(httr)

wg <- html_session("https://www.westgard.com/biodatabase1.htm", user_agent("LabRtorian"))

Now looking at the webpage, you can see that there are 8 columns in the tables of interest. So, we will define an empty dataframe with 8 columns.

#define empty table to hold all the content
biotable = data.frame(matrix(NA,0, 8))

#define empty table to hold all the content

biotable = data.frame(matrix(NA,0, 8))

We need to know which part of the document to scrape. This is a little obscure, but following the instructions in this post, we can determine that the xpaths we need are:

/html/body/div[1]/div[3]/div/main/article/div/table[1]

/html/body/div[1]/div[3]/div/main/article/div/table[2]

/html/body/div[1]/div[3]/div/main/article/div/table[3]

…

etc.

There are 8 such tables in the whole webpage. We can define a character vector for these as such:

xpaths <- paste0("/html/body/div[1]/div[3]/div/main/article/div/table[", 1:8, "]")

1 2	xpaths <- paste0("/html/body/div[1]/div[3]/div/main/article/div/table[", 1:8, "]")

Now we make a loop to scrape the 8 tables and with each iteration of the loop, append the scraped subtable to the main dataframe called biotable using the rbind() function. We have to use the parameter fill = TRUE in the html_table() function because the table does not happen to always a uniform number of columns.

for (j in 1:8){                
  subtable <- wg %>%
  read_html() %>%
  html_nodes(xpath =  xpaths[j]) %>%
  html_table(., fill = TRUE) 
  subtable <- subtable[[1]]
  biotable <- rbind(biotable,subtable)
}

for (j in 1:8){

subtable <- wg %>%

read_html() %>%

html_nodes(xpath = xpaths[j]) %>%

html_table(., fill = TRUE)

subtable <- subtable[[1]]

biotable <- rbind(biotable,subtable)

}

Clean Up

Now that we have the raw data out, we can have a quick look at it:

X1	X2	X3	X4	X5	X6	X7	X8
	Analyte	Number of Papers	Biological Variation	Biological Variation	Desirable specification	Desirable specification	Desirable specification
	Analyte	Number of Papers	CVI	CVg	I(%)	B(%)	TE(%)
S-	11-Desoxycortisol	2	21.3	31.5	10.7	9.5	27.1
S-	17-Hydroxyprogesterone	2	19.6	50.4	9.8	13.5	29.7
U-	4-hydroxy-3-methoximandelate (VMA)	1	22.2	47.0	11.1	13.0	31.3
S-	5' Nucleotidase	2	23.2	19.9	11.6	7.6	26.8
U-	5'-Hydroxyindolacetate, concentration	1	20.3	33.2	10.2	9.7	26.5
S-	α1-Acid Glycoprotein	3	11.3	24.9	5.7	6.8	16.2
S-	α1-Antichymotrypsin	1	13.5	18.3	6.8	5.7	16.8
S-	α1-Antitrypsin	3	5.9	16.3	3.0	4.3	9.2

We can see that we need define column names and we need to get rid of some rows containing extraneous column header information. There are actually 8 such sets of headers to remove.

table.header <- c("Sample", "Analyte" ,"NumPapers", "CVI", "CVG", "I", "B","TE")
names(biotable) <- table.header

table.header <- c("Sample", "Analyte" ,"NumPapers", "CVI", "CVG", "I", "B","TE")

names(biotable) <- table.header

Let's now find rows we don't want and remove them.

for.removal <- grep("Analyte", biotable$Analyte)
biotable <- biotable[-for.removal,]

for.removal <- grep("Analyte", biotable$Analyte)

biotable <- biotable[-for.removal,]

You will find that the table has missing data which is written as “- – -”. This should be now replaced by NA and the column names should be assigned to sequential integers. Also, we will remove all the minus signs after the specimen type. I'm not sure what they add.

biotable[biotable == "---"] <- NA
row.names(biotable) <- 1:nrow(biotable)
biotable$Sample <- gsub("-", "", biotable$Sample, fixed = TRUE)

biotable[biotable == "---"] <- NA

row.names(biotable) <- 1:nrow(biotable)

biotable$Sample <- gsub("-", "", biotable$Sample, fixed = TRUE)

Check it Out

Just having another look at the first 10 rows:

Sample	Analyte	NumPapers	CVI	CVG	I	B	TE
S	11-Desoxycortisol	2	21.3	31.5	10.7	9.5	27.1
S	17-Hydroxyprogesterone	2	19.6	50.4	9.8	13.5	29.7
U	4-hydroxy-3-methoximandelate (VMA)	1	22.2	47.0	11.1	13.0	31.3
S	5' Nucleotidase	2	23.2	19.9	11.6	7.6	26.8
U	5'-Hydroxyindolacetate, concentration	1	20.3	33.2	10.2	9.7	26.5
S	α1-Acid Glycoprotein	3	11.3	24.9	5.7	6.8	16.2
S	α1-Antichymotrypsin	1	13.5	18.3	6.8	5.7	16.8
S	α1-Antitrypsin	3	5.9	16.3	3.0	4.3	9.2
S	α1-Globulins	2	11.4	22.6	5.7	6.3	15.7
U	α1-Microglobulin, concentration, first morning	1	33.0	58.0	16.5	16.7	43.9

Now examining the structure:

str(biotable)

1 2	str(biotable)

## 'data.frame':    370 obs. of  8 variables:
##  $ Sample   : chr  "S" "S" "U" "S" ...
##  $ Analyte  : chr  "11-Desoxycortisol" "17-Hydroxyprogesterone" "4-hydroxy-3-methoximandelate (VMA)" "5' Nucleotidase" ...
##  $ NumPapers: chr  "2" "2" "1" "2" ...
##  $ CVI      : chr  "21.3" "19.6" "22.2" "23.2" ...
##  $ CVG      : chr  "31.5" "50.4" "47.0" "19.9" ...
##  $ I        : chr  "10.7" "9.8" "11.1" "11.6" ...
##  $ B        : chr  "9.5" "13.5" "13.0" "7.6" ...
##  $ TE       : chr  "27.1" "29.7" "31.3" "26.8" ...

## 'data.frame': 370 obs. of 8 variables:

## $ Sample : chr "S" "S" "U" "S" ...

## $ Analyte : chr "11-Desoxycortisol" "17-Hydroxyprogesterone" "4-hydroxy-3-methoximandelate (VMA)" "5' Nucleotidase" ...

## $ NumPapers: chr "2" "2" "1" "2" ...

## $ CVI : chr "21.3" "19.6" "22.2" "23.2" ...

## $ CVG : chr "31.5" "50.4" "47.0" "19.9" ...

## $ I : chr "10.7" "9.8" "11.1" "11.6" ...

## $ B : chr "9.5" "13.5" "13.0" "7.6" ...

## $ TE : chr "27.1" "29.7" "31.3" "26.8" ...

It's kind-of undesirable to have numbers as characters so…

#convert appropriate columns to numeric
biotable[,3:8] <- lapply(biotable[3:8], as.numeric)

#convert appropriate columns to numeric

biotable[,3:8] <- lapply(biotable[3:8], as.numeric)

Write the Data

Using the xlsx package, you can output the table to an Excel file in the current working directory.

library(xlsx)
write.xlsx(biotable,
            file = "Westgard_Biological_Variation.xlsx",
            row.names = FALSE)

library(xlsx)

write.xlsx(biotable,

file = "Westgard_Biological_Variation.xlsx",

row.names = FALSE)

If you are having trouble getting xlsx to install, then just write as csv.

write.csv(biotable,
            file = "Westgard_Biological_Variation.csv",
            row.names = FALSE)

write.csv(biotable,

file = "Westgard_Biological_Variation.csv",

row.names = FALSE)

Conclusion

You can now use the same general approach to parse any table you have web access to, no mater how small or big it is. Here is a complete script in one place:

library(httr)
library(rvest)
library(xlsx)

wg <- html_session("https://www.westgard.com/biodatabase1.htm", user_agent("yournamehere"))
xpaths <- paste0("/html/body/div[1]/div[3]/div/main/article/div/table[", 1:8, "]")

#define empty dataframe
biotable = data.frame(matrix(NA,0, 8))

#loop over the 8 html tables
for (j in 1:8){                
  subtable <- wg %>%
  read_html() %>%
  html_nodes(xpath =  xpaths[j] ) %>%
  html_table(., fill = TRUE) 
  subtable <- subtable[[1]]
  biotable <- rbind(biotable,subtable)
}

table.header <- c("Sample", "Analyte" ,"NumPapers", "CVI", "CVG", "I", "B","TE")
names(biotable) <- table.header

#remove extraneous rows
for.removal <- grep("Analyte", biotable$Analyte)
biotable <- biotable[-for.removal,]

#make missing data into NA
biotable[ biotable == "---" ] <- NA
row.names(biotable) <- 1:nrow(biotable)

#convert appropriate columns to numeric
biotable[,3:8] <- lapply(biotable[3:8], as.numeric)

#get rid of minus signs in column 1
biotable$Sample <- gsub("-", "", biotable$Sample, fixed = TRUE)

write.xlsx(biotable,
            file = "Westgard_Biological_Variation.xlsx",
            row.names = FALSE)

write.csv(biotable,
            file = "Westgard_Biological_Variation.csv",
            row.names = FALSE)

library(httr)

library(rvest)

library(xlsx)

wg <- html_session("https://www.westgard.com/biodatabase1.htm", user_agent("yournamehere"))

xpaths <- paste0("/html/body/div[1]/div[3]/div/main/article/div/table[", 1:8, "]")

#define empty dataframe

biotable = data.frame(matrix(NA,0, 8))

#loop over the 8 html tables

for (j in 1:8){

subtable <- wg %>%

read_html() %>%

html_nodes(xpath = xpaths[j] ) %>%

html_table(., fill = TRUE)

subtable <- subtable[[1]]

biotable <- rbind(biotable,subtable)

}

table.header <- c("Sample", "Analyte" ,"NumPapers", "CVI", "CVG", "I", "B","TE")

names(biotable) <- table.header

#remove extraneous rows

for.removal <- grep("Analyte", biotable$Analyte)

biotable <- biotable[-for.removal,]

#make missing data into NA

biotable[ biotable == "---" ] <- NA

row.names(biotable) <- 1:nrow(biotable)

#convert appropriate columns to numeric

biotable[,3:8] <- lapply(biotable[3:8], as.numeric)

#get rid of minus signs in column 1

biotable$Sample <- gsub("-", "", biotable$Sample, fixed = TRUE)

write.xlsx(biotable,

file = "Westgard_Biological_Variation.xlsx",

row.names = FALSE)

write.csv(biotable,

file = "Westgard_Biological_Variation.csv",

row.names = FALSE)

Parting Thought on Tables

You prepare a table before me in the presence of my enemies. You anoint my head with oil; my cup overflows.

(Psalm 23:5)

Determine the CV of a Calculated Lab Reportable – Bioavailable Testosterone

August 7, 2017August 7, 2017 dtholmes@mail.ubc.ca

Background

At the AACC meeting last week, some of my friends were bugging me that I had not made a blog post in 10 months. Without getting into it too much, let's just say I can blame Cerner. Thanks also to a prod from a friend, here is an approach to a fairly common problem.

We all report calculated quantities out of our laboratories–quantities such as LDL cholesterol, non-HDL cholesterol, aldosterone:renin ratio, free testosterone, eGFR etc. How does one determine the precision (i.e. imprecision) of a calculated quantity. While earlier in my life, I might go to the trouble of trying to do such calculations analytically using the rules of error propagation, in my later years, I am more pragmatic and I'm happy to use a computational approach.

In this example, we will model the precision in calculated bioavailable testosterone (CBAT). Without explanation, I provide an R function for CBAT (and free testosterone) where testosterone is reported in nmol/L, sex hormone binding globulin (SHBG) is reported in nmol/L, and albumin is reported in g/L. Using the Vermeulen Equation as discussed in this publication, you can calculate CBAT as follows:

cbat <- function(TT,SHBG,ALB = 43){
    Kalb <- 3.6*10^4
    Kshbg <- 10^9
    N <- 1 + Kalb*ALB/69000
    a <- N*Kshbg
    b <- N + Kshbg*(SHBG - TT)/10^9
    c <- -TT/10^9
    FT <- (-b + sqrt(b^2 - 4*a*c))/(2*a)*10^9
    cbat <- N*FT
    return(list(free.T = FT, cbat = cbat))
}

cbat <- function(TT,SHBG,ALB = 43){

Kalb <- 3.6*10^4

Kshbg <- 10^9

N <- 1 + Kalb*ALB/69000

a <- N*Kshbg

b <- N + Kshbg*(SHBG - TT)/10^9

c <- -TT/10^9

FT <- (-b + sqrt(b^2 - 4*a*c))/(2*a)*10^9

cbat <- N*FT

return(list(free.T = FT, cbat = cbat))

}

To sanity-check this, we can use this online calculator. Taking a typical male testosterone of 20 nmol/L, an SHBG of 50 nmol/L and an albumin of 43 g/L, we get the following:

cbat(20,50)

1 2	cbat(20,50)

## $free.T
## [1] 0.3273049
## 
## $cbat
## [1] 7.670319

## $free.T

## [1] 0.3273049

## $cbat

## [1] 7.670319

which is confirmed by the online calculator. Because the function is vectorized, we an submit a vector of testosterone results and SHBG results and get a vector of CBAT results.

cbat(c(10,20,30), c(40,50,60))

1 2	cbat(c(10,20,30), c(40,50,60))

## $free.T
## [1] 0.1738837 0.3273049 0.4661380
## 
## $cbat
## [1]  4.074926  7.670319 10.923842

## $free.T

## [1] 0.1738837 0.3273049 0.4661380

## $cbat

## [1] 4.074926 7.670319 10.923842

Precision of Components

We now need some precision data for the three components. However, in our lab, we just substitute 43 g/L for the albumin, so we will leave that term out of the analysis and limit our precision calculation to testosterone and SHBG. This will allow us to present the precision as surface plots as a function of total testosterone and SHBG.

We do testosterone by LC-MS/MS using Deborah French's method. In the last three months, the precision has been 3.9% at 0.78 nmol/L, 5.5% at 6.7 nmol/L, 5.2% at 18.0 nmol/L, and 6.0% at 28.2 nmol/L. We are using the Roche Cobas e601 SHBG method which, according to the package insert, has precision of 1.8% at 14.9 nmol/L, 2.1 % at 45.7 nmol/L, and 4.0% at 219 nmol/L.

cv.tt <- c(3.9, 5.5, 5.2, 6.0)
conc.tt <- c(0.78, 6.7, 18.0, 28.2)
tt.df <- data.frame(conc.tt,cv.tt)

plot(cv.tt ~ conc.tt, data = tt.df,
                    main = "Precision Profile of Testosterone",
                    xlab = "Testosterone (nmol/L)",
                    ylab = "CV Testosterone (%)",
                    ylim = c(0,8),
                    type = "o")

cv.tt <- c(3.9, 5.5, 5.2, 6.0)

conc.tt <- c(0.78, 6.7, 18.0, 28.2)

tt.df <- data.frame(conc.tt,cv.tt)

plot(cv.tt ~ conc.tt, data = tt.df,

main = "Precision Profile of Testosterone",

xlab = "Testosterone (nmol/L)",

ylab = "CV Testosterone (%)",

ylim = c(0,8),

type = "o")

plot of chunk unnamed-chunk-4

cv.shbg <- c(1.8, 2.1, 4.0)
conc.shbg <- c(14.9,45.7,219)
shbg.df <- data.frame(cv.shbg, conc.shbg)
plot(cv.shbg ~ conc.shbg, data = shbg.df,
                    main = "Precision Profile of SHBG",
                    xlab = "SHBG (nmol/L)",
                    ylab = "CV SHGB (%)",
                    ylim = c(0,5),
                    type = "o")

cv.shbg <- c(1.8, 2.1, 4.0)

conc.shbg <- c(14.9,45.7,219)

shbg.df <- data.frame(cv.shbg, conc.shbg)

plot(cv.shbg ~ conc.shbg, data = shbg.df,

main = "Precision Profile of SHBG",

xlab = "SHBG (nmol/L)",

ylab = "CV SHGB (%)",

ylim = c(0,5),

type = "o")

plot of chunk unnamed-chunk-4

Build Approximation Functions

We will want to generate linear interpolations of these precision profiles. Generally, we might watnt to use non-linear regression to do this but I will just linearly interpolate with the approxfun() function. This will allow us to just call a function to get the approximate CV at concentrations other than those for which we have data.

tt.fun <- approxfun(x = tt.df$conc.tt, y = tt.df$cv.tt)
shbg.fun <- approxfun(x = shbg.df$conc.shbg, y = shbg.df$cv.shbg)

tt.fun <- approxfun(x = tt.df$conc.tt, y = tt.df$cv.tt)

shbg.fun <- approxfun(x = shbg.df$conc.shbg, y = shbg.df$cv.shbg)

Now, if we want to know the precision of SHBG at, say, 100 nmol/L, we can just write,

shbg.fun(100)

1 2	shbg.fun(100)

## [1] 2.695326

1	## [1] 2.695326

to obtain our precision result.

Random Simulation

Now let's build a grid of SHBG and total testosterone (TT) values at which we will calculate the precision for CBAT.

shbg <- seq(from = 15, to = 200, by = 5)
tt <- seq(from = 1, to = 28, by = 1)

shbg <- seq(from = 15, to = 200, by = 5)

tt <- seq(from = 1, to = 28, by = 1)

At each point on the grid, we will have to generate, say, 100000 random TT values and 100000 random SHBG values with the appropriate precision and then calculate the expected precision of CBAT at those concentrations.

Let's do this for a single pair of concentrations by way of example modelling the random analytical error as Gaussian using the rnorm() function.

# [SHBG] = 15 nmol/L
# [TT] = 5.0 nmol/L
set.seed(100) #just to get consistent results
rng.tt <- rnorm(100000, mean = 5.0, sd = tt.fun(5.0)/100*5.0)
rng.shbg <- rnorm(100000, mean = 15, sd = shbg.fun(15)/100*15)
rng.cbat <- cbat(rng.tt, rng.shbg)
cv.cbat <- sd(rng.cbat$cbat)/mean(rng.cbat$cbat)*100
cv.cbat

# [SHBG] = 15 nmol/L

# [TT] = 5.0 nmol/L

set.seed(100) #just to get consistent results

rng.tt <- rnorm(100000, mean = 5.0, sd = tt.fun(5.0)/100*5.0)

rng.shbg <- rnorm(100000, mean = 15, sd = shbg.fun(15)/100*15)

rng.cbat <- cbat(rng.tt, rng.shbg)

cv.cbat <- sd(rng.cbat$cbat)/mean(rng.cbat$cbat)*100

cv.cbat

## [1] 5.30598

1	## [1] 5.30598

So, we can build the process of calculating the CV of CBAT into a function as follows:

cbat.cv <- function(TT, SHBG, N = 100000){
  rng.tt <- rnorm(N, mean = TT, sd = tt.fun(TT)/100*TT)
  rng.shbg <- rnorm(N, mean = SHBG, sd = shbg.fun(SHBG)/100*SHBG)
  rng.cbat <- cbat(rng.tt, rng.shbg)
  cv <- sd(rng.cbat$cbat)/mean(rng.cbat$cbat)*100
  return(cv)
}

cbat.cv <- function(TT, SHBG, N = 100000){

rng.tt <- rnorm(N, mean = TT, sd = tt.fun(TT)/100*TT)

rng.shbg <- rnorm(N, mean = SHBG, sd = shbg.fun(SHBG)/100*SHBG)

rng.cbat <- cbat(rng.tt, rng.shbg)

cv <- sd(rng.cbat$cbat)/mean(rng.cbat$cbat)*100

return(cv)

}

Now, we can make a matrix of the data for presenting a plot, calculating the CV and appending it to the dataframe.

cv.grid <- expand.grid(tt, shbg)
names(cv.grid) <- c("tt", "shbg")
cv.grid$cv.cbat <- mapply(cbat.cv, cv.grid$tt, cv.grid$shbg)

cv.grid <- expand.grid(tt, shbg)

names(cv.grid) <- c("tt", "shbg")

cv.grid$cv.cbat <- mapply(cbat.cv, cv.grid$tt, cv.grid$shbg)

Now make plot using the wireframe() function.

library(lattice)
wireframe(cv.cbat ~ tt*shbg, data = cv.grid,
          xlab = "Testo \n (nmol/L)",
          ylab = "SHBG \n (nmol/L)",
          zlab = "CV \n (%)",
          drape = TRUE,
          colorkey = TRUE,
          col.regions = colorRampPalette(c("blue", "red", "yellow"))(100),
          scales = list(arrows=FALSE,cex=.5,tick.number = 10)
          )

library(lattice)

wireframe(cv.cbat ~ tt*shbg, data = cv.grid,

xlab = "Testo \n (nmol/L)",

ylab = "SHBG \n (nmol/L)",

zlab = "CV \n (%)",

drape = TRUE,

colorkey = TRUE,

col.regions = colorRampPalette(c("blue", "red", "yellow"))(100),

scales = list(arrows=FALSE,cex=.5,tick.number = 10)

)

plot of chunk unnamed-chunk-11

This shows us that the CV of CBAT ranges from about 4–8% over the TT and SHBG ranges we have looked at.

Conclusion

We have determined the CV of calculated bioavailable testosterone using random number simulations using empirical CV data and produced a surface plot of CV. This allows us to comment on the CV of this lab reportable as a function of the two variables by which it is determined.

Parting Thought on Monte Carlo Simulations

The die is cast into the lap, but its every decision is from the LORD.

(Prov 16:33)