OBJECTIVES
- To understand how reference ranges are determined and how to use them to interpret laboratory data
- To understand common statistical terms related to laboratory diagnostics
- To understand how disease prevalence affects predictive values of laboratory tests
- To understand the basic principle of quality control as applied to a diagnostic test
KEY TERMS
Accuracy- closeness of agreement of a measured value and the true value
Analytical variation- observed differences in the value of an analyte after it has been prepared for analysis
Coefficient of variation- a relative measure of precision, determined by dividing the standard deviation by the mean
Intraindividual variation- differences in true value of an analyte within the same individual
Interindividual variation- differences in true value of analyte between individuals
Mean- the arithmetic average
Negative predictive value- the fraction of negative values which are correct; determined by dividing the true negatives by the sum of the true negatives and false negatives
Positive predictive value- the fraction of positive values which are correct; determined by dividing the true positives by the sum of the true positives and false positives
Precision- the closeness of agreement between independent measurements; generally expressed as coefficient of variation or standard deviation
Reference range- the inner 95% of values for a laboratory test as measured in a defined population; the subject population is typically disease free with regards to the test of interest
Sensitivity- the ability of a test to detect a true positive; determined by dividing the true positives by the sum of the true positives and false negatives
Specificity- the ability of a test to detect a true negative; determined by dividing the true negatives by the sum of the true negatives and false positives
Standard deviation- a measure of precision (square root of the variance)
BACKGROUND/SIGNIFICANCE
A basic understanding of statistics is assumed for this chapter. In order to interpret laboratory tests it is essential that pharmacists understand sources of variation, reference ranges, predictive values and how laboratory results are interpreted.
SOURCE OF VARIATION
Proper interpretation of results requires an understanding of the sources of variation which influence laboratory tests.
Analytical variation is produced by conditions which affect the sample and the testing system from the moment the sample is removed from the patient until the final result is generated. (It is helpful to further subdivide this category into preanalytical factors, which include all the things that can happen to a sample as it is collected, transported, processed, and stored, and analytical factors which affect the testing process itself.) All test results are subject to analytical variation. Important interferences with laboratory tests include hemolysis (rupture of RBC into plasma), lipemia (excess lipids in a plasma sample) and icteric (high concentrations of bilirubin).
Table 1. Hemolysis Interferences with Common Lab Tests | |||
---|---|---|---|
Test | Erythrocyte | Plasma | Effect of Hemolysis |
Sodium, mEq/L | 16 | 140 | Lower plasma result |
Potassium, mEq/L | 100 | 4.4 | Raises plasma result |
Transaminase (ALT), U/L | 500 | 25 | Raises plasma result |
Folates, ng/dL | 200-1200 | 2.5-1.5 | Raises plasma result |
Table 2. Examples of Effects of Drugs on Laboratory Tests | ||
---|---|---|
Test | Effect | Mechanism |
Bilirubin | Decrease | Barbiturates induce glucuronyl transferase |
Bilirubin | Increase | Any drug with liver toxicity (e.g., acetaminophen) |
Amylase | Increase | Opiates cause constriction of sphinter of Oddi |
Digoxin | Increase | Quinidine releases digoxin from heart muscle and decreases renal clearance causing substantial increases in serum digoxin |
Intraindividual variation is produced by conditions which cause a single individual’s laboratory values to change at different times of day or under different physiologic conditions. Examples of factors which contribute to intraindividual variation include circadian rhythms, hydration, activity, stress, posture, and food intake. When we use the results of serial testing to follow the course of disease in a patient, it is important to recognize the potential contribution of normal physiologic factors and try to distinguish it from medically important variation.
Interindividual variation reflects the many different factors which cause laboratory test results to vary from one individual to another within a population. Examples of such variables include age, sex, diet, body mass, general activity level, and genetics. The results of a test performed on a group of individuals will reflect analytical variation and intraindividual as well as interindividual variation.
The remainder of this chapter addresses each of these categories in more detail, together with the applicable statistical concepts. Familiarity with the normal distribution (bell-shaped curve, or Gaussian distribution), including the concepts of the mean and standard deviation, is assumed.
ESTABLISHING REFERENCE RANGES
When we establish a reference range, we want to have a tool for comparing the test result from one individual with those from a relatively large number of other members of a similar population. What we want to determine is the expected range of interindividual variation. We already know that the results of a test performed on a group of people will reflect intraindividual and analytical variation as well as interindividual variation; it is obvious that if the contributions from the first two are relatively large, they will obscure the part of the total variation that is due to actual differences among individuals. Part of the process of establishing a reference range is simply taking steps to reduce the magnitude of this obscuring effect.
Define the reference population. Demographically, it should match the population whose laboratory results will be compared to this reference range. Based on what is already known about the analyte, consider whether separate reference ranges should be established for adults versus children, men versus women, and so forth. Profound biochemical changes take place in the period between birth and adulthood, and many of these are reflected by clinical chemistry test values in this age group that differ significantly from those considered normal in adults. The most pronounced and/or accelerated changes are seen in the newborn period and during puberty. Table 3 gives examples of laboratory tests that are affected by age. Some hospitals only give reference ranges for adults, yet report out children’s results (with incorrect reference ranges), thus it is important to know what values change with age.
Table 3. Lab Results that are affected by age | ||
---|---|---|
Lab Tests that are Higher in Newborns and Children | Lab Tests that are Lower in Newborns and Children | |
Alkaline Phosphatase | Bicarbonate | |
Ammonia | Albumin | |
AST | Amylase | |
Bilirubin | Cholesterol | |
Creatine Kinase (CK) | Creatinine | |
Potassium | Cooper | |
Gamma glutamyl transferase (GGT) | Glucose | |
Thyroid stimulating hormone (TSH) | Haptoglobin | |
Thyroxine (T4) | IgA, IgM, IgE Osmolality |
If the distribution is Gaussian, a parametric method may be used. The reference range is defined as the mean plus or minus two standard deviations: Reference Range = X + 2SD .
If the distribution is non-Gaussian, a non-parametric method must be used, but the reference range is still the inner 95% of the defined population.
We have a specific term for the values of healthy individuals which fall outside the limits of the reference range: we call them “false positives” regardless of whether they fall at the high or low end of the range. The terms “positive” and “negative” in this context have nothing to do with high or low numbers, but rather indicate positivity versus negativity for disease.
Figure 1: Graphic Representation of a Reference Range
USING REFERENCE RANGES
Meaningful interpretation of laboratory data requires an understanding of test results to be expected for patients having various diseases and conditions, as well as for healthy individuals. It would be ideal if such reference ranges for disease never had any areas of overlap with the so-called “reference ranges,” but since that is rarely the case, the next section addresses the interpretation of overlapping result distributions.
TAXONOMY OF OVERLAPPING DISTRIBUTIONS
Figure 2 shows a pair of curves, representing hypothetical distributions of test results from two distinct populations, one healthy and one diseased with a region of overlap. The horizontal line represents the continuum of possible result values for the analyte we are measuring. The curve on the left represents the distribution of results from the healthy reference population. The curve on the right represents the distribution of results from a group of people known to have the disease. The vertical line represents the upper limit of the reference range.
All of the results which fall to the left of the vertical line are called negative. All of the results which fall to the right of the line are called positive. Some diseased patients have “negative” results, since part of their distribution falls to the left of the vertical line. We call this group of results “false negatives.” Conversely, when we defined our reference range we already acknowledged that a small group of healthy individuals would have “false positive” results. To complete the taxonomy, we call the results which accurately reflect the status of the individuals from which they came “true positives” and “true negatives,” respectively.
Sensitivity and specificity are performance characteristics of a test. To determine these characteristics, it is necessary to obtain test results on populations in whom the presence or absence of disease has been established by some method independent of this test.
Sensitivity is defined as the proportion of diseased subjects correctly classified by the test; i.e., the ability to detect a true positive in a person afflicted with the disease.
(1)Figure 2: Overlapping Distribution of Test Results for Healthy and Diseased Subjects
Specificity is defined as the proportion of healthy subjects correctly classified; i.e., the ability to exclude a diagnosis in a healthy person.
(2)A convenient format for arranging data is shown in Tables 4 and 5. Note that sensitivity refers only to the diseased population while specificity refers only to the healthy population. The relative sizes of the two populations do not affect sensitivity or specificity.
Table 4. Sensitivity and Specificity | |||
---|---|---|---|
Number of Subjects with Positive Test | Number of Subjects with Negative Test | TOTALS | |
Number of Subjects with Disease | TP | FN | TP + FN |
Number of Subjects without Disease | FP | TN | FP + TN |
TOTALS | TP + FP | FN + TN | TP + FP + FN + TN |
Table 5. Example of Sensitivity and Specificity | |||
---|---|---|---|
Number of Subjects with Positive Test | Number of Subjects with Negative Test | TOTALS | |
Number of Subjects with Disease | 68 | 32 | 100 sensitivity = 68% |
Number of Subjects without Disease | 2 | 98 | 100 specificity = 98% |
TOTALS | 70 | 130 | 200 |
Sensitivity and specificity tell us how well a test performs when run on groups of people in whom we already know the diagnosis. In clinical practice, we do not use tests this way. Often, we are running a test on one patient for whom we have not yet made a diagnosis. What we want to know about the test is the odds that the result will correctly classify our patient with respect to the diagnosis we are considering.
PREDICTIVE VALUES
Predictive values describe the odds that the results of a test will correctly classify an individual with respect to the disease or condition under consideration. To determine predictive values, we need to know the prevalence of the disease in the population we are testing. Prevalence is the fraction of the population which has the disease.
The predictive value of a positive test result is the fraction of positive test results which are correct, or the true positives divided by all the positives, both true and false.
(3)The predictive value of a negative test is the fraction of all negative results which are correct, or the true negatives divided by all the negatives, both true and false.
(4)Using the hypothetical data provided in Table 5, we can calculate the predictive value of a positive result to be 97% and that of a negative result to be 75%.
It is important to recognize the impact of disease prevalence on predictive values. Tables 6 and 7 show two more hypothetical data sets. Both have the same sensitivity and specificity as shown in Table 5, but the prevalence has been decreased, demonstrating the impact on predictive values. In general, as the prevalence of disease increases, the predictive value of a positive test improves. As the prevalence of disease decreases, the predictive value of a negative test improves, and the predictive value of a positive test is diminished by increasing numbers of false positive results.
Table 6. Effect of Low Prevalence | ||||
---|---|---|---|---|
Positive | Negative | Total | ||
Diseased | 68 | 32 | 100 | |
Healthy | 20 | 980 | 1,000 | Sensitivity = 68% |
Total | 88 | 1,012 | 1,100 | Specificity = 98% |
PV+ = 77% | PV- = 97% | Prevalence = $\frac{100}{1100}$ = 9% |
Table 7. Effect of Further Decrease in Prevalence | ||||
---|---|---|---|---|
Positive | Negative | Total | ||
Diseased | 68 | 32 | 100 | |
Healthy | 200 | 9,800 | 10,000 | Sensitivity = 68% |
Total | 268 | 9,832 | 10,100 | Specificity = 98% |
PV+ = 25% | PV- = 99.7% | $Prevalence = \frac{100}{10100} = 1$% |
INTRAINDIVIDUAL VS. ANALYTICAL VARIATION
Up to this point we have focused on the interpretation of individual results relative to group results, or analysis of interindividual variation. We will now focus on determining if therapeutic intervention has changed laboratory values that can be detected analytically.
How do you know if my therapy has changed the patient’s lab values? The way to determine if the patient’s lab values have actually changed is to determine if the difference between the first and subsequent measurement is greater than 3 times the standard deviation of the assay. If the difference between the two measurements is greater than 3 times the standard deviation of the assay then you can be 95% confident that the difference between the two measurements is not due to chance (Reference Kaplin LA, Pesce AJ, and Kazmierczak SC, Clinical Chemistry Theory, Analysis, Correlation, 4th edition, St. Louis: Mosby, 2003, page 385).
Figure 3: Quality Control Chart for Sodium with a Mean of 113.2 and a Standard Deviation of 0.5 mmol/L
EXAMPLE
Example: A patient had a sodium of 120 mmol/L on day 1. After treatment the patient’s sodium increased to 126 mmol/L. Has the patient become less hyponatremic?
In order to answer this question you need to know the analytical variation of the lab’s sodium analysis. This can be determined by calling the lab and asking what the standard deviation of the sodium assay is. The laboratory runs quality control specimens with each batch of samples and will know the analytical variability of the assay in the range of interest. A typical quality control chart is shown in Figure 3. The standard deviation for this sodium control is 0.5 mmol/L. Three times 0.5 mmol/L is 1.5 mmol/L. Since the observed change (6 mmol/L) is greater than 1.5 mmol/L, you can be 95% confident that the patient’s sodium has increased to a degree which can be detected analytically.
The primary reason to run controls is to assess whether the test system is functioning properly and generating reliable test results. When the technologists in the laboratory examine the results obtained on the control sample, they are expecting some variation, and they are trying to distinguish between two possible sources of variation: random analytic variation and systematic error.
Random analytic variation is inevitable and all its points will fall within a Gaussian distribution. Systematic error occurs when some new variable is introduced, such as deterioration of a reagent, clogging of a tube within the instrument, etc. The problem with systematic error is that it is likely to compromise the accuracy of test results.
Each time a technologist sets up an analytical test for patient samples he or she first calibrates the assay with standards containing a known concentration of the analyte. To ensure that the calibration is accurate, the technologist then runs a series of controls. Typically controls are run at three levels, low, normal and elevated concentrations. The technologist then checks the control values to see whether the result falls within the pattern of a Gaussian distribution before analyzing and reporting patient samples. Once the mean and standard deviation have been established for a particular control for a particular analyte, 95% of the control results should fall within ± 2 SD of the mean. About one in 20 results will fall outside these limits, but within 3 SD. If this is just a random event, the next time the control is tested, the result will return to within 2 SD 95% of the time. If it persists outside 2 SD, this is interpreted as most likely a sign of some systematic error, and the technologist proceeds to investigate and take corrective action. Likewise, if any result is more than 3 SD from the mean, it is interpreted as probable systematic error and treated as such.
Clinical laboratory technologists do not issue test results if the control results do not fall within established limits. Their objective is to generate the most accurate results possible given the methodology and instrumentation available. The principles of quality control are a major component of their education and training. As Point-of-Care testing becomes more widespread, it is important for pharmacists and other care providers to understand the need to verify the integrity of the test system before using it.