Worked examples of measurement uncertainty

​Serology, ELISA, CFT and others (National Centre for Disease Investigation, NZ)


Use of ‘control sample’ or ‘top-down’ approach.

A traditional control sample procedure, run many times by all analysts at various times, usually covers all the major sources of uncertainty for the testing within a lab. Sources of uncertainty during laboratory testing include storage conditions (e.g. fridges, freezers), sample effects, reagent quality (e.g. type, origin, batch of test etc.), reference material, volumetric manipulations (e.g. pipetting), environmental conditions (e.g. room temperature), contamination (e.g. haemolysed samples), equipment effects, analyst or operator bias, and other unknown or random effects.

The variation of the control samples can be used as an estimate of those combined sources of uncertainty, which is represented by a test methods precision. Reproducibility and repeatability are measurements of precision: for example, repeatability estimates (intra- and inter-assay variation) are available through continuous monitoring of internal quality control data (e.g. analysis of data using basic statistics, Levy-Jennings charts etc.). This covers the control sample approach for estimating measurement uncertainty (MU) (2). Refer to SOP NCDI-QC-02.

Reproducibility estimates are available through participation in external proficiency testing programs—for example, as organised by the Australian National Quality Assurance Program (ANQAP) (e.g. application of z-score, robust statistics, Youden plots etc.). This covers the method characteristics approach (3) for estimating MU. Refer to SOP NCDI-QC-01.

Serology, ELISA (Australian Animal Health Laboratory)


Use of ‘control sample’ or ‘top-down’ approach using an internal control sample close to the cut-off in an ELISA.

As the uncertainty is to be estimated at the threshold which is not necessarily the reaction level of the low positive control serum, the relative standard deviation, rsd (or coefficient of variation, if expressed as a percentage), provides a convenient transformation:

rsd (x) = sd(x)/(x)

To simplify assessment, the transformed result is regarded as the assay output result (x). In the case of this example, a competitive ELISA, results are standardised by forming a ratio of all optical density (OD) values to the OD result of a non-reactive (negative) control (ODN). This ratio is subtracted from 1 to place the level of antibody activity on a positive correlation scale, the greater the level, the greater the calculated value. This adjusted value is expressed as a percent now referred to as the percentage inhibition or PI value. So for the low positive control serum, the transformation is:

PIL = 100 X [1-{ODL/ ODN}]

The relative standard deviation becomes: rsd (PIL) = sd(PIL)/PIL

A limited data set is shown below. Ideally in the application of this ‘top down’ method, a large data set would be used which would enable accounting for effects on precision resulting from changes in operator and assay components (other than the control serum).


PI (%)
























Std Dev (sd)


Assays (n)


rsd = sd/mean


From the limited data set,

rsd (PIL) = 7.9/56.3 = 0.14
(Or as coefficient of variation = 14%.)

Note: If uncertainty on accuracy is to be included, it is added at this stage by combining the squared rsd values, i.e.:
Combined uncertainty u(x)/(x) = ?[ƒ?( rsd (x)2 ].

However, for many serology assays, this information is not readily available and will not be considered in this article.

Expanding the uncertainty by multiplying the rsd (PIL) by a factor of 2, allows the calculation of 95% confidence levels around the threshold value.

Expanded Uncertainty U(95%CI) = 2 X rsd = 0.28

This estimate can then be applied at the threshold level (in this case at PI = 50%)

95% CI = 50 } (50 X 0.28) = 50 } 14%

Interpretation: any positive result (PI > 50%) that is less than 64% is not positive at the 95% confidence levels. Similarly, a negative result (PI < 50%) that is not less than 36% is not negative at the 95% confidence level.

Serology, ELISA (QLD)


Combination of ‘components approach’ or ‘bottom-up’ and ‘control sample’ or ‘top-down’ approach.

The characteristic data was accumulated over many runs on different days by different analysts.

Comment from Peter McKinley:

Whilst we realise that the main contributor of the MU was the precision of the absorbance readings (12%) and that the other contributors only added a further 3%, we felt that it was a valuable exercise for our people to include these other contributors so that they had a better appreciation of the sources of variation of the results and of MU estimations and at that stage, we weren't aware of the extent of the contribution of each set of variables. Having set up the spreadsheet, it was relatively easy to then apply it to other methods. I realise that one could choose to estimate the MU in these cases just based on the precision.

Serology, ELISA (WA)


Use of ‘control sample’ or ‘top-down’ approach.

UM is not about the range of results for a range of samples but more about the range of results for a single sample (in this case a ‘weak positive’).

Look at the range of variation that will occur in your own lab (e.g. if you only have one staff member ever performing a test then that is not really a variable. Same for sets of pipettes etc. and how often batches change for kits).

Say 24 samples (3 columns of an ELISA plate) * Number of staff * number of sets of pipettes * number of days* number of kits I would suggest 24 * 3 * 3* 2 * 2.

Not all of these need to be conducted separately. They can be tested concurrently (there are some downsides to this approach). For example, 3 staff can use a different set of pipettes each and make up their reagents from fresh.

I suggest 24 samples done twice on a day in the same plate using two batches of kit (48 samples + controls). The whole process repeated for each of 3 staff (using different sets of pipettes).

Document all detail of variables used.

Then calculate the Mean, SD for each of the 3 sets of data by staff and by kit then for the full 144 repeats.

2*SD then becomes the UM (95%) (+/-) (Grey Zone).

Reality checks on the outcome, if too big then why? Check what contributed most significantly to the UM. If a problem can be identified (training, kits or equipment etc.) then this should be rectified and that part recalculated.

Keep all data and repeat for each test.

Decide what the UM means for the interpretation of the data.

Modify the methods to include any change to interpretation and the UM data.

Australasian Society for Immunology and Allergy (ASCIA)

National Association of Testing Authorities, Australia (NATA) recommended approach:

Use of ‘control sample’ or ‘top-down’ approach.

Parameter of Uncertainty of Measurement to be assessed = expanded uncertainty = Standard Deviation (SD) x 2 (rather than Coefficient of Variation [CV] x 2).

Note: CV = (SD / Mean) x 100%.

Number of measurements

At least 60 (degrees of freedom), so at least n=61 measurements.

Across different lots/batches.

What tests does this apply to?

Any test where quantitative values are used to obtain the result, including qualitative.

Assays where a quantitative value is used to generate a test result (e.g. Anti-ENA ELISA).

Reference material

Any external reference material (commercial or pooled patient samples) used for each assay.

Value should be close to the cut off, clinical decision points OR mid-range where curve is linear.

The kit controls should not be used instead of external reference material, as the assessment will be across different lots/batches.


At present, we would not advise including the expanded uncertainty parameter on the report.


We acknowledge the above expanded uncertainty parameter does not take into account bias or accuracy.

MU example quantitative tests, Royal College of Pathologists of Australasia

NATA recommended approach:

Use of ‘control sample’ or ‘top-down’ approach serology, biochemistry.

  1. Make a list of all of the tests where the result is reported as a number.
  2. From this, make a sub-list of every test which is already in the in-house CV% data.
  3. Multiply the CV% by 2 and record this figure as the uncertainty of measurement against each of these tests. This should cover the majority of tests that a particular laboratory reports as a number.
  4. Determine the CV% for the remaining tests, either by:
    1. reviewing internal QC data (30 sets) and determining the CV%
    2. obtaining CV% for method employed from an external QA programme
    3. obtaining CV% for method employed from the manufacturer (reagent kit insert).
  5. Check if it is either impractical or irrelevant to determine the uncertainty of measurement for the remaining tests, and document your reasons if these are not obvious.
  6. Finally, document the policy for determining the uncertainty of measurement, the calculations used, and the sources of data used for these calculations.

Example from the American Society for Veterinary Clinical Pathology, Principles of Quality Assurance and Standards for Veterinary Reference Values

NATA recommended approach:

Decision limits are required for interpretation of laboratory test values. Decision limits may be cut-points or reference intervals. At least one type of decision limit is required to answer relevant clinical problems or questions.

Cut-points are used to differentiate health from disease or, of greater clinical relevance, to differentiate diseases with similar presenting problems. Diagnostic sensitivity and diagnostic specificity are determined at different cut-points. Receiver-operating-characteristic (ROC) curves can be used to help identify optimal cut-points for different clinical needs. If disease prevalence is known, the predictive values of a positive and negative test can be calculated.

Alternatively, clinicians may rely upon reference intervals for interpreting patient test values. Due to the many combinations of factors affecting test results, including instrumentation, reagents, technologists, collection and processing of samples, animal-related factors, etc., reference intervals should be confirmed in each laboratory. The reference intervals reported by the laboratory typically are the lowest and highest value expected for the central 95% of similar healthy individuals.

International Federation of Clinical Chemists (IFCC) guidelines are recommended for use when developing reference intervals in veterinary diagnostic laboratories. The IFCC discourages use of the term reference ‘range’ due to the statistical definition of range as a single number, i.e, the difference between the upper and lower limit. It is almost impossible for each veterinary diagnostic laboratory to develop and maintain reliable, current, method-related reference intervals for each test and for each subset of animals of potential clinical interest. Frequent changes in technology and the variables of species, breed, age, sex, animal husbandry, diet, and geographic location may create significant differences among laboratories. In spite of this challenge, we recommend, at minimum, that a veterinary laboratory provide the following information to clinicians for every test:

Reference intervals for at least one well-defined subset of each major species. It should be stated clearly whether reference intervals were determined de novo using current methodology or whether reference intervals from a previous method were modified following comparison of old and new methods. It also should be noted if the reference intervals were historic or extrapolated from the literature.

Reference interval calculations. The number of reference observations used for calculating reference intervals, whether data did, or did not, have normal distribution, and the parametric or non-parametric method used to calculate the 2.5 and 97.5 percentiles should be indicated. If observations were excluded, these should be listed with explanations.

Confidence intervals (tolerance limits) for the lower and upper reference limit. This calculation requires a minimum number of observations.

Additional information upon request. The laboratory should be able to provide users with additional information about the relevance and reliability of each test method. This information should include the signalment and state of health of the reference individuals, how this was determined, fasting state, sampling time, collection, storage, and processing of samples, etc. In addition, information regarding handling of outliers and observations excluded for other reasons should be described. This information should allow clients to determine whether the reference observations represent the patient(s) in question, e.g., regarding breed, age, sex and management.

The IFCC recommends a minimum of 120 observations for reference intervals. With this sample size, confidence intervals can be calculated for the lower and upper reference limits independent of whether observations have a normal (Gaussian) or non-normal distribution. The confidence interval around the lower and upper limits for 60 observations with normal distribution is similar to that for 100–120 observations with non-normal distribution. Confidence intervals indicate the reliability of the reference limits and whether a test is able to meet clinical expectations.

Although reference intervals often are developed using fewer observations, until >40 observations are available the best estimates for the central 95% reference intervals are the lowest and highest values observed.

Bacteriology, sensitivity testing (QLD)

The main component of the total uncertainty for the procedure is the precision of zone diameter readings and this was determined as a type A standard uncertainty with 45 repeated measurements. Use of precision (14%) and accuracy component (16%) for MU.

Bacteriology, CFU (QLD)

MU as published approach by the Canadian Association for Environmental Analytical Laboratories (CAEAL). Refer to: CAEAL Policy on the Uncertainty of Measurement in Environmental Testing. (Revision 1.7 24\10\03),

The approach just considers precision as the significant contributor to the method MU. Parasitology, Fecal egg count, (QLD) Use of ‘control sample’ or ‘top-down’ approach.

Comment from Peter McKinley:

Whilst we realise that the main contributor of the MU was the precision of the counts, we felt that it was a valuable exercise for our people to include these other contributors so that they had a better appreciation of the sources of variation of the results and of MU estimations. I realise that one could choose to estimate the MU in this case just based on the precision and obtained:

61 % (cf 62%) for a mean of less than ten eggs per chamber.

22 % (cf 24%) for a mean of ten or higher eggs per chamber.

The precision of counts is determined from the normalised difference data (the difference divided by the mean) of replicated counts of characteristic samples. The characteristic data was accumulated over many runs on different days by different analysts of the 3 veterinary laboratories. The standard deviation of the normalised differences is divided by √2 to transform from a standard deviation for pairwise differences to the standard deviation for single values. The estimated precision relative standard deviation for single count determinations is calculated as 30% for the lower range and 11 % for the higher range.