Abstract
Objective: To assess accuracy in transcutaneous bilirubinometry (TcB) measurements over an 11-year period and investigate differential performances depending on ethnic background. Aims to establish a TcB screening threshold to trigger measuring the serum bilirubin (SBR) for each group.
Methods: We conducted a retrospective review using neonatal unit data from 2013–2024. All neonates with numerical paired TcB and SBR’s were included (n=1464). The ethnicity had been recorded for 1,196 neonates; TcB and SBR results for these patients were compared using 4 broad groups: White, South Asian, Chinese and Black ethnic groups. The neonatal unit used several different bilirubinometers during this time.
Findings: Among 1,464 patients, the mean TcB and SBR were 262 and 256 μmol/L respectively (p<0.00001). There was an absolute mean difference of -6 μmol/L, with 95% confidence intervals from -8 to -4 μmol/L, confirming a systematic difference. In this QIP, we have chosen to represent the difference between SBR and TcB (SBR-TcB) with the Δ symbol, for clarity. The mean Δ was 0, -3, -13 and -33 μmol/L for the Chinese, White, South Asian and Black ethnic groups respectively. Thus, there is a systematic overestimate of TcB versus SBR for all groups except the Chinese ethnicity. The overestimate is more statistically significant in darker skin tones. There was no significant change in Δ in any group over the 11 years. For all ethnicities, the 95th centiles for mean Δ ranged from +55 to +66 μmol/L. Thus, setting a cut off for +65 μmol/L regardless of skin color would ensure that 95% of children are managed safely.
Keywords
Jaundice, Bilirubin, Bilirubinometry, Transcutaneous, Kernicterus, Ethnicity, Diversity, Technology
Introduction
Neonatal jaundice is a common presentation, occurring in 60% of newborns and in 80% of premature newborns [1]. The jaundice is a result of hyperbilirubinaemia; there is either increased production of bilirubin or decreased clearance of it. At high levels, bilirubin deposits into the subcutaneous tissue and sclera, thus giving a yellow tinge. Whilst most of the cases are physiological and do not require treatment, a small proportion of neonates can accumulate extremely high bilirubin. This can cause kernicterus, and thus accurate monitoring of bilirubin is vital.
Bilirubin can be initially measured with a Transcutaneous Bilirubinometer (TcB), which gives a non-invasive, immediate estimate of bilirubin levels. However, serum bilirubin (SBR) readings remain to be the gold standard method of bilirubin measurement.
The published accuracy of the TCB monitors that are widely used in UK varies. Some companies report correlation coefficients whilst some companies report ± ranges. However, correlation coefficients give no sense of accuracy; they only demonstrate that one measurement moves with the other. It is not clear from brochures whether the ± ranges are standard errors of the mean, standard deviations or root mean standard errors. Only one brochure published the repeatability. Therefore, it is impossible to know the accuracy of any company’s TcB accuracy in a real-world scenario [2–5].
Current evidence around skin color and TcB accuracy is inconsistent; some research suggests the TcB is unaffected by skin color, whilst other sources suggest that TcB underestimates the bilirubin in those with darker skin more than it does in lighter skin babies and thus is not reliable [6–8].
We conducted a Quality Improvement Project (QIP) as part of our Equality, Diversity and Inclusion programme. The project primarily aims to shed more light on the accuracy of our TcB measurements in different ethnicities (White ethnic group, South-Asian ethnic group, Black ethnic group and Chinese ethnic group). We also used the data to track the trend in bilirubin presentation levels in neonates for each ethnic group over the first 10 days of life.
Since the introduction of TcB technology, new devices have been marketed. As this project includes data from 2013 to 2024, we explored any improvement in the accuracy of TcB measurements over the past decade. Our unit has utilized 4 different TcB devices over this time period, with some overlap in usage temporally. In chronological order, our department used BiliChek™ (Philips), JM102™ (Minolta), JM-105™ (Dräger) and finally since 2020, we have used BiliCare™ (Mennen).
Materials & Methods
This quality improvement project was a retrospective review, using neonatal data from Sheffield Teaching Hospital’s Rapid Access Clinic from January 2013 to July 2024.
In excess of 3,000 babies attended our rapid access clinic, 1,464 had paired TcB and SBR values, and 1196 had age at sample measurement, and ethnicity data recorded to produce a minimum group size of >30. If a patient had multiple paired TcB and SBRs, then only the first pair of measurements were included. Patients who had missing or unclear data regarding paired TcB and SBR values, ethnicity, or age at presentation were excluded from the study.
The patients formed 4 broad ethnic groups: White ethnic group (including White British, Irish British, and any other White background), South Asian ethnic group (including those of Indian, Pakistani, and Bangladeshi ethnic groups), Black ethnic group (including Caribbean, African, and any other Black background) and Chinese ethnic group. Whilst ethnicity does not relate to a specific skin tone, we have used it to broadly demonstrate the differences in TcB as skin tone darkens.
There are a number of potential confounding factors that may also affect TcB readings, such as gestational age, birthweight, postnatal age distribution, device type, and operator variability. Unfortunately, due to limitations in data collection, we have not been able to account for these variables.
The average bilirubin value, and the numerical and percentage differences between SBR and TcB were recorded for each ethnic group. Statistical analyses were performed using The British Standards Institution’s Coefficient of Agreement and unpaired T-tests. The T-tests were used to compare the SBRs and TcBs, to compare the mean TcBs between groups and to compare the mean SBRs between groups. A significance level of 0.05 was used.
Additionally, the progression of TcB accuracy through time was also investigated. The total patient set was split into ‘early’ (2013–2017) and ‘late’ (2020–2024), with a sample of the 500 first patients compared with the last 500.
We also explored the day of presentation of serum bilirubin levels for each ethnic group. A graph of age (in days) for the first bilirubin measurement against the average SBR was plotted.
Microsoft Excel was used for recording data. Online SPSS was used for statistical calculations. A Shapiro-Wilk test was applied to the data, which showed a normal distribution, as expected with most continuous biological variables. Hence, we used parametric assessment of unpaired T-tests to compare different groups. A Bland-Altman plot [9] was used to depict the average difference in TcB and SBR with the average SBR value for each ethnic group. In the Bland-Altman plot, the Limits of Agreement were established as the mean bias ±1.96 x standard deviation, so that 95% of the points fall between the limits. Proportional bias was not tested in this study. In addition, we calculated the Pearson correlation co-efficient.
Results
One thousand four hundred sixty-four patients had paired measurements. However, only 1,196 fell into our four broad ethnic groups. The remainder had no ethnicity recorded or the population of the ethnicity was <30. The paired TcB and SBR values were analyzed within each group.
For 1,464 patients with paired TcB and SBRs, the mean TcB was 262±42 μmol/L, with 95% CI of the mean as 260–264 μmol/L. The mean SBR was 256±53 μmol/L, with 95% CI of the mean as 253–259 μmol/L. As the 95% confidence intervals do not overlap, they were significantly different from each other (p<0.0001).
The mean Δ was -6 μmol/L and the 95% CI of the mean Δ were -7.9 to -3.9 μmol/L, illustrating that there was a systematic overestimation of the TcB in comparison to the SBR. The 95th centile for differences was +63 μmol/L This means that only 5% of our neonates had an SBR that was 63 μmol/L higher than the measured TcB.
Table 1 shows data for each of our four main ethnicities. The mean difference increased as likely skin color darkened. With the exception of the Chinese ethnicity, there was a systematic overestimation of TcB compared with SBR. However, the 95th percentile across the ethnic groups ranged from +55 to +66 μmol/L.
SBR values for the Black ethnic group were significantly lower compared to those of every other ethnic background (p<0.01).
There was a significantly higher mean Δ in the Black ethnic group compared with the other ethnic groups (p<0.001), suggesting that the difference between TcB and SBR readings is worse for the darkest of skin tones.
The average measured TcB is significantly lower in the White ethnic group than in the South Asian ethnic group (p<0.0001) and in the Chinese ethnic group (p<0.05).
TcB and SBRs have a surprisingly weak correlation using Pearson’s Correlation Coefficients; r=0.687 for 1,464 neonates.
There was a significant difference between SBR and TcB values for all groups (p<0.033), except for the Chinese ethnic group.
Figure 1 shows a systematic error between TcB and SBR for groups A, B, and D as the 95% confidence intervals do not cross 0. There is therefore a significant difference between the mean Δ in the White, South Asian, and Black ethnic groups; TcB is significantly higher than the paired SBR in these groups. There is no significant difference in the Chinese ethnic group. The p-values, as seen in Table 1, show that the TcB is overestimated most significantly in the Black ethnic group, then the South Asian ethnic group, and lastly in the White ethnic group.
Figure 1. Bland-Altman plot showing the mean (solid lines) and 95% Limits of Agreements for the mean (dotted lines) for each ethnicity.
|
Group |
n |
Average age in days at presentation (95% confidence intervals of mean) |
Mean TcB in μmol/L, (95% confidence intervals of mean) |
Mean SBR in μmol/L, (95% confidence intervals of mean) |
Pearsons Correlation Coefficient between TcB and SBR |
Δ (SBR-TcB) in μmol/L, (95% confidence intervals of mean Δ) |
95% centile of the mean Δ in μmol/L |
|
Chinese ethnicity |
32 |
4 (4 to 4) |
275 (260–290) |
275 (253–297) |
0.794 |
0 (-14 to 14) |
+66 |
|
White ethnicity |
914 |
4 (4 to 4) |
258 (256–261) |
256 (252–259)† |
0.712 |
-3 (-5 to -0.22) |
+62 |
|
South Asian ethnicity |
182 |
4 (4 to 4) |
273 (268–278) |
260 (253–267)† |
0.6308 |
-13 (-19 to -8)‡ |
+55 |
|
Black ethnicity |
68 |
4 (3 to 5) |
263 (253–272) |
230 (214–245)† |
0.690 |
-33 (-44 to -22)* |
+59 |
Figure 2 shows that at lower bilirubin levels, TcB is more likely to overestimate serum bilirubin. At higher bilirubin levels, TcB is more likely to underestimate it. This correlation is significant for all identified ethnic groups (p<0.001), with R2 varying between 0.0498 to 0.4079.
For the earliest 500 patients recorded (starting from 2013) the mean Δ (with 95% confidence intervals of the mean) was -5.9 (-9.6 to -2.3) μmol/L. For the most recent 500 patients (working backwards from 2024), the mean Δ was -8.7 (-12 to -5.4) μmol/L. There is no statistically significant change in accuracy with time.
Figure 2. Scatter chart showing the average bilirubin against percentage Δ (difference of SBR-TcB) for each ethnicity.
Figure 3 shows the plot of how the difference in SBR and TcB (and thus TcB reliability) has not changed over the past 11 years, in regard to each ethnicity. For the 1,464 babies there was no significant correlation between accuracy and time passing (r=0.008367, p=0.751).
Figure 3. Scatter graph showing the differences between SBR and TcB (Δ) over the past 11 years.
Figure 4 illustrates that over the first 10 days of life; the serum bilirubin day of referral/presentation seems to behave differently in babies of different ethnic groups. In babies of White or South Asian ethnicity, there is a proportion of babies that present on day 0/1 with high bilirubin levels, but this doesn’t seem to happen in those babies of Chinese or Black ethnic origin. However, the small sample sizes in the Black and Chinese neonate ethnic groups means that it is difficult to justify conclusions from the trajectories seen.
Figure 4. Comparison of the average bilirubin by day of presentation for each ethnicity for the first 10 days of life.
Discussion
Previous to TcB technology, newborns were visually assessed for jaundice, which proved inaccurate and particularly unreliable in babies of darker skin tones. In some countries, initial assessment is still visual inspection and is followed by an SBR [10].
Transcutaneous bilirubinometers were first introduced in 1980 [11]. The TcB device works by detecting the yellow bilirubin pigment in the subcutaneous tissue via optical spectroscopy [11]. More recently technology has developed; TcBs are based on microspectroscopy [7]. TcBs are commonly used as it is a cost-effective, point-of-care method and can avoid the trauma and potential anemia associated with blood sampling for serum bilirubin (SBR) [12]. They also reduce unnecessary SBRs and maximize the efficiency of the workforce [13].
Studies have suggested that TcB accuracy can vary due to a host of factors, including skin color, birth weight, and gestational age [14,15]. Whilst most research deems TcBs to be a reliable way of estimating the serum bilirubin [16,17], some authors show that there are inconsistencies in how well it performs in different skin tones. A 2018 South Indian study demonstrates a higher correlation between TcB and SBR in lighter skin tones than in darker ones [7], suggesting that reliability decreases as skin tone darkens.
Studies have shown that TcB tends to underestimate the bilirubin in lighter and medium skin tones and overestimate the bilirubin in darker skin tones [16–19]. Interestingly, a 2014 in vitro study showed the darker the skin tone, the larger the underestimation of TcB [9].
In this study, we have been using ethnicity as a proxy for skin color, and so this needs to be taken in consideration when discussing the results of the data, as it limits biological interpretation and external validity.
Our data shows that there is an overall poor correlation between TcB and SBR (r=0.687) in the total sample (n 1464), and within each group (r=0.631 to 0.794). The strongest correlation is in the Chinese ethnic group. However, correlation is not a marker of accuracy and should not be appearing solely in any TcB promotional material.
The average SBR in presenting neonates was statically significantly lower in the Black ethnic group in comparison to the SBR of every other ethnic group, which is interesting given the level of overestimation by the TcB. As a result of the greater overestimation by TcB for babies from a Black ethnic background, these babies are having more serum bilirubin’s tested even though they have lower bilirubin levels. However, this seems to imply that clinical staff are being appropriately vigilant in picking up jaundice in babies with darker skin tones, and that they are picking up jaundice in these babies at a lower level than those babies in other ethnic groups. However, this does not explain the absence of Black and Chinese ethnicity babies presenting on day 1 of life in our dataset with higher bilirubin values.
The average bilirubin by day of first presentation for each ethnic group is shown in Figure 4. The time course is similar in the White and South Asian ethnic groups; the average bilirubin in these groups seem to have a peak in first 24 hours that is not seen in Chinese or Black ethnicity. The Black ethnic group and Chinese ethnic group show very different trajectories. This may mean that due to the challenges of assessing jaundice in babies that have different or darker skin tones, jaundice in some of the neonates are being missed in early days of life. However, the jaundice is being clinically picked up in South Asian neonates, who generally have darker skin tones than those of White ethnicity. Another potential hypothesis may be there are different trajectories of bilirubin levels in these ethnic groups. The sample sizes are particularly small for the Black and Chinese ethnic groups, so ultimately these data are too limited to make firm conclusions.
There is a significant difference between the Δ in the White, South Asian and Black ethnic groups, with absolute mean differences of -3, -13, and -33 μmol/L respectively. Thus, TcB is overestimating the bilirubin in all these groups, and the overestimation increases as likely skin tone, as suggested by ethnicity, darkens. The absolute mean Δ in the Chinese ethnic group is 0 μmol/L, showing a reliable estimate by TcB. Although, it should be noted that this is the smallest population and may not have been large enough to detect a systematic error between TcB and SBR.
Surprisingly, the 95th centile difference was relatively similar between groups, ranging between +55 to +66 μmol/L. For all the 1,464 babies with paired measurements the 95th centile was 63 μmol/L. Thus, whilst accuracy depends on the ethnic background, setting a TcB threshold of around 60 to 65 μmol/L within treatment level, regardless of skin color, would ensure around 95% of children will receive a subsequent SBR and can thus be managed safely.
The statistical and clinical significance of this change varies between ethnicities. For babies of Chinese ethnicity, there is no statistical or clinical significance in changing the TcB threshold, for White ethnicity babies, there is statistical significance but not clinical. For South Asian and Black ethnic group babies, there is a difference in both. It is important to remember that these are the mean differences, and not the individual ones. Specifically in the Black ethnicity, the underestimation by the TcB could mean that some babies that need exchange transfusion will not even receive an SBR.
Figure 5 shows the ‘Treatment threshold graph’, published by NICE CG98 [20] in 2010. In this figure, we have shown NICE recommendations for TcB thresholds, as well as our own. NICE CG98 states a transcutaneous bilirubinometer should be used for babies that have a gestation of ≥38 weeks and are ≥24 hours old, and if the TcB level is ≥250 μmol/L (as indicated by the red dotted line), then the serum bilirubin should be tested [20]. The purple line illustrates our recommendation for the TcB threshold, as it includes TcB values up to 65 μmol/L within the phototherapy treatment line.
Figure 5 indicates that from 24–78 hours of age, there are a number of babies that will have TcB values under 250 μmol/L, so would not be treated under the current NICE guidance. However, some of these babies will have had their bilirubin underestimated by up to 66 μmol/L by the TcB device, and thus will not receive an SBR or treatment despite needing it. Changing the TcB threshold will therefore protect these neonates. Additionally, from 78 hours onward, there are a number of babies that will have a SBR done under NICE guidance, but do not fall within our threshold; thus, these babies will be having unnecessary blood tests. Babies that present with jaundice within 24 hours of birth should have a serum blood test rather than a TcB due to underlying pathology, thus we have not included these babies in our recommendation.
Figure 5. Treatment threshold graph, based on NICE guidance (CG98) and our recommendations in regard to TcB thresholds for measuring the SBR.
It is disappointing, in terms of light technology improvement, that the vast improvement seen in smart phone cameras from 2011 to 2024 does not seem to have translated across to the accuracy of TcB monitors. There has been no step change in performance see Figure 3, unlike phone cameras. This may relate to regulation of medical devices, or the complexity of propriety technological validation.
This study is limited by its sample sizes. The largest group in our study was that of the White ethnic group, with 914 patients. Comparatively, our smallest group was the Chinese ethnic group with only 32 patients. Thus, the confidence intervals will always be wider in the Chinese ethnic group. An additional limitation to this study is that we have used the recorded maternal ethnicity as a proxy for the neonate’s ethnicity. This may not accurately reflect the baby’s ethnicity nor skin tone, as there is no paternal data recorded or utilized.
Lastly, the TcB measurements were done by jobbing clinicians, not in laboratory conditions. This may potentially affect accuracy but also allows this to be a real-world test of TcB performance.
This QIP has been based off the ‘Plan-Do-Study-Act (PDSA) cycle. We have planned the methodology of this study, collected the data and studied it. Whilst we have also acted on the data by raising the screening threshold for TcB readings to 60–65 μmol/L within treatment level, the more crucial ‘act’ must be done by technology companies that produce transcutaneous Bilirubinometers. The accuracy of the technology needs to be vastly improved, and skin color needs to be taken into account.
Conclusions
Further research should be done using actual measurements of skin tone, for example using the Fitzpatrick scale, instead of using ethnicity as a proxy, to allow for more reliable results. Future studies should also aim to have a larger data set for different ethnicities/skin tones and should consider a multi-center approach, which ideally would include premature babies and higher bilirubin values. Additionally, an exploration into the trajectories of bilirubin increase in the first few days of life would be a valuable insight, as this study’s sample sizes may be too small to properly analyses if different ethnicities have different courses or if our results are misleading due to small sample sizes. Due to this project being a pragmatic QI review, we did not have data comprehensive enough to justify a multi-logistic regression. This may be a valuable addition to future prospective studies.
Conflicts Of Interest
All authors declare that they have no conflicts of interest.
Funding
No funding was given or used for this project.
Approval
This QI project was approved as part of our in-house Equality, Diversity and Inclusion (EDI) work.
References
2. Croyde Medical. Croyde Jaundice Meter FAQs. Accessed: 24 August 2025. Available at: https://knowledgehub.croydemedical.co.uk/croyde-jaundice-meter-faqs.
3. Delta Medical International. Jaundice Meter. 2017 [Accessed: 24 August 2025]. Available at: https://deltamedint.com/products/jaundice-meter/.
4. Dräeger. Dräger Jaundice Meter JM-105. Accessed: 24 August 2025. Available at: https://www.draeger.com/en_uk/Products/Jaundice-Meter-JM-105.
5. Philips. BiliCheck System. Accessed: 24 August 2025. Available at: https://www.healthcare.shop.philips.nl/Philips-Global-Category/mother-and-child-care/supplies-product-family/supplies-product-type/bilichek-bilirubinometer/p/989805644871?salesOrg=NL90.
6. Varughese PM, Krishnan L. Does color really matter? Reliability of transcutaneous bilirubinometry in different skin-colored babies. Indian J Paediatr Dermatol. 2018 Oct 1;19(4):315–20.
7. Afanetti M, Eleni Dit Trolli S, Yousef N, Jrad I, Mokhtari M. Transcutaneous bilirubinometry is not influenced by term or skin color in neonates. Early Hum Dev. 2014 Aug;90(8):417–20.
8. Dam-Vervloet AJ, Morsink CF, Krommendijk ME, Nijholt IM, van Straaten HLM, Poot L, et al. Skin color influences transcutaneous bilirubin measurements: a systematic in vitro evaluation. Pediatr Res. 2025 Apr;97(5):1706–10.
9. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986 Feb 8;1(8476):307–10.
10. van der Geest BAM, de Mol MJS, Barendse ISA, de Graaf JP, Bertens LCM, Poley MJ, et al; STARSHIP Study Group. Assessment, management, and incidence of neonatal jaundice in healthy neonates cared for in primary care: a prospective cohort study. Sci Rep. 2022 Aug 23;12(1):14385.
11. Yamanouchi I, Yamauchi Y, Igarashi I. Transcutaneous bilirubinometry: preliminary studies of noninvasive transcutaneous bilirubin meter in the Okayama National Hospital. Pediatrics. 1980 Feb;65(2):195–202.
12. Dai J, Parry DM, Krahn J. Transcutaneous bilirubinometry: its role in the assessment of neonatal jaundice. Clin Biochem. 1997 Feb;30(1):1–9.
13. Jnah A, Newberry DM, Eisenbeisz E. Comparison of Transcutaneous and Serum Bilirubin Measurements in Neonates 30 to 34 Weeks' Gestation Before, During, and After Phototherapy. Adv Neonatal Care. 2018 Apr;18(2):144–53.
14. Knüpfer M, Pulzer F, Braun L, Heilmann A, Robel-Tillig E, Vogtmann C. Transcutaneous bilirubinometry in preterm infants. Acta Paediatr. 2001 Aug;90(8):899–903.
15. Karen T, Bucher HU, Fauchère JC. Comparison of a new transcutaneous bilirubinometer (Bilimed) with serum bilirubin measurements in preterm and full-term infants. BMC Pediatr. 2009 Nov 12;9:70.
16. Bhutani VK, Gourley GR, Adler S, Kreamer B, Dalin C, Johnson LH. Noninvasive measurement of total serum bilirubin in a multiracial predischarge newborn population to assess the risk of severe hyperbilirubinemia. Pediatrics. 2000 Aug;106(2):E17.
17. Maya-Enero S, Candel-Pau J, Garcia-Garcia J, Duran-Jordà X, López-Vílchez MÁ. Reliability of transcutaneous bilirubin determination based on skin color determined by a neonatal skin color scale of our own. Eur J Pediatr. 2021 Feb;180(2):607–16.
18. Wainer S, Rabi Y, Parmar SM, Allegro D, Lyon M. Impact of skin tone on the performance of a transcutaneous jaundice meter. Acta Paediatr. 2009 Dec;98(12):1909–15.
19. Samiee-Zafarghandy S, Feberova J, Williams K, Yasseen AS, Perkins SL, Lemyre B. Influence of skin colour on diagnostic accuracy of the jaundice meter JM 103 in newborns. Arch Dis Child Fetal Neonatal Ed. 2014 Nov;99(6):F480–4.
20. National Institute for Health and Care Excellence (NICE). Neonatal jaundice: treatment threshold graphs (CG98). London: NICE; 2010 [cited 2026 Apr 8]. Available from: https://www.nice.org.uk/guidance/CG98.




