Science
Improving the Accuracy of Forensic Age Estimation Through Bias Reduction
Key Points
Chronological age estimation can provide supporting information in forensic casework when traditional identification methods are limited. DNA methylation, a stable epigenetic mark, has emerged as a promising tool for predicting chronological age from trace samples. However, many existing age estimation models rely on linear regression approaches, which often yield biased prediction errors across the age distribution (i.e. model residuals show a significant age dependence).
Chronological age estimation can provide supporting information in forensic casework when traditional identification methods are limited. DNA methylation, a stable epigenetic mark, has emerged as a promising tool for predicting chronological age from trace samples. However, many existing age estimation models rely on linear regression approaches, which often yield biased prediction errors across the age distribution (i.e. model residuals show a significant age dependence). In this study, we compared three approaches for age estimation modeling: multivariable linear regression, random forest regression and maximum likelihood estimation. While the first two approaches are well established, for the third one we constructed and validated a DNA methylation-based LOESS regression maximum likelihood model for age estimation utilizing forensic-relevant CpG markers. In all cases, model performance was evaluated through Leave-One-Out Cross-Validation (LOOCV). We utilized three independent publicly accessible methylation datasets collected using droplet digital PCR (ddPCR) to evaluate the most effective method for accuracy and bias in age estimation. Notably, when we compare the results of the maximum likelihood approach to the other approaches, multivariable linear regression and random forest regression, we find less bias in the age associated residuals compared to the other methods. These findings highlight the utility of non-linear modeling techniques in reducing the biases of epigenetic age estimation for forensic applications.