Research Article - (2024) Volume 15, Issue 2
Received: 02-Apr-2024, Manuscript No. jfr-24-131270;
Editor assigned: 04-Apr-2024, Pre QC No. P-131270;
Reviewed: 18-Apr-2024, QC No. Q-131270;
Revised: 23-Apr-2024, Manuscript No. R-131270;
Published:
30-Apr-2024
, DOI: 10.37421/2157-7145.2024.15.606
Citation: Camille Ropert, Simon Buré, Adèle Sourisce and Caroline Gallois, et al. “DNA Methylation-based Age Prediction from Blood Samples of French Child and Adult Individuals.” J Forensic Res 15 (2024): 606.
Copyright: © 2024 Ropert C, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Objective: Identifying the origin of human biological traces detected at crime scenes by comparing DNA profiles to national or international forensic databases is often key to provide new orientations to police investigations. However, when unknown profiles are established, investigators can benefit from forensic genetics to propose new leads, for example by predicting the physical appearance of individuals. Since blood traces are of primary interest for forensic investigators and often lead to the extraction of usable genetic material, in this study, we developed a methodology to predict the biological age from blood samples based on the analysis of DNA methylation of human genomic regions.
Methods: We first established a cohort of blood samples obtained from 170 French donors aged from 0 to 101 years old. We analyzed the methylation status of 5 age-associated CpG sites using the SNaPshot method, a primer-extension based assay routinely used in the French forensic police laboratories. Using a training set of 136 samples, we generated an age-prediction model based on multiple regression analyses of DNA methylation data and we tested its predictive performances on a validation set.
Results: The SNaPshot assay was adapted to limiting quantities of genomic DNA relevant for forensic investigations. The DNA methylation levels were established for 5 age-related CpG sites in 170 blood samples collected from French male and female donors. We established a statistical model optimized for 5 CpG sites that can explain 97% of age variation with a Mean Absolute Error (MAE) of 3.45 years between the estimated biological and chronological age of individuals.
Conclusion: We developed an approach to predict the biological age of individuals strictly based on the methylation levels of 5 CpG sites from circulating blood samples and that is compatible with routine genetic analyses in French forensic police laboratories.
DNA methylation • Bisulfite conversion • SNaPshot • Age prediction • Mathematical model • Chronological and biological age comparison • Blood samples
In the early 20th century, Alphonse Bertillon introduced anthropometric measurements as a statistically sound approach for forensic identification in criminal investigations in France [1]. More than a century later, criminal investigations heavily rely on molecular biology and genetics to establish DNA profiles, based on short tandem repeat sequences analysis, that can be compared to national or international forensic DNA databases in order to identify individuals. When established DNA profiles remain unidentified, an alternative genetic approach can be proposed to orientate investigation leads, narrow down potential suspects and help reveal the truth. Indeed, during the last decade, a broad base of forensic genetics studies have focused on predicting phenotypic traits such as gender, skin, eye and hair colour as well as biogeographic origins of individuals, and, to a lower extent predisposition to baldness and freckles [2-8]. While these phenotypic features were mainly inferred from DNA Polymorphic Nucleotide Region (SNP) analyses [4-6,9,10], predictions of additional features such as the biological age have also emerged from epigenetic studies of DNA methylation [11-16]. DNA methylation is defined as the transfer of a methyl group onto the C5 position of cytosines primarily located at CpG sites throughout the genome. DNA methylation marks undergo highly dynamic patterns to control gene expression, and contribute to define cell and tissue identity as robustly stable marks throughout individual life [12]. Interestingly, several studies have revealed that a number of CpG site subsets rather display variable DNA methylation levels, either gain or loss, during individual aging in humans [17]. These observations at the genomic scale led to the concept of epigenetic clock [12]. The correlation between DNA methylation levels and the chronological age, based on birth declarations, has been driving a large research effort to predict the biological age of individuals from age-related changes in the epigenetic landscape and to evaluate the impact of environmental factors or diseases on human life span [18]. Being capable of predicting the age of an individual also has obvious applications in the forensic field for the analysis of unknown DNA traces collected at crime scenes to facilitate the identification of victims or potential responsibilities.
In the recent years, several studies have proposed age prediction models based on the analysis of DNA methylation changes for different tissues of primary interest in forensic investigations: semen [19], blood [20-22], saliva [22-24], or hard tissues such as teeth and bones [22,25,26]. Concomitantly, different methodologies have been developed to either analyze a limited number of CpG sites (SNaPshot, EpiTYPER, Pyrosequencing) or to perform genome-wide studies (massive parallel sequencing) [2,20,22,24,27-30]. As a forensic laboratory, our priority was to select an efficient and cost-effective approach to analyze blood samples that could directly be implemented to the laboratory routine analyses: the SNaPshot assay. Previous studies have reported the development of age-prediction models from blood samples, combining DNA methylation level quantifications by SNaPshot of 5 CpG sites localized in the vicinity of the ELOVL2, FHL2, KLF14, C1orf132 and TRIM59 genes, and the use of linear regression models for age predictions [20,24,29-31] (Figure 1). These 5 CpG sites were also studied in the context of saliva and buccal swabs and referred to as robust multi-tissue age predictive epigenetic marks [23,32]. In the present study, our main goal was to adapt the SNaPshot assay and statistical age-prediction models to sets of blood samples obtained from the French population. We first challenged the SNaPshot approach using the 5 CpG sites mentioned above for low input DNA samples, to cope with the limiting genomic DNA amounts collected from crime scenes. We analyzed a population of 136 individuals, with equal gender representation and homogeneous age distribution between new borns and centenarians, and we established an optimized age prediction model based on DNA level detection of 5 CpG sites (Figure 1).
Sample collection
This study was performed in accordance with the recommendations of the French National Ethics Committee legal framework (Comité Consultatif National d’Ethique pour les Sciences de la Vie et de la Santé). All adult participants and parents of minors, under the legal age of 18, signed a written informed consent for research purposes of collected samples. Peripheral blood samples were obtained from 170 living and anonymized French donors, including 78 males, 88 females and 4 donors who did not declare their gender on the consent forms. The donors were evenly distributed among 6 age classes: 0-14 years, 15-29 years, 30-44 years, 45-59 years, 60-74 years and over 75 years. From these samples, 144 (70 females, 70 males and 4 undetermined genders) were used to establish a training population, and the remaining 26 (18 females and 8 males) a validation population. Blood samples were collected and complemented with EDTA and stored at 4 °C until processing or on cotton swabs and processed promptly after collection following standard operating procedures.
DNA extraction and quantification
Genomic DNA was extracted for each blood sample using the Nucleospin™ Plasma XS Kit (Macherey Nagel, Düren, Germany) following manufacturer’s instructions. Extracted DNA samples were eluted in 50 μL of 5 mM Tris-HCl and quantified using the Quantifiler™ Trio DNA Quantification Kit (Applied Biosystems, Foster City, CA, USA) following the manufacturer's protocol.
Bisulfite conversion
Genomic DNA samples were resuspended in 20 μl and subjected to bisulfite conversion using the Premium Bisulfite kit (Diagenode, Liège, Belgium) following manufacturer’s instructions. The converted DNA samples were eluted in 15 μL of elution buffer from the kit.
To determine the sensitivity and ruggedness of the bisulfite conversion, an assay was conducted on six different amounts of input DNA (1 ng, 5 ng, 10 ng, 20 ng, 50 ng and 100 ng), obtained by serial dilution of one sample of genomic DNA, and analyzed in triplicates.
To ensure the consistency and reliability of the results for both the training and validation samples, 20 ng of genomic DNA were used to perform the bisulfite conversion and all samples were analyzed in duplicate. A negative control (no DNA) was used to detect a potential contamination. Two additional controls were also included to validate each SNaPshot assay. First, a blood sample with a known profile was systematically included to assess the variability of results between series of SNaPshot analyses. Secondly, a control composed of unconverted and unmethylated human genomic DNA (Epitect Control DNA Set, Qiagen, Hilden, Germany) was included to assess the bisulfite conversion efficiency.
PCR, multiplex SNaPshot and capillary electrophoresis
The Polymerase Chain Reaction (PCR) steps, multiplex SNaPshot and capillary electrophoresis were carried out under the same conditions as previously described [33,34]. Briefly, five CpG sites, respectively located in the vicinity of the genes ELOVL2, FHL2, KLF14, C1orf132, and TRIM59 were considered in our study. The characteristics of the selected CpG sites are described in Table S1. Converted DNA samples were submitted to multiplex PCR amplification in 20 µL assays containing 2 µL of converted DNA, 4 µL of 5x primers mix (concentrations specified (Table S1), 11.6 µL of pure H2O, 2 µL of 10X Gold ST × R Buffer (Promega Corporation, Madison, WI, U.S.A), and 0.4 µL of AmpliTaq Gold® polymerase (Applied Biosystems, Foster City, CA, USA). The PCR amplification program began with an initial denaturation at 95 °C for 11 min, followed by 34 cycles encompassing denaturation at 94 °C for 20 s, annealing at 56 °C for 1 min, and extension at 72 °C for 30 s. Subsequently, the PCR amplification included a final extension step at 72 °C for 7 min.
Next, 2 µL of ExoSAP-ITTM (Applied Biosystems, Foster City, CA, USA) were added to 10µL of each PCR-amplified products, and the digestion was conducted 45 min at 37 °C followed by 15 min at 80 °C.
The Single Base Extension (SBE) step was performed using the SNaPshot™ multiplex kit (Applied Biosystems, Foster City, CA, USA). 10 µL SBE reactions containing 2 µL of ExoSAP™-treated amplified DNA, 1 µL of a 10X primers mix (concentrations specified in Table S1), 2 µL of 5X Sequencing Buffer BigDye™ termination (Applied Biosystems, Foster City, CA, USA), 1 µL of SNaPshot reaction mix (Applied Biosystems, Foster City, CA, USA), and 4 µL of water were amplified by a PCR sequencing program consisting in 10 s at 96 °C, 5 s at 50 °C and 30 s at 60 °C for 25 cycles.
A final treatment was conducted by adding 1µL of Shrimp Alkaline Phosphatase (Applied Biosystems, Foster City, CA, USA) to each sample, and incubating the resulting mixes for 45 min at 37 °C followed by 15 min at 8 °C.
The resulting digested SBE products were analyzed using the 3500 xL Genetic Analyzer (Applied Biosystems, Foster City, CA, USA). The methylation rates (0 to 1) at individual CpG sites were determined using the GeneMapper™ Software (version 5) (Applied Biosystems, Foster City, CA, USA). Briefly, we quantified the nucleotide intensities defined by the peak height of the converted and unconverted nucleotides (C or G) and we calculated a ratio of methylated intensities over total peak intensities, as described by Jung SE, et al. [24].
DNA methylation linearity assay
To assess the linearity of DNA methylation detection by the SNaPshot assay for 20 ng of DNA input, completely methylated or completely unmethylated bisulfite converted control DNA samples (EpiTect PCR Control DNA, Qiagen, Hilden, Germany) were mixed to create nine samples with increasing methylation percentages: 0%, 5%, 10%, 25%, 50%, 75%, 90%, 95%, and 100%. These samples were then analyzed in triplicates as previously described, and the measured methylation levels were compared to the expected ratios for each mix.
SNaPshot method validation for low input DNA samples
In the recent years, several studies have described the development, for forensic applications, of DNA methylation-based age prediction models from blood samples using the SNaPshot assay (Suppl. Figure 1). Our goal was to propose a methodology to predict the age of unidentified individuals from blood traces discovered at crime scenes, and that was directly compatible with the routine analyses performed in the French police forensic laboratories. To this end, we first re-evaluated the multiplex methylation assay developed by Jung SE, et al. [24] using multi-tissue age methylation CpG sites identified in the vicinity of the ELOVL2, FHL2, KLF14, C1orf132 and TRIM59 genes (Suppl. Table 1). We verified the linearity of detection of DNA methylation levels, individually for each CpG site, from already purified and bisulfite converted control unmethylated or methylated genomic DNA samples (Suppl. Figure 2A). Accordingly, performing the SNaPshot assay with DNA sample mixes containing increasing proportions of methylated DNA, ranging from 0 to 100%, revealed a high degree of correlation between the measured and expected methylation levels for the ELOVL2 (R2=0.95), FHL2 (R2=0.98), KLF14 (R2=0.97), C1orf132 (R2=0.99) and TRIM59 (R2=0.96) CpG sites (Suppl. Figure 2A).
Since DNA samples collected from crime scenes are often available in limiting amounts, we next assessed the reliability of the bisulfite conversion coupled to the SNaPshot assay for low input genomic DNA samples. Genomic DNA was extracted from a peripherical blood sample and the initial bisulfite conversion was performed on increasing amounts of DNA ranging from 1 ng to 100 ng to detect KFL14, C1orf132, ELOVL2, FHL2 and TRIM59 CpGs simultaneously in a multiplex approach (Suppl. Figure 2B). DNA methylation levels measured were consistent with previous studies [20,24,29]. While the KFL14 and C1orf132 CpGs displayed low (<10%) and high (>70%) methylation levels, respectively, the ELOVL2, FHL2 and TRIM59 CpGs were associated to intermediate (35-40%) methylation levels for the blood sample analyzed. This analysis revealed a significantly high variability of the measured DNA methylation levels for the lowest DNA input samples (1 ng, 5 ng and 10 ng) (Suppl. Figure 2B), consistent with previous observations [35]. Interestingly, the methylation values for the 5 CpG sites were however highly consistent when using 20 ng, 50 ng and 100 ng of input DNA for the bisulfite conversion step, with standard deviations ranging from 1% to 2.4%. Therefore, in order to cope with the limiting availability of DNA traces collected in forensic cases while ensuring a highly reproducible detection of DNA methylation for multiple CpG sites, 20 ng of genomic DNA was defined as standard input to perform the bisulfite conversion and multiplex SNaPshot methylation assays in this study.
Establishing a training set of blood samples
The methylation levels of the CpGs in the genes ELOVL2, FHL2, KLF14, C1orf132 and TRIM59 were analyzed in 144 out of the 170 peripheral blood samples collected from French individuals between 0 and 101 years old (Figure 1). The blood donors included 4 undeclared genders, 70 female and 70 male participants evenly distributed across 6 age classes (Figure 1). Following genomic DNA extraction and bisulfite conversion of each sample, the methylation levels of the 5 CpGs were simultaneously measured using the SNaPshot assay, as previously described [24]. As expected, all 5 CpG sites showed age-dependant changes in peak distribution for methylated and non-methylated nucleotides on the electrophoregrams (Suppl. Figure 3). The DNA methylation rates of individual CpGs were inferred from the average peak intensity measured in duplicates for each sample. The distribution of methylation values for each CpG site was then analyzed and 8 outlier samples were excluded based on Bonferroni corrected p-values exceeding 0.05. For the remaining 136 samples, changes in DNA methylation rates for the 5 CpG sites were highly correlated with the chronological age of the donors (Figure 2A). The strongest correlations were observed for the CpG sites in ELOVL2 (R2 = 0.92), FHL2 (R2 = 0.87), C1orf132 (R = 0.84), TRIM59 (R2 = 0.84), while the lowest correlation was observed for the KLF14 CpG site (R2 = 0.66). These results are consistent with previous studies of age determination based on epigenetic DNA modifications in human blood samples [20,24,29,30]. Since the gender distribution is relatively homogeneous across the different age-classes of our sample set (Figure 1), we next confirmed that the distribution of the DNA methylation levels measured was not significantly different between females and males for each of the 5 CpG sites (p-values between 0.62 and 0.86, Figure 2B). In conclusion, we established a training set of 136 blood samples, obtained from 66 females, 66 males and 4 donors of undetermined gender, that can be used to model the French population to assess age-predictive statistical methods (Figure 2).
Figure 2. A. Scatter plots representing the correlation between the chronological age and DNA methylation levels at each of the 5 CpG sites analyzed in the ELOVL2, FHL2, KLF14, C1orf132 and TRIM59 genes, for a training set composed of 136 blood samples from individuals aged from 0 to 101 years. The lines defining the trend curves and the coefficient of determination (R2) values are indicated in each graph. B. Box plot representing the distribution of DNA methylation levels for each CpG site in males (blue) and females (yellow) from the training set. The edges of boxes represent the first and the third quartiles respectively, the line within each box represents the median, and the whisker extends represent maximum and minimum values. For each CpG site, the exact p-values of student’s t-tests are indicated.
Development of an age prediction model for blood samples of the French population
Next, we analyzed the predictive potential of each CpG site based on simple linear regression and polynomial regression models (in Table 1). Although, a strong correlation was observed between DNA methylation-based predicted ages and chronological ages for all of these models (with R2: 0.81-0.91), their predictive capacities proved to be rather limited with MAE values comprised between 6.18 and 9.05 years, and high AIC values indicative of high prediction error rates (in Table 1). To improve the age-predictive capacities for forensic applications, multiple linear regression models including simultaneously the 5 CpG sites as well as several data transformation methods were considered. After comparing multiple modelling methods, an optimized Age Prediction Model (APM) was defined with the following formula:
CpG Associated Genes | R2 | MAE (years) | RMSE | AIC | Correct Predictions ± 5 years | Age Predictive Models |
---|---|---|---|---|---|---|
ELOVL2 | 0.91 | 6.18 | 8.24 | 1008 | 54 % | Simple linear regression |
FHL2 | 0.85 | 7.73 | 10.57 | 1083 | 46 % | 3d order polynomial regression |
KLF14 | 0.81 | 9.05 | 12.01 | 1127 | 36 % | 3d order polynomial regression |
C1orf132 | 0.87 | 8.15 | 10.06 | 1082 | 40 % | 2d order polynomial regression |
TRIM59 | 0.86 | 7.71 | 10.36 | 1082 | 43 % | 2d order polynomial regression |
Predicted age (years) = 55.2403 + 66.0422 × (% ELOVL2 CpG methylation) + 29.1731 × ‘% FHL2 CpG methylation) + 16.4241 × log(% KLF14 CpG methylation) + 1.6526 × log2(% KLF14 CpG methylation) - 25.8812 × (% C1orf132 methylation) + 18.9406 × (% TRIM59 CpG methylation) (Table 1).
Using this optimized APM, the biological age of the 136 donors was predicted and compared to their declared chronological age (Figure 3). This APM based on 5 CpGs explained 97% of the total variance observed in the training set (R2=0.97), and its performances were defined with the following metrics: a MAE: ± 3.45 years, a RMSE: ± 4.79 and an AIC of 828. To evaluate the accuracy of the APM predictions, 1000 partial subsets of 43 individuals were randomly generated from the training set. This approach revealed that the model could predict the age of an individual ± 5 years with an accuracy of 75% (Table 2), strictly based on the DNA methylation levels of 5 CpG sites (Table 2 and Figure 3).
R2 | MAE (years) | RMSE | AIC | Correct predictions ± 5 years | |
---|---|---|---|---|---|
Training set | 0.97 | 3.45 | 4.79 | 828 | 75 % |
Validation set | 0.88 | 4.49 | 7.17 | NA | 62 % |
Combined | 0.96 | 3.64 | 5.26 | NA | 73% |
Figure 3. Scatter plot representing the correlation between the chronological age and the age predicted by a multiple regression model (APM) trained on the methylation levels detected at ELOVL2, FHL2, KLF14, C1orf132, and TRIM59 CpG sites for blood samples from new born to 101 years old individuals (n=136). Females, males and undefined individuals are represented by yellow, blue and grey squares, respectively. The line and R2 value indicate the trend curve and the coefficient of determination, respectively.
Validation of the age prediction model for blood samples
To validate the established APM, an independent validation set composed of the remaining 26 blood samples was analyzed with the SNaPshot approach, similarly to the training sample set. Strong correlations were once again observed between the DNA methylation levels for each of the 5 CpG sites and the chronological age of donors (Figure 4A). Moreover, when applied to the validation set, the APM confirmed a highly significant correlation between the predicted biological and chronological ages with a coefficient of determination of 0.88 (Figure 4B), a MAE of ± 4.49 years and an RMSE of ± 7.17 (Table 2). When combining the training and validation dataset, the APM demonstrated improved accuracy of age prediction (R2: 0.96, MAE: ± 3.64 and RMSE: ± 5.26) compared to the validation dataset alone (Table 2 and Figure 4).
Figure 4. A. Scatter plots representing the correlation between the chronological age and DNA methylation levels at each of the 5 CpG sites analyzed in the ELOVL2, FHL2, KLF14, C1orf132 and TRIM59 genes, for a validation set composed of blood samples from individuals aged from 2 to 88 years (n=26). The lines defining the trend curves and the coefficient of determination (R2) values are indicated in each graph. B. Scatter plot representing the correlation between chronological and predicted biological ages inferred from the age prediction model (APM) for the validation set of blood samples from French individuals (n=26). The line indicates the expected theoretical age for each chronological age.
When analyzing the distribution of the model prediction errors between the chronological and predicted biological ages for the entire 136 individuals of the training set, we noticed that the error range increased along with the age of the donors (Figure 5A). Similar conclusions were described in previously reported studies on blood samples [29,30,35]. Indeed, the median age prediction error was particularly increased for individuals older than 60 years old (Figure 5B). While the APM predictions were quite accurate for the youngest individuals of the training set (0 to 29 years old), we observed a significantly higher dispersion of predicted age error values for volunteers older than 40 years old (Figure 5).
Figure 5. A. Biological age prediction errors versus chronological ages represented by individuals for the training set samples (n=136). B. Boxplots represent the distribution of absolute errors in each 15-year age-class. The edges of the box represent the first and the third quartiles respectively, the line within each box represents the median, and the whisker extends represent maximum and minimum values.
The SNaPshot approach has already been described in the past decade to study the relationship between DNA epigenetic modifications (5mC methylation) and chronological ages in blood samples of Korean [24], Portuguese [30], Polish [20], and Italian [29] and Turkish [31] populations, and to propose age-prediction models. Differences of DNA methylation profiles have been reported for specific CpG sites based on the ancestry or biogeographic origin of individuals between Japanese and Germans populations [36] and between Middle East and Central European populations [37]. Consistently, several studies have investigated the potential impact of biogeographic ancestry on DNA methylation based age predictions [36-39]. Altogether, these studies advocate for the development of age-prediction models adapted to the population of interest and to the methodology used. Therefore, in this study, we applied for the first time the SNaPshot approach to study the relationship between DNA methylation and chronological ages in the French population, in the context of a forensic laboratory. Previous studies investigating the importance of DNA methylation changes for age prediction from blood samples relied on 100 ng [31], 40-200 ng [24], 200-400 ng [30], 400 ng [29] and 2 μg [20] of input DNA for bisulfite conversion. However, forensic investigations often depend on limited or even rare biological material collected from crime scenes, therefore impacting the number and type of molecular and genetic analyses that can be performed in comparisons with studies conducted in the frame of fundamental research laboratories. Along this line, our objective was to perform bisulfite conversion coupled to SNaPshot assays from scarce DNA input samples. Our results demonstrate that this methodology can be applied with high reproductibility for as little as 20 ng of genomic DNA, a quantity we decided to use as a standard input for this study to cope with limitations of forensic studies. Interestingly, the results obtained for DNA methylation detection and age prediction accuracy are quite comparable with previous studies performed from significantly larger DNA input samples, therefore suggesting that this limit may actually be surpassed in the future. Accordingly, our data also strongly suggest these levels could be lowered to 5 ng of input DNA (Suppl. Figure 2B). Additional experiments should be performed on a wider sample cohort and by increasing the number of technical replicates per sample to confirm this hypothesis and therefore increase the scope of criminal investigations that could benefit from DNA methylation based age-prediction analyses in the future in France.
Similarly to published studies [20,24,29-31,40], we observed consistent modifications of DNA methylation levels of 5 previously described CpG sites in blood samples depending on the chronological ages of donors. Indeed, ELOVL2, FHL2, KFL14 and TRIM59 CpG sites tend to become more methylated as age increases while the methylation levels of the C1orf132 site were inversely correlated to age [20,24,29,30,40]. In addition, as previously described in blood samples obtained from different Asian or European populations, predicted biological age based on DNA methylation levels and chronological age tend to display higher correlations for the youngest individuals and increased error rates for individuals older than 45 [20,30,35].
When applied to the validation sample set, the APM displayed significant yet lower is correlations (R2 = 0.88 vs. 0.97) between the biological and predicted ages, increased MAE (4.49 vs. 3.45 years) and RMSE (7.17 vs. 5.26) values and reduced 5-year prediction accuracy compared to the training set (62% vs. 75%) (Table 2). This observation is in line with the conclusions related by numerous studies using different APM and sample sets [20,24,29-31,40]. Yet, it likely also results from a more restricted number of samples in the validation set (n= 26) compared to the training set (n= 136). Increasing the number of blood samples analyzed to validate APM should therefore be considered in future studies.
A current limitation to age prediction from DNA methylation analysis remains the representativeness of the training populations for mathematical models, including the age and gender distributions, the total number of samples analyzed, as well as anonymized information relative to the health status of each person implicated in these studies. With these parameters taken into consideration, we established a training cohort that included 136 participants and covering a large age-distribution from new born to 101 years old, with a rather similar representativeness of the different age classes. In comparison, except for the Dias HC, et al. report (59 individuals aged 1-94) [30], previous studies focused on more limited age distributions, including 18-65 years [29], 18-74 years [24], 20-83 [31] or 2-75 years [20]. While our study covers a broad age distribution with equal gender representation, one limitation remains the total number of individuals analyzed to train and validate the age prediction model (APM). Increased numbers of DNA methylation measurements should be considered in future studies to support more robust biological age prediction models, both by increasing the number of participants and by increasing the number of CpG sites analyzed [2,22]. Additional studies are also needed to establish whether a single panel of age-related CpG sites should be considered for these models, or whether two or more panels should be used to predict the biological age from different age classes. Different CpG loci in the vicinity of genes of interest can undergo multiple epigenetic modifications: e.g. 7, 10 and 4 distinct cytosines have been shown to be methylated in the vicinity of ELOVL2, FHL2 and KLF14 genes, respectively. Taking this complexity and heterogeneity into consideration should also help building more reliable age-predictive models [2,22,28,41]. In addition, different types of models such as quantile regression or constitutional neural networks should be considered in the future to analyze the relationship between DNA methylation and age, as a non-linear relationship [35,40-42].
In conclusion, we established a methodology to predict the biological age of individuals from the French population based on a DNA methylation analysis method compatible with forensic laboratory routines (MAE= 3.64 years with a 5-years prediction rate of 73%). Although criminal investigations by the French police services could already benefit from this biological age prediction model, we anticipate that this reliable methodology should be further improved in a near future prior to adapting it to the French police forensic laboratories. Indeed, defining additional phenotypic features describing unidentified individuals, such as age prediction, remains an interesting perspective and should rely on additional valuable age-associated CpG sites, on more robust statistical models coping for the intrinsic variability of DNA methylation measurements within a complex biological population, or on age-category specific APM to provide the most accurate predictions to support strong investigation leads and help reveal the truth in a near future.
The authors thank Emmanuelle Sciacca, Magali Faivre, Joanna Fombonne and Emilie Lessoud for their valuable comments on this project, and Isabelle Vignon from the Novelab Ingels Vignon laboratory (Société NOVELAB S.E.L.A.S., 69400 Villefranche sur Saône, France) for coordinating the collection of blood samples from donors.
This research was funded by the Service National de Police Scientifique of the French National Police.
The authors declare no conflicts of interest.
Journal of Forensic Research received 1817 citations as per Google Scholar report