{"id":72120,"date":"2025-08-10T10:27:18","date_gmt":"2025-08-10T10:27:18","guid":{"rendered":"https:\/\/www.newsbeep.com\/us\/72120\/"},"modified":"2025-08-10T10:27:18","modified_gmt":"2025-08-10T10:27:18","slug":"integrated-phenotypic-analysis-predictive-modeling-and-identification-of-novel-trait-associated-loci-in-a-diverse-theobroma-cacao-collection-bmc-plant-biology","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/us\/72120\/","title":{"rendered":"Integrated phenotypic analysis, predictive modeling, and identification of novel trait-associated loci in a diverse Theobroma cacao collection | BMC Plant Biology"},"content":{"rendered":"<p>Phenotypic diversity and trait interrelationships in the TARS collection<\/p>\n<p>To further explore the structure within the phenotypic data and the relationships among traits for the 173 accessions evaluated in TARS, PCA and Hierarchical Clustering were performed using all measured horticultural traits. The PCA revealed that the first principal component (PC1) accounted for 47.2% of the total phenotypic variance, while the second principal component (PC2) explained an additional 18.8%, collectively representing 66.0% of the variation. Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#Fig1\" rel=\"nofollow noopener\" target=\"_blank\">1<\/a> displays the distribution of all 173 individual accessions (black dots) in the PCA space based on their phenotypic profiles. The red labels represent the K\u2009=\u20097 genetic structure groups defined by Bekele et al. [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 22\" title=\"Bekele FL, Bidaisee GG, Allegre M, Argout X, Fouet O, Boccara M, et al. Genome-wide association studies and genomic selection assays made in a large sample of cacao (Theobroma cacao L.) germplasm reveal significant marker-trait associations and good predictive value for improving yield potential. PLoS One. 2022;17:e0260907.\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#ref-CR22\" id=\"ref-link-section-d211955441e862\" rel=\"nofollow noopener\" target=\"_blank\">22<\/a>] and are positioned at the mean coordinates of the 28 common accessions belonging to each respective group. The mean points for the accessions show some differentiation. For instance, the means for the \u201cNA, Pound\u201d and \u201cPA\u201d Bekele groups are positioned in the lower-left quadrant, while the mean for \u201cAMAZ, IMC\u201d is in the upper-right quadrant, \u201cLCT, EEN, SCA, MO\u201d is in the upper-left quadrant, and \u201cSPEC,\u201d and \u201cCRIOLLO\u201d are in the lower-right quadrant. This visualization helps to position these genetically defined subsets within the broader phenotypic landscape of the entire TARS collection.<\/p>\n<p>The contributions of individual traits to these principal components are illustrated in the loading plot (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">S1<\/a>). Traits such as pod width, pod length, pod weight, seed water content, and fresh seed weight loaded heavily on PC1, indicating their primary role in driving the main axis of variation. Both PC1 and PC2 influenced dry seed weight, pod index, and total pods, though their directional influence varied. PC2 predominantly influenced Yield (kg\/tree\/year), number of seeds, and infection rate. The number of infected pods showed a minimal impact, primarily along PC2.<\/p>\n<p>Hierarchical clustering using Ward\u2019s method, based on all phenotypic traits, grouped the 173 accessions into 13 distinct clusters (Fig. S2). This clustering reflects the overall phenotypic similarity among accessions. A corresponding hierarchical clustering of the traits themselves revealed their interrelationships across the dataset (Fig. S3). Notably, dry seed weight clustered closely with pod length. Fresh seed weight formed a cluster with seed water content, which was adjacent to a group containing pod weight and pod width. The number of infected pods and infection rate clustered together, and this pair was closely associated with pod index. Finally, total pods and Yield (kg\/tree\/year) formed a distinct group, which was linked to the number of seeds. These multivariate analyses provide insights into the underlying structure of phenotypic diversity and the correlational patterns among key horticultural traits in this cacao collection.<\/p>\n<p>Fig. 1<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y\/figures\/1\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig1\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/08\/12870_2025_7128_Fig1_HTML.png\" alt=\"figure 1\" loading=\"lazy\" width=\"685\" height=\"409\"\/><\/a><\/p>\n<p>Principal component analysis of 173 cacao accessions based on all measured phenotypic traits. The plot displays the distribution of individual accessions along the first two principal components (PC1 and PC2). The 28 accessions with known genetic group assignments from Bekele et al. [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 22\" title=\"Bekele FL, Bidaisee GG, Allegre M, Argout X, Fouet O, Boccara M, et al. Genome-wide association studies and genomic selection assays made in a large sample of cacao (Theobroma cacao L.) germplasm reveal significant marker-trait associations and good predictive value for improving yield potential. PLoS One. 2022;17:e0260907.\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#ref-CR22\" id=\"ref-link-section-d211955441e886\" rel=\"nofollow noopener\" target=\"_blank\">22<\/a>], are color-coded according to their respective group: AMAZ, IMC (red); CRIOLLO (orange); LCT_EEN, SCA, MO (yellow); NA, Pound (green); PA (blue); and SPEC (purple). The remaining 145 accessions are shown as black dots. Red vectors indicate the mean coordinates for each genetic group<\/p>\n<p>To understand the interrelationships among the horticultural traits measured in Mayaguez, PR, 2007\u20132011, and to explore phenotypic consistencies with other studies, several correlation analyses were performed: Pearson\u2019s correlation analysis among all traits measured in the TARS collection revealed numerous significant associations, visualized as a heatmap and scatter plots in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#Fig2\" rel=\"nofollow noopener\" target=\"_blank\">2<\/a> (detailed p-values and correlation coefficients are provided in Supplementary Data 1). Notably, a strong positive correlation was observed between \u2018Yield\u2019 and \u2018Total pods\u2019 (r\u2009=\u20090.83, p\u2009&lt;\u20090.0001). \u2018Yield\u2019 also showed a significant negative correlation with \u2018Infection rate\u2019 (r\u2009=\u2009\u2212\u20090.26, p\u2009=\u20090.0006). The \u2018Number of infected pods\u2019 was strongly positively correlated with \u2018Infection rate\u2019 (r\u2009=\u20090.70, p\u2009&lt;\u20090.0001). Strong positive correlations (often r\u2009&gt;\u20090.70, p\u2009&lt;\u20090.0001) were prevalent among pod size-related traits, including \u2018Pod length\u2019, \u2018Pod weight\u2019, and \u2018Pod width\u2019. Other notable strong correlations included \u2018Dry seed weight\u2019 with \u2018Pod index\u2019 (r\u2009=\u2009\u2212\u20090.95, p\u2009&lt;\u20090.0001), \u2018Fresh seed weight\u2019 with \u2018Pod weight\u2019 (r\u2009=\u20090.99, p\u2009&lt;\u20090.0001), and \u2018Seed water content\u2019 with \u2018Pod weight\u2019 (r\u2009=\u20090.97, p\u2009&lt;\u20090.0001), suggesting that seed water content is a major contributor to pod weight. \u2018Number of seeds\u2019 per pod was positively correlated with the total \u2018Dry seed weight\u2019 per pod (r\u2009=\u20090.45, p\u2009&lt;\u20090.0001); consequently, a higher number of seeds was also strongly and negatively correlated with \u2018Pod index\u2019 (r\u2009=\u2009\u2212\u20090.50, p\u2009&lt;\u20090.0001), as fewer of these productive pods were required to yield one kilogram of dry beans. \u2018Pod weight\u2019 showed a significant negative correlation with the \u2018Number of infected pods\u2019 (r\u2009=\u2009\u2212\u20090.23, p\u2009=\u20090.0027). \u2018Total pods\u2019 exhibited significant negative correlations with \u2018Dry seed weight\u2019 (r\u2009=\u2009\u2212\u20090.32, p\u2009&lt;\u20090.0001), \u2018Seed water content\u2019 (r\u2009=\u2009\u2212\u20090.46, p\u2009&lt;\u20090.0001), \u2018Pod weight\u2019 (r\u2009=\u2009\u2212\u20090.48, p\u2009&lt;\u20090.0001), and \u2018Infection rate\u2019 (r\u2009=\u2009\u2212\u20090.32, p\u2009&lt;\u20090.0001), the latter suggesting that higher pod production be associated with a lower overall infection rate.<\/p>\n<p>Fig. 2<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y\/figures\/2\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig2\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/08\/12870_2025_7128_Fig2_HTML.png\" alt=\"figure 2\" loading=\"lazy\" width=\"685\" height=\"610\"\/><\/a><\/p>\n<p>Pearson\u2019s correlation matrix of horticultural traits from TARS cacao phenotype data. The upper triangle displays a heatmap of Pearson correlation coefficients (r), where red indicates positive correlations and blue indicates negative correlations. The lower triangle shows scatter plots for each pair of traits, with a fitted regression line. Trait names are indicated on the diagonal. Detailed correlation coefficients and p-values are provided in Supplementary Data 1<\/p>\n<p>Comparative analyses of TARS traits with ICGT and Agrosavia collections<\/p>\n<p>To explore potential associations between genetic structure and horticultural traits in a subset of the cacao collection [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 23\" title=\"Irish BM, Goenaga R, Zhang D, Schnell R, Brown JS, Motamayor JC. Microsatellite fingerprinting of the USDA-ARS tropical agriculture research station Cacao (Theobroma Cacao L.) germplasm collection. Crop Sci. 2010;50:656\u201367.\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#ref-CR23\" id=\"ref-link-section-d211955441e1030\" rel=\"nofollow noopener\" target=\"_blank\">23<\/a>], Pearson\u2019s correlation coefficients were calculated. This analysis involved 28 accessions common to TARS study and the Bekele et al. [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 22\" title=\"Bekele FL, Bidaisee GG, Allegre M, Argout X, Fouet O, Boccara M, et al. Genome-wide association studies and genomic selection assays made in a large sample of cacao (Theobroma cacao L.) germplasm reveal significant marker-trait associations and good predictive value for improving yield potential. PLoS One. 2022;17:e0260907.\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#ref-CR22\" id=\"ref-link-section-d211955441e1033\" rel=\"nofollow noopener\" target=\"_blank\">22<\/a>] investigation. Membership coefficients for these accessions to the K\u2009=\u20097 genetic clusters defined by Bekele et al. [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 22\" title=\"Bekele FL, Bidaisee GG, Allegre M, Argout X, Fouet O, Boccara M, et al. Genome-wide association studies and genomic selection assays made in a large sample of cacao (Theobroma cacao L.) germplasm reveal significant marker-trait associations and good predictive value for improving yield potential. PLoS One. 2022;17:e0260907.\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#ref-CR22\" id=\"ref-link-section-d211955441e1036\" rel=\"nofollow noopener\" target=\"_blank\">22<\/a>] were correlated with their phenotypic trait values from the TARS evaluation. Several statistically significant correlations were identified (Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#Fig3\" rel=\"nofollow noopener\" target=\"_blank\">3<\/a>, detailed in Supplementary Data 1). Notably, a higher membership coefficient for the \u2018AMAZ, IMC\u2019 genetic cluster was positively correlated with both the Dry seed weight (r\u2009=\u20090.39, p\u2009=\u20090.041), Number of Seeds per pod (r\u2009=\u20090.59, p\u2009=\u20090.001) and overall Yield (kg\/tree\/year) (r\u2009=\u20090.50, p\u2009=\u20090.0065), while being negatively correlated with Pod Index (r\u2009=\u2009\u2212\u20090.40, p\u2009=\u20090.0344). Membership in the \u2018LCT, EEN, SCA, MO\u2019 cluster showed a positive correlation with Total Pods per tree (r\u2009=\u20090.48, p\u2009=\u20090.0097), whereas membership in the \u2018SPEC\u2019 cluster was negatively correlated with this trait (r\u2009=\u2009\u2212\u20090.48, p\u2009=\u20090.0101). The \u2018CRIOLLO\u2019 genetic group membership showed positive correlations with Fresh seed weight (r\u2009=\u20090.38, p\u2009=\u20090.0454), seed water content (r\u2009=\u20090.38, p\u2009=\u20090.0475), Pod Length (r\u2009=\u20090.43, p\u2009=\u20090.0231) and Infection Rate (r\u2009=\u20090.50, p\u2009=\u20090.0074), and negatively correlated with total pods (r\u2009=\u2009\u2212\u20090.41, p\u2009=\u20090.0318) Stronger affiliation with the \u2018GU\u2019 (Guiana) group was associated with fewer Infected Pods (r\u2009=\u2009\u2212\u20090.40, p\u2009=\u20090.034). This correlation was possible because the analysis used continuous membership coefficients; however, the \u2018GU\u2019 group is not shown in Figs.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#Fig2\" rel=\"nofollow noopener\" target=\"_blank\">2<\/a> and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#Fig3\" rel=\"nofollow noopener\" target=\"_blank\">3<\/a>, as no accession had \u2018GU\u2019 as its highest assigned ancestry group. Interestingly, membership in the \u2018PA\u2019 (Parinari) group was strongly correlated with a higher (less efficient) Pod Index (r\u2009=\u20090.61, p\u2009=\u20090.0005) and negatively correlated with Dry Seed Weight (r\u2009=\u2009\u2212\u20090.48, p\u2009=\u20090.0098). These correlations suggest linkages between the genetic background and the expression of these specific horticultural traits within this subset of 28 accessions evaluated in Puerto Rico.<\/p>\n<p>Fig. 3<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y\/figures\/3\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig3\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/08\/12870_2025_7128_Fig3_HTML.png\" alt=\"figure 3\" loading=\"lazy\" width=\"685\" height=\"413\"\/><\/a><\/p>\n<p>Correlation heatmap of genetic cluster membership and phenotypic traits. The heatmap displays Pearson\u2019s correlation coefficients (r) between membership coefficients for 28 common accessions to the K\u2009=\u20097 genetic clusters defined by Bekele et al. [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 22\" title=\"Bekele FL, Bidaisee GG, Allegre M, Argout X, Fouet O, Boccara M, et al. Genome-wide association studies and genomic selection assays made in a large sample of cacao (Theobroma cacao L.) germplasm reveal significant marker-trait associations and good predictive value for improving yield potential. PLoS One. 2022;17:e0260907.\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#ref-CR22\" id=\"ref-link-section-d211955441e1152\" rel=\"nofollow noopener\" target=\"_blank\">22<\/a>] and the corresponding phenotypic traits evaluated in Puerto Rico. Red squares indicate positive correlations and blue squares indicate negative correlations, with color intensity corresponding to the magnitude of the correlation coefficient<\/p>\n<p>A summary of the ANOVA results and post-hoc pairwise comparisons (p\u2009&lt;\u20090.05) for traits showing statistically significant differences among genetic groups is provided in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#Fig4\" rel=\"nofollow noopener\" target=\"_blank\">4<\/a>. For Dry Seed Weight, the ANOVA revealed a statistically significant difference among the genetic structure groups (F\u2009=\u20093.7625, p\u2009=\u20090.0130). Post-hoc comparisons indicated that the \u2018AMAZ, IMC\u2019 group had a significantly higher mean dry seed weight (50.10\u00a0g) than the \u2018PA\u2019 (30.64\u00a0g) and \u2018LCT, EEN, SCA, MO\u2019 (35.83\u00a0g) groups. For Pod Index, a significant difference was also found among the groups (F\u2009=\u20095.3806, p\u2009=\u20090.0022). The \u2018PA\u2019 group had a significantly higher (less efficient) mean pod index (33.12) compared to all other groups except \u2018LCT, EEN, SCA, MO\u2019. Regarding yield components, the mean number of Total Pods per tree differed significantly among the genetic groups (F\u2009=\u20092.9832, p\u2009=\u20090.0333). The \u2018LCT, EEN, SCA, MO\u2019 group had a significantly higher mean number of total pods (30.43) than the \u2018CRIOLLO\u2019 (15.25), \u2018SPEC\u2019 (15.08), and \u2018NA, POUND\u2019 (18.05) groups. Finally, Yield (kg\/tree\/year) also showed a statistically significant difference among the groups (F\u2009=\u20094.7307, p\u2009=\u20090.0044). The \u2018AMAZ, IMC\u2019 group exhibited the highest mean yield (1.258 kg\/tree\/year), which was significantly greater than the \u2018PA\u2019 (0.626 kg\/tree\/year) and \u2018CRIOLLO\u2019 (0.630\u00a0kg\/tree\/year) groups. Other horticultural traits evaluated did not show statistically significant variation across these genetic groups for this subset of 28 accessions and are therefore not depicted in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#Fig4\" rel=\"nofollow noopener\" target=\"_blank\">4<\/a> (detailed statistics in Supplementary Data 1). These characterizations provide insights into the phenotypic diversity associated with the genetic structure within this subset of the cacao collection.<\/p>\n<p>Fig. 4<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y\/figures\/4\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig4\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/08\/12870_2025_7128_Fig4_HTML.png\" alt=\"figure 4\" loading=\"lazy\" width=\"685\" height=\"635\"\/><\/a><\/p>\n<p>Comparison of key horticultural traits among 28 cacao accessions grouped by genetic clusters. Boxplots illustrate the distribution of Dry Seed weight (g), Pod index, Total pods (count), and Yield (kg\/tree\/year) for accessions assigned to the K\u2009=\u20097 genetic clusters. Other horticultural traits evaluated did not show statistically significant differences among these genetic groups in this subset of accessions. Boxes represent the interquartile range (IQR), the horizontal line within the box indicates the median, and whiskers extend to 1.5 times the IQR. Different letters above the boxes indicate statistically significant differences (p\u2009&lt;\u20090.05) between mean values for the genetic groups, based on ANOVA and post-hoc Student\u2019s t-tests<\/p>\n<p>Spearman\u2019s rank correlation was used to compare traits from the TARS study with those reported by Bekele et al. [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 22\" title=\"Bekele FL, Bidaisee GG, Allegre M, Argout X, Fouet O, Boccara M, et al. Genome-wide association studies and genomic selection assays made in a large sample of cacao (Theobroma cacao L.) germplasm reveal significant marker-trait associations and good predictive value for improving yield potential. PLoS One. 2022;17:e0260907.\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#ref-CR22\" id=\"ref-link-section-d211955441e1218\" rel=\"nofollow noopener\" target=\"_blank\">22<\/a>] for 27 overlapping cacao accessions evaluated at the ICGT. This analysis was chosen due to the inclusion of ranked and continuous data in the ICGT dataset. The correlation matrix is presented in Fig. S4 and Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#Tab1\" rel=\"nofollow noopener\" target=\"_blank\">1<\/a>, with full details available in Supplementary Data 1.<\/p>\n<p>Evidence for trait stability across the two locations was observed. A significant positive correlation was found between \u2018Number of seeds (TARS)\u2019 and \u2018Seed, number (ICGT)\u2019 (\u03c1\u2009=\u20090.50, p\u2009=\u20090.0082), suggesting a degree of genetic control over this yield component. Similarly, \u2018Dry weight (TARS)\u2019 was strongly correlated with \u2018Wet bean mass (ICGT)\u2019 (\u03c1\u2009=\u20090.56, p\u2009=\u20090.0026), indicating consistency in the overall seed mass potential of these accessions.<\/p>\n<p>The analysis also confirmed fundamental biological trade-offs. \u2018Pod index (TARS)\u2019, an inverse measure of efficiency, was strongly and negatively correlated with \u2018Wet bean mass (ICGT)\u2019 (\u03c1= \u22120.55, p\u2009=\u20090.0027). Conversely, \u2018Total pods (TARS)\u2019 showed a negative correlation with \u2018Cotyledon mass (ICGT)\u2019 (\u03c1= \u22120.48, p\u2009=\u20090.0118), demonstrating the classic inverse relationship between pod number and individual seed size.<\/p>\n<p>Among the strongest relationships identified, \u2018Pod length (TARS)\u2019 was highly correlated with \u2018Wet bean mass (ICGT)\u2019 (\u03c1\u2009=\u20090.64, p\u2009=\u20090.0004), suggesting that the potential for longer pods is a strong indicator of the potential for heavier beans. Furthermore, an intriguing positive correlation was found between \u2018Infection rate (TARS)\u2019 and \u2018Cotyledon mass (ICGT)\u2019 (\u03c1\u2009=\u20090.44, p\u2009=\u20090.0205).<\/p>\n<p>Table 1 Significant spearman\u2019s rank correlations between phenotypic traits from the Puerto Rico (TARS) and international cocoa Genebank, Trinidad (ICGT) evaluations. The table lists significant (p\u2009&lt;\u20090.05) spearman\u2019s rank correlation coefficients (\u03c1) comparing traits measured in the 27 overlapping accessions from the TARS evaluation against traits from the ICGT dataset [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 22\" title=\"Bekele FL, Bidaisee GG, Allegre M, Argout X, Fouet O, Boccara M, et al. Genome-wide association studies and genomic selection assays made in a large sample of cacao (Theobroma cacao L.) germplasm reveal significant marker-trait associations and good predictive value for improving yield potential. PLoS One. 2022;17:e0260907.\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#ref-CR22\" id=\"ref-link-section-d211955441e1295\" rel=\"nofollow noopener\" target=\"_blank\">22<\/a>]. The corresponding p-value for each correlation is provided<\/p>\n<p>Pearson\u2019s correlation analysis was conducted between numerical traits from the TARS study and those reported by Osorio-Guar\u00edn et al. [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\" title=\"Osorio-Guar\u00edn JA, Berdugo-Cely JA, Coronado-Silva RA, Baez E, Jaimes Y, Yockteng R. Genome-wide association study reveals novel candidate genes associated with productivity and disease resistance to moniliophthora spp. In Cacao (Theobroma Cacao L.). G3 genes genomes genet. 2020;10:1713\u201325.\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#ref-CR7\" id=\"ref-link-section-d211955441e2391\" rel=\"nofollow noopener\" target=\"_blank\">7<\/a>] for 20 overlapping accessions evaluated in Colombia. The correlation matrix is shown in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#Fig5\" rel=\"nofollow noopener\" target=\"_blank\">5<\/a>, with TARS traits highlighted in green (full details in Supplementary Data 1). Notably, \u2018Infection rate (TARS)\u2019, primarily reflecting black pod rot incidence caused by Phytophthora spp. in Puerto Rico, showed strong positive correlations with disease metrics from the Agrosavia study. Specifically, it was correlated with \u2018AUDPC flower cushion broom infected by WBD\u2019 (r\u2009=\u20090.86, p\u2009&lt;\u20090.0001) and \u2018AUDPC deformed branches infected by WBD\u2019 (r\u2009=\u20090.72, p\u2009=\u20090.0016). This result indicates that accessions susceptible to black pod rot in Puerto Rico were also susceptible to Witches\u2019 Broom Disease (WBD) in Colombia. Other significant correlations included \u2018Fresh seed weight (TARS)\u2019 with \u2018AUDPC flower cushion broom infected by WBD\u2019 (r\u2009=\u20090.56, p\u2009=\u20090.025), \u2018Total number of pods (TARS)\u2019 with \u2018AUDPC pods infected by WBD\u2019 (r\u2009=\u20090.50, p\u2009=\u20090.024), and \u2018Pod weight (TARS)\u2019 with \u2018AUDPC flower cushion broom infected by WBD\u2019 (r\u2009=\u20090.56, p\u2009=\u20090.024).<\/p>\n<p>Fig. 5<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y\/figures\/5\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig5\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/08\/12870_2025_7128_Fig5_HTML.png\" alt=\"figure 5\" loading=\"lazy\" width=\"685\" height=\"596\"\/><\/a><\/p>\n<p>Comparative analysis using Pearson\u2019s correlations of traits between the TARS evaluation dataset and the Agrosavia collection dataset for 20 overlapping cacao accessions. Traits from the TARS study are highlighted with a green background on the axes. Pearson correlation coefficients (r) are visualized by color intensity (red for positive, blue for negative). Detailed correlation coefficients and p-values are provided in Supplementary Data 1<\/p>\n<p>Machine learning models predict Cacao yield from TARS phenotypic traits<\/p>\n<p>To assess the utility of phenotypic traits for predicting cacao yield in the PR study, a model screening approach employing various ML algorithms was conducted using JMP Pro 17 with 5-fold cross-validation. The predictive performance and key contributing traits were evaluated in an iterative manner.<\/p>\n<p>Initially, when all other measured phenotypic traits were used as predictors for \u2018Yield\u2019, a Neural Boosted model demonstrated superior performance, achieving an average R2 of 0.9985 across the 5 validation folds. Other models, such as Fit Stepwise (average R2\u2009=\u20090.9571), Generalized Regression Lasso (average R2\u2009=\u20090.9539), Boosted Tree (average R2\u2009=\u20090.9332), and Fit Least Squares (average R2\u2009=\u20090.9234), also performed well, while Bootstrap Forest, SVM-RBF, Decision Tree, and KNN models had average R2 values below 0.90. The best individual validation model from this iteration was a Neural Boosted model (Fold 1, R2\u2009=\u20090.9994). For this model, \u2018Total pods\u2019 was the most influential predictor, with a total effect of 0.763, followed by \u2018Pod index\u2019 (total effect\u2009=\u20090.213) and \u2018Dry weight\u2019 (total effect\u2009=\u20090.054). Other traits had total effects of less than 0.002 (details in Supplementary Data 1). This near-perfect R-squared value is expected, as several key predictors (e.g., \u2018Total pods\u2019, \u2018Pod index\u2019) are mathematical components of the target variable \u2018Yield\u2019. The value of this analysis, therefore, lies not in prediction per se, but in confirming and ranking the hierarchy of these components through our iterative approach.<\/p>\n<p>In a second iteration, \u2018Total pods\u2019 was removed from the set of predictor variables. The Neural Boosted algorithm remained the top-performing model, with an average validation R-squared of 0.9935 across 5 folds. The next best models were SVM-RBF (average R2\u2009=\u20090.76) and Generalized Regression Lasso (average R2\u2009=\u20090.7371). The best individual validation model in this iteration was a Neural Boosted model (Fold 3, R2\u2009=\u20090.999). The primary predictor identified by this model was \u2018Infection rate\u2019, with a total effect of 0.722. Other important predictors included \u2018Fresh seed weight\u2019 (total effect\u2009=\u20090.469), \u2018Infected pods\u2019 (total effect\u2009=\u20090.274), \u2018Seed water content\u2019 (total effect\u2009=\u20090.259), and \u2018Pod weight\u2019 (total effect\u2009=\u20090.224) (Supplementary Data 1).<\/p>\n<p>Finally, with both \u2018Total pods\u2019 and \u2018Infection rate\u2019 excluded as predictors, the Neural Boosted model still provided the best average predictive performance across 5 folds, though the average R-squared decreased to 0.5214. Bootstrap Forest (average R2\u2009=\u20090.3606) and Fit Stepwise (average R2\u2009=\u20090.3417) were the next best. The best individual validation model was again a Neural Boosted model (Fold 1), achieving an R2 of 0.6884. In this constrained model, \u2018Pod weight\u2019 emerged as the most significant predictor (total effect\u2009=\u20090.543), followed by \u2018Infected pods\u2019 (total effect\u2009=\u20090.217), \u2018Number of seeds\u2019 (total effect\u2009=\u20090.118), \u2018Pod index\u2019 (total effect\u2009=\u20090.091), and \u2018Pod width\u2019 (total effect\u2009=\u20090.09) (Supplementary Data 1). These iterative modeling experiments highlight the strong predictive capacity of phenotypic traits for cacao yield and reveal a hierarchy of trait importance.<\/p>\n<p>Marker-trait association identifies loci on multiple chromosomes linked to key horticultural traits<\/p>\n<p>To identify genetic markers associated with phenotypic variation observed in the accessions evaluated in TARS, SNP data from Bekele et al. [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 22\" title=\"Bekele FL, Bidaisee GG, Allegre M, Argout X, Fouet O, Boccara M, et al. Genome-wide association studies and genomic selection assays made in a large sample of cacao (Theobroma cacao L.) germplasm reveal significant marker-trait associations and good predictive value for improving yield potential. PLoS One. 2022;17:e0260907.\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#ref-CR22\" id=\"ref-link-section-d211955441e2511\" rel=\"nofollow noopener\" target=\"_blank\">22<\/a>] were integrated with the TARS phenotypic dataset for 27 common accessions. Utilizing 671 SNPs, a response screening analysis was conducted in JMP Pro 17. This analysis identified several SNPs significantly associated with key horticultural traits after applying a FDR correction (p\u2009&lt;\u20090.01). For \u2018Total pods\u2019, three significant markers passed the significance threshold (Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#Fig6\" rel=\"nofollow noopener\" target=\"_blank\">6<\/a>a). These included the genetic marker TcSNP475, located within a putative zinc finger stress-associated protein gene (Tc05_t008610); TcSNP428, associated with a magnesium-protoporphyrin cyclase gene (Tc05v2_g011210); and TcSNP154, linked to a xyloglucan endotransglucosylase\/hydrolase protein gene (Tc01v2_g018540). For \u2018Infection rate\u2019, a single marker, TcSNP508, was found to be significant (Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#Fig6\" rel=\"nofollow noopener\" target=\"_blank\">6<\/a>b) and is associated with a cysteine proteinase 15\u2009A gene (Tc08v2_g002970). For \u2018Yield\u2019, two SNPs were significantly associated (Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y#Fig6\" rel=\"nofollow noopener\" target=\"_blank\">6<\/a>c). One was again the genetic locus TcSNP475, reinforcing its link to overall productivity. The other significant marker was TcSNP483, located within a gene for an ATP synthase subunit (Tc03v2_g019660). No other SNPs reached this level of genome-wide significance for the other horticultural traits tested with this FDR threshold. The repeated association of the genetic marker TcSNP475 on chromosome 5 with both \u2018Total pods\u2019 and \u2018Yield\u2019 is particularly noteworthy. This locus is located within the gene Tc05_t008610, annotated as encoding a \u201cZinc finger A20 and AN1 domain-containing stress-associated protein 4\u201d. This finding strongly points to this genomic region as a potential quantitative trait locus (QTL) influencing yield components in this subset of cacao germplasm, with a candidate gene potentially involved in stress response pathways.<\/p>\n<p>Fig. 6<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/bmcplantbiol.biomedcentral.com\/articles\/10.1186\/s12870-025-07128-y\/figures\/6\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig6\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/08\/12870_2025_7128_Fig6_HTML.png\" alt=\"figure 6\" loading=\"lazy\" width=\"685\" height=\"1359\"\/><\/a><\/p>\n<p>Volcano plots illustrating marker associations with key horticultural traits from the TARS cacao collection. Response screening analysis using genetic markers from the ICGT study as predictors for traits in 28 common accessions. In each plot, the x-axis represents the mean difference associated with each marker, the y-axis represents the Logworth score, and each point represents a single marker. The horizontal dashed line indicates the significance threshold (FDR p\u2009&lt;\u20090.01). Points colored red or blue highlight the markers that surpassed this threshold. (a) Volcano plot for the \u2018Total pods\u2019 trait, highlighting three significant markers (the locus TcSNP475, and SNPs TcSNP428 and TcSNP154). (b) Volcano plot for the \u2018Infection rate\u2019 trait, highlighting one significant SNP (TcSNP508). (c) Volcano plot for the \u2018Yield\u2019 trait, highlighting two significant markers (the locus TcSNP475 and SNP TcSNP483)<\/p>\n","protected":false},"excerpt":{"rendered":"Phenotypic diversity and trait interrelationships in the TARS collection To further explore the structure within the phenotypic data&hellip;\n","protected":false},"author":2,"featured_media":72121,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[50],"tags":[2342,200,38349,79,50425],"class_list":{"0":"post-72120","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-genetics","8":"tag-agriculture","9":"tag-genetics","10":"tag-plant-sciences","11":"tag-science","12":"tag-tree-biology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/72120","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/comments?post=72120"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/72120\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media\/72121"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media?parent=72120"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/categories?post=72120"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/tags?post=72120"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}