Decomposition of phenotypic heterogeneity in autism reveals underlying genetic programs

Identifying the structures of autism phenotypes

To best reflect the complexity of presentations across autistic individuals, we identified 239 item-level and composite phenotype features present in 5,392 individuals from the SPARK cohort33, a nationwide effort to collect and track genetic and clinical presentations of autism. Briefly, these features represent responses on standard diagnostic questionnaires (the Social Communication Questionnaire-Lifetime (SCQ)34, Repetitive Behavior Scale-Revised (RBS-R)35, Child Behavior Checklist 6–18 (CBCL)36) and a background history form focused on developmental milestones. These data were analyzed with a general finite mixture model (GFMM) to minimize statistical assumptions while accommodating heterogeneous (continuous, binary and categorical) data types (Methods). The model captures the underlying distributions in the data and provides an inherently person-centered approach, separating individuals into classes rather than fragmenting each individual into separate phenotypic categories (Fig. 1a).

Fig. 1: Overview of study design and description of identified subclasses.

a, Study design for parsing the phenotypic heterogeneity of autism and deciphering the genetic factors contributing to individual presentations. A GFMM was trained on a matrix of probands (n = 5,392 individuals) by phenotype (239 features describing item-level and composite phenotypic measure data). We describe four data-driven classes of autism that exhibit differing phenotypic presentations and trait patterns. These four subclasses were further characterized by external validations and genetic analyses. b, To demonstrate differences in phenotypic patterns, we assessed the propensity of each class toward seven phenotype categories. Values close to 1 indicate that the majority of phenotypes within the category were significantly and positively enriched for the phenotype domain compared to probands in other classes (indicating higher difficulties), and values close to −1 indicate significant negative enrichment or depletion for a given phenotype domain compared to probands in other classes (indicating lower difficulties). Sample sizes for all analyses shown were as follows: Broadly affected, n = 554 (magenta); Social/behavioral, n = 1,976 (green); Mixed ASD with DD, n = 1,002 (blue); Moderate challenges, n = 1,860 (orange); unaffected siblings, n = 1,972. c, Distributions of two key developmental milestones: the age when the individual first walked and the age when they first used words (both in months) across the four classes, with nonautistic siblings as a control (n = 1,972). d, Individual total scores from the SCQ by class, with nonautistic siblings as a control (n = 1,972). Center lines in all box plots represent the median, box limits represent the 25th and 75th percentiles, whiskers extend to show 1.5× the interquartile range, and outliers are shown separately as open circles. One-sided independent t-tests with adjustment for multiple comparisons (Benjamini–Hochberg correction) were used to determine the significance of enrichment for each class compared to siblings, as well as for all possible class–class comparisons (*FDR < 0.1, **FDR < 0.05, ***FDR < 0.01; NS, not significant). Stars directly above the box plot indicate comparisons to siblings, whereas stars accompanied by horizontal bars indicate class–class comparisons. Schematic in a created with BioRender.com.

We selected a GFMM with four latent classes representing four different patterns of phenotype profile by considering six standard model fit statistical measures and the overall interpretability of the model solutions. After training models with two to ten latent classes, we found that four classes presented the best balance of model fit as measured by the Bayesian information criterion (BIC), validation log likelihood and other statistical measures of fit (Extended Data Fig. 1 and Supplementary Table 1). In addition, a four-class solution offered the best interpretability in terms of phenotypic separation (Extended Data Fig. 2), as evaluated by clinical collaborators with extensive experience working with autistic individuals. We also found the four-class model to be highly stable and robust to various perturbations (Extended Data Fig. 3).

As observed clinically, classes differed not only in severity of autism symptoms but also in the degree to which co-occurring cognitive, behavioral and psychiatric concerns factored into their presentation. For clinical interpretability, we assigned each of the 239 phenotype features to one of the following seven categories defined in the literature35,37,38,39: limited social communication, restricted and/or repetitive behavior, attention deficit, disruptive behavior, anxiety and/or mood symptoms, developmental delay (DD) and self-injury (Fig. 1b). We identified one class that demonstrated high scores (greater difficulties) across core autism categories of social communication and restricted and/or repetitive behaviors compared to other autistic children, as well as disruptive behavior, attention deficit and anxiety, but no reports of developmental delays; this class was named Social/behavioral (n = 1,976). A second class, Mixed ASD with DD (n = 1,002), showed a more nuanced presentation, with some features enriched and some depleted among the restricted and/or repetitive behavior, social communication and self-injury categories and overall strong enrichment of developmental delays compared to both nonautistic siblings and individuals in other classes (false discovery rate (FDR) < 0.01; 0.19 < Cohen’s d <0.46; Fig. 1c, Extended Data Fig. 4a and Supplementary Table 2). Individuals in the last two classes scored consistently lower (fewer difficulties) and consistently higher than other autistic children across all seven categories. These two classes were termed Moderate challenges (n = 1,860) and Broadly affected (n = 554). Although individuals in the Moderate challenges class scored below other autistic children across these measured categories, those in all classes still scored significantly higher than nonautistic siblings on the SCQ, the only diagnostic questionnaire with sibling responses, supporting their ASD diagnoses (Fig. 1d). Furthermore, classes displayed significant differences across measures (Supplementary Table 2) and significantly greater between-class variability than within-class variability (Extended Data Fig. 4b), further supporting their phenotypic separation. Additional characteristics of the classes, including sex and age distributions, can be seen in Extended Data Fig. 5.

Clinical attributes and replication of phenotype classes

The characteristics of the four phenotypic classes we identified were consistent with data on diagnoses of co-occurring conditions and parent reports that were external to our modeling and class identification. A medical history questionnaire, with reports on diagnoses of conditions such as attention-deficit hyperactivity disorder (ADHD), obsessive–compulsive disorder, language delays, depression and anxiety, was not included in the GFMM, but we found that enrichment patterns of these diagnoses matched the class-specific phenotypic profiles and further distinguished the classes phenotypically (Fig. 2a and Supplementary Table 3). The Broadly affected class displayed significant enrichment in almost all measured co-occurring conditions, with the Social/behavioral class matching or exceeding the same diagnostic levels for ADHD, anxiety and major depression (Social/behavioral FDR < 0.01, 1.65 < fold enrichment (FE) < 2.36 compared to out-of-class probands; Fig. 2a), reflecting enrichments in phenotypic profiles (Fig. 1b). The Mixed ASD with DD class was highly enriched in language delay, intellectual disability and motor disorders, compared to both siblings (FDR < 0.01, 8.8 < FE < 20.0) and probands in other classes (FDR < 0.01, 1.38 < FE < 2.33), consistent with the high scores of this class in the categories of developmental delay and restricted and/or repetitive behavior, and individuals in this class showed significantly lower levels of ADHD, anxiety and depression, as expected based on their phenotypic profile. The two classes with greater developmental delays, Mixed ASD with DD and Broadly affected, also showed significantly higher reported levels of cognitive impairment (FDR < 0.01, 1.74 < FE < 3.14), lower levels of language ability (FDR < 0.01, 0.51 < FE < 0.78) and much earlier ages at diagnosis (FDR < 0.01, 0.22 < Cohen’s d < 0.98) than the two classes without substantial developmental delays (Fig. 2b, Extended Data Fig. 5d and Supplementary Table 4). In addition, average numbers of interventions (such as medication, counseling, physical therapy or other forms of therapy) were highest among the Broadly affected and Social/behavioral classes (Fig. 2b). These diagnostic data represented the best available external validation, although the natural associations between behavioral diagnoses and the behavioral questionnaires on which our model was trained meant that this was not a fully orthogonal validation set. However, the consistency observed here further supported the validity of the self-reported data. Together, these analyses of medical features show that the four classes were phenotypically consistent, supporting their separation in genetic analyses.

Fig. 2: Clinical validation and replication of subclass characteristics.

a, Clinical validation of classes with external medical diagnoses. We computed the FE (x axis) and statistical significance (FDR) for a selection of available diagnoses for each class. Open circles indicate FDR > 0.05, whereas closed circles indicate FDR < 0.05. Statistical comparisons were computed against siblings as background using one-sided binomial tests and adjusted for multiple comparisons (Benjamini–Hochberg correction). The dotted line indicates FE = 1. Sample sizes for all analyses shown were as follows: Broadly affected, n = 554 (magenta); Social/behavioral, n = 1,970 (green); Mixed ASD with DD, n = 1,001 (blue); Moderate challenges, n = 1,856 (orange); unaffected siblings (n = 1,599). b, External validation of classes with additional parent-reported data from background history and medical history questionnaires. Displayed are: language level at enrollment, parent report with four levels reflecting language abilities (0, nonverbal; 1, single words; 2, phrases; 3, sentences), total number of interventions for probands (including options such as medication, social skills groups, speech therapy and counseling), cognitive impairment at enrollment, a binary indicator of a diagnosis of intellectual disability or cognitive impairment, and age at diagnosis in months. Box plots (center lines represent the median, box limits represent the 25th and 75th percentiles, whiskers represent 1.5× the interquartile range, open circles represent outliers) were plotted for continuous variables (one-sided independent t-tests), whereas bar plots displaying means and 95% confidence intervals were plotted for binary variables (one-sided binomial tests) and categorical variables (one-sided independent t-tests; distributions shown in Extended Data Fig. 5d). *FDR < 0.1, **FDR < 0.05, ***FDR < 0.01. c, Replication of phenotype classes in the SSC. An independent model was trained on the SPARK dataset (n = 6,393) for features matching across the two cohorts and applied to the SSC dataset for all individuals with complete data across features (n = 861). Bars display Pearson correlation coefficients (x axis) between SPARK and SSC category enrichment proportions across four classes.

Furthermore, the four phenotype classes were replicated well in an independent autism cohort that was deeply phenotyped by trained clinicians, the Simons Simplex Collection (SSC)40. Most phenotypic questionnaires used in the SPARK model were available for SSC, with the exception of item-level CBCL data. We combined these matched data, resulting in 108 training features present for both cohorts. To demonstrate the generalizability of our model to the SSC cohort (n = 861), we applied a GFMM trained on SPARK data to the SSC test set, as well as independently training a GFMM on the SSC data. We computed the enrichment and depletion of each feature within each class across the seven phenotype categories for both cohorts, as described above for the original SPARK model. We demonstrated strong replication of the autism classes in the SSC cohort, with highly similar feature enrichment patterns across all seven categories (Fig. 2c and Extended Data Fig. 6a). We further assessed the significance of the overall model similarity with several permutation tests (Extended Data Fig. 6b,c), shuffling both the SSC class labels and the SPARK phenotypes before training, and never observed a higher correlation value with permuted data than the true correlation of 0.927 (P < 1 × 10−4). The phenotypic classes defined here are therefore concordant with clinical data and can be replicated in an external cohort.

Dissimilar genetic signals underlie phenotypic heterogeneity

We expected that the differences in phenotypes, co-occurring diagnoses and developmental milestones across the four autism classes would correspond to class-specific patterns in genetic signals for common variants. We computed polygenic scores (PGS) of children with European ancestry for autism and five other well-powered genome-wide association studies (GWAS) of related traits and conditions (ancestry principal component analysis in Extended Data Fig. 7; GWAS references in Supplementary Table 5); these showed significant differences across the four classes that qualitatively matched their clinical and phenotypic characteristics (Fig. 3a). Several PGS signals within the classes matched their diagnostic burdens, with the Broadly affected and Social/behavioral classes showing significantly higher ADHD signals relative to both nonautistic siblings and other classes (FDR < 0.01, Cohen’s d > 0.22 compared to siblings; FDR < 0.06, Cohen’s d > 0.13 for significant cross-class comparisons; Supplementary Table 6) and significant enrichment of ADHD diagnoses (Supplementary Table 3). The Social/behavioral class also showed both the highest average PGS signal and highest diagnostic burden for major depressive disorder (FDR = 0.00327, Cohen’s d = 0.204; Figs. 2a and 3a). In addition, the Broadly affected class, which was most enriched for intellectual disability, cognitive impairment and developmental delays, exhibited significantly lower educational attainment and IQ PGS compared with siblings and other classes (FDR < 0.1, Cohen’s d > 0.17), demonstrating that co-occurring conditions were associated with common genetic variation that significantly differed among the four identified classes. Notably, none of the classes had a statistically significant signal for the autism PGS, owing to the high variance of this score across our cohort and their siblings.

Fig. 3: Genetic analyses of genome-wide common and rare variant signals.

a, PGS for ASD GWAS and related phenotypes and conditions. PGS were normalized by the mean of sibling scores within each condition. Sample sizes were as follows: Moderate challenges, n = 822; Broadly affected, n = 225, Social/behavioral, n = 425; Mixed ASD with DD, n = 822; unaffected siblings, n = 476. b, Count per offspring of high-impact DNVs (left) and high-impact rare inherited variants (right) across all protein-coding genes. High-impact variants were defined as variants predicted to be either high-confidence LoF or likely pathogenic missense. Sample sizes for b and c were as follows: Moderate challenges, n = 809; Broadly affected, n = 145; Social/behavioral, n = 640; Mixed ASD with DD, n = 419; unaffected siblings, n = 1,013. c, Analysis of evolutionarily constrained genes across autism classes and nonautistic siblings. Using the gene-centric measure of evolutionary constraint, pLI, we assigned genes with pLI ≥ 0.5 to one of two categories: pLI ≥ 0.995 (higher constraint genes) or 0.5 ≤ pLI < 0.995 (lower constraint genes). Count burdens (dnLoF) per offspring were then computed for each class. In all parts, circles indicate the mean and error bars show the 95% confidence intervals. Statistical significance was computed with one-sided independent t-tests and adjusted for multiple comparisons using Benjamini–Hochberg correction.*FDR < 0.1, **FDR < 0.05, ***FDR < 0.01. Stars above the 95% confidence intervals indicate comparisons to siblings, whereas stars accompanied by horizontal bars indicate direct class comparisons. All statistically significant comparisons are shown.

In addition to dissimilar patterns of common variants, we observed significant differences in rare genetic variation between the phenotypic classes (Fig. 3b and Supplementary Table 7). We conducted de novo and inherited variant calling on the whole exomes of individuals from the cohort using HAT41 and further classified variants as either loss-of-function (LoF), missense or synonymous. Count burden enrichments were then computed for each variant type in the four classes and among nonautistic siblings (Methods). The Broadly affected class displayed the greatest enrichment for high-confidence de novo LoF (dnLoF) and de novo missense (dnMis) variants compared to both nonautistic siblings and other classes (FDR = 0.01, FE = 1.66 compared to siblings; 0.044 < FDR < 0.086, 1.28 < FE < 1.43 for all class comparisons), whereas the Social/behavioral class displayed the lowest enrichment compared to siblings (FDR = 0.086, FE = 1.17), although we found significant burden in all four classes (Fig. 3b, left; FDR: 0.01, 0.01, 0.086, 0.04). Rare inherited LoF and missense variants displayed statistically significant increases only in the Mixed ASD with DD class (FDR = 0.016, FE = 2.55 compared to siblings), with a significantly higher count per proband in this class compared to the Moderate challenges and Social/behavioral classes (FDR < 0.016, FE > 1.05; Fig. 3b, right; FDR: 0.57, 0.29, 0.97, 0.016). Our analyses differentiated the two classes with greater intellectual disability and developmental delays, showing that the Broadly affected class had more high-impact de novo variants (DNVs), whereas the Mixed ASD with DD class had a combination of high-impact de novo and rare inherited variants compared to nonautistic controls, suggesting a stronger inherited component for the children in this class.

We also observed differences across categories of LoF constraint, including potential significance that had been masked by grouping heterogeneous classes of probands together. Using the gene-level measure of probability of LoF intolerance (pLI)42, we assigned genes into one of two categories: high-constraint genes (pLI ≥ 0.995) and intermediate-constraint genes (0.5 ≤ pLI < 0.995). When we examined the burden of dnLoF variants across the classes in high-constraint genes, we observed a pattern consistent with our findings above: the Broadly affected class displayed the greatest burden counts in high-constraint genes compared to both siblings and other classes (FDR < 0.1 for two of three classes, 1.76 < FE < 2.3), although there was a significant increase relative to siblings among all autistic classes (Fig. 3c, left; FDR: 6 × 10−4, 0.013, 0.013, 0.013; odds ratio (OR): 3.69, 6.31, 2.87, 3.25). This finding was consistent with prior work showing an excess burden of mutations in high-constraint genes among probands9,43. However, previous work identified no significant increase in intermediate-constraint genes in probands. By separating heterogeneous classes, we found significant enrichment of dnLoF variation in genes of intermediate constraint in the Moderate challenges class (Fig. 3c, right, Supplementary Table 7; FDR: 0.09, 0.26, 0.26, 0.26, OR: 2.54, 2.84, 1.75, 1.95), suggesting that perhaps less-essential genes are affected in this class of individuals. The genetic differences observed between and across classes support the importance of separation of individuals to identify the genetic architectures underlying clinical phenotypic presentations.

Unique gene sets and pathways associate with phenotypes

We obtained further insight into specific genes and processes dysregulated by the variants across the phenotype classes by investigating the count burdens of de novo and inherited variants in ASD-relevant gene sets9,19,44,45,46 (Supplementary Data). Although all four classes displayed enrichment of de novo variation among ASD-related gene sets compared to nonautistic siblings, there were clear differences in the levels and patterns of enrichment among classes (Fig. 4a, Extended Data Fig. 8a,b and Supplementary Tables 8 and 9). For example, all autism-specific gene sets had significantly higher dnLoF mutation burdens (compared to siblings) in the classes with greater developmental delays (average FE: 17.2, 10.9) than in the classes with lower developmental delays (average FE: 5.8, 5.0), suggesting that cognitive outcomes are associated with rare high-impact mutations in a small subset of relevant genes (Fig. 4a). OR analysis further subdivided the classes, with ORs for dnLoF variants being greatest in the classes with greater developmental delays (Fig. 4b, left; FDR: 0.09, 0.002, 0.009, 0.001; OR: 6.3, 28.7, 11.2, 19.7), whereas the ORs for de novo synonymous variants were uniformly distributed, with no significant increases compared with siblings (Fig. 4b, left; FDR: 0.85, 0.85, 0.85, 1.0; OR: 0.7, 1.6, 1.4, 1.1; other gene sets are shown in Extended Data Fig. 8c). Furthermore, DNVs in fragile X mental retardation protein (FMRP) target genes were strongly associated with the Broadly affected class, with significant enrichment over probands in other classes (FDR = 0.04, FE = 2.2, OR = 12.8). Individuals with fragile X syndrome exhibit developmental delays and intellectual disabilities, as seen across both the Broadly affected and Mixed ASD with DD classes, but they also tend to display mood disorders such as anxiety and impulsive, hyperactive and aggressive behaviors47, which we observed only in the Broadly affected class.

Fig. 4: Functional gene set analyses reveal differing genetic profiles.

a, Enrichment and significance of dnLoF burden in each class and gene set. We retrieved seven relevant gene sets and computed the aggregated dnLoF burden for each individual in every gene set. FE (bubble size) and FDR significance (open versus closed circles) were computed relative to nonautistic siblings using one-sided independent t-tests followed by adjustment for multiple comparisons (Benjamini–Hochberg correction). An FDR cutoff of 0.05 was used to determine significance. b, ORs (y axis) across classes for dnLoF variation in ASD risk genes (left) and FMRP target genes (right). ORs for de novo synonymous (dnSyn) variation are displayed for ASD risk genes. Statistical significance was computed with Fisher’s exact tests comparing each class to nonautistic siblings and adjusted for multiple comparisons using Benjamini–Hochberg correction. For ASD risk genes, FDR: 0.09, 0.002, 0.009, 0.001 for dnLoF; FDR: 0.85, 0.85, 0.85, 1.0 for dnSyn; for FMRP genes, FDR: 0.004, 0.0004, 0.0005, 0.002 for dnLoF. *FDR < 0.1, **FDR < 0.05; ***FDR < 0.01. c, Top significantly affected gene ontology (GO) biological processes and molecular functions are reported for each class against a genome-wide background of protein-coding genes. Gene sets for GO enrichment analyses include all protein-coding genes affected by high-confidence dnLoF or pathogenic missense variation present in individuals from each class. The plots display FE (x axis) and log-transformed FDR (bubble size, hypergeometric test with multiple hypothesis correction). Terms were selected by FDR and sorted by FE. For the Moderate challenges, Social/behavioral and Mixed ASD with DD classes, an FDR cutoff of 0.05 was used, whereas a cutoff of 0.1 was used for the Broadly affected class. Shaded boxes represent GO biological processes, and unshaded boxes represent GO molecular functions. Sample sizes in all analyses shown were as follows: Moderate challenges, n = 809; Broadly affected, n = 145; Social/behavioral, n = 640; Mixed ASD with DD, n = 419; siblings, n = 1,013.

Molecular pathways affected by the patterns of genetic variation observed across phenotypic classes also suggested different underlying biological mechanisms. Analysis of biological processes affected by high-confidence dnLoF or damaging missense variations in each class revealed little overlap in the top enriched biological processes and no overlap in top molecular functions between all four classes, suggesting that affected genes represent pathways uniquely associated with class-specific phenotypes (Fig. 4c and Supplementary Table 10). In particular, compared to a gene-based background, the Social/behavioral class was highly enriched for processes of chromatin organization (FE = 3.5, FDR = 1.9 × 10−3), regulation of DNA repair (FE = 5.3, FDR = 1.0 × 10−2) and microtubule activity (FE = 34.2, FDR = 1.4 × 10−4). The Moderate challenges class displayed strong enrichment for histone modification (FE = 3.56, FDR = 1.1 × 10−4) and chromatin organization (FE = 3.5, FDR = 2.0 × 10−5). By contrast, the Mixed ASD with DD class was characterized by processes of neuronal action potential (FE = 19.0, FDR = 1.0 × 10−2) and membrane depolarization (FE = 25.0, FDR = 1.5 × 10−3), negative regulation of protein depolymerization (FE = 13.7, FDR = 3.6 × 10−3) and voltage-gated sodium channel activity (FE = 28.8, FDR = 3.5 × 10−3). Overall, our analysis directly suggests hypotheses for specific biological dysregulations underlying each autism class, providing a framework for directed examination of mechanistic insights in continuing autism research.

Class-specific developmental gene expression patterns

We found that genes affected by variants in each ASD phenotypic class were associated with unique patterns of gene expression trajectories throughout brain development. This analysis leveraged cell-type-specific developmental gene expression trajectories of the human prefrontal cortex (Fig. 5a and Methods)48. We found that the Mixed ASD with DD class was enriched for dnLoF variants that affected genes expressed in all major prefrontal cortex cell types, mostly during the fetal and neonatal stages, with declining expression later in development (these patterns were termed ‘trans down’ and ‘down’ by Herring et al.48) (Fig. 5b and Extended Data Fig. 9). By contrast, the Social/behavioral class was enriched only for LoF variants in genes highly expressed postnatally compared to nonautistic siblings (termed ‘trans up’ and ‘up’ genes; Fig. 5b) in inhibitory interneurons of the medial ganglionic eminence. Furthermore, the Mixed ASD with DD class was significantly enriched for high-impact variation in genes with the ‘down’ trend compared to the Social/behavioral class in principal excitatory neurons and inhibitory interneurons (FE > 2, FDR < 0.05; Supplementary Table 11); conversely, the Social/behavioral class was enriched for variation in genes with the ‘up’ trend compared to the Mixed ASD with DD class (FDR = 0.001, FE = 2.84 for principal excitatory neurons). These developmental gene expression patterns were aligned with the developmental clinical milestones of the classes: the Mixed ASD with DD class had the latest average age of developmental milestone attainment (FDR < 1.9 × 10−19, Cohen’s d > 0.38 compared to the Social/behavioral class; Fig. 1c and Supplementary Table 2) and the earliest average age of diagnosis (FDR = 6.97 × 10−150, Cohen’s d = 0.99 compared to the Social/behavioral class; Fig. 2b and Supplementary Table 4). By contrast, the Social/behavioral class, which had variants in later-expressed genes, showed less impact on early development, with later ages at diagnosis and developmental milestones almost in line with those of nonautistic siblings (Cohen’s d < 0.07 compared to siblings). The Moderate challenges class displayed enrichment in mostly prenatal gene sets (‘down’ and ‘trans down’ genes), although the genes affected in this class tended to be of lower evolutionary constraint than genes affected in the Mixed ASD with DD class, which may have contributed to the differences in outcomes (Moderate challenges, median pLI = 0.75; Mixed ASD with DD, median pLI = 0.95; P = 0.026). Finally, the Broadly affected class displayed significance for all trends, indicating broad dysregulation across developmental stages and cell types. Our findings thus demonstrate that there are class-specific differences in the developmental timing of genes that are dysregulated, and that these correspond to differences in clinical milestones and outcomes between the classes.

Fig. 5: Cell-type-specific and developmental-stage-specific analysis of variant impacts.

a, Trends from Herring et al.48 representing the gene expression trajectories of brain development genes differentially expressed across developmental stages. Gene expression trajectories follow one of four general patterns: ‘up’ (first), ‘trans up’ (second), ‘trans down’ (third), ‘down’ (fourth). Trends are measured across the six stages of development (x axis): fetal, neonatal, infancy, childhood, adolescence and adulthood. b, Patterns of dnLoF variant enrichment across classes (x axis), major cell types of the prefrontal cortex (y axis) and gene expression trends (y axis). For each class, we computed the FE (bubble size) and corrected P values (FDR, one-sided independent t-tests adjusted for multiple comparisons using Benjamini–Hochberg correction) of variant burden compared to that of nonautistic siblings. Open circles indicate FDR > 0.05 (not significant), and closed circles indicate significant enrichment (FDR ≤ 0.05). Each column is colored by the corresponding phenotypic class color, with purple representing the combined pool of all probands (n = 2,013). Sample sizes in the analysis were as follows: Moderate challenges, n = 809; Broadly affected, n = 145; Social/behavioral, n = 640; Mixed ASD with DD, n = 419; siblings n = 1,013. Statistics for all class–class comparisons for this analysis can be found in Supplementary Table 11. Cell type and trend combinations with no significant enrichment in any class are not shown. MGE, medial ganglionic eminence; CGE, caudal ganglionic eminence.

Decomposition of phenotypic heterogeneity in autism reveals underlying genetic programs

Tags: