To identify nucleotides within potential CRMs that are necessary for their activity, we developed the d-MPRA. Most nucleotides identified are suspected to be within TF binding sites, though other roles, e.g., sites that affect chromatin structure, might also play a role. Libraries with point mutations were created via error-prone PCR amplification of each NR1-3 CRM. Each library was then cloned into a vector backbone. An intraplasmid duplication (IPD) was then made to serve as a barcode before inserting a GFP reporter into the completed library (Figure 7A). In the IPD process, the mutant library was nicked on the plus strand upstream and the minus strand downstream of the mutated CRM fragments. Strand-displacing polymerase, lacking exonuclease activity, extended these nicks, linearizing the libraries such that each end carried an exact copy of a particular mutated fragment. The final step involved inserting a minimal promoter and GFP ORF cassette between the two duplicated CRM copies. The 3’ copy of the mutated CRM was thus in the 3’ UTR and served as a barcode in the cDNA (Figure 7A). This configuration allowed the possibility that the 5’ and/or 3’ mutated CRM could influence transcription of GFP. In addition, it was possible that the 3’ fragment could influence the stability of the transcript. To somewhat override this effect, a WPRE sequence was also included in the 3’ UTR of all library members. The IPD method was used so that the step of making and sequencing the association library, as was done for the LS-MPRA, was unnecessary.

Degenerate massively parallel reporter assays (MPRAs) to identify functional residues within candidate cis‐regulatory modules (CRMs).
(A) Schematic of d-MPRA library assembly. Point mutations were introduced into Olig2 CRM fragments via error-prone PCR, followed by intraplasmid duplication (IPD) to generate constructs with duplicated mutant CRMs flanking a minimal promoter and GFP ORF. The 3′ CRM copy, located in the 3′ untranslated region (UTR), served as a barcode, with a WPRE sequence included to potentially stabilize transcripts. (B) Conceptual diagram illustrating expected d-MPRA results, showing predicted changes in CRM activity upon disruption of enhancer or repressor binding sites. (C–E) d-MPRA plots displaying log₂ fold changes in mutational frequencies using a 5-base pair sliding window average, normalized across the Olig2-NR1 (C), Olig2-NR2 (D), and Olig2-NR3 (E) regions.
Mutations reducing positive activity, presumed TF binding that enhances activity, should result in depleted barcode abundance, while those diminishing repressor TF binding should lead to enrichment (Figure 7B). Similar to LS-MPRAs, the abundance of barcodes of mutated fragments was normalized to their baseline abundance in the d-MPRA plasmid libraries.
The d-MPRA libraries for Olig2 NR1-3 CRMs were constructed with a mutational frequency of ~0.02–0.2% per nucleotide and a complexity of at least 2×107 transformants (see Source data 2 for mutational rate, co-occurrence across CRMs, and analyses comparing bulk fragments vs singleton mutations). These libraries were electroporated into E14 ex vivo retinas, which were incubated for 24 hr before RNA extraction. The 3’ UTR region containing transcribed mutated CRMs was reverse transcribed to generate cDNA, PCR amplified, gel purified, indexed, and sequenced. Normalized log2-transformed values were averaged across replicates, and a sliding window average (window size = 5) was applied to smooth the data and reduce noise. This approach ensured robust comparisons across experimental conditions. The resulting d-MPRA plots for Olig2-NR1 (Figure 7C), Olig2-NR2 (Figure 7D), and Olig2-NR3 (Figure 7E) revealed peaks and troughs, indicating that mutations at multiple sites influence CRM activity. In three independent electroporation experiments for each CRM, the d-MPRA readouts exhibited low variance between experimental replicates. The clear alignment of peaks and troughs with specific nucleotide positions supports the notion that these regions are important for CRM activity.
To identify over-represented sequences in sites corresponding to peaks or troughs in the d-MPRA plots, de novo motif discovery analysis, HOMER, was performed. This analysis focused on all three Olig2 CRMs, with an emphasis on NR2, which is proximal to the Olig2 TSS. Across the Olig2-NR2 region, 16 de novo motifs were identified (Figure 8A), several of which overlapped with d-MPRA peaks or troughs. The presence and density of binding motifs in this region are consistent with the open chromatin profile observed in ATAC-seq data. Notably, motifs matching known binding sites for Foxn4, Pax6, Mybl1, and Otx2—recognizable regulators of retinal development (Cvekl and Callaerts, 2017; Farhy et al., 2013; Kaufman et al., 2019; Li et al., 2004; Marquardt et al., 2001; Nishida et al., 2003; Oron-Karni et al., 2008; Remez et al., 2017; Wang et al., 2014)—were detected (Figure 8B–D). Mybl1 and Otx2 motifs (Motifs 1 and 14, respectively) corresponded to troughs in the d-MPRA plot, whereas the shared motif (Motif 3) for Foxn4 (typically an enhancer) and Pax6 (frequently a repressor) did not align with any prominent peaks or troughs (Figure 8A). The shared motif for Foxn4 and Pax6 in Olig2-NR2 likely represents a regulatory balance between activation and repression, reflecting the complex interplay of these TFs in fine-tuning Olig2 expression.

Transcription factor (TF) binding sites within the Olig2-NR2 cis‐regulatory module (CRM).
(A) TF binding motifs predicted using HOMER identified within the Olig2-NR2 CRM candidate, aligned with the average d-massively parallel reporter assay (MPRA) plot for this region (from Figure 7). (B–D) Position frequency matrices of TF binding sites aligned to motifs predicted by HOMER in Olig2-NR2: Mybl1 to Motif 1 (B), Foxn4 and Pax6 to Motif 3 (C), and Otx2 to Motif 14 (D). (E) UMAP visualization of scRNA profiles from E14 mouse retinas, with cell types previously identified by marker genes by Clark et al., 2019. (F) Expression of Olig2 on UMAP. (G) Co-expression (pink) of Olig2 (blue) with Mybl1, Foxn4, Pax6, or Otx2 (red) on UMAP. Percentages of respective populations that co-express Olig2 are indicated (* denotes significance using Fisher’s exact test, p≤0.05). (H) TF occupancy track aligned to (i) Olig2 locus-specific massively parallel reporter assay (LS-MPRA) barcode enrichment after 12 hr LY411575 treatment (replicate of Figure 4B) and (ii) gene models. Occupancy peaks for Mybl1, Foxn4, and Pax6 align with the Olig2-NR2 CRM candidate (blue box).
To assess whether RPCs that express Olig2 could use these TFs to regulate Olig2, scRNAseq data (Clark et al., 2019) were examined for co-expression with Olig2 at E14. The dataset, originally generated using the 10X Genomics platform, was further processed in Seurat, where filtering, normalization, and dimensionality reduction was carried out to identify distinct retinal cell populations (Figure 8E). As expected, Olig2 expression was enriched in a subset of neurogenic RPCs (Figure 8F). Notably, Mybl1, Pax6, Foxn4, and Otx2 were co-expressed with Olig2 in subsets of these cells (Figure 8G), with 37.7%, 55.5%, 30.7%, and 67.3% of neurogenic RPCs that expressed Olig2 also expressed Mybl1, Pax6, Foxn4, and Otx2, respectively. The expression of Mybl1, Foxn4, and Otx2 were significantly enriched in Olig2+ RPCs (Fisher’s exact test Odds Ratios [OR]: 11.8 [p<0.0001], 7.1 [p<0.0001], and 8.5 [p<0.0001], respectively), whereas Pax6 (OR: 0.96, p=0.31) was not, perhaps reflecting a role as a repressor. These results support the hypothesis that these factors are functionally intertwined in the regulation of Olig2 expression.
To examine whether the suggested Olig2 regulatory TF’s bind to the Olig2-NR2 CRM, a CUT&RUN analysis was carried out. A significant enrichment of Mybl1, Pax6, and Foxn4 binding, with peaks that aligned with the NR2 region (Figure 8H), provided evidence for their direct regulatory roles via the Olig2-NR2 CRM. Together, these findings suggest that the Olig2-NR2 CRM is a target for multiple retinal developmental regulators, potentially mediating Olig2 expression in neurogenic RPCs.
Using de novo motif discovery analysis on Olig2‐NR1, 20 motifs were identified (Figure 8—figure supplement 1A), some of which correspond to binding sites for several retinal TFs, including Sox4, Sox11, Lhx2, Dlx2, Isl1, Foxp1, Mybl1, and Otx2. These factors play critical roles in retinal development: Sox4 and Sox11 are active in neurogenic RPCs and regulate the generation and survival of RGCs (Jiang et al., 2013); Lhx2 is suggested to influence the progression of RPC competence states (Gordon et al., 2013); Dlx2 is required for terminal RGC differentiation (de Melo et al., 2005; Zhang et al., 2017); Isl1 is essential in the gene regulatory network governing RGC development (Mu et al., 2008); Foxp1 has been implicated in regulating the competence of RPCs to generate early-born retinal cell types (Zhang et al., 2023); and Mybl1 and Otx2 are known to influence neural progenitor cell cycle regulation (Kaufman et al., 2019), and photoreceptor and horizontal cell development (Nishida et al., 2003; Wang et al., 2014), respectively.
In the d-MPRA analysis, specific motifs in Olig2‐NR1 aligned with these predicted TF binding sites. For instance, Motifs 13 and 16 matched shared binding sites for Sox4 and Sox11 (Figure 8—figure supplement 1B); scRNA-seq co-expression data of RPCs (Figure 8—figure supplement 1G) revealed that 74.1% of Olig2-expressing neurogenic RPCs co-expressed Sox4 (enrichment OR: 5.3 [p<0.0001]) and 89.3% co-expressed Sox11 (enrichment OR: 3.1 [p<0.0001]). In contrast, Motif 11 contained a shared binding site for both Lhx2 and Dlx2 (Figure 8—figure supplement 1C); among Olig2+ neurogenic RPCs, 29.6% co-expressed Lhx2 (enrichment OR: 1.0, p=0.94) and 21.1% co-expressed Dlx2 (enrichment OR: 3.7 [p<0.0001]). Motif 8 matched an Isl1 binding site (Figure 8—figure supplement 1D) and was associated with only 5.0% co-expression in Olig2+ RPCs (Figure 8—figure supplement 1G) (enrichment OR: 0.33 [p<0.0001]), implying it might suppress Olig2 expression. Motif 18 contained a Foxp1 binding site (Figure 8—figure supplement 1E), with 40.9% of Olig2+ neurogenic RPCs co-expressing Foxp1 (enrichment OR: 0.81 [p<0.0001]), as shown in Figure 8—figure supplement 1G. Additionally, binding sites for Mybl1 and Otx2 were identified in Motifs 2 and 12, respectively (Figure 8—figure supplement 1F); scRNA-seq data (Figure 8G) indicate that these TFs may also contribute to the regulation of Olig2 expression via Olig2‐NR1. Taken together, and considering the alignment of motifs with d-MPRA peaks and troughs as well as the quantified co-expression data from Figure 8G, Figure 8—figure supplement 1G, these results suggest that Sox4, Sox11, Dlx2, and Isl1 may predominantly function to repress Olig2 expression. Lhx2, Foxp1, Mybl1, and Otx2 are likely to act as activators through the Olig2‐NR1 CRM.
The de novo motif discovery analysis of Olig2-NR3 identified 19 motifs (Figure 8—figure supplement 1H), several of which aligned with peaks or troughs in the d-MPRA plot. Among these, four motifs containing known TF binding sites active in the developing retina were associated with troughs, suggesting potential enhancer activity. Motif 13 matched a binding site for Bhlhb5 (encoded by Bhlhe22) (Figure 8—figure supplement 1I), a TF involved in retinal interneuron subtype specification (Feng et al., 2006; Huang et al., 2014); scRNA-seq co-expression data (Figure 8—figure supplement 1L) revealed that among Olig2+ RPCs, 21.6% co-expressed Bhlhb5 (enrichment OR: 3.5 [p<0.0001]). Motifs 3 and 5 contained binding sites for Ngn2 (Figure 8—figure supplement 1J), a TF expressed in neurogenic RPCs during both early and late retinal development (Hufnagel et al., 2010; Kowalchuk et al., 2018; Ma and Wang, 2006); 81.4% of Olig2-expressing RPCs co-expressed Ngn2 (Figure 8—figure supplement 1L) (enrichment OR: 15.4 [p<0.0001]). Motif 15 matched a binding site for Lhx9 (Figure 8—figure supplement 1K), a TF expressed in early retinal development that is necessary for interneuron subtype specification (Balasubramanian et al., 2014; Balasubramanian et al., 2018). scRNA-seq analysis (Figure 8—figure supplement 1L) established that 7.7% of RPCs expressing Olig2 co-expressed Lhx9 (enrichment OR: 3.3 [p<0.0001]). The combined scRNA-seq analysis, d-MPRA, and motif discovery analysis confirmed that there are likely Bhlhb5, Ngn2, and Lhx9 binding sites in relevant regions of Olig2-NR3, and that they exhibit some degree of co-expression with Olig2, supporting the hypothesis that these TFs may regulate Olig2 expression via the Olig2-NR3 CRM.
The d-MPRA and motif analyses reveal a complex transcriptional network potentially regulating Olig2 expression, with some of the same TFs as candidate regulators across Olig2-NR1, NR2, and NR3 CRMs. Olig2-NR2 emerged as potentially the most important regulatory hub, where motif discovery, scRNA-seq enrichment, and CUT&RUN validation confirmed direct binding of Mybl1, Pax6, and Foxn4. The d-MPRA approach provided a robust, high-resolution method for nominating functionally relevant TF binding sites, with motif-associated peaks and troughs aligning with known retinal regulators. The identification of additional candidates in Olig2-NR1 and NR3 provides candidates to expand our understanding of the gene regulatory network governing Olig2 expression, setting the stage for future in vivo studies to dissect their functional roles in retinal development.