{"id":30793,"date":"2025-07-29T09:52:10","date_gmt":"2025-07-29T09:52:10","guid":{"rendered":"https:\/\/www.newsbeep.com\/uk\/30793\/"},"modified":"2025-07-29T09:52:10","modified_gmt":"2025-07-29T09:52:10","slug":"integrated-in-vivo-combinatorial-functional-genomics-and-spatial-transcriptomics-of-tumours-to-decode-genotype-to-phenotype-relationships","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/uk\/30793\/","title":{"rendered":"Integrated in vivo combinatorial functional genomics and spatial transcriptomics of tumours to decode genotype-to-phenotype relationships"},"content":{"rendered":"<p>Animal experiments<\/p>\n<p>Group size was determined on the basis of our experience in previous experiments<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 68\" title=\"Tschaharganeh, D. F. et al. p53-Dependent nestin regulation links tumor suppression to cellular plasticity in liver cancer. Cell 158, 579&#x2013;592 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR68\" id=\"ref-link-section-d96566545e2341\" rel=\"nofollow noopener\" target=\"_blank\">68<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 69\" title=\"Revia, S. et al. Histone H3K27 demethylase KDM6A is an epigenetic gatekeeper of mTORC1 signalling in cancer. Gut 71, 1613&#x2013;1628 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR69\" id=\"ref-link-section-d96566545e2344\" rel=\"nofollow noopener\" target=\"_blank\">69<\/a>. For HDTV injections, eight-week-old female C57Bl\/6 animals were purchased from Envigo. The mice were injected (into the lateral tail vein in 5\u20137\u2009s) with 5\u2009\u03bcg DNA of each of a total of eight perturbation plasmids (40\u2009\u00b5g total DNA) mixed with 20\u2009\u00b5g CMV-SB13 transposase (1:2 ratio) prepared in sterile 0.9% sodium chloride solution in a total volume corresponding to 10% of the body weight, as described before. Two animals were used. We labelled this approach RUBIX as the perturbation plasmids are equipped with molecular barcodes (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#Fig8\" rel=\"nofollow noopener\" target=\"_blank\">1<\/a>) and the plasmid mixtures injected allow for all possible combinations to become integrated in the genome of hepatocytes (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#Fig1\" rel=\"nofollow noopener\" target=\"_blank\">1a<\/a>). All animals were monitored twice weekly and animal experiments were performed in compliance with all relevant ethical regulations determined in the animal permit. After tumours were palpable (10\u2009weeks), the animals were euthanized and their livers were harvested (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#Fig9\" rel=\"nofollow noopener\" target=\"_blank\">2<\/a>). As a control group, two animals were injected with 40\u2009\u00b5g pT3-EF1-shRen and 20\u2009\u00b5g CMV-SB13 transposase (1:2 ratio) prepared in sterile 0.9% sodium chloride. For fixation, livers were incubated in 4% paraformaldehyde for 48\u2009h. The sample processing procedure is illustrated in Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#Fig11\" rel=\"nofollow noopener\" target=\"_blank\">4<\/a>. For the leave-one-out experiments (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#Fig7\" rel=\"nofollow noopener\" target=\"_blank\">7<\/a>), HDTV injections were performed essentially as described before. For omission of either VEGFA or mtCTNNB1 perturbation from the plasmid pools, the respective plasmids were replaced with a control plasmid. Housing conditions for the mice were: 12\u2009h light\u201312\u2009h dark cycle, an ambient temperature of 20\u201324\u2009\u00b0C and relative humidity of 45\u201365%. All animal experiments were approved by the regional board Karlsruhe, Germany.<\/p>\n<p>10X Visium for FFPE spatial transcriptomics<\/p>\n<p>Spatial transcriptomics were performed using the manual 10X Visium workflow for samples embedded in paraffin blocks or the 10X Visium CytAssist workflow for samples already placed on glass slides and stained with H&amp;E (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#Fig9\" rel=\"nofollow noopener\" target=\"_blank\">2<\/a>). Both workflows were carried out according to the manufacturer\u2019s protocol (CytAssist, CG000495, RevC; manual Visium, CG000407, RevD). Briefly, slices of approximately 5\u2009\u00b5m were cut from FFPE blocks using a microtome and floated onto a water bath at 42\u2009\u00b0C until all wrinkles were resolved. For the manual Visium workflow the slice was then placed inside the capture frame of the spatial transcriptomics slide (M.R.). Slices used for the CytAssist workflow were placed on frosted glass slides. Deparaffinization and staining of the slides was similar between both workflows. After drying the slide, paraffin was removed through incubation at 60\u2009\u00b0C for 2\u2009h and a subsequent incubation in xylol. Rehydration was done by sequential washes with decreasing ethanol concentrations. After rehydration, the tissue was stained with H&amp;E and imaged using a Leica Aperio AT2 microscope at \u00d740 magnification. Following imaging, the slides were destained by incubation in 0.1\u2009N HCl and formalin crosslinks were removed by incubation with TE buffer pH\u20099.0 for 1\u2009h at 70\u2009\u00b0C (manual workflow) or decrosslinking buffer for 1\u2009h at 95\u2009\u00b0C (CytAssist workflow.) The tissue was then permeabilized and incubated with RTL probes at 50\u2009\u00b0C for approximately 20\u2009h. Free probes were washed away and the bound probes were ligated, followed by washing steps to remove unligated probes. For the manual workflow, the probes were released by treating the slices with RNase and a permeabilization enzyme. For the CytAssist workflow, the slices were stained with a diluted eosin solution and placed in the CytAssist together with the Visium spatial transcriptomics slides and incubated for 30\u2009min at 37\u2009\u00b0C with RNase and permeabilization enzyme. For both protocols, the spatial barcode was added to the probes by extending them and the probes were released using a 0.08\u2009M potassium hydroxide solution. For the CytAssist workflow, a pre-amplification PCR consisting of eight cycles was performed. After clean-up with 1.2\u00d7 SPRIselect beads, 25% of the product was used as input for the index PCR. For both protocols, a quantitative PCR was used to select the number of cycles for the index PCR. To reduce PCR duplicates and avoid overamplification, cycle number at a Cq value of 10% was used for the index PCR. The PCR product was purified using 0.85\u00d7 SPRIselect beads.<\/p>\n<p>For samples already stained and mounted on a slide, the slides were first imaged and then incubated in xylol to remove the coverslip. Sample rehydration was done by sequential washes with decreasing ethanol concentrations. The slides were destained and decrosslinking was performed by incubating with a decrosslinking buffer at 95\u2009\u00b0C for 1\u2009h. After decrosslinking, the samples were incubated with probes for 20\u2009h at 50\u2009\u00b0C. Excess probes were washed away and the probes were ligated. Thereafter, the unligated probes were washed away. The samples were stained again with eosin and placed in the 10X CytAssist together with the spatial transcriptomics slides. A mixture of RNase and permeabilization enzyme was added to the spatial transcriptomics slides and the 10X CytAssist was started. After incubation, the spatial transcriptomics slides were removed and the enzymes were washed off. The spatial barcodes were attached to the probes with an extension enzyme. Probes were released using 0.08\u2009M potassium hydroxide solution. The probes were then amplified through eight PCR cycles; 25% of the purified PCR products were used as input for the index PCR. The cycle number of the index PCR was determined using the cycle number at a Cq value of 10%.<\/p>\n<p>For all samples, the final sample concentration was determined using Agilent Tapestation 4150 with D1000 HS tapes. Sequencing for both protocols was performed on an Illumina NovaSeq6000 system. Four samples were pooled on one SP flow cell with 100 cycles to aim for a read count of 250\u2009\u00d7\u2009106 reads per sample. The FASTQ files and the alignment were done using spaceranger 2.0.1. A total of 12 spatial transcriptomics datasets were generated (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#Fig9\" rel=\"nofollow noopener\" target=\"_blank\">2<\/a>). Note that the utility of sample ML-II_B_2Cyt is constrained by tissue detachment of the sample during the processing for 10X Visium CytAssist.<\/p>\n<p>The 10X Visium for FFPE protocol engages RTL probes that capture a 50-nt sequence specific to endogenous transcripts (note that we used Visium mouse transcriptome probe set v1). We leveraged this strategy to likewise capture molecular barcodes via RTL probes (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#Fig1\" rel=\"nofollow noopener\" target=\"_blank\">1<\/a>). We hence labelled this approach PERTURB-CAST.<\/p>\n<p>Economized molecular barcode selection<\/p>\n<p>To enable PERTURB-CAST, we aimed to avoid additional expenses and protocol modifications by redeploying RTL probes from commercially available 10X Visium reagents (originally designed to detect endogenous transcripts) as barcode identification reagents, provided that the selected transcripts are not expressed in mouse liver. We named this strategy redeploy probes for barcode capture (REDPRO-BC; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#Fig1\" rel=\"nofollow noopener\" target=\"_blank\">1<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">1<\/a>). To this end, we initially analysed a publicly available bulk RNA-seq dataset including a total of 128 murine liver samples (<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/geo\/query\/acc.cgi?acc=GSE137385\" rel=\"nofollow noopener\" target=\"_blank\">GSE137385<\/a>)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 70\" title=\"Chi, Y. et al. Comparative impact of dietary carbohydrates on the liver transcriptome in two strains of mice. Physiol. Genomics 53, 456&#x2013;472 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR70\" id=\"ref-link-section-d96566545e2421\" rel=\"nofollow noopener\" target=\"_blank\">70<\/a> to identify transcripts that were generally not detected (fragments per kilobase of transcript per million mapped reads\u2009=\u20090 over all samples). Note that this approach can be error-prone due to the initial source data. Consequently, we went on to validate non-expression of selected transcripts (olfactory, vomeronasal and taste receptors) in additional datasets (including bulk RNA-seq from <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/geo\/query\/acc.cgi?acc=GSE148379\" rel=\"nofollow noopener\" target=\"_blank\">GSE148379<\/a>)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 19\" title=\"Molina-S&#xE1;nchez, P. et al. Cooperation between distinct cancer driver genes underlies intertumor heterogeneity in hepatocellular carcinoma. Gastroenterology 159, 2203&#x2013;2220 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR19\" id=\"ref-link-section-d96566545e2433\" rel=\"nofollow noopener\" target=\"_blank\">19<\/a>, information provided in MGI GXD<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 71\" title=\"Ringwald, M. et al. Mouse Genome Informatics (MGI): latest news from MGD and GXD. Mamm. Genome 33, 4&#x2013;18 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR71\" id=\"ref-link-section-d96566545e2437\" rel=\"nofollow noopener\" target=\"_blank\">71<\/a> as well as 10X Visium data<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 24\" title=\"Guilliams, M. et al. Spatial proteogenomics reveals distinct and evolutionarily conserved hepatic macrophage niches. Cell 185, 379&#x2013;396 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR24\" id=\"ref-link-section-d96566545e2441\" rel=\"nofollow noopener\" target=\"_blank\">24<\/a>. The endogenous transcripts associated with the REDPRO-BCs used in this study are illustrated in Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#Fig10\" rel=\"nofollow noopener\" target=\"_blank\">3<\/a> and the respective nucleotide sequences for 10X Visium RTL-probe capture barcodes (reverse complement to RTL-probe sequence provided by 10X Genomics) are listed under the section \u2018Molecular cloning\u2019. Note that we used Visium mouse transcriptome probe set v1. Visium mouse transcriptome probe set v2 is not compatible with the barcodes employed in this study.<\/p>\n<p>Molecular cloning<\/p>\n<p>The transposon plasmids used in this study including overexpression constructs for NICD, mutant human CTNNB1 (T41A) and human MYC and potent shRNA constructs targeting Trp53, Pten, Kmt2c and Renilla luciferase were previously described and validated in animal experiments<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 19\" title=\"Molina-S&#xE1;nchez, P. et al. Cooperation between distinct cancer driver genes underlies intertumor heterogeneity in hepatocellular carcinoma. Gastroenterology 159, 2203&#x2013;2220 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR19\" id=\"ref-link-section-d96566545e2475\" rel=\"nofollow noopener\" target=\"_blank\">19<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 68\" title=\"Tschaharganeh, D. F. et al. p53-Dependent nestin regulation links tumor suppression to cellular plasticity in liver cancer. Cell 158, 579&#x2013;592 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR68\" id=\"ref-link-section-d96566545e2478\" rel=\"nofollow noopener\" target=\"_blank\">68<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 72\" title=\"Zhu, C. et al. MLL3 regulates the CDKN2A tumor suppressor locus in liver cancer. eLife 12, e80854 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR72\" id=\"ref-link-section-d96566545e2481\" rel=\"nofollow noopener\" target=\"_blank\">72<\/a>. For spatial transcriptomics, note that NICD overexpression can be investigated via Notch1 expression given that the 10X Visium RTL probe identifies the exogenous transcript introduced. However, endogenous Notch1 is similarly identified. VEGFA overexpression plasmid was cloned by insertion of a codon-optimized gene fragment (gBlock, IDT) based on Vegfa NCBI reference sequence NP_033531.3 by replacing hMYC from a previously validated expression plasmid<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 68\" title=\"Tschaharganeh, D. F. et al. p53-Dependent nestin regulation links tumor suppression to cellular plasticity in liver cancer. Cell 158, 579&#x2013;592 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR68\" id=\"ref-link-section-d96566545e2498\" rel=\"nofollow noopener\" target=\"_blank\">68<\/a> using NEBuilder HiFi-DNA assembly according to the manufacturer\u2019s protocol (NEB). All plasmids were individually modified to express molecular barcodes. Specifically, fluorescent protein-based peptide barcodes (mKate2, mOrange2 and mWasabi) were ordered as codon-optimized gene fragments (gBlock, IDT) and cloned into previously validated shRNA expression plasmids to replace GFP<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 68\" title=\"Tschaharganeh, D. F. et al. p53-Dependent nestin regulation links tumor suppression to cellular plasticity in liver cancer. Cell 158, 579&#x2013;592 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR68\" id=\"ref-link-section-d96566545e2502\" rel=\"nofollow noopener\" target=\"_blank\">68<\/a> using NEBuilder HiFi-DNA assembly according to the manufacturer\u2019s protocol. Small-peptide barcodes (for example, AU1, AU5 and so on), as described previously<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 14\" title=\"Dhainaut, M. et al. Spatial CRISPR genomics identifies regulators of the tumor microenvironment. Cell 185, 1223&#x2013;1239 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR14\" id=\"ref-link-section-d96566545e2506\" rel=\"nofollow noopener\" target=\"_blank\">14<\/a>, were ordered as oligonucleotides (Sigma), annealed and cloned using NEBuilder HiFi-DNA assembly according to the manufacturer\u2019s protocol. Long RNA barcodes (stretches of at least 650\u2009nt derived from the combination of multiple oligonucleotide\u2013miner probe sequences designed against Arabidopsis thaliana Chr1 (ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 73\" title=\"Beliveau, B. J. et al. OligoMiner provides a rapid, flexible environment for the design of genome-scale oligonucleotide in situ hybridization probes. Proc. Natl Acad. Sci. USA 115, E2183&#x2013;E2192 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR73\" id=\"ref-link-section-d96566545e2513\" rel=\"nofollow noopener\" target=\"_blank\">73<\/a>) were ordered as gene fragments (gBlock, IDT) and cloned using NEBuilder HiFi-DNA assembly according to the manufacturer\u2019s protocol. REDPRO-BC triplet arrays were ordered as gene fragments (gBlock, IDT) and cloned using NEBuilder HiFi-DNA assembly according to the manufacturer\u2019s protocol. Briefly, each REDPRO-BC triplet array incorporates one 50\u2009nt sequence derived from olfactory receptors, one 50\u2009nt sequence derived from taste receptors and one 50\u2009nt sequence derived from vomeronasal receptors (based on 10X Genomics RTL-probe sequences against murine transcripts; note that we used Visium mouse transcriptome probe set v1; sequences below), separated and flanked by spacer sequences of approximately 20\u2009nt to avoid potential steric hindrance during hybridization. The spacer sequences used were derived from T7 and T3 promoters (described in ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 56\" title=\"Askary, A. et al. In situ readout of DNA barcodes and single base edits facilitated by in vitro transcription. Nat. Biotechnol. 38, 66&#x2013;75 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR56\" id=\"ref-link-section-d96566545e2517\" rel=\"nofollow noopener\" target=\"_blank\">56<\/a>) and\/or AsCas12a-DR sequences (described in ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 61\" title=\"DeWeirdt, P. C. et al. Optimization of AsCas12a for combinatorial genetic screens in human cells. Nat. Biotechnol. 39, 94&#x2013;104 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR61\" id=\"ref-link-section-d96566545e2522\" rel=\"nofollow noopener\" target=\"_blank\">61<\/a>), and\/or the 10X Capture sequences cs1 and cs2 (described in ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 57\" title=\"Replogle, J. M. et al. Combinatorial single-cell CRISPR screens by direct guide RNA capture and targeted sequencing. Nat. Biotechnol. 38, 954&#x2013;961 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR57\" id=\"ref-link-section-d96566545e2526\" rel=\"nofollow noopener\" target=\"_blank\">57<\/a>), and as such provide additional functionality, which was not tested in this study. Single 50-nt REDPRO-BCs were ordered as oligonucleotides (Sigma), annealed and cloned using NEBuilder HiFi-DNA assembly according to the manufacturer\u2019s protocol. Note that the REDPRO-BC length should enable straightforward en masse cloning engaging commercially available oligonucleotide pools (such as in ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 61\" title=\"DeWeirdt, P. C. et al. Optimization of AsCas12a for combinatorial genetic screens in human cells. Nat. Biotechnol. 39, 94&#x2013;104 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR61\" id=\"ref-link-section-d96566545e2530\" rel=\"nofollow noopener\" target=\"_blank\">61<\/a>), which was not tested in this study. Peptide barcodes were integrated in frame with the respective coding regions. RNA barcodes were integrated in the 3\u2032 untranslated region of coding regions expressed under the control of a polymerase II promoter (EF1), unless otherwise specified. Subsets of perturbation plasmids were equipped with REDPRO-BC arrays either 5\u2032 and 3\u2032 of the shRNA expression cassette (mir-E-based) to account for mir-E processing<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 74\" title=\"Fellmann, C. et al. An optimized microRNA backbone for effective single-copy RNAi. Cell Rep. 5, 1704&#x2013;1713 (2013).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR74\" id=\"ref-link-section-d96566545e2534\" rel=\"nofollow noopener\" target=\"_blank\">74<\/a>, or with additional REDPRO-BC arrays driven by a polymerase III promoter (hU6) in reverse orientation to the EF1-driven transcript. Extended Data Figs. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#Fig8\" rel=\"nofollow noopener\" target=\"_blank\">1<\/a> and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#Fig10\" rel=\"nofollow noopener\" target=\"_blank\">3<\/a> provide simplified illustrations of the plasmid design and barcode position. Respective FASTA sequences of plasmids are available on request. Plasmids were validated by restriction digest and Sanger sequencing (Microsynth).<\/p>\n<p>Selected REDPRO-BC barcode sequences based on Visium mouse transcriptome probe set v1 were as follows.<\/p>\n<p>Myc<\/p>\n<p>Olfr103, TGGGAGTGAGAGACATACAAGAACCACAGCCCTTTCTCTTTGCTATTTTC; Tas2r102, AACACAAGTGTGAATACCATGAGCAATGACCTTGCAATGTGGACCGAGCT; Vmn1r1, TAAAAGGCAGTGTCAGTACCTTCACAACACCAGCATTTCCCGCAAAGCAT; Olfr1018, CAGTTCCATGGTTATCAATGTTCTCACCTTGAGTTTGCCCTACTGTGGAC; Tas2r118, TTATTGGCACTGTGTTTGATAAGAAATCTTGGTTCTGGGTCTGCGAAGCT; Vmn1r174, ACTTCAACCAGAGGCCAGAGCAGCAAACACAATTCTCATGCTGATGATCA; Olfr1, TGGCCAGCATCTTTCTTGTCCTTCCATTTGCACTCATTACCATGTCCTAT.<\/p>\n<p>mtCtnnb1<\/p>\n<p>Olfr1000, GGCACAGTAGGTATGTTCACTGGTCTGATAATTCTGGGGTCCTATGTATG; Tas2r103, TGTCACTAATCACAGGGTTCTTGGTATCATTATTGGACCCAGCTTTATTG; Vmn1r178, GTCTCTTCATGAGTCATTTCAGTAAAGTTTTTGCTGCAGGATTCCCCACT; Olfr1019, TGCTTGGTCCTAATGCTGGGCTCTTACTTCGCTGGCCTAGTGAGTTTAGT; Tas2r119, GATATCCAGGTTGGTGCCATGGCTGATCCTGGCATCTGTGGTCTATGTAA; Vmn1r175, AGTACAAACATGTGCTCCACCTGCTTTCTGAGCACTTATCAGCTTGTCAC.<\/p>\n<p>NICD<\/p>\n<p>Olfr1006, AGGGAACATGTTGCTGGTTGTTTTAATCCGAATTGATTCTAGACTGCATA; Tas2r105, GACCTCGGAGATGTACTGGGAGAAAAGGCAATTCACTATTAACTACGTTT; Vmn1r139, AAGCATTGGCAAGTCACAGGCAAAGAGTGACACAGAGACGTTCCTCAATT.<\/p>\n<p>VEGFA<\/p>\n<p>Olfr1002, AGGCCTTATAAGCACTGTGGTCCATACTACTTCTGCATTTATTCTTCCAT; Tas2r104, TAACGTGGCTAGCTTCCTTTCCGCTAGCTGTGAAGGTCATTAAAGATGTT; Vmn1r12, ACTACATTGTCAGGAGCTTGATTTTAACTGTGACAACTTCCAGGGATATG.<\/p>\n<p>shPTEN<\/p>\n<p>Olfr1013, GTACACATTGACTTTGATGGGAAATAGCTCCCTCATTATGTTAATCTGCA; Tas2r110, ACTAGTGAATATCATGGACTGGACCAAGAGAAGAAGCATTTCATCAGCGG; Vmn1r170, TGATTCTCCTGAACAGACACCACCACAGACTGCAGCATATTCAATCCACA; Olfr1015, TGTCTATGTGAAAATCCTTTCCAGTATGGTGGGCTTCACTGTCCTCTCAA; Tas2r114, TGTAATTTGTCTGTTAATCCCAGAAAGCAACTTGTTATTCATGTTTGGTT; Vmn1r172, GGAAGTAAATGCCCAGAGAGTCTTCAAAGGAAGACAGTCATAGCTGTTTT.<\/p>\n<p>shREN<\/p>\n<p>Olfr1008, CCAGGCTCTGCTATTCACCAGTAAAATTTTCACATTAACTTTCTGTGGCT; Tas2r106, AAGGCACTGAAGCAATTAAAATGCCATAAGAAAGACAAGGACGTCAGAGT; Vmn1r157, CAGATCCTCTTGCTTTGCCATTTTGAGGTTGGGACCGTGGCCAATGTCTT.<\/p>\n<p>shTRP53<\/p>\n<p>Olfr1009, CCAGAGACTCTGCATACAGCTGGTGATCGGACCCTATGCTGTTGGCTTTT; Tas2r107, GCTCTCTAAGATCGGTTTCATTCTCATTGGCTTGGCGATTTCCAGAATTG; Vmn1r167, GTTTCAGTATAGGCATGCGCATCTTATCATTTGCCCATGATGGAGTGTTC; Olfr1014, TTGCTGTGTATGCATTAACTGTGTTAGGAAACAGCACCCTCATTGTGTTG; Tas2r113, GATCAATCATTGTAACTTTTGGCTTACTGCAAACTTGAGCATCCTTTATT; Vmn1r171, AACAGCACTGCCCTCATGATCACTATTCCGTTGACCAATGAAGTTGTCTC; Olfr107, TTACTGCTTTCTTGCTCAGACACTCACCTCAGTGAGGGCCTGATGATGGC.<\/p>\n<p>shKMT2C<\/p>\n<p>Olfr1012, ATCTACTCTCGGCCAAGTTCCAGTTATTCCTTGGAAAGGGATAAAATGGT; Tas2r109, TTCTAGAATTTTCCTGCTCTGGTTCATGCTAGTAGGTTTTCCAATTAGCT; Vmn1r169, GGTACCTGGGGTAGGGTGATGCTCCATGGAAGAGCCCCCAAATTTGTGAG.<\/p>\n<p>Histopathology<\/p>\n<p>After fixation, representative specimens of the liver were routinely dehydrated, embedded in paraffin and cut into 4-\u03bcm-thick sections. The tissue sections were stained with H&amp;E according to standard protocols. Slides were scanned using a SCN400 slide scanner (Leica Biosystems) at \u00d720 magnification.<\/p>\n<p>Nodule annotation was initially performed by experienced pathologists (D.F.T. and H.W.) based on H&amp;E-stained sections using the quPath software and the 10X Loupe Browser software. Nodule annotation was further refined based on specific transcript expression (example provided in Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">3<\/a>) using the 10X Loupe Browser software (H.W. and M.B.).<\/p>\n<p>IHC<\/p>\n<p>After heat-induced antigen retrieval at pH\u20096 or pH\u20099, FFPE tissue sections were incubated overnight with the primary antibody and blocked with hydrogen peroxide if necessary. Depending on the primary antibody, an anti-mouse or anti-rabbit secondary antibody conjugated to horseradish peroxidase (HRP) and alkaline phosphatase (AP), respectively, was applied (PolyviewPlus, ENZO Life Sciences GmbH). The signal was visualized using either 3,3\u2032-diaminobenzidine (Dako liquid DAB+ substrate, Agilent Technologies, Inc.) or AP (Permanent AP red, Zytomed Systems) as a chromogen. Details are given in Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#Tab1\" rel=\"nofollow noopener\" target=\"_blank\">1<\/a>.<\/p>\n<p>Table 1 Details of antibodies used in IHC<\/p>\n<p>Slides were scanned using a SCN400 slide scanner (Leica Biosystems) at \u00d720 magnification. The individual histochemical GFP, RFP and GS staining was evaluated using the quPath software as: high, very intense uniform staining; moderate, moderate intensity or intense non-uniform staining; and low, low intensity and non-uniform staining. Mapping of barcode signals to respective IHC data was performed by manual assessment of marker staining on IHC images using the quPath software (H.W. and M.B.). Next, corresponding tumour nodules were selected, categorized and stratified using the 10X Loupe Browser software (H.W. and M.B.). Note that this approach can be error-prone due to shifts in the z-plane based on serial sectioning for each IHC sample and samples used for 10X Visium.<\/p>\n<p>Computational data analysis<\/p>\n<p>We used Python (v3.9.12) and the packages anndata (v0.11), scanpy (v1.9.8), squidpy (v1.4.1), sagenet (v1.1.0), Cell2module (GitHub version, retrieved in February 2024), pandas (v2.0.3), Torch (v2.1.1), Numpy (v1.24.3), Matplotlib (v3.7.2), Pyro (v1.8.6), SciPy (v1.11.3) and alpha_shape (GitHub clone c171a7d). We used R (v4.3.0) and the packages SingleCellExperiment (v1.24.0), ZellKonverter (v1.12.1), scater (v1.30.1), ComplexHeatmap (v2.16.0), glasso (v1.11), FSA (0.9.6), dplyr(1.1.4), ggplot2 (v3.5.1), igraph (v2.0.1.1) and scran (v1.28.2).<\/p>\n<p>GenotypingBarcode expression pre-preprocessing<\/p>\n<p>Before analysing the 10X Visium data, we applied a filtering criterion of unique molecular identifier counts of &gt;5,000. For the CytAssist platform, we excluded the outermost layer of spots due to unexpectedly high unique molecular identifier counts. In addition to manually identifying cancerous nodule regions, we annotated \u2018normal tissue\u2019 regions to acquire representation of areas without any cancerous cells. The selection of normal regions was based on a minimum distance of 250\u2013700\u2009\u00b5m from the nearest tumour, depending on the tissue section, to minimize contamination from adjacent tumour regions. Tumour nodules were defined as described in the \u2018Histopathology\u2019 section.<\/p>\n<p>Bayesian modelling of perturbation probabilities from barcode counts<\/p>\n<p>In our model, the observed expression count matrix Ds,b (spots s by barcode genes b) is assumed to follow a negative binomial distribution. This matrix has a mean \u03bbs,b and overdispersion \u03d5b. The overdispersion parameter \u03d5 is sampled from a Gamma distribution (shape\u2009=\u20091,000; rate\u2009=\u20090.03) skewed towards higher values to encourage the likelihood to approximate a Poisson distribution in the absence of overdispersion evidence.<\/p>\n<p>The mean expression for each spot is calculated as:<\/p>\n<p>$${{\\rm{\\lambda }}}_{{s},{g}}={{{\\mu }}}_{{s}}{\\sum }_{{r}}{{A}}_{{s},{r}}{\\sum }_{{g}}{{G}}_{{r},{g}}{{B}}_{{g},{b}}{{\\kappa}_{{b}}}+{{\\rm{\\xi }}}_b$$<\/p>\n<p>where \u03bcs represents the sensitivity of each spot, As,r maps spots to clonal nodules r, Gr,g estimates the expected number of integrated copies of plasmid g, Bg,b links plasmids to their corresponding barcodes and \u03bab is the barcode expression rate; \u03beb is a barcode-specific additive noise term. The per-nodule plasmid integration number (Gr,g) is modelled as an expected count of integration events, described by Fr,g,o. Here Fr,g,o captures the probability of no integration and higher indices reflect the integration of increasing numbers of copies. This is modelled using a Dirichlet distribution with a uniform concentration parameter and an order o\u2009=\u20096. This assumes the maximum of six copies of the same plasmid per clone, balancing the need to capture dosage-dependent variation with the practicality of limiting the number of parameters to be learnt. For normal tissue regions, the probability of perturbation presence was fixed to 1\u2009\u00d7\u200910\u22123 to indicate near absence, but not zero, to prevent numerical instabilities.<\/p>\n<p>Both \u03bag and \u03beg are sampled from a weakly regularized exponential distribution with a rate of one. Spot sensitivity \u03bcs is modelled by a gamma distribution (shape\u2009=\u20093; rate\u2009=\u20090.3) that is weakly centred around one. This parameter accounts for both the sensitivity variability across 10X Visium spots and the dilution effects on the barcode signal due to varying tumour purity.<\/p>\n<p>Perturbation probability model inference<\/p>\n<p>We infer our Bayesian model using a variational posterior approximation. Specifically, we employ a log-normal guide distribution to approximate the parameters that have exponential and gamma distributed priors. In addition, we use a Dirichlet approximation for the posterior of Fr,g,o. The model and its inference framework are implemented in Pyro (v1.8.6)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 75\" title=\"Bingham, E. et al. Pyro: deep universal probabilistic programming. J. Mach. Learn. Res. 20, 973&#x2013;978 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR75\" id=\"ref-link-section-d96566545e3327\" rel=\"nofollow noopener\" target=\"_blank\">75<\/a>. The variational approximation is conducted via the stochastic variational inference method<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 76\" title=\"Wolf, F. A., Angerer, P. &amp; Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR76\" id=\"ref-link-section-d96566545e3331\" rel=\"nofollow noopener\" target=\"_blank\">76<\/a>, employing the Adam optimizer set at a learning rate of 0.01 and using three samples for Kullback\u2013Leibler divergence estimation. We perform inference over 10,000 gradient steps, monitoring the evidence lower bound to assess convergence.<\/p>\n<p>Occurrence of individual perturbation combinations<\/p>\n<p>Considering the probabilistic nature of our estimates for Fr,g,o, we utilized samples from the estimated posterior to analyse tumour populations. Due to uncertain integration copy number estimates, we focused on presence\/absence categories. These probabilities were computed as 1\u2009\u2212\u2009Fr,g,o, and representative genotypes were sampled with Bernoulli distribution for each region and perturbation. We aggregated the data across 324 nodules into 256 possible genotype states, which allows us to compute medians and CIs for marginal integration numbers (indicating the count of different plasmids integrated) and individual perturbation for frequencies as well as frequencies of individual genotypes.<\/p>\n<p>Although it may be tempting to interpret high frequencies of genotype occurrence as advantageous for tumour proliferation, such raw frequencies could be confounded by technical factors such as initial plasmid concentration and integration rate. To address this, we constructed a null hypothesis (H0) over the 256 individual genotype numbers. This hypothesis holds the marginal expected number of integrations and perturbation frequencies constant across the population, attributing variations solely to technical effects, and assumes that the genotypes are independently distributed.<\/p>\n<p>In practice, we adjust the observed perturbation frequencies to account for technical biases by normalizing these frequencies\u2014dividing the average observed perturbation frequency for each perturbation by the sum of all plasmid frequencies and multiplying by the expected number of integrations. We then simulate the distribution of genotypes by drawing Bernoulli samples using these rescaled probabilities for each perturbation. This process is repeated for the number of nodules (324) and the results are aggregated back into the 256 genotype states to create a sampling strategy that reflects the desired properties. By comparing deviations between 5,000 samples drawn from both the inferred posterior (observed) and the simulated H0 (expected), we can identify genotypes with significant tumorigenic effects (observed\u2009&gt;\u2009expected) or disadvantageous effects (observed\u2009&lt;\u2009expected).<\/p>\n<p>Co-occurrence ORs and model of second-order interaction effect<\/p>\n<p>With posterior estimates of the genotypes within the tumour population, we can test for interaction effects between individual perturbations. Wrange of variables accountinge categorize the frequencies of perturbations A and B into four groups: p00(A\u2212B\u2212), p01(A\u2212 and B+), p10 (A+B\u2212) and p11(A+B+). The system can be described using a softmax linear model expressed as:<\/p>\n<p>$${{p}}_{{i},\\;{j}}=\\exp ({{\\rm{\\theta }}}_{00}+{{i}{\\rm{\\theta}}}_{10}+{{j}{\\rm{\\theta }}}_{01}+{{ij}{\\rm{\\theta }}}_{11})\/{\\sum }_{ij}\\exp ({{\\rm{\\theta }}}_{00}+{{i}{\\rm{\\theta }}}_{10}+{{j}{\\rm{\\theta }}}_{01}+{{ij}{\\rm{\\theta }}}_{11})$$<\/p>\n<p>Here, \u03b800 and \u03b801 represent the effects of individual perturbations and \u03b811 is the interaction effect. By setting \u03b800 to zero to eliminate softmax non-identifiability and using Z as the normalization constant \\({\\Sigma}_{i,\\;j}\\;{\\rm{e}}^{\\theta_{i,\\;j}}\\), we derive the following relationships:<\/p>\n<p>$${p}_{10}\/{p}_{00}=[{e}^{{\\theta }_{10}}\/Z\\;]\/[1\/Z\\;]={e}^{{\\theta }_{10}}$$<\/p>\n<p>similarly,<\/p>\n<p>$${p}_{01}\/{p}_{00}={e}^{{\\rm{\\theta }}_{01}}$$<\/p>\n<p>and<\/p>\n<p>$${{{p}}}_{11}\/{{{p}}}_{00}={{{e}}}^{{\\rm{\\theta }}_{10}+{\\rm{\\theta }}_{01}+{\\rm{\\theta }}_{11}}$$<\/p>\n<p>Thus, computing the pairwise odds ratios (OR) effectively determines the interaction effect \u03b811:<\/p>\n<p>$$\\begin{array}{l}{\\rm{OR}}={{{p}}}_{11}{{{p}}}_{00}\/{{{p}}}_{10}{{{p}}}_{01}={{{p}}}_{11}{{{p}}}_{00}\/[{{{p}}}_{10}\/{{{p}}}_{01}][{{{p}}}_{01}\/{{{p}}}_{00}]\\\\\\quad\\quad={{{e}}}^{{\\rm{\\theta }}_{10}+{\\rm{\\theta }}_{01}+{\\rm{\\theta }}_{11}}\/{{{e}}}^{{\\rm{\\theta }}_{10}}\\,{{{e}}}^{{\\rm{\\theta}}_{01}}={{{e}}}^{{\\rm{\\theta }}_{11}}\\end{array}$$<\/p>\n<p>For each gene pair, we estimated the ORs and assessed their significance by drawing 2,000 samples from the posterior probability of perturbation presence for each nodule. By fitting a softmax linear model directly to the data and setting the interaction effect \u03b811 to zero (OR\u2009=\u20091), we simulated the expected probabilities for the pairwise groups under the assumption of no interaction.<\/p>\n<p>Genotype-to-phenotype GLM<\/p>\n<p>To explore the relationships between inferred perturbation probabilities and phenotypic features\u2014specifically, IHC staining status and gene expression\u2014we employed a GLM model. Here the inferred perturbation probabilities serve as the explanatory variables X.<\/p>\n<p>For the IHC staining analysis conducted at the nodule level, we used binary annotations (positive\/negative) and modelled the outcomes with a Bernoulli distribution. The staining status for each nodule, Yr,m,k (r, region; m, gene; k, sample) is modelled as:<\/p>\n<p>$${{{Y}}}_{{{r}},{{m}},{{k}}} \\sim {\\rm{Bernoulli}}({\\rm{\\sigma }}({\\sum }_{{{g}}}{{{X}}}_{{{r}},{{g}}}{{{w}}}_{{{g}},{{m}}})+{{{z}}}_{{{k}}})$$<\/p>\n<p>where \u03c3(x)\u2009=1\u2009\/\u20091\u2009+\u2009e\u2212x is the sigmoid link function. The weight matrix wg,m, akin to L1 regularization, is sampled from a Laplace distribution centred at zero with a scale parameter b. We set the scale to one for the intercept and 0.1 for the perturbation weights to impose stronger regularization on the perturbations. The batch effect zk for each sample k is also sampled from a Laplace distribution (0, 1). The explanatory variable Xr,g is directly sampled from 1\u2009\u2212\u2009Fr,g,0, estimated by the perturbation probability model (non-learnable in the GLM).<\/p>\n<p>Gene expression is modelled similarly, with few key differences. As gene expression is recorded as a non-zero integer at the spot level s, we use a Poisson distribution:<\/p>\n<p>$${{Y}}_{{s},{m},{k}} \\sim {\\rm{Poisson}}({{{\\mu }}}_{{s}}\\exp ({\\sum }_{{g}}{{X}}_{{s},{g}}{{w}}_{{g},{m}}+{{z}}_{k}))$$<\/p>\n<p>In addition to the parameters used in the IHC model, spot sensitivity \u03bcs is factored in, sampled from the posterior of the perturbation probability model. Xs,g is calculated as \\({\\sum }_{{r}}{{A}}_{{s},{r}}(1-{{F}}_{{r},{g},0})\\). The weight matrix wg,m for perturbation-related weights is sampled from a Laplace distribution centred at zero with a strongly regularizing scale b\u2009=\u20091\u2009\u00d7\u200910\u22123.<\/p>\n<p>GLM inference<\/p>\n<p>We infer our Bayesian model using a mean field variational posterior approximation. The model and its inference framework are implemented in Pyro (v1.8.6)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 75\" title=\"Bingham, E. et al. Pyro: deep universal probabilistic programming. J. Mach. Learn. Res. 20, 973&#x2013;978 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR75\" id=\"ref-link-section-d96566545e4858\" rel=\"nofollow noopener\" target=\"_blank\">75<\/a>. The variational approximation is conducted via the stochastic variational inferederive the following relationships:nce method<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 76\" title=\"Wolf, F. A., Angerer, P. &amp; Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR76\" id=\"ref-link-section-d96566545e4862\" rel=\"nofollow noopener\" target=\"_blank\">76<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 77\" title=\"Hoffman, M. D., Blei, D. M., Wang, C. &amp; Paisley, J. Stochastic variational inference. J. Mach. Learn. Res. 14, 1303&#x2013;1347 (2013).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR77\" id=\"ref-link-section-d96566545e4865\" rel=\"nofollow noopener\" target=\"_blank\">77<\/a>, employing the Adam optimizer set at a learning rate of 0.01 and using three samples for Kullback\u2013Leibler divergence estimation. At each gradient descent step, the parameters X and \u03bc are sampled from their respective posterior distributions, as estimated by the perturbation probability model. This approach integrates the uncertainties associated with their estimation directly into the GLM framework. We perform inference over 2,000 gradient steps, monitoring the evidence lower bound to assess convergence.<\/p>\n<p>Statistical analysis of the leave-one-out experiment<\/p>\n<p>The leave-one-out experiment (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#Fig7\" rel=\"nofollow noopener\" target=\"_blank\">7<\/a>) evaluates whether the removal of the previously identified epistatically interacting perturbation, VEGFA or mtCTNNB1, from the eight-plasmid mix affects the formation of HCC or CCA. Three setups were tested: (1) all eight perturbations, (2) all perturbations minus VEGFA and (3) all perturbations minus mtCTNNB1. For each setup, four mice were used. Tumours were identified and counted on two independent liver sections per mouse, with H&amp;E histopathology and CK19 staining employed to differentiate between HCC and CCA subtypes. To assess group differences using a commonly applied approach, we performed a two-sided Kruskal\u2013Wallis test, followed by Dunn\u2019s post-hoc test with Holm\u2013Bonferroni correction for multiple testing. We further used an alternative approach to model tumour burden across experimental groups while accounting for biological variability and applied a Bayesian hierarchical Poisson regression.<\/p>\n<p>This statistical model aims to estimate the occurrence rate of HCC or CCA types across different experimental conditions, accounting for group-level and animal-specific variability. Tumour counts, yi, observed for each slide i are modelled as Poisson-distributed outcomes. The rate parameter \u03bci for each observation depends on the experimental condition and animal-specific effects. Group-level experiment effects (\u03b2g) are drawn from a normal distribution \u03b2g\u2009~\u2009N(0, \u03c3\u03b2), where variability controlled by the hyperprior \u03c3\u03b2\u2009~\u2009HalfCauchy(1.0). Animal-specific random effects (\u03b1a) account for individual variability and are sampled from N(0, \u03c3a), with \u03c3\u03b1\u2009~\u2009HalfCauchy(0.05). The latter is assumed to be less strong than the group-level effect, which is reflected in a more regularised hyperprior. The model specifies the expected log-counts as log(\u03bci)\u2009=\u2009\u03b2g\u2009+\u2009\u03b1a, where g and a index the group and animal associated with observation i. The tumour counts are then sampled from a Poisson distribution, yi\u2009~\u2009Poisson(\u03bci).<\/p>\n<p>The posterior parameter distribution was estimated using 3,000 Markov chain Monte Carlo samples. The significance of differences between experimental effects was assessed by comparing the group-level parameters (\u03b2g) across groups. P values were computed as the proportion of posterior samples where the differences in \u03b2g exceeded or fell below zero, depending on the direction of the expected effect. Model-derived estimates corroborated the findings of the non-parametric Kruskal\u2013Wallis analysis.<\/p>\n<p>Reading Visium space ranger output into data objects<\/p>\n<p>We utilized the anndata package (v0.11) in Python and the SingleCellExperiment package (v1.24.0) in R to generate and manage 10X Visium data objects. To facilitate communication between R and Python, we employed ZellKonverter (v1.12.1). For data processing, we employed Scanpy (v1.9.8), squidpy (v1.4.1) and scater (v1.30.1) in Python and R<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 76\" title=\"Wolf, F. A., Angerer, P. &amp; Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR76\" id=\"ref-link-section-d96566545e5006\" rel=\"nofollow noopener\" target=\"_blank\">76<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 78\" title=\"Palla, G. et al. Squidpy: a scalable framework for spatial omics analysis. Nat. Methods 19, 171&#x2013;178 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR78\" id=\"ref-link-section-d96566545e5009\" rel=\"nofollow noopener\" target=\"_blank\">78<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 79\" title=\"McCarthy, D. J., Campbell, K. R., Lun, A. T. L. &amp; Wills, Q. F. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33, 1179&#x2013;1186 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR79\" id=\"ref-link-section-d96566545e5012\" rel=\"nofollow noopener\" target=\"_blank\">79<\/a>. We refined our analysis by subsetting all objects to include only features shared across all 11 slides, resulting in a total of 19,464 genes.<\/p>\n<p>Publicly available databases of cell-type markers<\/p>\n<p>We used scLiverDB, PanglaoDB and MSigDB to collect an initial set of marker genes for prevalent cell types and gene sets in normal and tumour liver tissues of mouse and human<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Pan, Q. et al. scLiverDB: a database of human and mouse liver transcriptome landscapes at single-cell resolution. Small Methods 7, e2201421 (2023).\" href=\"#ref-CR80\" id=\"ref-link-section-d96566545e5025\">80<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Franz&#xE9;n, O., Gan, L.-M. &amp; Bj&#xF6;rkegren, J. L. M. PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database 2019, baz046 (2019).\" href=\"#ref-CR81\" id=\"ref-link-section-d96566545e5025_1\">81<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 82\" title=\"Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417&#x2013;425 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR82\" id=\"ref-link-section-d96566545e5028\" rel=\"nofollow noopener\" target=\"_blank\">82<\/a>. This yielded a list of a total 2,323 genes.<\/p>\n<p>Data normalization and preprocessing<\/p>\n<p>After applying filtering criteria to exclude genes with raw counts of &lt;10 or &gt;1\u2009\u00d7\u2009106 for any individual slide, as well as barcode genes, the count matrices were normalized to each spot to ensure a total count of 1\u2009\u00d7\u2009104. Subsequently, the normalized values were log-transformed (log(x\u2009+\u20091)). This preprocessing was executed in Python using Scanpy (v1.9.8). Utilizing Scanpy (v1.9.8) with the Seurat flavour, we identified 15,000 highly variable genes for each of the 11 Visium and Visium CytAssist slides. The intersection of these sets resulted in 9,205 genes. Subsequently, in the final refinement step, we narrowed the gene set down to 80 core markers, resulting in 7,361 genes. This final gene set was employed for all subsequent phenotype analyses and visualizations.<\/p>\n<p>Gene coexpression networks<\/p>\n<p>Using the spot level-normalized expression values after filtering out uninformative genes, we conducted Gaussian graphical modelling<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 83\" title=\"Friedman, J., Hastie, T. &amp; Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432&#x2013;441 (2008).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR83\" id=\"ref-link-section-d96566545e5055\" rel=\"nofollow noopener\" target=\"_blank\">83<\/a> to infer a sparse gene coexpression network. We utilized the R package glasso (v1.11) for this purpose. The regularization parameter was optimized through a grid-search approach and set to 0.3. Following the construction of the initial Gaussian graphical modelling, we refined the network by filtering edges to retain only those with a Pearson\u2019s pairwise correlation coefficient of at least 0.25. The isolated genes were, in turn, dropped from the graph. We used the R package igraph (v2.0.1.1) to visualize the graphs.<\/p>\n<p>Nodule-level expression aggregation<\/p>\n<p>We computed two types of aggregates for normalized expression values within each nodule: mean-based and quantile-based. For the mean-based aggregates, we calculated the average normalized expression of each gene across all spots within each nodule. For the quantile-based aggregates, we determined q95 of expression values across all spots per nodule. We used these aggregates in the subsequent nodule-level analyses.<\/p>\n<p>Binarization of nodule phenotypes<\/p>\n<p>After quantile normalization, we computed the average of the scaled expression values for the core markers per phenotype, we then thresholded the values by 0.5, where all nodules with an aggregate value of &gt;0.5 are considered to have the corresponding phenotype signature and otherwise not. The mean-based aggregates for each gene are quantile-normalized further by<\/p>\n<p>$${x}-{{q}}_{25}\/{q}_{99}-{{q}}_{25}$$<\/p>\n<p>where q25 and q99 represent the 25th and 99th quantile values of the aggregate gene expression for the corresponding gene across all nodules and all slides. We then binarized the values &gt;0.5 as one (on) and otherwise zero (off).<\/p>\n<p>Binarization of nodule TME signatures<\/p>\n<p>We binarized the TME signatures following the same procedure as for nodule phenotypes but using the initial quantile-based aggregates instead.<\/p>\n<p>Phenotype and TME heatmaps<\/p>\n<p>We generated heatmaps of scaled gene expression using the processed expression values at the nodule level, employing ComplexHeatmap (v2.16.0)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 84\" title=\"Gu, Z., Eils, R. &amp; Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847&#x2013;2849 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR84\" id=\"ref-link-section-d96566545e5170\" rel=\"nofollow noopener\" target=\"_blank\">84<\/a>. Clustering of both rows and columns was performed based on Spearman\u2019s correlations. In addition, hierarchical clustering was applied to subdivide genes into clusters. The colour bar associated with genes indicates their corresponding phenotypes, with emphasis on the core markers. An attached annotation heatmap illustrates scaled (p10) estimated plasmid probabilities per nodule.<\/p>\n<p>Spatial integration and nodule unification<\/p>\n<p>To integrate all slides into a unified embedding space and classify spots based on their phenotypic signatures in an unbiased manner, we employed an ensemble spatially-aware classifier implemented in SageNet (v1.1.0)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 85\" title=\"Heidari, E. et al. Supervised spatial inference of dissociated single-cell data with SageNet. Preprint at bioRxiv &#010;                https:\/\/doi.org\/10.1101\/2022.04.14.488419&#010;                &#010;               (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR85\" id=\"ref-link-section-d96566545e5186\" rel=\"nofollow noopener\" target=\"_blank\">85<\/a>. Data from all slides were trained and fed into this classifier. Subsequently, we clustered the spots within the embedded space using Scanpy\u2019s wrapper of Leiden clustering (with a resolution of one)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 86\" title=\"Traag, V. A., Waltman, L. &amp; van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#ref-CR86\" id=\"ref-link-section-d96566545e5190\" rel=\"nofollow noopener\" target=\"_blank\">86<\/a>. We then performed voting classification to classify nodules to the most-dominant class across the spots belonging to the corresponding nodules. We call these classes the \u2018unified nodule annotations\u2019. Finally, we extracted spatially informative genes from the SageNet model.<\/p>\n<p>Inter-nodule differential gene expression<\/p>\n<p>To delve deeper into inter-nodule transcriptional differences, we conducted differential gene expression analysis using the FindMarkers method from the R package scran (v1.28.2). This method allowed us to perform a light-weight differential gene expression analysis on the unified nodule annotations.<\/p>\n<p>Cell type-informed factor analysis<\/p>\n<p>We concatenated all raw anndata objects and subsetted them to the set of \u2018core markers\u2019 and associated genes (as listed in Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#Fig5\" rel=\"nofollow noopener\" target=\"_blank\">5<\/a>) as well as 500 highly variable genes across slides. We then used cell2module (<a href=\"http:\/\/github.com\/vitkl\/cell2module\" rel=\"nofollow noopener\" target=\"_blank\">github.com\/vitkl\/cell2module<\/a>) to perform non-negative matrix factorization. The cell2module model treats raw RNA count data D as negative-binomial (NB) distributed, given transcription rate \u03bcc,g and a range of variables accounting for technical effects:<\/p>\n<p>$${{D}}_{{c},{g}} \\sim {\\rm{NB}}({{\\mu }}={{{\\mu }}}_{{c},{g}},{{\\rm{\\alpha }}}_{{a},{g}})$$<\/p>\n<p>and<\/p>\n<p>$${{{\\mu }}}_{{c},{g}}=({\\sum }_{{f}}{{w}}_{{c},\\;{f},}{{g}}_{{f},{g}}+{{s}}_{{e},{g}}) {{y}}_{{c}}$$<\/p>\n<p>where \u03bcc,g denotes expected RNA count g in each cell c, \u03b1a,g denotes the per gene g stochastic\/unexplained overdispersion for each covariate \u03b1, wc,f denotes cell loadings of each factor f for each cell c, gf,g denotes gene loadings of each factor f for each cell c, se,g denotes additive background for each gene g and for each experiment c to account for contaminating RNA and yc denotes normalization for each spot c to account for RNA-detection sensitivity, sequencing depth. We recovered 40 factors representing groups of coexpressing cell-type signatures using the default cell2module parameters. After training, we inferred the posterior of the gene loadings per factor. Subsequently, we extracted genes with the top five posterior median values and compared them with predefined marker gene lists per cell type. Finally, we mapped each factor to the cell type with the highest number of overlapping genes.<\/p>\n<p>Reporting summary<\/p>\n<p>Further information on research design is available in the <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41551-025-01437-1#MOESM2\" rel=\"nofollow noopener\" target=\"_blank\">Nature Portfolio Reporting Summary<\/a> linked to this article.<\/p>\n","protected":false},"excerpt":{"rendered":"Animal experiments Group size was determined on the basis of our experience in previous experiments68,69. For HDTV injections,&hellip;\n","protected":false},"author":2,"featured_media":30794,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25],"tags":[18924,3251,18921,18922,3250,916,90,18923,56,54,55],"class_list":{"0":"post-30793","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-genetics","8":"tag-biomedical-engineering-biotechnology","9":"tag-biomedicine","10":"tag-cancer-models","11":"tag-epistasis","12":"tag-general","13":"tag-genetics","14":"tag-science","15":"tag-tumour-heterogeneity","16":"tag-uk","17":"tag-united-kingdom","18":"tag-unitedkingdom"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts\/30793","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/comments?post=30793"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts\/30793\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/media\/30794"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/media?parent=30793"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/categories?post=30793"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/tags?post=30793"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}