{"id":212761,"date":"2025-10-14T13:06:09","date_gmt":"2025-10-14T13:06:09","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/212761\/"},"modified":"2025-10-14T13:06:09","modified_gmt":"2025-10-14T13:06:09","slug":"sagpek-fast-and-flexible-approach-to-identify-genotypes-of-sanger-sequencing-data-bmc-bioinformatics","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/212761\/","title":{"rendered":"SAGPEK: fast and flexible approach to identify genotypes of Sanger sequencing data | BMC Bioinformatics"},"content":{"rendered":"<p>Software architecture<\/p>\n<p>SAGPEK mainly processes Sanger sequencing results of two kinds of samples: diploids such as humans and livestock animals and haploids such as viruses and bacteria. For diploids, SAGPEK reports the homologous or heterozygous genotypes of targeted loci. However, for haploids, it reports haplotypes. If the detected loci are located in the coding region, SAGPEK also reports alterations in amino acids.<\/p>\n<p>SAGPEK performs three main functions (see Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-025-06271-5#Fig1\" rel=\"nofollow noopener\" target=\"_blank\">1<\/a>). The first is reading in the ABI-format files as inputs. It extracts normalized fluorescence intensity signals for A, C, G, T, detects positions of maximum peaks in each channel and calculate corresponding peak amplitudes. Second, SAGPEK performs base calling to determine the primary and secondary bases by identifying the highest peak intensity and detecting if a secondary signal reaching \u2265 33% of the strongest peak. Third, SAGPEK maps anchoring sequences to primary bases to locate the targeted loci, obtains their genotypes and outputs results to a result file. Users should feed the SAGPEK anchoring sequence for each target locus, which is usually a short sequence between 6 and 10 nucleotides just in front of the target locus from the orientation of the 5\u2032 primer. SAGPEK searches anchoring sequences in the primary sequence and locates the target locus. After the coordinates of the target locus in the primary sequence are obtained, SAGPEK can judge whether the locus is homozygous or heterozygous and determine the genotype. In some situations, Sanger sequencing is performed from the reverse orientation, e.g., from 3\u2032 to 5\u2032, in which case the \u2212\u2009r option should be invoked, and the anchoring sequences should be in the 5\u2032 to 3\u2032 orientation. SAGPEK can also report amino acid alterations for each locus for each input ABI file. By default, amino acid alterations are generated in the output files. If no amino acid alteration needs to be generated, users should turn off the function with the -AA option (by default, this function is on). If enabled, the amino acid alteration option will also process loci in non-coding regions, returning \u201cNA\u201d for such sites. This does not affect genotype calling but may add unnecessary \u201cNA\u201d entries, so disabling it is recommended for non-coding analyses.<\/p>\n<p>Fig.\u00a01<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-025-06271-5\/figures\/1\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig1\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/10\/12859_2025_6271_Fig1_HTML.png\" alt=\"figure 1\" loading=\"lazy\" width=\"685\" height=\"406\"\/><\/a><\/p>\n<p>Schematic of SAGPEK implementation. The input files are in ABI format and are read in SAGPEK to generate intermediate files that record the signal intensities of four bases (A, C, G, T) for each base position of each input file. Then, SAGPEK performs base calling at each base position and records the status flag indicating whether the site is heterozygous or homozygous. At this step, SAGPEK generates two sequences: the primary sequence, which consists of bases with the highest signal intensity, and the secondary sequence, which consists of bases with the second highest signal intensity. Next, SAGPEK locates target loci by mapping anchoring sequences to the primary sequence. The anchoring sequences are either built in the SAGPEK or provided by users. After successfully mapping the anchoring sequences, SAGPEK obtains the coordinates of the target loci. SAGPEK extracts the primary and secondary bases, and the status flag indicates homozygosity or heterozygosity. By judging the status flag, SAGPEK determines the target loci genotypes and writes the results to the output files. SAGPEK also reports the alteration of corresponding amino acids to the output files<\/p>\n<p>Software functionalities<\/p>\n<p>SAGPEK has built-in anchoring sequences for common drug-resistant loci of HBV, the loci of inherited metabolic diseases of the newborn population, mutations in citrullinemia (CN) [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 38\" title=\"Gebreselassie G, De S, Mukherjee A, Gohain M. Allele specific PCR based genotyping reveals lack of bovine leukocyte adhesion deficiency, bovine citrullinaemia and complex vertebral malformation carriers in Indian cattle population. IntJCurrMicrobiolAppSci. 2019;8:1581\u20139. &#010;                  https:\/\/doi.org\/10.20546\/ijcmas.2019.805.183&#010;                  &#010;                .\" href=\"http:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-025-06271-5#ref-CR38\" id=\"ref-link-section-d632090e702\" rel=\"nofollow noopener\" target=\"_blank\">38<\/a>] and deficiency of uridine monophosphate synthase (DUMPS) [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 28\" title=\"Polish Federation of Cattle Breeders and Dairy Farmers in Warsaw, Poland, Gozdek M, Kolenda M, Department of Biotechnology and Animal Genetics, University of Technology and Life Sciences in Bydgoszcz, Poland, Kamola D, Department of Physiological Sciences, Warsaw University of Life Sciences (SGGW), et al. Report on the incidence of hereditary disorders (BLAD, DUMPS) in the Polish population of Holstein-Friesian cattle. Acta Sci Pol Zootech. 2021;19:15\u201322. &#010;                  https:\/\/doi.org\/10.21005\/asp.2020.19.3.02&#010;                  &#010;                \" href=\"http:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-025-06271-5#ref-CR28\" id=\"ref-link-section-d632090e705\" rel=\"nofollow noopener\" target=\"_blank\">28<\/a>] in cattle. For each locus, we provide its anchoring sequence in Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-025-06271-5#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">S1<\/a>, enabling users to identify and select appropriate anchors for their specific workflows.<\/p>\n<p>Users need to use the \u201c-type HBV\u201d option to invoke automapping of targeted loci for common drug-resistant loci of HBV. SAGPEK stores anchoring sequence tags for 25 common drug-resistant mutations in the reverse transcriptase domain (POL\/RT) of the HBV polymerase (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-025-06271-5#Fig2\" rel=\"nofollow noopener\" target=\"_blank\">2<\/a>). The polymerase number is defined according to the serial number of amino acids of the reverse transcriptase on the basis of the HBVdb X20763 protein number (<a href=\"https:\/\/hbvdb.lyon.inserm.fr\/HBVdb\/\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/hbvdb.lyon.inserm.fr\/HBVdb\/<\/a>). SAGPEK reads in the input ABI files, generates nucleotide sequences, and locates the drug-resistant mutation loci by mapping anchoring tags. If the anchoring tags are mapped to the nucleotide sequences, the sequence of the corresponding codon is determined and translated to the amino acid. By comparing the amino acid sequence with the built-in amino acid sequence reference, SAGPEK reveals whether the mutation occurs and what the mutation is. If one of the 25 common loci is not detected, SAGPEK will generate the unavailable field in the output file for the locus.<\/p>\n<p>Fig.\u00a02<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-025-06271-5\/figures\/2\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig2\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/10\/12859_2025_6271_Fig2_HTML.png\" alt=\"figure 2\" loading=\"lazy\" width=\"685\" height=\"304\"\/><\/a><\/p>\n<p>Schematic diagram of the detection of HBV polymerase gene mutations in SAGPEK. SAGPEK stores 25 anchoring sequences corresponding to 25 common drug-resistance mutations ranging from rt166 to rt256. All these mutations are located within the POL\/RT domain of the HBV polymerase. The POL\/RT domain is numbered from rt1 (first amino acid) to rt344 (last amino acid) according to standard HBV polymerase nomenclature<\/p>\n<p>For inherited metabolic diseases of the newborn population, SAGPEK stores anchoring sequence tags for the 14 common mutation sites of the phenylalanine hydroxylase (PAH) gene (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-025-06271-5#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">S1<\/a>, Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-025-06271-5#Fig3\" rel=\"nofollow noopener\" target=\"_blank\">3<\/a>), which is the critical gene in phenylketonuria (PKU) disease [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"van Spronsen FJ, Blau N, Harding C, Burlina A, Longo N, Bosch AM. Phenylketonuria. Nat Rev Dis Primers. 2021;7:36. &#10;                  https:\/\/doi.org\/10.1038\/s41572-021-00267-0&#10;                  &#10;                .\" href=\"#ref-CR39\" id=\"ref-link-section-d632090e750\">39<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Hillert A, Anikster Y, Belanger-Quintana A, Burlina A, Burton BK, Carducci C, et al. The genetic landscape and epidemiology of phenylketonuria. Am J Hum Genet. 2020;107:234\u201350. &#10;                  https:\/\/doi.org\/10.1016\/j.ajhg.2020.06.006&#10;                  &#10;                .\" href=\"#ref-CR40\" id=\"ref-link-section-d632090e750_1\">40<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 41\" title=\"Jin X, Yan Y, Zhang C, Tai Y, An L, Yu X, et al. Identification of novel deep intronic PAH gene variants in patients diagnosed with phenylketonuria. Hum Mutat. 2022;43:56\u201366. &#010;                  https:\/\/doi.org\/10.1002\/humu.24292&#010;                  &#010;                .\" href=\"http:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-025-06271-5#ref-CR41\" id=\"ref-link-section-d632090e753\" rel=\"nofollow noopener\" target=\"_blank\">41<\/a>]. Phenylketonuria is an autosomal recessive genetic disease. Owing to gene mutation, phenylalanine cannot be metabolized generally in the body, and abnormal accumulation of harmful metabolites produced by its bypass metabolism leads to severe mental retardation and neurobehavioral abnormalities. Pathogenic mutations, including missense mutation, splicing site mutation, silent mutation, and nonsense mutation, have been detected in 13 exons (exon, E), introns (I), the 5\u2032 untranslated region (5\u2032-UTR), and the 3&#8242; untranslated region (3\u2032-UTR) of the PAH gene. The mutation numbering is based on naming rules according to the BIOPKU database (<a href=\"http:\/\/biopku.org\/home\/home.asp\" rel=\"nofollow noopener\" target=\"_blank\">http:\/\/biopku.org\/home\/home.asp<\/a>).<\/p>\n<p>Fig.\u00a03<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-025-06271-5\/figures\/3\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig3\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/10\/12859_2025_6271_Fig3_HTML.png\" alt=\"figure 3\" loading=\"lazy\" width=\"685\" height=\"264\"\/><\/a><\/p>\n<p>Schematic diagram of the human PAH gene structure. The PAH gene has 13 coding regions and consists of 452 amino acids. SAGPEK stores 14 anchoring sequences for 14 common mutations in the PAH gene ranging from p.48 to p.415, where p.48 represents the 48th amino acid and p.415 represents the 415th amino acid of the whole protein. The 14 mutations are dispersed in the three domains of the PAH protein<\/p>\n<p>For cattle genetic defects, SAGPEK stores anchoring sequences for mutations in citrullinemia (CN) [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 38\" title=\"Gebreselassie G, De S, Mukherjee A, Gohain M. Allele specific PCR based genotyping reveals lack of bovine leukocyte adhesion deficiency, bovine citrullinaemia and complex vertebral malformation carriers in Indian cattle population. IntJCurrMicrobiolAppSci. 2019;8:1581\u20139. &#010;                  https:\/\/doi.org\/10.20546\/ijcmas.2019.805.183&#010;                  &#010;                .\" href=\"http:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-025-06271-5#ref-CR38\" id=\"ref-link-section-d632090e784\" rel=\"nofollow noopener\" target=\"_blank\">38<\/a>] and deficiency of uridine monophosphate synthase (DUMPS) [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 28\" title=\"Polish Federation of Cattle Breeders and Dairy Farmers in Warsaw, Poland, Gozdek M, Kolenda M, Department of Biotechnology and Animal Genetics, University of Technology and Life Sciences in Bydgoszcz, Poland, Kamola D, Department of Physiological Sciences, Warsaw University of Life Sciences (SGGW), et al. Report on the incidence of hereditary disorders (BLAD, DUMPS) in the Polish population of Holstein-Friesian cattle. Acta Sci Pol Zootech. 2021;19:15\u201322. &#010;                  https:\/\/doi.org\/10.21005\/asp.2020.19.3.02&#010;                  &#010;                \" href=\"http:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-025-06271-5#ref-CR28\" id=\"ref-link-section-d632090e787\" rel=\"nofollow noopener\" target=\"_blank\">28<\/a>]. For CN, the 86th amino acid codon encoded by the argininosuccinate synthetase (ASS) gene on chromosome 11 of dairy cows has a nonsense mutation (CGA-&gt;TGA), resulting in the loss of the synthetic peptide chain, which results in the loss of ASS function in recessive homozygous individuals. For DUMPS, there is a C\/T point mutation at the 405th codon of the C-terminus encoded by the UMPS gene, which leads to the mutation of the CGA of the arginine codon into the TGA terminator and results in the loss of 76 amino acids in uridine synthase.<\/p>\n<p>In addition to the built-in anchoring sequence tags, SAGPEK takes user-defined anchoring sequence tags with the \u201c-type custom\u201d option. Users should provide a header-free file with two columns: the first lists the tag names, and the second lists the anchoring sequences. If amino acid alterations need to be generated in the output file, ensure that the anchoring sequences are located just in front of the first base of the investigated codons.<\/p>\n<p>While SAGPEK\u2019s primary function is to generate genotype files, it also supports the optional output of sequencing chromatograms. By specifying the parameter -chromatogram on, users can generate chromatogram images for selected samples, enabling visual inspection of sequencing peaks.<\/p>\n<p>Illustrative examples<\/p>\n<p>SAGPEK is implemented with the Perl language. Users should install an Perl interpreter before using SAGPEK. SAGPEK can work in both Windows and Unix-like platforms. After installing the dependencies, download and unzip SAGPEK from <a href=\"https:\/\/github.com\/JINPENG-WANG\/SAGPEK\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/github.com\/JINPENG-WANG\/SAGPEK<\/a>. A folder with five subfolders (ABI, chromatogram, Custom, Genotype and lib) and one executive file (SAGPEK.pl) will be generated. The SAGPEK.pl script is used to execute the program. The ABI-format input files should be placed in the ABI folder. SAGPEK is easy to use. We describe its usage below. A summary of options is provided in Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-025-06271-5#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">S2<\/a>.<\/p>\n<p>Before using, users should install an Perl interpreter, such as from the ActiveState website (<a href=\"https:\/\/www.activestate.com\/products\/perl\/\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/www.activestate.com\/products\/perl\/<\/a>). After the installation, the directories in which executable programs are located should be added to the PATH environment variable. Before running the package, users should put the ABI-format files generated by Sanger sequencing technology into the \u201cABI\u201d file folder of the SAGPEK package. Then, the command prompt window is opened, and the working directory is changed to the SAGPEK package directory with the \u201ccd\u201d command. The program is run with the \u201cperl SAGPEK.pl -type [TYPE]\u201d command, where the TYPE should be replaced with HBV, PAH, cattle_CN, cattle_DUMPS, TEST, or custom according to the actual situation. If the \u201ccustom\u201d type is used, users should also provide the name of a file storing the customized anchoring sequences located in the \u201cCustom\u201d directory and add the file name to the command as \u201cperl SAGPEK.pl -type custom -tag ab.tags.txt\u201d. The tag file must consists of two columns. The first column stores the names of the anchoring sequences, and the second column stores the sequences. After completion, a result file consisting of genotypes or amino acid alterations is generated in the Genotype Directory. The explanation of the output files describing each output column in detail including name, definition and example values is listed in Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-025-06271-5#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">S3<\/a>. Users can also generate chromatograms for each input ABI file with the \u201cperl SAGPEK.pl -type TEST -chromatogram on\u201d command.<\/p>\n<p>Usage examples<\/p>\n<p>We list a set of common uses of SAGPEK below. We provide several ABI format files in the ABI folder as test dataset, which can be used to test SAGPEK.<\/p>\n<p>The SAGPEK can be used in Windows\/Linux\/Mac systems as follows:<\/p>\n<p>$ perl SAGPEK.pl -type TEST<\/p>\n<p>This will generate a test output file (TEST.genotype.txt) in the .\/SAGPEK\/Genotype\/ directory, allowing users to verify correct installation and execution.<\/p>\n<p>If the -AA is set to off, the SAGPEK can be run as follows:<\/p>\n<p>$ perl SAGPEK.pl -type TEST -AA off<\/p>\n<p>This will generate a test output file which only reports the genotypes of tested loci but not the amino acids alterations.<\/p>\n<p>Users can also test SAGPEK with the custom type. We have provided an tag file named \u201cab.tags.txt\u201d in the Custom folder which stores anchoring sequences and their names. Users can run SAGPEK with the following command:<\/p>\n<p>$ perl SAGPEK.pl -type custom -tag ab.tags.txt<\/p>\n<p>This will generate an output file (Custom.mutants.txt) in the .\/SAGPEK\/Genotype\/ directory, which stores genotypes of loci mapped with the anchoring sequencings in the \u201cab.tags.txt\u201d file.<\/p>\n<p>For the HBV, PAH, cattle_CN, and cattle_DUMPS types, SAGPEK has built-in anchoring sequences.<\/p>\n<p>If chromatogram is needed, the SAGPEK can be run with the following command:<\/p>\n<p>$ perl SAGPEK.pl -type custom -tag ab.tags.txt -chromatogram on<\/p>\n<p>This will generate the chromatogram plots in the chromatogram directory for all investigated input ABI files.<\/p>\n","protected":false},"excerpt":{"rendered":"Software architecture SAGPEK mainly processes Sanger sequencing results of two kinds of samples: diploids such as humans and&hellip;\n","protected":false},"author":2,"featured_media":212762,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25],"tags":[3974,130665,64,63,15373,51634,51635,130666,336,17253,86095,125628,128],"class_list":{"0":"post-212761","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-genetics","8":"tag-algorithms","9":"tag-animal-molecular-breeding","10":"tag-au","11":"tag-australia","12":"tag-bioinformatics","13":"tag-computational-biology-bioinformatics","14":"tag-computer-appl-in-life-sciences","15":"tag-drug-resistance-mutation","16":"tag-genetics","17":"tag-microarrays","18":"tag-molecular-diagnostics","19":"tag-sanger-sequencing","20":"tag-science"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/212761","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=212761"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/212761\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/212762"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=212761"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=212761"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=212761"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}