The Polyploid Gene Assembler (PGA), developed and tested in this study, represents a new strategy to perform gene-space assembly from complex genomes using low coverage DNA sequencing. The pipeline integrates reference-assisted loci and de novo assembly strategies to construct high-quality sequences focused on gene content. Pipeline validation was conducted with wheat (Triticum aestivum), a hexaploid species, using barley (Hordeum vulgare) as reference, that resulted in the identification of more than 90% of genes and several new genes. Moreover, PGA was used to assemble gene content in Saccharum spontaneum species, a parental lineage for hybrid sugarcane cultivars. Saccharum spontaneum gene sequence obtained was used to reference-guided transcriptome analysis of six different tissues. A total of 39,234 genes were identified, 60.4% clustered into known grass gene families. Thirty-seven gene families were expanded when compared with other grasses, three of them highlighted by the number of gene copies potentially involved in initial development and stress response. In addition, 3,108 promoters (many showing tissue specificity) were identified in this work. In summary, PGA can reconstruct high-quality gene sequences from polyploid genomes, as shown for wheat and S. spontaneum species, and it is more efficient than conventional genome assemblers using low coverage DNA sequencing.
Nascimento, Leandro Costa, et al. “Unraveling the complex genome of Saccharum spontaneum using Polyploid Gene Assembler.” DNA Research (2019).