Complex genomes and their applications in agroenergy

Marcelo F. Carazzolle

Institute of Biology and Center for Computing in Engineering & Sciences – University of Campinas

The CCES bioinformatics research group has developed a methodology that allows for the reconstruction of complex genome sequences without high investments in DNA sequencing and with low computer demand. The work was coordinated by Dr. Marcelo Falsarella Carazzolle, who is a researcher at the Laboratory of Genomics and BioEnergy (LGE) at the Institute of Biology of the University of Campinas.

Comparison between hybrid cultivars of sugarcane (left) and energy-cane (right) on water stress conditions, evidencing the resistance to drought and high yield of the energy-cane.

With advances in DNA sequencing technologies, many genomes have been studied in detail, vastly expanding the knowledge about the organisms of our planet. Genomes of microorganisms and plants are important sources of genes for identifying new proteins with applications in biotechnology, such as fungus resistance, resistance to drought, and biochemical production. In parallel, the knowledge of the human genome, more specifically of the genome of each person, has been a very important tool for personalized medicine, especially for more accurate diagnoses.

Even with these advances in genomics, the genomes of many important organisms is still not sequenced because their high complexity. A complex genome is a very large genome, typically with tens of billions of base pairs (for comparison, the human genome has 3.2 billion base pairs), which has a large number of copies of each chromosome, usually different from each other, and large regions containing only repetitive sequences.

Several agricultural crops, such as corn, wheat, sugarcane and other grasses, have complex genomes. Their genomes have been sequenced by large consortia involving many scientists and investments in massive DNA sequencing. Among these scientists, bioinformatics researchers have a fundamental role in the development of computational algorithms for reconstruction of these genomes from the sequencing data. This genome assembly step typically requires high performance computing.

“We realized that focusing only on the reconstruction of gene sequences, rather than on the complete genome of those organisms, would be computationally simpler, reducing the cost of DNA sequencing,” Carazzolle says. “Thus, we developed a computational pipeline that integrates several bioinformatics programs to perform assembly focused on the gene space, based on the concept of reference-assembly. For that, we use phylogenetically close-related species with available complete genome sequences. With this protocol, it was possible to reconstruct 90% of the genes in the wheat genome (which has 6 copies of each gene in its DNA), and to identify several genes that had not been previously described.”

This pipeline was then used to study the complex genome of the species Saccharum spontaneum, which is one of the plants that give rise to the most modern sugarcane cultivars. S. spontaneum is responsible for contributing to the resistance of sugarcane to pathogens and climatic stress. In addition, it is used for the production of energy-cane hybrids, which is a plant with high bagasse content and high productivity, being economically attractive for the production of second generation biofuels and other biochemicals. About 40,000 genes were identified and studied by comparison with known genes from other grasses. With this, it was possible to understand the remarkable characteristics of this species, such as high productivity and resistance to stress. These results can be used in functional and genetic studies for the development of new cultivars of sugarcane and energy-cane, increasing the use of bioenergy in Brazil.

The study “Unraveling the complex genome of Saccharum spontaneum using Polyploid Gene Assembler” was published in the DNA research journal in January 2019.

Associated scientific work:

L. C. Nascimento et al., Unraveling the complex genome of Saccharum spontaneum using Polyploid Gene Assembler, DNA Research, 2019 https://doi.org/10.1093/dnares/dsz001

Software: PGA (Polyploid Gene Assembler): http://www.lge.ibi.unicamp.br/pga

Machine Learning for turbulence simulation

Computação na Nuvem aplicada à Metagenômica

Related posts

Using Barrier Elision to Improve Transactional Code Generation

League of Brazilian Bioinformatics: a competition framework to promote scientific training

Elastic and ‘transparent bone’ as an electrochemical separator