Research – compgenomics

front-slider-img-1-cmp

Needle Retention in Christmas Trees: Balsam, Canaan, and Fraser fir

PLANT COMPUTATIONAL GENOMICS LAB – JILL WEGRZYN

Department of Ecology and Evolutionary Biology

front-slider-img-2-cmp

Transcriptomics of Tamarack Needle Senescence

PLANT COMPUTATIONAL GENOMICS LAB – JILL WEGRZYN

Department of Ecology and Evolutionary Biology

front-slider-img-3-cmp

Genome Assembly of Sugar Maple and Box Elder

PLANT COMPUTATIONAL GENOMICS LAB – JILL WEGRZYN

Department of Ecology and Evolutionary Biology

front-slider-img-4-cmp

Conifer genome annotation: Douglas-fir

PLANT COMPUTATIONAL GENOMICS LAB – JILL WEGRZYN

Department of Ecology and Evolutionary Biology

front-slider-img-5-cmp

Examining Resistance to Emerald Ash Borer in Green Ash

PLANT COMPUTATIONAL GENOMICS LAB – JILL WEGRZYN

Department of Ecology and Evolutionary Biology

front-slider-img-6-cmp

Epigenetics of Sugar Pine to examine WPBR Resistance

PLANT COMPUTATIONAL GENOMICS LAB – JILL WEGRZYN

Department of Ecology and Evolutionary Biology

juglans

Comparative genomics across Juglandaceae

PLANT COMPUTATIONAL GENOMICS LAB – JILL WEGRZYN

Department of Ecology and Evolutionary Biology

juglans - copy

Toward Genomic Selection in Loblolly Pine

PLANT COMPUTATIONAL GENOMICS LAB – JILL WEGRZYN

Department of Ecology and Evolutionary Biology

Active Research

PineRefSeq - Conifer reference sequence project

Development of a high quality reference genome sequences for loblolly pine, Douglas-fir and sugar pine by means that can serve as a model approach for sequencing other large, complex genomes and empower the forest tree biology research community and the broader biological research community in the practical use and application of this resource. Our lab is focused on developing methodologies to improve and sensitivity and specificity of gene annotation. We are also interested in investigating conifer genome biology, including adaptations for exceptionally long introns and gymnosperm specific gene family evolution.
Team: Sumaira Zaman, Madison Caballero
Publications:
Neale, D. B., McGuire, P. E., Wheeler, N. C., Stevens, K. A., Crepeau, M. W., Cardeno, C., … Wegrzyn, J. L. (2017). The Douglas-Fir Genome Sequence Reveals Specialization of the Photosynthetic Apparatus in Pinaceae. G3: Genes|Genomes|Genetics. https://doi.org/10.1534/g3.117.300078

Zimin, A. V., Stevens, K. A., Crepeau, M. W., Puiu, D., Wegrzyn, J. L., Yorke, J. A., … Salzberg, S. L. (2017). An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing. GigaScience, 6(1), 1–4. https://doi.org/10.1093/gigascience/giw016

Stevens, K. A., Wegrzyn, J. L., Zimin, A., Puiu, D., Crepeau, M., Cardeno, C., Paul, R., Gonzalez-Ibeas, D., Koriabine, M., Holtz-Morris, A. E., Martínez-García, P. J., Sezen, U. U., Marçais, G., Jermstad, K., McGuire, P. E., Loopstra, C. A., Davis, J. M., Eckert, A., de Jong, P., Yorke, J. A., Salzberg, S. L., Neale, D. B., & Langley, C. H. (2016). Sequence of the sugar pine megagenome. Genetics, 204(4), 1613-1626.

Gonzalez-Ibeas, D., Martinez-Garcia, P. J., Famula, R. A., Delfino-Mix, A., Stevens, K. A., Loopstra, C. A., Langley, C. H., Neale, D. B., & Wegrzyn, J. L. (2016). Assessing the gene content of the megagenome: Sugar pine (Pinus lambertiana). G3: Genes, Genomes, Genetics 6(12), 3787-3802.

Neale D.B., Wegrzyn J. L., Stevens K.A., Zimin A.V., Puiu D., Crepeau M.W., . . . Liechty J.D. (2014). Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies. Genome biology, 15(3), R59.

Wegrzyn J. L., Liechty J.D., Stevens K. A., Wu L.-S., Loopstra C.A., Vasquez-Gross, H. A., . . . Martínez-García, P. J. (2014). Unique Features of the Loblolly Pine (Pinus taeda L.) Megagenome Revealed Through Sequence Annotation. Genetics, 196(3), 891-909.

Zimin A., Stevens K. A., Crepeau M.W., Holtz-Morris A., Koriabine M., Marçais G., Wegrzyn J. L. . . de Jong, P. J. (2014). Sequencing and Assembly of the 22-Gb Loblolly Pine Genome. Genetics, 196(3), 875-890.

Wegrzyn J. L., Lin B., Zieve J., Dougherty M., Garcia-Martinez P.J., Koriabine M., Holtz-Morris A., deJong P., Crepeau M., Langley C.H., Puiu D., Salzberg S.L., Neale D.B., Stevens K.A. (2013). Insights into the loblolly pine genome: characterization of BAC and fosmid sequences. PLoS ONE, 8(9), e72439.

Role of epigenomics and gene expression in the regulation of immune responses to WPBR in sugar pine

In highly-repetitive, densely-methylated conifer genomes, DNA methylation may play an important role in the response to pathogens, as varying levels of resistance and high levels of phenotypic plasticity are observed in natural populations of conifer species. Conifers possess some distinctive features in the RdDM pathway, such as low frequency of 24nt sRNAs, low numbers of Dicer-like 3 (DCL3) proteins, MIR loci composed by many long-terminal repeat-retrotransposons (LTR-RTs), and a high frequency of 21nt sRNAs. However, huge and complex draft genomes, limited genome-wide studies, outcrossing populations, and long-generation times have complicated the study of transgenerational inheritance. This project will investigate the White Pine Blister Rust (WPBR) - sugar pine pathosystem to understand the role of DNA methylation in regulating trans-generational immune responses. White pine blister rust (WPBR) is a devastating fungal disease causing great economic and ecological loss in five-needle pines such as sugar pine (Pinus lambertiana) in North America. In our previous studies in the pathosystem, we genotyped a large number of individuals for the presence or absence of resistance (major gene resistance, MGR and quantitative). These individuals were cloned and grown in contrasting environments for the last 20-30 years. To further examine resistance in sugar pine, we will construct a high-resolution map of genome-wide DNA methylation and examine whole-genome DNA methylation and expression in the progeny of resistant and susceptible trees after initial pathogen infection. The complex and fragmented genome assembly will be enhanced through long-read technologies and Hi-C. The resulting improvements in contiguity and accuracy will be paired with transcriptomic technologies, PacBio Iso-Seq. This will result in an enhanced understanding of transgenerational epigenetics, which could be used to reduce the very long time (~20 years) required by traditional breeding to generate resistant individuals. In specific, we aim to 1) Assess whether the RdDM pathway activity is reduced during maturity due to a decrease in the frequency of 24nt sRNAs, resulting in increased TE activity; 2) Identify the parental’ environmental factors (biotic and abiotic) that lead to hypomethylation resulting in heritable increased expression of NBS-LRR resistance genes in the offspring; 3) Evaluate the changes in DNA methylation and expression of NBS-LRR and other MGR resistance-related genes during different infection stages.
Team: Susan McEvoy
Collaboration with: Amanda De La Torre (NAU)

Investigating Genetic Variation in Green Ash to Reduce Tree Mortality Against EAB

With the recent invasion of emerald ash borer (EAB) in North America, the health and quantity of American ash (genus: Fraxinus) trees has never been more at risk. Targeting every member of the Fraxinus genus, including Fraxinus americana (white ash), Fraxinus pennsylvanica (green ash), and Fraxinus nigra (black ash), EAB is proving to be detrimental to the American ash population as a whole. It was introduced to the United States from Asia through international trade, and now the invasive pest targets both healthy and unhealthy Fraxinus ash trees, feeds on their nutrient channels, and ultimately causes death in less than 3 years with a near 100% kill rate. In just five years after its initial discovery in 2002, EAB had destroyed more than 53 million native ash trees, and has been rapidly spreading throughout New England and as far west as Colorado. With the vast economic and ecological impacts the total destruction of this natural resource could have, researchers are under immense pressure to discover methods of protecting these species. We will study the impact of EAB through informed sampling of impacted populations and perform a genome-wide association study (ddRADSeq). The variants will be investigated with phenotypes assessed by our colleagues (metabolomics, growth traits, and disease resistance) While it is unlikely we can identify full resistance to this high mortality pest, we hope to uncover potential genes in lingering ash that may reduce susceptibility.
Team: Jeremy Bennett, Ava Fritz

Comparative genomics in hardwoods: assembly and annotation of three maples

Sugar maple, a long-lived deciduous hardwood native to northeastern North America, is a dominant species in temperate forests. This shade-tolerant tree is present at low to mid elevations and has a continuous distribution from southern Quebec to the southeastern United States. Sugar maples promote Nitrogen mineralization, reduce nitrate load in groundwater, and generate beneficial nutrients for the soil. They are also an iconic species associated with vibrant Autumn hues. From an economic perspective, sugar maple may be the most valuable member of northeastern temperate forests as the primary source of maple syrup as well as hardwood timber. During the past decades, sugar maple decline increased across the natural range, with notable reductions in northeastern forests. Declining populations are characterized by a loss of crown vigor, dieback of fine branches, reduced radial growth, and associated low regeneration. Inconsistent regional patterns in both managed and unmanaged forests made the source of sugar maple decline elusive despite substantial research. Adaptive genetic variation in relevant genes and phenotypic plasticity are essential for survival in future conditions. Predominantly outcrossing, sugar maple populations are long lived and sessile, making them ideal for associating genetic diversity with environmental metrics. Despite this advantage, genomic resources for hardwood species have been slow to develop. Their evolutionary history contributes to divergent chromosomal architectures and multiple whole-genome duplications and rearrangements. These qualities introduce complexities in the assembly and annotation of reference sequences. Reduced cost and increased availability of long read technologies has allowed for the recent efforts to sequence, assemble, and annotate three Acer species to conduct comparative genomics analysis and further understand their adaptive potential.
Team: Susan McEvoy
Collaboration with: Nathan Swenson (UMD)

Development and use of genomic tools to improve firs for use as Christmas trees

NGS technologies are being employed to accelerate the development and use of genetic information to improve firs for use as Christmas trees, an important specialty crop. The primary focus is improving post-harvest needle retention. A two-step process will be used to identify single nucleotide polymorphic markers (SNPs) with predictive power: 1) candidate genes will be identified via RNA sequencing and 2) SNPs in candidate genes will be screened for association with phenotypes by targeted sequencing of genomic DNA. Additional traits of interest also include the examination of partial resistance to Phytophthora in Trojan fir.
Lab Team: Alex Trouern-Trend, Alyssa Ferreira
Collaboration with North Carolina State University: John Frampton, Ross Whetten and Lilian Matallana

Not so evergreen: investigating leaf senescence in a deciduous conifer

Eastern larch (or tamarack) is one species among a handful of deciduous conifers. There are only 19 gymnosperms that undergo Autumn leaf senescence, and most belong to the Larix (larch) genus. While the majority of conifers shed their needles at varying intervals over their lifespan (4-5 years), tamaracks behave like many deciduous angiosperms, displaying a deep yellow hue in early to mid Autumn and fully losing all of their needles by winter. Leaf (or needle) senescence is considered the final stage in leaf development, and is associated with cellular death. This process is highly coordinated as there are parallel changes in metabolism and cell structure. A better understanding of Autumn leaf senescence in gymnosperms can provide further insight on their evolutionary history. The abscission zone (tissue providing the attachment of the needle to the stem) was sampled and sequenced for a replicated timecourse study to identify gene expression changes through seasonal senescence. We are currently investigating the pathways and genes that are well conserved with broad leaf angiosperms and those that are unique to conifers.
Lab Team:Olivia Maher

Genetic diversity of Armenian grape varieties (modern and ancient)

We are implementing low coverage sequencing to assess the genomes of 50 modern wild and neglected grape varieties across Armenia via RAD sequencing (RADseq) to generate preliminary genetic diversity data, to: 1) produce information on the phylogenetic links between neglected and wild varieties; 2) understand adaptive plasticity to changing climatic conditions; 3) inform efforts to conserve grape genetic diversity within Armenia.
Lab Team: Madison Caballero
Collaboration with: Nelli Hovhannisyan (YSU), Alexia Smith (Department of Anthropology, UConn), Rachel O'Neill (Department of MCB, UConn)

Associative transcriptomics and metagenomics to evaluate adaptation to acid rain in two hardwood species

Understanding the population genetic structure, and gene expression patterns as it relates to different soil conditions can predict future trajectories of forest composition. No genetic studies have been carried out on the trees in the long-term ecological monitoring site, Hubbard Brook Experimental Forest (HBEF) in New Hampshire. Monitoring of growth performance in the field has revealed that sugar maple is on the decline. On the other hand, American beech is performing well in exacerbated cation depleted soils. Controlled field experiments have examined the effects of Ca and Al treatments when applied through the soil. Dominant sugar maple trees remained unaffected but non-dominant trees responded positively to Ca amendment. Transcriptomics of the plant tissues and metagenomics of the associated soil microbial/fungal communities are underway to build a more complete picture of forest response.
Lab Team: Alex Trouern-Trend
Collaboration with: Uzay Sezen and Paul Schaberg (US Forest Service)

Towards genomic breeding in forest trees

Intensively managed pine plantations are the major source of wood, fiber, and biomass for bio-based energy. Loblolly pine is the most economically important timber species in the US. The species has been established on 30 million plantation acres. Southern pine plantations produces about 16% of the global wood supply. To meet the increasing demand for forest products from decreasing land, tree breeders need to introduce fast-growing forest trees with higher yield that require fewer inputs, are resistant to diseases, and are adaptable to environmental change. Worldwide, forest ecosystems play a critical role in protecting land and water resources, preserving biodiversity, and mitigating the rising levels of CO2 that contribute to climate change. Recent completion of the reference genome for loblolly pine (v2.01) coupled with tremendous resequencing resources in large breeding populations, provides a foundation for developing genotyping resources to implement genomic selection. A moderate density SNP assay was developed from the available genomic resources, including GBS and exome capture from the PineMAP project. Extensive bioinformatics analysis and strict criteria for selection will be necessary to determine the final selections for this assay.
Lab Team: Madison Caballero
Collaboration with North Carolina State University: Fikret Isik, Juan Acosta, Andrew Eckert (VCU), and Richard Sniezko (USFS)

Active Research

TreeGenes Database TreeGenes provides custom informatics tools and databases to manage the multitude of information resulting from high-throughput genomics projects in forest trees from sample collection to downstream analysis. This resource is further enhanced with systems that are well connected with federated databases, automated data flows, machine learning analysis, standardized annotations and quality control processes. The supporting TreeGenes database contains several curated modules that support the storage of data and provide the foundation for web-based searches and visualization tools. Annotated transcriptomic studies resulting from next-generation sequencing are now available for several forest tree species. The variety of outputs available through customizable search interfaces allows users to perform high-resolution dissection of traits and relate molecular diversity to functional variation. Recent development has focused on web services to connect geo-referenced individuals with important ecological and trait databases in the form of a new utility known as CartograTree. The combined resources of TreeGenes serves as a powerful knowledge environment for genotype-phenotype information resulting from a multitude of large-scale genomics projects.

TreeGenes Database
Lab Team: Emily Grau, Sean Buehler, Peter Richter, Risharde Ramnath, Charlie Demurjian, Sumaira Zaman Collaboration with: Nic Herndon (ECU), Meg Staton (Department of Entomology and Plant Pathology, UTenn)
Current Funding: USDA NIFA FACT 2019-67021-29920

Publications:
Spoor, S., Cheng, C. H., Sanderson, L. A., Condon, B., Almsaeed, A., Chen, M., ... & Bett, K. (2019). Tripal v3: an ontology-based toolkit for construction of FAIR biological community databases. Database, 2019.

Wegrzyn, J. L., Staton, M., Street, N., Main, D., Grau, E., Herndon, N., ... & Richter, P. (2019). Cyberinfrastructure to improve forest health and productivity: the role of tree databases in connecting genomes, phenomes, and the environment. Frontiers in Plant Science, 10, 813.

Falk, T., Herndon, N., Grau, E., Buehler, S., Richter, P., Zaman, S., ... & Wegrzyn J.L. (2018). Growing and cultivating the forest genomics database, TreeGenes. Database, 2018.

Harper, L., Campbell, J., Cannon, E. K., Jung, S., Poelchau, M., Walls, R., ... & Cannon, S. (2018). AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture. Database, 2018.

Wegrzyn, J. L., Main, D., Figueroa, B., Choi, M., Yu, J., Neale, D. B., ... & Ficklin, S. (2012). Uniform standards for genome databases in forest and fruit trees. Tree genetics & genomes, 8(3), 549-557.

Active Research

EnTAP: Eukaryotic Non-Model Transcriptome Annotation Pipeline

EnTAP (Eukaryotic Non-Model Transcriptome Annotation Pipeline) was designed to improve the accuracy, speed, and flexibility of functional gene annotation for de novo assembled transcriptomes in non-model eukaryotes. This software package addresses the fragmentation and related assembly issues that result in inflated transcript estimates and poor annotation rates. Following filters applied through assessment of true expression and frame selection, open-source tools are leveraged to functionally annotate the translated proteins. Downstream features include fast similarity search across three repositories, protein domain assignment, orthologous gene family assessment, and Gene Ontology term assignment. The final annotation integrates across multiple databases and selects an optimal assignment from a combination of weighted metrics describing similarity search score, taxonomic relationship, and informativeness. Researchers have the option to include additional filters to identify and remove contaminants, identify pathways, and prepare the transcripts for enrichment analysis. This fully featured pipeline is easy to install, configure, and runs much faster than comparable functional annotation packages. It is developed to contend with many of the issues in existing software solutions.

Software Access
Software Documentation
Lab Team: Alex Hart

GFACS: Filtering, Analysis, and Conversion to Unify Genome Annotations

Published genome annotations are filled with erroneous gene models that represent issues associated with frame, start side identification, splice sites, and related structural features. The source of these inconsistencies can often be traced to translated text file formats designed to describe long read alignments and predicted gene structures. The majority of gene prediction frameworks do not provide downstream filtering to remove problematic gene annotations, nor do they represent these annotations in a format consistent with current file standards. In addition, these frameworks lack consideration for functional attributes, such as the presence or absence of protein domains which can be used for gene model validation. gFACs operates across a wide range of alignment, analysis, and gene prediction software inputs with a flexible framework for defining gene models with reliable structural and functional attributes. gFACs supports common downstream applications, including genome browsers and generates extensive details on the filtering process, including distributions that can be visualized to further assess the proposed gene space.

Software Access
Software Documentation
Lab Team: Madison Caballero, Susan McEvoy

EASEL: Efficient, Accurate, Scalable Eukaryotic modeLs

A high-quality annotation and associated genome assembly are necessary to identify locations of co-expressed genes, the proximity of a variant with a phenotypic correlation, or regions associated with epigenetic control. Variation in the number of genes and their structure provides a framework for examining morphological, physiological, and behavioral traits. The observed variation in gene families may represent diversity resulting from gene family evolution; however, it may also be incorrectly inferred from assembly and annotation artifacts. In the era of high-throughput sequencing, the size and complexity of the genomes attempted in recent years has dramatically increased. Despite this, over 91% of these genomes contain gene annotation errors. Common errors associated with genome annotation include: gene assignments to retroelements, conflicting models, frame inconsistencies, fragmented models, contaminate gene models, and structural errors related to exon/intron lengths and splice sites. Despite recent informatic advancements in sequence assembly technologies, robust frameworks for gene annotation are lacking. Development of an independent framework that can respond to diverse sets of inputs, integrate structural and functional evaluation, implement advanced machine learning techniques, and properly format the outputs for community adopted standards, is critical. In addition to developing a scalable and accurate approach, distribution of the framework in a format accessible to biologists is necessary.
Lab Team: Jeremy Bennett, Peter Richter