Gene set correlation enrichment analysis for interpreting and annotating gene expression profiles

林峻宇副教授研究團隊發表研究成果於Nucleic Acids Res



Pathway analysis, including nontopology-based (non-TB) and topology-based (TB) methods, is widely used to interpret the biological phenomena underlying differences in expression data between two phenotypes. By considering dependencies and interactions between genes, TB methods usually perform better than non-TB methods in identifying pathways that include closely relevant or directly causative genes for a given phenotype. However, most TB methods may be limited by incomplete pathway data used as the reference network or by difficulties in selecting appropriate reference networks for different research topics. Here, we propose a gene set correlation enrichment analysis method, Gscore, based on an expression dataset-derived coexpression network to examine whether a differentially expressed gene (DEG) list (or each of its DEGs) is associated with a known gene set. Gscore is better able to identify target pathways in 89 human disease expression datasets than eight other state-of-the-art methods and offers insight into how disease-wide and pathway-wide associations reflect clinical outcomes. When applied to RNA-seq data from COVID-19-related cells and patient samples, Gscore provided a means for studying how DEGs are implicated in COVID-19-related pathways. In summary, Gscore offers a powerful analytical approach for annotating individual DEGs, DEG lists, and genome-wide expression profiles based on existing biological knowledge.