1) Building integrative CNS networks for genomic analysis of autism

  • PI: Kasper Lage
  • Co-PIs: Daniel Geschwind & Alexis Battle
  • Funding: Collaborative R01 National Institutes of Mental Health

Large-scale genomic investigations have begun to illuminate the genetic contributions to major psychiatric illnesses. In autism spectrum disorder (ASD), rare large effect variants provide novel causal anchors to understand its neurobiological basis, and to understand convergence and divergence in disease mechanisms. These findings, not unlike other psychiatric disorders, also emphasize extreme genetic heterogeneity. We and others have shown that gene and protein networks provide an organizing framework for understanding heterogeneous psychiatric disease genetic risk in a unified biological context. The emergence of large-scale genomic and epigenetic data from human brain, and growing knowledge of genetic variation involved in ASD and other psychiatric diseases, coupled with advances in methodology, provides an unprecedented opportunity for comprehensive integrative analyses. We bring together a collaborative group of investigators, each with distinct areas of expertise critical for understanding psychiatric diseases, that have not been combined before, to develop a framework for integrative genomic network analysis.

The work proposed here is an ambitious multi-PI project (UCLA/UW, MGH/Harvard, and Johns Hopkins) that brings together 3 principal investigators and collaborators with a track record in approaches necessary to perform this work using state of the art and novel methodologies. Through close collaboration we aim to develop a comprehensive framework and test optimal methods for integration of gene expression and protein-protein interaction (PPI) networks in the brain with genetic and epigenetic data – networks that will be iteratively refined using experimental data. We will construct networks representing multiple brain regions in adulthood and development through rigorous combination of multiple transcriptomic data sets from ASD and control brain, developing and validating methods for integration of splicing and expression levels within gene networks. These networks will be refined to inform tissue specific PPI inference, validated via experimental tissue-specific PPI. We will further identify causal drivers by integration of genetic and epigenetic data, identifying QTL effects on RNA, splicing and protein levels. We will experimentally validate network regulatory predictions for a subset of putative causal drivers prioritizing network hubs and ASD associated genes. In addition, we will use our networks to predict high likelihood risk genes, whose relationship to ASD will be assessed using data from large-scale sequencing in ASD and related psychiatric disease cohorts, as well as our own focused experimental validation via multiplex inversion probe (MIPs) sequencing. Completion of these aims will lead to more valid and comprehensive CNS networks thereby significantly advancing our understanding of ASD associated variants and causal neurobiological pathways. As is our usual practice, our data, networks, and all code will be made freely available in a web-based format to be of maximum utility to the community.

  • Geschwind Lab UCLA: Geschwind Lab
  • Battle Lab Johns Hopkins University: Battle Lab
  • Lage Lab team members: Sandrine Muller,  April Kim, Edyta Malolepsza, Taibo Li

2) Brain networks perturbed by genetics in psychiatric diseases

  • PI: Kasper Lage
  • Co-Investigators: Kevin Eggan, Mark Daly, Steve Carr, Jake Jaffe, Monica Schenone
  • Funding: Stanley Center for Psychiatric Diseases at the Broad Institute

Stanley Center Logo

The recent explosion in genome-wide association studies and exome sequencing projects in psychiatric disorders, have revealed many genes likely to be involved in these debilitating disorders. The Broad Institute’s Stanley Center is a major driver of efforts that are resulting in exciting glimpses of molecular pathways emerging from the data (e.g., those involved in synaptic plasticity, neurodevelopment, as well as signaling channels and receptors of the brain). While such examples illustrate how genes linked to psychiatric disorders interact at the level of proteins to form networks involved in diverse areas of neurobiology, most of the identified genes do not fall into any well defined cellular pathway and it is now clear that the biology also includes largely uncharted and incomplete networks that are probably unique to the human brain. This is a key bottleneck towards biological insight and therapeutic intervention.

Here, we propose to overcome these challenges through an integrative approach that leverages recent genetic discoveries with large-scale proteomics experiments to derive a human brain network (of physically interacting proteins) perturbed by genetics and targeted by therapeutics in psychiatric disorders. This network will serve as an accelerator of functional insight from current and future psychiatric genetics data and it sits at the infliction point of transformative technology and data that have just become mature: First, we will capitalize on new unbiased genetic data to choose corresponding proteins (termed “index proteins” hereafter) as the starting point of the network analysis. Second, we will exploit new proteomics technology developed at the Broad Institute to map the tissue-specific quantitative interaction networks of these index proteins at high resolution. We suggest to also map the networks of proteins physically interacting with targets of drugs currently being used in the clinic and to identify systematic overlaps and links between the “therapy” network and the “disease” network, which will lead to an increased understanding of the mechanism of action of current drugs. We believe that this network will be of broad value to interpret current and future studies in psychiatric genetics, and that it will immediately contribute to guiding therapeutic insight and intervention. Third, the proteomics experiments will be coupled to exciting progress in our ability to generate human neurons from induced pluripotent stem cells, so that the interactions of index proteins are derived in a biologically meaningful cellular (and human) context. Moreover, it is an important aspect of this application that we will establish a robust statistical methodology, which is currently lacking, for integrative analyses of experimental proteomics networks and genetic data that can be a model for others to use in any area of genetics in the future.

We will use the recent genome-wide association study of schizophrenia to define the index proteins from which the network will be derived as it is the largest (and arguably most robust) genetic analysis in any neuropsychiatric disorder, and it converges on many of the biological processes – and one of the drug targets – that have previously been implicated in this group of diseases. Therefore, we believe that insights on networks derived mainly from the schizophrenia study will be the most statistically and functionally robust while simultaneously having broad impact by informing many different psychiatric disorders.

Overall, the goal of this project is to leverage the genetic analyses that have been driven and executed by the Stanley Centers investigators and their collaborators to map the brain-specific cellular networks perturbed by genetics and targeted by therapeutics in psychiatric disorders. This will catalyze biological insight and inform future therapeutic opportunities.









  • Stanley Center for Psychiatric Research: Stanley Center at the Broad Institute
  • Eggan Lab: Eggan Lab
  • Broad Institute Proteomics Platform: Proteomics Platform
  • Lage Lab team members: April Kim, Edyta Malolepsza, Taibo Li
  • Eggan Lab team members: Kevin Eggan, Eugene Nacu, Kiki Lilliehook, Billy Chrystal
  • Proteomics Platform team members: Jake Jaffe, Monica Schenone, Benjamin Tanenbaum 

3) Broad Institute Web Platform for Genome Networks – GeNets

GeNets is a web platform for analyzing and interpreting genetic variation in the context of gene-gene networks (e.g., based on protein-protein interactions [InWeb], cancer synthetic lethality relationships [AchillesNet], similarities in cell perturbation profiles [LINCSNet], phylogenetic profiles [CLIMENet] and gene expression correlations from Gene Expression Omnibus [GEONet]).  GeNets includes a pathway discovery algorithm [Quack] that can be trained on any network to find nontrivial pathway relationships between gene sets emerging from genetic analyses. It also includes quantitative methods to compare the biological signal between different networks, and to determine which pathways have good coverage and signal in any network dataset. This enables users to tailor the choice of biological network to biological problem and genetic dataset they are interested in.

The GeNets project has won a best poster award at the 2014 Annual Retreat of the Broad Institute, a best poster award at the 2015 Annual Meeting of the Program in Computational Biology and Bioinformatics of the Broad Institute. It was featured as an invited Plenum Talk of the 2016 Annual Retreat of the Broad Institute and at an invited Platform Talk at the 2015 Annual Meeting of the American Society of Human Genetics.


  • Access GeNets here: GeNets
  • About GeNets and user guide: User guide
  • About Quack: Quack
  • About InWeb: InWeb
  • About AchillesNet, CLIMENet, GEONet and LINCSNet: GeNets Networks
  • GeNets presentation at the 2015 Broad Retreat: Broad Retreat
  • GeNets presentation at the 2015 Meeting of the American Society of Human Genetics: ASHG2015
  • Lage Lab team members: John Mercer, Taibo Li & April Kim
  • Broad Institute Knowledge, Design and User Experience (KDUX) team: Andrew Zimmer, Liraz Greenfeld & David An
  • Funding: SPARC Grant Broad Institute , Junior Faculty Development Award Harvard Medical School, Eleanor and Miles Shore Fellowship Award, Stanley Center for Psychiatric Research

4) Protein networks perturbed by genetics in myocardial infarction

  • PI: Kasper Lage
  • Co-Investigators: Sekar Kathiresan, Steve Carr, Jake Jaffe, Monica Schenone
  • Funding: Broad Institute Broadnext10 Committee

The recent explosion in genome-wide association studies and exome sequencing projects in early onset myocardial infarction (MI) and coronary heart disease, have revealed many genes likely to be involved in these debilitating disorders. Led by Sekar Kathiresan, David Altshuler, Mark Daly and Ben Neale amongst others, the Broad Institute’s Program in Medical and Population Genetics is a major driver of efforts that are resulting in exciting glimpses of biology emerging from the data. A subset of the genes and loci discovered in this way (<25%) encode proteins that assemble into molecular networks that regulate blood lipid levels (i.e., low density lipoprotein [LDL], high density lipoprotein [HDL], and triglycerides [TG]) in the human liver – a known set of biological pathways linked to causal risk for MI. While these example illustrates how some genes linked to MI interact at the level of proteins to form tissue-specific networks involved in blood lipid biology (particularly LDL and HDL), our understanding of the networks involved in TG biology remains limited. More importantly, the majority of loci and genes (~75%) do not affect MI risk through blood lipid levels and it is becoming clear that the biology critical to MI also includes largely uncharted and incomplete networks of unknown biological function and tissue-specificity. Our lack of a complete understanding of TG and non-lipid-related biology underlying MI risk is a key bottleneck towards biological insight and therapeutic intervention. Understanding the non-lipid biology of MI is considered key to unlocking new therapeutic paradigms in the disease.

The hypothesis underlying this proposal is i) that the genes involved in TG biology will interact in liver-specific networks (at the level of proteins) that remain to be mapped and characterized and ii) that the 75% of genes and loci linked to risk for MI that are not currently linked to blood lipid biology assemble into MI networks of unknown composition, tissue-specificity and biology. To test these hypotheses, we here propose an integrative genetic and proteomics approach to derive tissue-specific networks (of physically interacting proteins) perturbed by genetics in MI. These networks will expand our current knowledge of blood lipid biology. More importantly, our approach will point to new biological paradigms in MI that are not related to blood lipid levels. The proposal sits at the inflection point of transformative technology and data that have just become mature:

First, we will capitalize on newly published unbiased genetic data from the Broad to choose corresponding proteins (termed “index proteins” throughout the proposal) as the starting point of the network analyses. Second, we will exploit new proteomics technology developed at the Broad Institute to map the tissue-specific quantitative interaction networks of these index proteins and thoroughly validate the resulting network through statistical genetics, replication genotyping, and follow-up experiments. The networks will contribute to identifying new potential drug targets, and inform new therapeutic paradigms. Having these data will also increase the power to genetically stratify patients and hereby predict response to specific therapeutics. Third, not only will the proteomics experiments be executed in a relevant human tissue to derive tissue-specific protein networks involved in MI biology, we will establish approaches that combine genetics, expression quantitative trait loci, and epigenetic data with the protein-protein interaction networks to fine map the human tissues relevant for non-blood-lipid MI biology. Working in human cells is particularly relevant for MI because, this disease is poorly modeled in nonhuman species.

We believe that the analyses outlined here will be of broad value to interpret current and future studies in MI and that they will immediately contribute to guiding therapeutic insight and intervention as well as risk prediction based on genetics.


  • Lage Lab team members: April Kim, Edyta Malolepsza, Taibo Li, Anna Sappington
  • Kathiresan Lab team members: Sekar Kathiresan, Martin Qiuyu Zhu, Raj Gupta, Andy R. Snyder
  • Proteomics Platform team members: Steve Carr, Jake Jaffe, Monica Schenone, Benjamin Tanenbaum 

5) Cancer Complex Compendium – CanComSquared

  • PI: Kasper Lage
  • Co-Investigators: Steve Carr, Jake Jaffe, Monica Schenone, Ben Ebert, Matthew Meyerson, Bill Hahn, Cigall Kadoch
  • Funding: Broad Institute Broadnext10 Committee

The Broad Institute has played a significant role in the recent revolution in cancer sequencing studies, which have revealed many cancer genes that point to new tumor biology of diagnostic and therapeutic value. Despite the resounding success of these studies it remains unclear how most of cancer genes integrate into tissue-specific cellular networks to drive cancer progression as cohesive biological modules, to which extent such networks are quantitatively rewired by patient-derived alleles, and how to optimally kill cancers through combination therapies informed by the current catalog of cancer genes. To successfully provide cures for cancer we must overcome these challenges.

It is the hypothesis of this project that the many hundreds of genes linked significantly to cancer integrate into a much smaller number of transient or stable networks of physically interacting proteins – also known as protein complexes – that are quantitatively rewired in a cell-type specific manner by somatic driver mutations. We believe that we must operate at the level of protein complexes to understand and effectively treat cancers, because these complexes– rather than individual proteins – are most likely the true effectors of cellular functions. Therefore, characterizing the protein complexes in which candidate cancer genes and alleles participate is more effective and scalable than characterizing new genes in isolation, and provides a basis for establishing logical groupings of cellular functions that may become dysregulated in cancer etiology.

To overcome the challenges mentioned above we propose to create the first high-quality compendium of the quantitative molecular protein complexes involved in human cancers with allele and cell-type specificity to fill in the blanks of our current knowledge of the networks of cancer gene products. This resource, which we call the Broad Institute Cancer Complex Compendium (CanCom2), will augment the biological and genetic discoveries from current and future cancer sequencing data, inform therapeutic opportunities, and complement and enable several Broad Institute flagship projects (e.g., Project Achilles and the Target Accelerator).

This project is a cross-disciplinary collaboration between the Lage Lab (genetics, computational analysis of cancer complexes), the Broad Institute Proteomics Platform (quantitative interaction proteomics), and leading cancer domain experts Benjamin Ebert, Matthew Meyerson, Bill Hahn, and Cigall Kadoch (affiliated with the Broad Institute Cancer Program, Dana Farber Cancer Institute, Brigham and Women’s Hospital, and Harvard Medical School).


  • Lage Lab team members: Edyta Malolepsza (team leader), April Kim, Taibo Li, Jakob Berg Jespersen, Heiko Horn, Justin Lim
  • Meyerson Lab: Matthew Meyerson, Alice Berger, Iris Fung, Srinivas Viswanathan, Peter Choi, Joshua Pan
  • Kadoch Lab: Cigall Kadoch, Steven Poynter
  • Hahn Lab: Bill Hahn, Joseph Rosenbluh, Andrew Aguirre, Katherine Walsh, Rita Sulahian
  • Ebert lab team members: Ben Ebert, Zuzana Tothova, Siddhartha Jaiswal, Ellen Beauchamp, Josephine Kahn, Alexander Silver, Roger Belizaire, Brian Liddicoat, Sebastian Koochaki, Esther Obeng
  • Proteomics Platform team members: Steve Carr, Monica Schenone, Jake Jaffe, Christina Hartigan, Benjamin Tanenbaum, Gaelen Guzman

6) A human protein-protein interaction network to catalyze genomic interpretation

Human protein-protein interaction networks are critical to understanding cell biology and interpreting genetic and genomic data, but are challenging to produce in individual large-scale experiments. We have for a number of years been developing a data integration and quality control framework to provide a scored human protein-protein interaction network (InWeb_IM). When comparing the most recent update of this network with other similar resources InWeb_IM has 2.8 times more interactions (~585K) and a superior functional signal showing that the added interactions reflect real cellular biology. InWeb_IM is a versatile resource for accurate and cost-efficient functional interpretation of massive genomic datasets which we have illustrated for example by annotating candidate genes from >4,700 cancer genomes and genes involved in neuropsychiatric diseases.




7) Interpreting population-scale genetic variation using protein structures and features

With the advent of increasing population scale genomic datasets (e.g., from the 1000 Genomes Project and the Exome Aggregation Consortium) we now have access to large datasets of genomic variation across human populations. However, functionally interpreting these data is non trivial. This project aims to analyze and interpret genetic variation in the context of protein structures and features (e.g., protein domains, protein-protein interactions, post-translational modification sites). It is also the aim of the project to develop pipelines and code that enables these analyses to be incorporated into clinical sequencing workflows with collaborators.

The project has won a best poster award at the Third European International Society for Computational Biology (ISCB) Student Council Symposium 2014 Strasbourg and was chosen for a talk by the Program Committee of the 2016 EMBO Workshop ‘Integrating Genomics and Biophysics to Comprehend Functional Genetic variation’.

  • 1000 Genomes Project: 1KG
  • Exome Aggregation Consortium: ExAC
  • More information about best poster award Jakob Jespersen: ISCB
  • About EMBO workshop ‘Integrating Genomics & Biophysics to Comprehend Functional Genetic Variation’: EMBO
  • Team: Jakob Jespersen
  • Funding: Lundbeck Foundation PhD scholarship (Jakob Jespersen), American Cancer Society Institutional Research Grant (MGH), International Society of Computational Biology Student Council

8) Network mutation burden analysis of 4,700 cancer genomes identifies new oncogenes

A large number of patients is required to find cancer driver genes with intermediate (2-20%) or low (<2%) mutational frequencies, suggesting that specific genes or networks of biological and therapeutic value may currently be obscured. In this project it is our hypothesis that is will be possible to use molecular network information to improve signal in underpowered cancer genome datasets and to integrate these predictions with a functional validation assay to systematically predict and validate hidden driver genes in existing tumor genomes. To test this hypothesis, we formalize a classifier that accurately calculates the significance level of mutations in a gene’s network (NetSig) by combining functional genomics networks and somatic mutations from 4,742 cancer genomes spanning 21 tumor types. We validated that the NetSig approach indeed identifies known cancer genes and recently proposed driver genes in the majority of these tumor types. We predict 62 new putative cancer genes including 35 with clear connection to cancer and 27 genes, which may suggest new cancer biology. We experimentally tested 33 of these predictions in vivo in multiplexed tumorigenesis assays (TumorPlex) followed by reanalysis of tumor genomes stratified to only include patients without known driver mutations thus validating 11 of our predictions as new oncogenic driver genes. Overall, NetSig identifies proportionally more (4x) low-frequency driver genes than existing gene-based methods and our integrated computational and experimental pipeline exemplifies how to identify molecular clues in patients with no established driver mutations using existing sequencing data. The general and integrated computational and experimental framework we present here is scalable to the rapid production of sequencing data and should become increasingly useful as the numbers of cancer genomes continues to expand.

This project is a collaboration between the Lage Lab, the Getz Lab (Broad Institute, MGH Department of Pathology), and the Boehm Lab (Broad Institute) and the Target Accelerator (Broad Institute).


  • Getz Lab: GetzLab
  • Boehm Lab: BoehmLab
  • Lage Lab team members: Heiko Horn, Anika Gupta, Jessica Xin Hu (former lab member), Alireza Kashani (former lab member)
  • Funding: Fund for Medical Discovery Grant from the Executive Committee on Research (MGH) & American Cancer Society Institutional Research Grant (MGH), P01 from National Institutes of Child Health and Human Development (NICHD)