Gene expression is highly correlated. We take advantage of this high-degree of connection to reduce the number of measurements needed to generate meaningful gene expression data for the approximately 20,000 genes in the human genome. CMap relies on a carefully chosen set of 1,000 genes we call the Landmark genes to inferred the activity of the other 19,000 genes.
If the landmark genes are carefully chosen, their representation will not be biased for a particular cellular model. We selected the landmark genes by analyzing the gene expression profiles of cells collected from normal tissue and primary tumor. The genes selected are: 1) minimally redundant, 2) widely expressed in different cellular context and 3) highly connected in our inference models.
By analyzing several query:result pairs (* in graph) from well-known published and unpublished CMap connections, we determined that 1,000 genes can capture approximately 80% of the information. The table below shows examples of Landmark genes used in the L1000 assay.
|L1000 ID||Symbol||Entrez ID||Description|
|YYYE11F11||PSME1||5720||proteasome(prosome,macropain) activator subunit 1 (PA28 alpha)|
|BBBBC1D1||CISD1||55847||CDGSH iron sulfur domain 1|
|BBBBE1F1||SPDEF||25803||SAM pointed domain containing ets transcription factor|
|QQQA8B8||ATF1||466||activating transcription factor 1|
|ZZZC4D4||RHEB||6009||Ras homolog enriched in brain|
|BBBBC2D2||IGF1R||3480||insulin-like growth factor 1 receptor|
|TTTE12F12||FOXO3||2309||forkhead box 03|
|BBBBG2H2||GSTM2||2946||glutathione S-transferase mu 2 (muscle)|
|ZZZE4F4||RHOA||387||ras homolog gene family, member A|
|VVVE5F5||IL1B||3553||interleukin 1, beta|
|QQQG7H7||ASAH1||427||N-acylsphingosine amidohydrolase (acid ceramidase) 1|
|ZZZC2D2||RALA||5898||v-ral simian leukemia viral oncogene homolog A (ras related)|
|QQQG6H6||ARHGEF12||23365||Rho guanine nucleotide exchange factor (GEF) 12|
|AAAAC2D2||SOX2||6657||SRY (sex determining region Y)-box 2|
|ZZZC10D10||SERPINE1||5054||serpine pepsidase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1), member 1|
|UUUG8H8||HLA-DMA||3108||major histocompatability complex, class II, DM alpha|
|SSSG12H12||EGF||1950||epidermal growth factor|
|BBBBC5D5||SPTLC2||9517||serine palmitoyltransferase, long chain base subunit 2|
|QQQG5H5||APP||351||amyloid beta (A4) precursor protein|
|BBBBG5H5||TSKU||25987||tsukuski small leucine rich proteoglycan homolog (Xenopus laevis)|
Download the full list of landmark genes.