What is a signature?

Cells are treated with a pharmacological or genetic perturbagen and the consequence of this treatment is assayed by measuring the abundance of ~1,000 transcripts using the L1000 assay. The data are then normalized and compared to a reference profile (e.g vehicles like DMSO, control vectors like LacZ etc or the population of perturbagens on the same plate). The result is a list of all ~1000 genes along with the extent of their differential expression (zscore). This list is termed a signature. Note that some analysis use a subset of the differential expressed genes e.g the top 50 most differentially expressed genes or a signature gene set .

What data is available?

This website provides access to data from several LINCS-related efforts and Broad Institute Connectivity Map efforts.

In particular, it serves as a repository and analytical environment for L1000 datasets from LINCS and from other Broad Institute projects (CMap, CDRP, Therapeutics, Genomic Perturbagen profiling etc).

For LINCS, in addition to L1000 datasets, this website serves proteomic datasets generated by Jake Jaffe (LINCS U01).

While the data and tools were funded through multiple mechanisms we believe the scientific community is best served if all of the data is available through one integrated resource. Thus our efforts to consolidate all the perturbational data into this website and the lincscloud.org analytical environment.

How do I access data?

Our goal is to make data analyzable via a web interface with data remaining on the cloud. This is because we anticipate that the best way to access and interact with the latest versions of the datasets will be via an integrated environment that co-locates data, tools and is accessible programmatically and via web apps, rather than having to download large files. We are working to develop and broadly release such an environment.

In the interim, you can download the data and work with it yourself. For this purpose, we provide downloadable versions of all forms of the data (raw scans, normalized matrices and signatures) in the form of files and data matrices.

What are some of the challenges in using the data?

Its important to stress that while our data generation processes are mature, the analytical aspects are rapidly improving but still present many challenges.

Indeed, it is to involve the community in developing solutions that we are making the data available pre-publication.

While there are many important challenges to address in data pre-processing (i.e better normalization methods, batch effect corrections etc), our sense is that the key challenge is one of interpretation. With over 1M profiles in the database, any query is going to return thousands of equivalently correlation signatures. Yet, experimental follow-up is hard and typically only a few hypothesis can be tested in the lab.

Hence, there is a need to develop methods that prioritize results by methods including (a) integration with other data types e.g protein-protein interaction data (b) pathways (c) methods to adjust for statistical biases and multiple-hypothesis correction

While we are working on our solutions to these challenges, we make available the data so that you might also work on these on your own.

I work in a for-profit company, how do I access the resources?

Please contact us: lincs@broadinstitute.org

What are data access keys and why are they needed?

There are two main reasons: (1) The size and hierarchical structure of datasets make it technically difficult to post the data as links downloadable via http or ftp. Furthermore, with multiple users serving the data from a single website will cause network bottlenecks (2) lincscloud.org is an integrated resource of data, tools, api’s, webapps and elastic compute that we have developed. Once authorized by LINCS, users should be able to easily access datasets and applications without having to maintain multiple credentials.

Data APIs and access keys are commonly used by data-intensive platforms such as Twitter and Google toolkits. There are several GUI and command-line utilities that make usage quite simple. Thus, the resources we’ve built on lincscloud.org use a credentialing system that allows authenticated access to human users (i.e via standard userid/password) as well as programmatic access (e.g via Python, JavaScript etc).

Is this a LINCS resource or a CMap resource?


While the data and tools were funded through multiple mechanisms, including LINCS, we believe the scientific community is best served if all of the data is available through one integrated resource. Thus our efforts to consolidate all the perturbational data into this website and the lincscloud.org analytical environment.

What are your future plans?

  1. Address the challenge of how to make LINCS data useful to researchers so that it can help accelerate biomedical discoveryFor this, we are developing webapps that allow bench researchers to query the data via a simple user interface.It is also important to bring advanced computational methods to bear on the challenge. To enable this we are creating a cloud-based compute and data access mechanism where researchers can quickly bring their ideas and algorithms to the dataset without the need for laborious downloads or pre-processing.
  2. Improve the datasetWe are working to enhance the dataset by creating a comprehensive reference library of pharmacologicals, profiling the knock-down and over-expression of all genes in the genome (using CRISPRs, shRNAs and cDNAs) and employing a wide diversity of cell types (cancer lines, primary cells, iPSC-derived cells etc) in these experiments.