For 2021, we present:
CAMDA encourages an open contest, where all analyses of the contest data sets are of interest, not limited to the questions suggested here. There is an
online forum
for the free discussion of the contest data sets and their analysis, in which you are encouraged to participate.
We look forward to a lively contest!
Given the recent events of COVID-19 pandemic we expect a steadily increasing flood of data from cells or patients infected with SARS-CoV-2 in the upcoming years. Incidents are still rising world-wide despite current vaccination campaigns, and several initiatives are arising worldwide to collect data systematically. These studies have great value by themselves. Moreover, a joint effort of all the research institutions participating in the Disease Maps initiative has allowed the compilation of a COVID-19 mechanistic map that captures our current knowledge of the disease. This permits first models of the cellular response to infection from a mechanistic perspective. Mechanistic pathway models can then provide a causal bridge from variations in gene activity or integrity to consequential changes in phenotype, making these models a useful tool for the identification of deregulated mechanisms and functions in the search for candidate targets or intervention points that might reverse the phenotype or slow down progression of the disease. We challenge participants to expand our mechanistic understanding of COVID-19 and can also test the most promising ideas experimentally.
Possible approaches include (but are not restricted to):
Analysis suggestions:
Check out and download the Disease Maps COVID-19 network in GPML, SBML, SBGN-ML or SIF format.
In order to download a Disease Maps COVID-19 sub-map you would need to scroll in and select the sub-map you are interested in (example: PAMP signalling associated submap), then you can right click on the map and select the option that best suits you (GPML, SBML, SBGN-ML). The Simple Interaction File (SIF) resulting from each map Disease Maps COVID-19 sub-map can be downloaded from here
We provide a collection of recent relevant gene expression profile studies after registration and login. As always in CAMDA challenges, however, you can use any other datasets in addition or instead, including any other data modalities (transcriptomic, genomic, proteomic, GWAS, etc) as long as these are also available to colleagues.
Please sign up to announcements from the CAMDA general forum for alerts.
Please read and accept the data download agreement for access.
Unexpected Drug-Induced Liver Injury (DILI) still is one of the main killers of promising novel drug candidates. It is a clinically significant disease that can lead to severe outcomes such as acute liver failure and even death. It remains one of the primary liabilities in drug development and regulatory clearance due to the limited performance of mandated preclinical models even today. The free text of scientific publications is still the main medium carrying DILI results from clinical practice or experimental studies. The textual data still has to be analysed manually. This process, however, is tedious and prone to human mistakes or omissions, as results are very rarely available in a standardized form or organized form. There is thus great hope that modern techniques from machine learning or natural language processing could provide powerful tools to better process and derive the underlying knowledge within free form texts. The pressing need to faster process potential drug candidates in the current COVID epidemic combined with recent advances in Artificial Intelligence for text processing make this Challenge particularly topical.
We have compiled a large set of PubMed papers relevant to DILI (positives) to be contrasted with a challenging set of unrelated papers (negatives). Both titles and abstracts have been collected. Can you build a classifier using modern AI or NLP techniques to identify the relevant papers?
Together, this thus recreates the problem faced by human experts: After the obvious, easy negatives and positives have been removed by basic algorithms, how can we identify true positives and negatives for the less obvious cases?
The released data should be used for both training and (nested) cross-validation to avoid over-fitting. Participants will then receive independent performance scores from the withheld additional test data.
Considering that the overall prevalence of DILI relevant papers is very low when considering all manuscripts in PubMed, we will also provide another independent performance score where the negative reference set has been expanded considerably to provide an assessment of how well the models can be applied to larger candidate collections that are naturally highly unbalanced.
Data are provided in the form of text tables. Both files contain paper titles and abstracts (where available).
Please sign up to announcements from the CAMDA toxicity forum for alerts.
Please read and accept the data download agreement for access.
We thank the Institute of Advanced Research in Artificial Intelligence (IARAI) for its support in the preparation of this Challenge.
Phages or bacteriophages are the most abundant viruses on planet, infecting bacteria. A systematic characterization of their striking variety, however, has been challenging and became only viable in the age of metagenomics. Recently, the idea of phage therapy has been rediscovered by researchers in academic and pharmaceutical industry as a potential alternative to classical antibiotics in the world of modern medicine. Moreover, a better understanding of phages and information about how they spread their genetic material can be of great value to public health. Rather than fight outbreaks of superbugs as they emerge, the aim would be to prevent of outbreaks or nip them in the bud.
The co-evolution of microbes and their viruses, analyses of correspondences, gene transfer, and other mechanisms of spread are yet to be explored systematically, especially on metagenomic scale. Such knowledge, however, would help in the prediction of anti-microbial resistance events from metagenomic samples collected in strategical monitoring areas. It seems that understanding relations between viruses and their hosts will be critical in that, as anti-microbial resistance can indeed spread through phages. In this CAMDA challenge we thus explore the systematic characterization of metagenomic samples with the aim of finding phages and pro-phages that may be associated with anti-microbial resistance and compile a `resistome'.
We provide a dataset containing
Samples are placed in 124 tar compressed folders with names corresponding to their ID's and AMR class (high or low). Within samples you will find:
Depth of sequencing may vary and we cannot wait to hear what do you think about this property in terms of phage metagenomics!
Data is based on an initial large-scale analysis of anti-microbial resistance of the MetaSUB International Consortium.
Questions of interest in this exploratory study include (but are not limited to):
Analysis suggestions:
The FASTQ files containing raw metagenomics reads of aforementioned samples are made available for the first time with the corresponding metadata and results of MetaSUB AMR analysis.
Please sign up to announcements from the CAMDA metagenomics forum for alerts.
Please read and accept the data download agreement for access.
There is an amazing comprehensive collection of matched genomic, transcriptomic, and epigenomic molecular patient profiles that characterizes the complex changes that occur in cancers. The most prominent data sets are provided by the Genomic Data Commons (GDC, formerly through the TCGA). The main goal of this challenge is to develop and demonstrate novel methods for gaining novel biological insights or improving support for Precision Medicine, as show cased for data from cancer patients. Innovation can build on
This just presents a unique opportunity to examine algorithm performance in a real-world clinical setting! We know that many approaches work well on some data-sets yet not on others. We here challenge you to demonstrate a unified single approach that matches or outperforms the current state-of-the-art for
and for at least one of the less well studied
Please visit and participate in the open CAMDA data integration forum for free discussion related to this contest.
Analysis suggestions:
Biological:
Technical:
Contest data comprises raw and pre-processed data from matched molecular profiles with complementary clinical information.
For convenience, we provide a local copy of the data. In addition, anonymized RNA-seq read level data are now available.
Please sign up to announcements from the CAMDA data integration forum for alerts.
Please read and accept the data download agreement for access.