From SemPoD

Jump to: navigation, search


About SemPoD

One of the primary challenges in translational research data management is breaking down the barriers between the multiple data silos and the integration of 'omics data with clinical information to complete the cycle from the bench to the bedside.

The role of contextual metadata, also called provenance information, is a key factor ineffective data integration, reproducibility of results, correct attribution of original source, and answering research queries involving “What”, “Where”, “When”, “Which”, “Who”, “How”, and “Why” (also known as the W7 model).

We introduce an ontology-driven, intuitive Semantic Proteomics Dashboard (SemPoD) that uses provenance together with domain information (semantic provenance) to enable researchers to query, compare, and correlate data across multiple projects, multiple types of data, and integrate with legacy data to support their current research.

The SemPoD platform, currently in use at the Case Center for Proteomics and Bioinformatics (CPB), consists of the components: Ontology-driven Visual Query Composer, Result Explorer, and Query Manager shown in Figure 1 below. Currently, SemPoD allows provenance–aware querying of 1153 mass-spectrometry experiments from 20 different projects.

Figure4.jpg

Figure 1: Semantic Proteomics Dashboard

SemPoD uses the systems molecular biology provenance ontology (SysPro) to support dynamic query composition interface, which automatically updates the query interface based on previous user selections and efficientlyprunes the result set usinga “smart filtering” approach. T he SysPro ontology re-uses terms from the PROV-ontology (PROV-O) being developed by the World Wide Web Consortium (W3C) provenance working group, the minimum information required for reporting a molecular interaction experiment (MIMIx), and the minimum information about a proteomics experiment (MIAPE). The SemPoD was evaluated in terms of user feedback as well as scalability over increasing size of data and complex user queries.

Figure1.jpg

Figure 2: SemPoD Architecture

SemPoD is an intuitive and powerful provenance ontology-driven data access and query platform to create an integrated view over large-scale systems molecular biology datasets. SemPoD can be deployed over many existing database applications storing ‘omics data, for instance, the LabKey data-management system as shown in Figure 2.

The initial user feedback evaluating the usability and functionality of SemPoD has been very positive and it is being considered for wider deployment beyond the proteomics domain, and in other ‘omics’ centers.

Methods

We use two principal proteomics workflows as exemplars to describe the design and implementation of SemPoD, namely:

  1. The first workflow is affinity-purification mass-spectrometry (AP-MS) workflow that enables the identification of specific protein complexes, thus identifying proteins that are associated with one another.
  2. The second workflow is the shotgun expression proteomics that identifies and quantifies proteins in an unbiased manner from cells or tissues of interest. Together, these two workflows account for approximately 50% of all experiments performed in the CPB and have been used in approximately 20 separate projects, generating over 3Terabytes (TB) of data.