Reach Project

About Reach

Reach stands for Reading and Assembling Contextual and Holistic Mechanisms from Text. Reach is an an open source, information extraction system for the biomedical domain, designed to extract signaling pathway fragments from biomedical publications.

Reach recognizes 17 biochemical interactions (e.g., phosphorylation, complex assembly, catalysis), relevant participants (genes, proteins, simple chemicals, sites, mutations), and contextual information (species, organs, cell types, cell lines). In the latest DARPA program evaluation, Reach obtained the highest overall performance (combined precision and throughput) of all participating systems, with a precision approaching 70%, and a throughput of over 20 thousand interactions extracted from 1,000 papers. The average runtime was less than 5 seconds per paper, when processing was parallelized on a machine with 48 cores.

Reach is a product of the Computational Language Understanding (CLU) Lab at the University of Arizona.

The development of Reach was funded by the DARPA Big Mechanism program under ARO contract W911NF-14-1-0395.

Welcome to Reach

Welcome to the Reach project site. Reach is an open source, natural language processing (NLP) project to read biomedical literature and extract cancer signaling pathways. This site provides resources, code, and tools for biomedical, information extraction, and NLP researchers to learn about and work with Reach.

Reach Services

Currently, the Reach project site provides both interactive and programmatic access to a variety of processing and information services:

Application Programming Interface (API)

The Reach API provides a REST-ful interface to the Reach information extraction engine. API services use standard HTTP GET and POST methods to accept either biomedical text fragments, or entire PubMed documents in NXML form. The biological events extracted from the text are returned in either FRIES Consortion JSON format or MITRE IndexCard JSON format. These formats are different representations of the linguistic, biological, and contextual information derived from the text.

Bio Visualizer

The Bio Visualizer allows the submission of biomedical text and visualizes a variety of data about the extracted entities, events, and their relationships. It also displays the extraction rules used, a visualization of the syntactic relationships in the text, BRAT annotations for both the syntax and the events, and syntactic tree graphs for the input sentences.

NXML Uploader

The NXML Uploader allows a user to interactively submit an open source PubMed document file to the Reach system for processing. The user's NXML document file is uploaded from their local computer via a web browser and JSON results are returned directly to the web browser, where they can be viewed or saved to a file on the user's computer.

Reach Results Explorer

As Reach processes a text, it identifies a sequence of biochemical events and their participating physical entities. In July 2015, Reach extracted 2.685 million interaction events from more than 175,000 cancer-related publications in the PubMed open-source corpus. The Results Explorer is an online application which allows interactive exploration of that result space using a simple agent/predicate/patient matching paradigm. Events may be queried by any combination of agent entity (controller), event interaction type (predicate), patient (controlled entity), and/or PubMedCentral document ID. For all events matching a given query, the top 40 most frequent agents, patients, and document IDs are displayed, along with categorizations and totals for the matching events' interaction types.

About Reach

Reach Services

Related Links