An ultra-rapid bioinformatics pipeline that uses state-of-the-art computational algorithms to analyze real-time next-generation sequencing (NGS) data for infectious pathogen detection and discovery.


  • Web-based interface and clients being charged to use the pipeline
  • Integration into existing bioinformatics suites
  • Coupling of SURPI analysis with the development of NGS-based clinical assay kits or platforms
  • Integration into existing Laboratory Information Management System (LIMS)
  • Support for genomics research in a variety of different fields, including clinical diagnostics, blood bank screening, purity and safety testing of foods, drugs, and biologics, and environmental metagenomics


Currently, available in-depth sequencing analysis tools are time-consuming, cumbersome to use, and do not provide an easily accessible format that can be interpreted for clinical diagnosis. There is a great need for analyzing NGS data in real-time, particularly in the field of infectious disease, where turnaround time is of paramount consideration for clinical actions. The presented technology, SURPI, can resolve these challenges by providing the following advantages:

  • An ultra-rapid bioinformatics analysis pipeline for microbial identification in clinically actionable timeframes
  • Increased sensitivity and accuracy over other pipelines
  • Allows researchers, clinicians, and laboratories to analyze NGS data without the need for a local cluster of computational servers (hardware) or significant bioinformatics expertise
  • Implementation on a cloud server such as Amazon EC2 cloud computing platform
  • Deployment on a single desktop or portable laptop
  • Accurately classifies sequence reads by simultaneous alignment to the entirety of the GenBank nucleotide (NT) database
  • User-friendly output incorporating a specially designed clinical interface

Scientists at UCSF have developed a novel system called SURPI, a cloud-compatible pipeline that combines two open-source algorithms, SNAP and RAPSearch with customized programs / scripts written in C, Perl, Python, and the Linux Bash shell to identify sequences and contigs (assemblies of contiguous sequences) corresponding to pathogens from 5 to 300 million reads in minutes.

SURPI can accurately classify every read by alignment to the GenBank NT database, a comprehensive reference database including all human, bacterial, fungal, parasitic, animal, insect and viral sequences identified to date.

Additionally, SURPI is usable on a number of different platforms, including a multi-core computational server, desktop, laptop, or cloud. Single multi-core computational server and cloud server. The PI’s have also created a clinical user interface that displays and summarizes results from the pipeline so that no bioinformatics expertise is needed for analysis of the results.