An ultra-rapid bioinformatics pipeline that uses state-of-the-art computational algorithms to analyze real-time next-generation sequencing (NGS) data for infectious pathogen detection and discovery.
- Web-based interface and clients being charged to use the pipeline
- Integration into existing bioinformatics suites
- Coupling of SURPI analysis with the development of NGS-based clinical assay kits or platforms
- Integration into existing Laboratory Information Management System (LIMS)
- Support for genomics research in a variety of different fields, including clinical diagnostics, blood bank screening, purity and safety testing of foods, drugs, and biologics, and environmental metagenomics
Currently, available in-depth sequencing analysis tools are time-consuming, cumbersome to use, and do not provide an easily accessible format that can be interpreted for clinical diagnosis. There is a great need for analyzing NGS data in real-time, particularly in the field of infectious disease, where turnaround time is of paramount consideration for clinical actions. The presented technology, SURPI, can resolve these challenges by providing the following advantages:
- An ultra-rapid bioinformatics analysis pipeline for microbial identification in clinically actionable timeframes
- Increased sensitivity and accuracy over other pipelines
- Allows researchers, clinicians, and laboratories to analyze NGS data without the need for a local cluster of computational servers (hardware) or significant bioinformatics expertise
- Implementation on a cloud server such as Amazon EC2 cloud computing platform
- Deployment on a single desktop or portable laptop
- Accurately classifies sequence reads by simultaneous alignment to the entirety of the GenBank nucleotide (NT) database
- User-friendly output incorporating a specially designed clinical interface
Scientists at UCSF have developed a novel system called SURPI, a cloud-compatible pipeline that combines two open-source algorithms, SNAP and RAPSearch with customized programs / scripts written in C, Perl, Python, and the Linux Bash shell to identify sequences and contigs (assemblies of contiguous sequences) corresponding to pathogens from 5 to 300 million reads in minutes.
SURPI can accurately classify every read by alignment to the GenBank NT database, a comprehensive reference database including all human, bacterial, fungal, parasitic, animal, insect and viral sequences identified to date.
Additionally, SURPI is usable on a number of different platforms, including a multi-core computational server, desktop, laptop, or cloud. Single multi-core computational server and cloud server. The PI’s have also created a clinical user interface that displays and summarizes results from the pipeline so that no bioinformatics expertise is needed for analysis of the results.