Download Diapositiva 1 - VHIR`s Statistics and Bioinformatics Unit
Document related concepts
no text concepts found
Transcript
CURS OF BIOINFORMATICS FOR BIOMEDICAL RESEARCH Vall d’Hebron Institut de Recerca (VHIR) Institut d’Investigació Sanitària acreditat per l’Instituto de Salud Carlos III (ISCIII) NEXT GENERATION SEQUENCING TECHNOLOGIES AND APPLICATIONS Rosa Prieto Head of the High Tech Unit rosa.prieto@vhir.org 15/05/2014 1 CURS OF BIOINFORMATICS FOR BIOMEDICAL RESEARCH Index 1 INTRODUCTION TO NGS 2 NGS TECHNOLOGY OVERVIEW 3 NGS APPLICATIONS OVERVIEW 4 WHAT IS NEXT IN SEQUENCING TECHNOLOGIES? 2 Introduction Personalized medicine era -The right therapeutic strategy for the right person at the right time -Predisposition to disease -Early and targeted prevention Biomarker identification: •Diagnostic •Susceptibility/risk (prevention) •Prognostic (indolent vs. aggressive) •Predictive (response) 5 Introduction: “omics” “Omics” Omics aims at the collective characterization and quantification of pools of biological molecules that translate into the structure, function, and dynamics of an organism or organisms (Wikipedia). High-throughput technologies Genomics Transcriptomics Proteomics Epigenomics Metagenomics http://www.genomicglossaries.com/content/omes.asp Metabolomics Lipidomics 7 Everything can be sequenced… Next generation sequencing The future is here, now? 8 Introduction to NGS technologies Automatic sequencer ABI 1987 3.234,83 Mb (haploid) $ 2,7 billion (GS20) 1st generation 2nd generation http://www.ipc.nxgenomics.org/newsletter/no11.htm 3rd generation 9 Sequencing technology milestones First generation sequencing Second generation sequencing NGS increases capacity and reduces costs Moore’s Law: the number of transistors in an integrated circuit duplicates in 2-years time (1965). Date Cost per Mb Cost per Genome % cost vs. sep01 Sep-01 $5.292,39 $95.263.072 100% Sep-02 $3.413,80 $61.448.422 64,5039% Oct-03 $2.230,98 $40.157.554 42,1544% Oct-04 $1.028,85 $18.519.312 19,4402% Oct-05 $766,73 $13.801.124 14,4874% Oct-06 $581,92 $10.474.556 10,9954% Oct-07 $397,09 $7.147.571 7,5030% Oct-08 $3,81 $342.502 0,3595% Oct-09 $0,78 $70.333 0,0738% Oct-10 $0,32 $29.092 0,0305% Oct-11 $0,086 $7.743 0,0081% Oct-12 $0,074 $6.618 0,0069% Oct-13 $0,057 $5.096 0,0053% Jan-14 $0,045 $4.008 0,0042% Source - NHGRI : http://www.genome.gov/sequencingcosts/ Sanger sequencing vs. NGS (2nd and 3rd generation) Sanger 2ªNGS 1. Fragmentación de DNA 1. Fragmentación de DNA 2.Clonaje en Vectores; Transformación Bacterias; crecimiento y aislamiento vector DNA 2. Ligación de adaptadores in vitro y Amplificación clonal 3ªNGS 1. Fragmentación de DNA 2. y 3. Ligación de adaptadores in vitro y Secuenciación masiva SIN Amplificación 3. Ciclo Secuenciación 3. Secuenciación masiva en paralelo Secuencia: Primer: Polimerasa dNTPs ddNTPs marcados 4. Procesamiento imagen y análisis de datos 4. Procesamiento imagen Electroforesis (1 Secuencia/Capilar) CTATGCTCG 4. Procesamiento imagen y análisis de datos Comparison of different NGS platforms -Similarities (and differences vs. Sanger): •library preparation: starting material: short fragments of nucleic acids adapter ligation multiplexing (MID tags) •clonal amplification (not for 3rd generation sequencing) •massive parallel sequencing •the use of physical location to identify unique reads is a critical concept for all next generation sequencing systems. The density of the reads and the ability to record them without interfering noise is vital to the throughput of a given instrument. •signal needs to be processed and post-treated to get the individual sequences •complex data analysis due to the big amount of data -Differences: •Clonal amplification method/sequencing technology/signal detection •Throughput •Read-length •Run time •Cost per base 2ns generation NGS platforms Benchtop Instruments ROCHE GS Junior 454 GS FLX+ 454 Illumina NextSeq500 HiSeq 2500 HiSeq X-Ten (exp.2014) MiSeq Life Technologies SOLID5500xl IonProton IonPGM 16 NGS general workflow 1 Library preparation 2 Clonal amplification 3 Cyclic array sequencing 1 DNA fragmentation and in vitro adaptor ligation Different kinds of libraries (amplicons, shot-gun, cDNA….) emulsion PCR bridge PCR 2 3 Pyrosequencing 454 sequencing Semiconductor sequencing Ion Proton/PGM 4-colour fluorescent nucleotides Illumina technology 17 Clonal amplification by emPCR (454, Ion) emPCR based systems (Roche, SoLID, Ion) High-speed shaker -1 starting effective fragment per microreactor - ~106 microreactors per ml - All processed in parallel (Clonal amplification) 18 Clonal amplification by emPCR (454, Ion) No empty beads Clonal amplification?? No beads containing more than one amplified fragment 1) Bead vs. starting DNA quantity titration 2) Optimal enrichment: Melt 5-20% OK dsDNA Unión de Primer marcado con Biotina a bolas de captura con ssDNA Adición de bolas magnéticas con estreptavidina Melt 19 Bridge amplification (Illumina) HiSeq2500: 2 “flow-cells”, 8 carriles por celda Clusters clonales de cadena doble Eliminación de las cadenas reversas Unión de cadenas sencillas a los adaptadores Bloqueo y adición primer secuenciación Generación de clusters: PCR “en puente” 100-200 millones de clusters 20 GS FLX 454 sequencing Metal coated PTP reduces crosstalk 29 μm well diameter (20/bead) 3,400,000 wells per PTP 21 GS FLX 454 sequencing Pyrosequencing (sequencing by synthesis) CCD Camera “flowgram” (signal intensity is proportional to the number of nucleotides incorporated in the sequence) - throughput limited by the nº of wells in the PTP - errors in homopolymers :S (454) - long sequences (up to 1000bp) are achieved - low throughput, very expensive reagents 22 Illumina sequencing Reversible dye terminator nucleotides (sequencing by synthesis) Liberación secuencial de 4 nucleótidos fluorescentes Eliminación terminador 3’ Incorporación Captación de imagen - Limited by the fragment length than can “bridge” - Labelled nucleotides are not incorporated as efficiently as native ones - Short sequences -Strand-specific errors, substitutions towards the end of the read, base substitution errors (sistematic error GGT >GGG) -High throughput, expensive machines, cost per Mb OK 23 Ion Torrent sequencing ION TORRENT (Life Techn.) Fragmentación & secuencias adaptadoras Amplificación clonal (emPCR sobre beads) Deposición de las beads+DNA en los pocillos del chip 1. 2. 3. Liberación secuencial de nucleótidos no modificados La incorporación de un nucleótido por la polimerasa libera un H+ Detección directa y simultánea de un cambio de pH en todos los pocillos. •pHmeter, no optical system: rapid output improvement based on chips •Fast runs (native nucleotides) •Inexpensible machine and reagents •Fails in homopolymers detection 24 NGS data analysis Pyrosequencing 454 sequencing 25 NGS platforms comparison PLATFORM ROCHE GS FLX+ 454 ILLUMINA HISEQ 2500 ION PROTON emPCR Bridge amplification emPCR Sequencing chemistry Pyrosequencing Reversible dye terminators pH change Read length Up to 1000bp From 2x125 bp to 2x300 bp Up to 200 bp Run time 22 hrs 7 hrs-6 days From 2 to 4 hrs Throughput/run Up to 700 Mb 500-1000Gb (1Tb) 10Gb (PI), 100Gb (PII) Equipment Cost 500.000 $ 750.000 $ 250.000 $ Reagents Cost/run 8.000 $ 5.500 $ 1.000 $ GOOD! Longest read length High throughput/low cost per base/ease of use Quick, easy to use and cheap BAD! High error rate in homopolymers (>6); very expensive; low throughput; not automatized at all Short sequences Strand-specific errors, substitutions towards the end of the read, base substitution errors (sistematic error GGT >GGG) Library preparation Errors in homopolymers Higher bias than Illumina 26 NGS High-Throughput Platforms comparison HiSeq Xten (10 HiSeqX) Two modes: Rapid Run and High Output Single/Dual Flow Cells PE 2 x 125 pb 120 Gb in 27 hours (Rapid) 1 Tb in 6 days (High) 20 exomes in a day 1 human genome in a day 30 RNAseq samples in 5 hours Only High Output mode Single/Dual Flow Cells PE 2 x 150 pb 600 Gb in a day (dual flow cell) 1.8 Tb in 3 days (4x faster than HiSeq2500) HiSeq XTen: 10.000 genomes at 30x per year Human exome, 30x, aprox. 800-1000 € Human RNAseq (30Mreads, 100bp PE, strand specific): aprox. 800-1000 € Human whole genome 30x: 4000 € Source: Nextgenseek.com & Allseq.com. Todos estos costes son orientativos a mayo de 2.014 y de ninguna manera vinculantes para la UAT Ion Proton Ion PI chip: Up to 20 Gb output (specific. 10 Gb) Read length:Up to 200 bp Run time: 2-4 hrs 1 human exome (aprox. 1000 €) Ion PII chip: Up to 100 Gb output (expected 2014), now reduced to 20-30 Gb at launch Run time: 2-4 hrs Read length: 100 pb Human Whole Genome (10x, ?) Ion PIII chip (???): 200 Gb output per run 27 NGS Platforms specifications and applications Illumina Ion PGM/Ion Proton 28 NGS Platforms specifications and applications Roche 454 PacBio RSII (3rd generation) 29 NGS advantages and limitations Journal of Investigative Dermatology (2013) 133 31