Home Shop Service Stellenangebote Newsletter Das Unternehmen Sitemap Unterhaltung Warenkorb English
Bücher | Angebote | Bioinformatics - From Genomes to Therapies | Inhaltsverzeichnis
Unsere Produkte
Bücher
 
Soeben erschienen
Titelsuche
Featured Sites
Unterhaltung
Zeitschriften
Elektronische Medien
Wählen Sie Ihr Fachgebiet
 
  Contents  
 
  Volume 1  
  Preface XXV
  List of Contributors XXIX
Part 1 Introduction 1
1 Bioinformatics - From Genomes to Therapies
Thomas Lengauer
1
1 Introduction 1
2 The Molecular Basis of Disease 1
3 The Molecular Approach to Curing Diseases 6
4 Finding Protein Targets 8
4.1 Genomics versus Proteomics 10
4.2 Extent of Information Available on the Genes/Proteins 11
5 Developing Drugs 12
6 Optimizing Therapies 14
7 Organization of the Book 15
References 23
Part 2 Sequencing Genomes 25
2 Bioinformatics Support for Genome-Sequencing Projects
Knut Reinert and Daniel Huson
25
1 Introduction 25
2 Assembly Strategies for Large Genomes 25
2.1 Introduction 25
2.2 Properties of the Data 29
2.2.1 Reads, Mate-pairs and Quality Values 29
2.2.2 Physical Maps 30
2.3 Assembly Strategies 31
3 Algorithmic Problems and their Treatment 33
3.1 Overlap Comparison of all Reads 34
3.2 Contig Phase: Layout of Reads 37
3.3 Error Correction and Resolving Repeats 40
3.4 Layout of Contigs 42
3.5 Computation of the Consensus Sequences 45
4 Examples of Existing Assemblers 47
4.1 The Celera Assembler 47
4.2 The Gig Assembler 48
4.3 The ARACHNE Assembler 48
4.4 The JAZZ Assembler 49
4.5 The RePS Assembler 49
4.6 The Barnacle Assembler 49
4.7 The PCAP Assembler 50
4.8 The Phusion Assembler 50
4.9 The Atlas Assembler 51
4.10 Other Assemblers 52
5 Conclusion 52
References 53
Part 3 Sequence Analysis 57
3 Sequence Alignment and Sequence Database Search
Martin Vingron
57
1 Introduction 57
2 Pairwise Sequence Comparison 58
2.1 Dot plots 58
2.2 Sequence Alignment 60
3 Database Searching I: Single-sequence Heuristic Algorithms 65
4 Alignment and Search Statistics 68
5 Multiple Sequence Alignment 71
6 Multiple Alignments, HMMs and Database Searching II 74
7 Protein Families and Protein Domains 78
8 Conclusions 79
References 79
4 Phylogeny Reconstruction
Ingo Ebersberger, Arndt von Haeseler and Heiko A. Schmidt
83
1 Introduction 83
1.1 Reconstructing a Tree from its Leaves 84
1.2 Phylogenetic Relationships of Taxa and their Characters 85
1.2.1 The Problem of Character Inconsistencies 86
1.2.2 Finding the Appropriate Character Set 87
2 Modeling DNA Sequence Evolution 88
2.1 Nucleotide Substitution Models 90
2.2 Modeling Rate Heterogeneity 90
2.3 Codon Models 91
3 Tracing the Evolutionary Signal 92
3.1 The Parsimony Principle of Evolution 93
3.1.1 Generalized Parsimony 94
3.1.2 Multiple/Parallel Hits 95
3.2 Distance-based Methods 95
3.2.1 UPGMA 95
3.2.2 Neighbors-relation Methods 96
3.2.3 Neighbor-joining Method 97
3.2.4 Least-squares Methods 98
3.3 The Criterion of Likelihood 98
3.4 Calculating the Likelihood of a Tree 99
3.5 Bayesian Statistics in Phylogenetic Analysis 99
3.6 Rooting Trees/Molecular Clock 101
3.6.1 Outgroup Rooting 101
3.6.2 Midpoint Rooting and Molecular Clock 102
4 Finding the Optimal Tree 103
4.1 Exhaustive Search Methods 103
4.2 Heuristic Search Methods 104
4.2.1 Hill Climbing and the Problem of Local Optimization 105
4.2.2 Modeling Tree Quality 108
4.2.3 Heuristics for Large Datasets 108
5 The Advent of Phylogenomics 109
5.1 Multilocus Datasets 109
5.2 Combining Incomplete Multilocus Datasets: Supertrees and their Methods 112
5.2.1 Agreement Supertrees 112
5.2.2 Optimization Supertrees 114
5.2.3 The Supertrees/Consensus versus Total Evidence Debate 115
5.2.4 Medium-level Combination 115
6 Phylogenetic Network Methods 116
6.1 From Trees to Split Networks 116
6.1.1 Split Systems and their Visualization 116
6.1.2 Constructing Split Systems from Trees 118
6.1.3 Constructing Split Systems from Sequence Data 118
6.2 Reconstructing Reticulate Evolution and Further Analyses 119
References 121
5 Finding Protein-coding Genes
David C. Kulp
129
1 Introduction 129
2 Basic DNA Terminology 129
3 Detecting Coding Sequences 131
3.1 Reading Frames 132
3.2 Coding Potential 132
4 Gene Contents 135
5 Gene Signals 137
5.1 Splice Sites 137
5.2 Translation Initiation 140
5.3 Translation and Transcription Termination 140
6 Integrating Gene Features 141
6.1 Combining Local Features 141
6.2 Dynamic Programming 142
6.3 Gene Grammars 143
7 Performance Comparisons 145
8 Using Homology 147
8.1 cDNA Clustering and Alignments 147
8.2 Orthologous DNA 150
8.3 Protein Homology 152
8.4 Integrative Methods 153
9 Pitfalls: Pseudogenes, Splice Variants and the Cruel Biological Reality 153
10 Further Reading 154
References 155
6 Analyzing Regulatory Regions in Genomes
Thomas Werner
159
1 General Features of Regulatory Regions in Eukaryotic Genomes 159
1.1 General Functions of Regulatory Regions 159
1.2 Most Important Elements in Regulatory Regions 160
1.3 TFBSs 160
1.4 Sequence Features 161
1.5 Structural Elements 161
1.6 Organizational Principles of Regulatory Regions 162
1.6.1 Overall Structure of Pol II Promoters 162
1.6.2 TFBS in Promoters 162
1.6.3 Module Properties of the Core Promoter 163
1.7 Bioinformatics Models for the Analysis and Detection of Regulatory Regions 168
1.8 Statistical Models 168
1.8.1 Mixed Models 168
1.8.2 Organizational Models 169
2 Methods for Element Detection 169
2.1 Detection of TFBSs 169
2.2 Detection of Novel TFBS Motifs 171
2.3 Detection of Structural Elements 172
2.4 Assessment of Other Elements 172
3 Analysis of Regulatory Regions 173
3.1 Comparative Sequence Analysis 173
3.2 Training Set Selection 173
3.3 Statistical and Biological Significance 174
3.4 Context Dependency 174
4 Methods for Detection of Regulatory Regions 175
4.1 Scaffold/Matrix Attachment Regions (S/MARs) 176
4.2 Enhancers/Silencers 177
4.3 Promoters 177
4.4 Programs for Recognition of Regulatory Sequences 177
4.4.1 Programs Based on Statistical Models (General Promoter Prediction) 178
4.4.2 Programs Utilizing Mixed Models 179
4.4.3 Programs Based on Specific Promoter Recognition 179
4.4.4 Early Attempts at Promoter Prediction 181
5 Annotation of Large Genomic Sequences 182
5.1 Balance between Sensitivity and Specificity 182
5.2 Genes – Transcripts – Promoters 183
5.3 Sources for Finding Alternative Transcripts and Promoters 185
5.4 Comparative Genomics of Promoters 185
6 Genome-wide Analysis of Transcription Control 186
6.1 Context-specific Transcripts and Pathways 187
6.2 Consequences for Microarray Analysis 187
7 Conclusions 189
References 190
7 Finding Repeats in Genome Sequences
Brian J. Haas and Steven L. Salzberg
197
1 Introduction 197
2 Algorithms and Tools for Mining Repeats 199
2.1 Finding Intra- and Inter-sequence Repeats as Pairwise Alignments 200
2.2 Miropeats (alias Printrepeats) 201
2.3 REPuter 202
2.4 RepeatFinder 206
2.5 RECON 207
2.6 PILER 209
2.7 RepeatScout 212
3 Tandem Repeats 215
3.1 TRF 216
3.2 STRING (Search for Tandem Repeats IN Genomes) 218
3.3 MREPS 219
4 Repeats and Genome Assembly Algorithms 220
4.1 Repeat Management in the Celera Assembler and other Assemblers 221
4.2 Repeat Identfication by k-mer Counts 221
4.3 Repeat Identfication by Depth of Coverage (Arrival Rates) 222
4.4 Repeat Identfication by Conflicting Links 223
4.5 Repeat Placement: Rocks and Stones 223
4.6 Repeat Placement: Surrogates 223
4.7 Repeat Resolution in Euler 224
5 Untangling the Mosaic Nature of Repeats (The A-Bruijn Graph) 225
6 Repeat Annotation in Genomes 227
References 230
8 Analyzing Genome Rearrangements
Guillaume Bourque
235
1 Introduction 235
2 Basic Concepts 236
2.1 Genome Representation 236
2.1.1 Circular, Linear and Multichromosomal Genomes 237
2.1.2 Unsigned Genomes 238
2.1.3 Unequal Gene Content 238
2.1.4 Homology Markers 238
2.2 Types of Genome Rearrangements 239
3 Distance between Two Genomes 240
3.1 Breakpoint Distance 240
3.2 Rearrangement Distance 241
3.2.1 HP Theory 242
3.3 Conservation Distance 244
3.3.1 Common Intervals 244
3.3.2 Conserved Intervals 245
4 Genome Rearrangement Phylogenies 245
4.1 Distance-based Methods 246
4.2 Maximum Parsimony Methods 247
4.3 Maximum Likelihood Methods 248
5 Recent Applications 249
5.1 Rearrangements in Large Genomes 249
5.2 Genomes Rearrrangements and Cancer 252
6 Conclusion 253
6.1 Challenges 253
6.2 Promising New Approaches 255
References 256
Part 4 Molecular Structure Prediction 261
9 Predicting Simplified Features of Protein Structure
Dariusz Przybylski and Burkhard Rost
261
1 Introduction 261
1.1 Protein Structures are Determined Much Slower than Sequences 261
1.2 Reliable and Comprehensive Computations of 3-D Structures are not yet Possible 261
1.3 Predictions of Simplified Aspects of 3-D Structure are often very Successful 262
2 Secondary Structure Prediction 262
2.1 Assignment of Secondary from 3-D Structure 262
2.1.1 Regular Secondary Structure Formation is Mostly a Local Process 262
2.1.2 Secondary Structures can be Somehow Flexible 263
2.1.3 Automatic Assignments of Secondary Structure 263
2.1.4 Reduction to Three Secondary Structure States 264
2.2 Measuring Performance 265
2.2.1 Performance has Many Aspects Relating to Many Different Measures 265
2.2.2 Per-residue Percentage Accuracy: QK 266
2.2.3 Per-residue Confusion between Regular Elements: BAD 266
2.2.4 Per-segment Prediction Accuracy: SOV 266
2.3 Comparing Different Methods 267
2.3.1 Generic Problems 267
2.3.2 Numbers can often not be Compared between Two Different Publications 267
2.3.3 Appropriate Comparisons of Methods Require Large, “Blind” Data Sets 268
2.4 History 269
2.4.1 First Generation: Single-residue Statistics 269
2.4.2 Second Generation: Segment Statistics 269
2.4.3 Third Generation: Evolutionary Information 269
2.4.4 Recent Improvements of Third-generation Methods 271
2.4.5 Meta-predictors Improve Somehow 272
2.5 State-of-the-art Performance 272
2.5.1 Average Predictions Have Good Quality 272
2.5.2 Prediction Accuracy Varies among Proteins 273
2.5.3 Reliability of Prediction Correlates with Accuracy 273
2.5.4 UnderstandableWhy Certain Proteins Predicted Poorly? 274
2.6 Applications 274
2.6.1 Better Database Searches 274
2.6.2 One-dimensional Predictions Assist in the Prediction of Higher-dimensional Structure 275
2.6.3 Predicted Secondary Structure Helps Annotating Function 275
2.6.4 Secondary Structure-based Classifications in the Context of Genome Analysis 276
2.6.5 Regions Likely to Undergo Structural Change Predicted Successfully 276
2.7 Things to Remember when using Predictions 277
2.7.1 Special Classes of Proteins 277
2.7.2 Better Alignments Yield Better Predictions 277
2.8 Resources 277
2.8.1 Internet Services are Widely Available 277
2.8.2 Interactive Services 277
2.8.3 Servers 278
3 Transmembrane Regions 278
3.1 Transmembrane Proteins are an Extremely Important Class of Proteins 278
3.2 Prediction Methods 279
3.3 Performance 279
3.4 Servers 280
4 Solvent Accessibility 280
4.1 Solvent Accessibility Somehow Distinguishes Structurally Important from Functionally Important 280
4.2 Measuring Solvent Accessibility 280
4.3 Best Methods Combine Evolutionary Information with Machine Learning 281
4.4 Performance 282
4.5 Servers 282
5 Inter-residue Contacts 282
5.1 Two-dimensional Predictions may be a Step Toward 3-D Structures 282
5.2 Measuring Performance 282
5.3 Prediction Methods 283
5.4 Performance and Applications 283
5.5 Servers 283
6 Flexible and Intrinsically Disordered Regions 284
6.1 Local Mobility, Rigidity and Disorder all are Features that Relate to Function 284
6.2 Measuring Flexibility and Disorder 284
6.3 Prediction Methods 284
6.4 Servers 285
7 Protein Domains 285
7.1 Independent Folding Units 285
7.2 Prediction Methods 285
7.3 Servers 286
References 286
10 Homology Modeling in Biology and Medicine
Roland L. Dunbrack, Jr.
297
1 Introduction 297
1.1 The Concept of Homology Modeling 297
1.2 How do Homologous Protein Arise? 298
1.3 The Purposes of Homology Modeling 299
1.4 The Effect of the Genome Projects 301
2 Input Data 303
3 Methods 307
3.1 Modeling at Different Levels of Complexity 307
3.2 Side-chain Modeling 309
3.2.1 Input Information 309
3.2.2 Rotamers and Rotamer Libraries 311
3.2.3 Side-chain Prediction Methods 312
3.2.4 Available Programs for Side-chain Prediction 317
3.3 Loop Modeling 317
3.3.1 Input Information 317
3.3.2 Loop Conformational Analysis 318
3.3.3 Loop Prediction Methods 320
3.3.4 Available Programs 321
3.4 Methods for Complete Modeling 322
3.4.1 MODELLER 322
3.4.2 MolIDE: A Graphical User Interface for Modeling 323
3.4.3 RAMP and PROTINFO 323
3.4.4 SWISS-MODEL 323
4 Results 324
4.1 Range of Targets 324
4.2 Example: Protein Kinase STK11/LKB1 324
4.3 The Importance of Protein Interactions 331
5 Strengths and Limitations 334
6 Validation 335
6.1 The CASP Meeting 336
6.2 Protein Health 336
References 337
11 Protein Fold Recognition Based on Distant Homologs
Ingolf Sommer
351
1 Introduction 351
2 Overview of Template-based Modeling 352
2.1 Key Steps in Template-based Modeling 352
2.1.1 Identifying Templates 352
2.1.2 Assessing Signficance 353
2.1.3 Model Building 353
2.1.4 Evaluation 354
2.2 Template Databases 354
3 Sequence-based Methods for Identifying Templates 356
3.1 Sequence–Sequence Comparison Methods 356
3.2 Frequency Profile Methods 357
3.2.1 Definition of a Frequency Profile and PSSM 357
3.2.2 Generating Frequency Profiles 359
3.2.3 Scoring Frequency Profiles 360
3.2.4 Scoring Profiles Against Sequences 360
3.2.5 Scoring Profiles against Profiles 361
3.3 Hidden Markov Models (HMMs) 363
3.3.1 Definition 363
3.3.2 Profile HMM Technology 364
3.3.3 HMMs in Fold Recognition 365
3.3.4 HMM–HMMComparisons 365
3.4 Support Vector Machines (SVMs) 365
3.4.1 Definition 365
3.4.2 Various Kernels 366
3.4.3 Experimental Assessment 366
4 Structure-basedMethods for Identifying Templates 367
4.1 Boltzmann’s Principle and Knowledge-based Potentials 368
4.2 Threading Using Pair-interaction Potentials 369
4.3 Threading using Frozen Approximation Algorithms 371
5 Hybrid Methods and Recent Developments 372
5.1 Using Different Sources of Information 372
5.1.1 Incorporating Secondary Structure Prediction into Frequency Profiles and HMMs 372
5.1.2 Intrinsically Disordered Regions in Proteins 373
5.1.3 Incorporating 3-D Structure into Frequency Profiles 374
5.2 Combining Information 374
5.3 Meta-servers 375
6 Assessment of Models 376
6.1 Estimating Signficance of Sequence Hits 376
6.2 Scoring 3-D Model Quality: Model Quality Assessment Programs (MQAPs) 377
6.3 Evaluation of Protein Structure Prediction: Critical Assessment of Techniques for Protein Structure Prediction 378
7 Programs and Web Resources 379
References 380
12 De Novo Structure Prediction: Methods and Applications
Richard Bonneau
389
1 Introduction 389
1.1 Scope of this Review and Definition of De Novo Structure Prediction 389
1.2 The Role of Structure Prediction in Biology 390
1.3 De novo Structure Prediction in a Genome Annotation Context, Synergy with Other Methods 391
2 Core Features of Current Methods of De Novo Structure Prediction 393
2.1 Rosetta De Novo 393
2.2 Evaluation of Structure Predictions 396
2.3 Domain Prediction is Key 399
2.4 Local Structure Prediction and Reduced Complexity Models are Central to Current De Novo Methods 403
2.5 Clustering as a Heuristic Approach to Approximating Entropic Determinants of Protein Folding 405
2.6 Balancing Resolution with Sampling, Prospects for Improved Accuracy and Atomic Detail 406
3 Applying Structure Prediction: De Novo Structure Prediction in a Systems Biology Context 408
3.1 Structure Prediction as a Road to Function 408
3.2 Initial Application of De Novo Structure Prediction 409
3.3 Application on Genome-wide Scale and Examples of Data Integration 410
3.4 Scaling-up De Novo Structure Prediction: Rosetta on the World Community Grid 412
4 Future Directions 412
4.1 Structure Prediction and Systems Biology: Data Integration 412
4.2 Need for Improved Accuracy and Extending the Reach of De Novo Methods 413
References 413
13 Structural Genomics
Philip E. Bourne and Adam Godzik
419
1 Overview 419
1.1 What is Structural Genomics? 419
1.2 What are the Motivators? 419
1.2.1 Fold Coverage as a Motivator 420
1.2.2 Structural Coverage of an Organism as a Motivator 424
1.2.3 Structure Coverage of Central Metabolism Pathways as a Motivator 424
1.2.4 Disease as a Motivator 425
1.3 How Does Structural Genomics Relate to Conventional Structural Biology? 425
2 Methodology 427
2.1 Target Selection 427
2.2 Crystallomics 428
2.3 Data Collection 429
2.4 Structure Solution 430
2.5 Structure Refinement 431
2.6 PDB Deposition 431
2.7 Functional Annotation 432
2.7.1 Biological Multimeric State 432
2.7.2 Active-site Determination 432
2.8 Publishing 433
3 Results – Number and Characteristics of Structures Determined 434
4 Discussion 435
4.1 Follow-up Studies 435
4.2 Examples of Functional Discoveries 436
5 The Future 436
References 436
14 RNA Secondary Structures
Ivo L. Hofacker and Peter F. Stadler
439
1 Secondary Structure Graphs 439
1.1 Introduction 439
1.2 Secondary Structure Graphs 440
1.3 Mountain Plots and Dot Plots 443
1.4 Trees and Forests 443
1.5 Notes 444
2 Loop-based Energy Model 444
2.1 Loop Decomposition 444
2.2 Energy Parameters 445
2.3 Notes 447
3 The Problem of RNA Folding 447
3.1 Counting Structures and Maximizing Base Pairs 447
3.2 Backtracing 449
3.3 Energy Minimization in the Loop-based Energy Model 450
3.4 RNA Hybridization 453
3.5 Pseudoknotted Structures 454
3.6 Notes 454
4 Conserved Structures, Consensus Structures and RNA Gene Finding 456
4.1 The Phylogenetic Method 456
4.2 Conserved Structures 457
4.3 Consensus Structures 459
4.4 RNA Gene Finding 460
4.5 Notes 463
5 Grammars for RNA Structures 463
5.1 Context-free Grammars (CFGs) and RNA Secondary Structures 463
5.2 Cocke–Younger–Kasami (CYK) Algorithm 465
5.3 Inside and Outside Algorithms 465
5.4 Parameter Estimation 466
5.5 Algebraic Dynamic Programming 466
5.6 Notes 467
6 Comparison of Secondary Structures 468
6.1 String-based Alignments 469
6.2 Tree Editing 469
6.3 Tree Alignments 472
6.4 The Sankoff Algorithm and Variants 475
6.5 Multiple Alignments 475
6.6 Notes 476
7 Kinetic Folding 476
7.1 Folding Energy Landscapes 476
7.2 Kinetic Folding Algorithms 477
7.3 Approximate Folding Trajectories and Barrier Trees 478
7.4 RNA Switches 480
7.5 Notes 481
8 Concluding Remarks 481
References 482
15 RNA Tertiary Structure Prediction
François Major and Philippe Thibault
491
1 Introduction 491
2 Annotation 493
2.1 Nucleotide Conformations 494
2.2 Nucleotide Interactions 501
2.2.1 Base Stacking 502
2.2.2 Base Pairing 505
2.2.3 Isosteric Base Pairs 508
3 Motif Discovery 508
3.1 RNA Motifs 509
3.1.1 Classical Examples 509
3.2 Catalytic Motifs 513
3.3 Transport and Localization 519
4 Modeling 521
4.1 The CSP 522
4.2 MC-Sym 524
4.2.1 Backbone Optimization 527
4.2.2 Probabilistic Backtracking 529
4.2.3 “Divide and Conquer” 529
4.3 MC-Sym atWork 530
4.3.1 Modeling a Yeast tRNA-Phe Stem–Loop 532
4.3.2 Modeling a Pseudoknot 533
4.3.3 Cycles of Interactions 535
5 Perspectives 535
References 536
  Volume 2  
Part 5 Analysis of Molecular Interactions 541
16 Docking and Scoring for Structure-based Drug Design
Matthias Rarey, Jörg Degen and Ingo Reulecke
541
1 Introduction 541
1.1 A Taxonomy of Docking Problems 543
1.2 Application Scenarios in Structure-based Molecular Design 544
2 Scoring Protein–Ligand Complexes 546
2.1 Modeling Protein–Ligand Interactions 546
2.2 Scoring Functions based on Force Fields 548
2.3 Empirical Scoring 550
2.4 Knowledge-based Scoring 551
2.5 Evaluation 551
3 Methods for Protein–Ligand Docking 552
3.1 Rigid-body Docking Algorithms 552
3.1.1 Approaches based on Clique Search 553
3.1.2 Geometric Hashing 554
3.1.3 Pose Clustering 555
3.1.4 Fast Shape Comparison 557
3.2 Flexible Ligand-docking Algorithms 558
3.2.1 Conformation Ensembles 558
3.2.2 Flexible Docking based on Fragmentation 559
3.2.2.1 “Place & Join” Algorithms 559
3.2.2.2 Incremental Construction Algorithms 560
3.2.3 Genetic Algorithms and Evolutionary Programming 563
3.2.4 Distance Geometry 565
3.2.5 Random Search 565
3.3 Docking by Simulation 566
3.3.1 Simulated Annealing 566
3.3.2 MD Simulations 567
3.3.3 MC Algorithms 568
3.3.4 Hybrid Methods 570
4 Structure-based Virtual Screening 570
4.1 Considering Pharmacophoric Constraints 571
4.2 Docking of Combinatorial Libraries 571
4.3 Database Approaches 573
5 From Molecules to Fragment Spaces: Structure-based De Novo Design 574
5.1 Modeling Fragment Spaces 575
5.2 De Novo Design Algorithms 575
5.2.1 Rigid-body Algorithms 576
5.2.2 Simulation Methods 576
5.2.3 “Place & Join” Algorithms 577
5.2.4 Sequential Growth Algorithms 578
5.2.5 Genetic Algorithms and Evolutionary Programming 579
5.3 Synthetic Accessibility 580
5.3.1 Fragment Selection 580
5.3.2 Virtual Synthesis 581
5.3.3 Compound Analysis 581
6 Structure-based Drug Design atWork: Validation Studies and Applications 582
7 Concluding Remarks 583
References 584
17 Modeling Protein–Protein and Protein–DNA Docking
Andreas Hildebrandt, Oliver Kohlbacher and Hans-Peter Lenhof
601
1 Introduction 601
2 Protein–Protein Interactions 603
2.1 Basic Concepts of Docking 603
2.2 Rigid Body Docking 606
2.2.1 Correlation Techniques 606
2.2.2 Graph-based Structure Generation Methods 610
2.2.3 Slice Decomposition and Polygon Descriptors 612
2.2.4 Critical Surface Points and Geometric Hashing 614
2.2.5 Other Approaches 615
2.3 Realizing Protein Flexibility 615
2.3.1 Side Chain Placement 617
2.3.1.1 Dead End Elimination 618
2.3.1.2 “Branch & Bound” and the A* Algorithm 619
2.3.1.3 Integer Linear Programming 621
2.3.2 Hinge-bending 624
2.3.3 Biased Probability Monte Carlo (BPMC) Conformational Search 626
2.4 Scoring Functions 627
2.4.1 Empirical Potentials 628
2.4.2 Knowledge-based Potentials 630
2.5 Data-driven Docking 632
2.5.1 Experimental Techniques 632
2.5.2 Algorithmic Approaches 633
2.6 Assessment of Docking Predictions 634
3 Protein–DNA Interactions 638
3.1 Peculiarities of Protein–DNA Binding 638
3.2 Algorithmic Techniques 639
3.2.1 Correlation Techniques 639
3.2.2 Monte Carlo Techniques 640
3.3 Scoring Functions 641
4 Conclusion 642
References 644
18 Lead Identfication by Virtual Screening
Andreas Kämper, Didier Rognan and Thomas Lengauer
651
1 Introduction 651
1.1 Screening Techniques 652
1.2 Drug Discovery Process 653
1.3 Compound Collections 654
2 Filtering and Preparation of Ligands 655
2.1 Library Preprocessing 656
2.2 Bioavailability 658
2.3 Drug-likeness 659
2.4 Molecular Diversity 660
3 Ligand-based VS 662
3.1 Descriptor-based Similarity Measures 664
3.2 Bit String Descriptors 665
3.3 Feature Trees 666
3.4 Molecular Superimposition Approaches 667
3.5 Pharmacophore Searches 669
3.6 QSARs 670
3.7 Other Techniques 672
4 Postprocessing of Hitlists 672
4.1 Data Mining 673
4.2 Analysis of the Protein–Ligand Interface 674
4.3 Consensus Techniques 675
4.4 Visualization 676
5 Critical Evaluation of Structure-based VS 677
5.1 Influence of Parameter Settings 677
5.1.1 Which Library? 677
5.1.2 Which Ligand Conformation(s)? 678
5.1.3 Which Protein Coordinates? 678
5.1.4 Which Docking Tool? 678
5.1.5 Which Scoring Function? 679
5.1.6 Which Postprocessing? 680
5.2 Recent Success Stories 681
5.2.1 Some Privileged Targets 681
5.2.2 First-in-class Compounds 684
5.2.3 Fragment Screening 685
5.2.4 Lead Optimization 686
5.2.5 Homology Models as VS Targets 686
5.3 Concluding Remarks 687
6 Critical Evaluation of Ligand-based VS 687
6.1 Influence of Parameter Settings 687
6.2 Recent Success Stories 688
6.3 Comparison of Structure- and Ligand-based Techniques 691
6.4 Concluding Remarks 692
References 693
19 Efficient Strategies for Lead Optimization by Simultaneously Addressing Affinity, Selectivity and Pharmacokinetic Parameters
Karl-Heinz Baringhaus and Hans Matter
705
1 Introduction 705
2 The Origin of Lead Structures 708
3 Optimization for Affinity and Selectivity 711
3.1 Lead Optimization as a Challenge in Drug Discovery 711
3.2 Use and Limitation of Structure-based Design Approaches 712
3.3 Integration of Ligand- and Structure-based Design Concepts 713
3.4 The Selectivity Challenge from the Ligand’s Perspective 716
3.5 Selectivity Approaches Considering Binding Site Topologies 717
4 Addressing Pharmacokinetic Problems 721
4.1 Prediction of Physicochemical Properties 721
4.2 Prediction of ADME Properties 722
4.3 Prediction of Toxicity 724
4.4 Physicochemical and ADMET Property-based Design 724
5 ADME/AntitargetModels for Lead Optimization 724
5.1 Global ADME Models for Intestinal Absorption and Protein Binding 724
5.2 Selected Examples to Address ADME/Toxicology Antitargets 728
6 Integrated Approach 732
6.1 Strategy and Risk Assessment 732
6.2 Integration 734
6.3 Literature and Aventis Examples on Aspects of Multidimensional Optimization 735
7 Conclusions 742
References 743
Part 6 Molecular Networks 755
20 Modeling and Simulating Metabolic Networks
Stefan Schuster and David Fell
755
1 Introduction 755
2 Fundamentals 756
2.1 Motivation 756
2.2 Stoichiometry 757
2.3 Balance Equations 759
2.4 Enzyme Kinetics 760
3 Network Analysis 762
3.1 Conservation Relations 762
3.2 Stationary States and Stability Analysis 764
3.3 Constraints on Steady-state Fluxes 766
3.4 Defining Component Pathways of a Network 769
3.5 Examples of Elementary-modes Analysis 771
3.6 Extreme Pathways 777
3.7 Optimization of Molar Yields and Flux Balance Analysis (FBA) 778
3.8 Analyzing the Robustness of Metabolism 781
4 Dynamic Simulation 782
4.1 How is a Dynamic Model Constructed? 782
4.2 Metabolic Databases 788
4.3 Example: Red Blood Cell Metabolism 790
4.4 Oscillations 792
4.5 Whole-cell Modeling 794
5 Conclusions 797
References 798
21 Inferring Gene Regulatory Networks
Michael Q. Zhang
807
1 Introduction 807
2 Gene Regulation at the Transcriptional Level 808
2.1 Finding TFBSs and Motifs 809
2.2 Identifying Target Genes 809
2.3 Discovering Novel Motifs and Target Genes 810
2.4 Inferring GRN Modules and Integrating Diverse Types of Data 812
3 Gene Regulation at the Level of RNA Processing 813
3.1 Identfication of Splicing Enhancers and Silencers 814
3.2 Splicing Microarrays 814
4 Gene Regulation at the Translational Level 815
5 Gene Regulation by Small ncRNAs 816
6 GRNs in Development and Evolution 817
References 819
22 Modeling Cell Signaling Networks
Anthony Hasseldine, Azi Lipshtat, Ravi Iyengar and Avi Ma’ayan
829
1 Introduction 829
1.1 Components and Cascades 829
1.2 From Pathways to Networks 832
1.2.1 Interactions between Signaling Pathways 832
1.2.2 Implications of Network Topology 834
1.2.3 Network Motifs 835
2 Types of Models and the Information they can Yield 839
2.1 Boolean Networks and Bayesian Networks Modeling Approaches 839
2.2 Quantitative Dynamics Modeling 841
2.2.1 Deterministic Models 843
2.2.2 Stochastic Models 846
2.2.3 Hybrid Models 849
3 Identifying Parameters/Data Sets for Modeling 850
3.1 Functionally Relevant Connections 850
3.2 Qualitative Relationships 850
3.3 Quantitative Specifications 851
4 Model Validation 853
4.1 Parameter Variation and Sensitivity Analysis 853
4.2 Constraints and Predictions 854
5 Perspective 855
References 858
23 Dynamics of Virus–Host Cell Interaction
Udo Reichl and Yury Sidorenko
861
1 Introduction 861
2 Viral Infection of Cells 863
2.1 Viral Infection of Prokaryotic Cells 864
2.2 Viral Infection of Eukaryotic Cells 866
3 Mathematical Models of Virus Dynamics 868
3.1 Unstructured Models of Virus Dynamics 869
3.2 Structured Models of Virus Dynamics 871
4 Influenza Virus as an Example for Virus–Host Cell Interaction 872
4.1 The Influenza A Virus Life Cycle 873
4.2 Mathematical Model of the Influenza A Virus Life Cycle 877
4.3 Influenza A Virus Growth Dynamics 880
4.4 Discussion and Outlook 886
5 Conclusions 887
References 892
Part 7 Analysis of Expression Data 899
24 DNA Microarray Technology and Applications – An Overview
John Quackenbush
899
1 Introduction to DNA Microarrays 899
2 Microarrays and Clinical Applications 899
3 Microarray Data Collection, Transformation and Representation 902
4 Identifying Patterns of Expression 905
5 Class Discovery 906
5.1 Hierarchical Clustering 906
5.2 k-means Clustering 907
5.3 Other Unsupervised Approaches 911
6 Classification 911
6.1 kNN Classification 912
7 Validation 914
8 Sample Selection and Classification 915
9 Limitations and Success of Classification 915
10 Data Reporting and Comparisons 916
11 Meta-analysis 919
12 The Path Forward 922
References 923
25 Low-level Analysis of Microarray Experiments
Wolfgang Huber, Anja von Heydebreck and Martin Vingron
929
1 Introduction 929
1.1 Microarray technology 929
1.2 Prerequisites 930
1.3 Preprocessing 931
2 Visualization and Exploration of the Raw Data 932
2.1 Image Analysis 932
2.2 Dynamic Range and Spatial Effects 933
2.3 Scatterplot 934
2.4 Batch Effects 938
2.5 Along Chromosome Plots 941
2.6 Sensitivity and Specificity of Probes 942
3 Error Models 943
3.1 Motivation 943
3.1.1 Obtaining Optimal Estimates 943
3.1.2 Biological Inference 944
3.1.3 Quality Control 944
3.2 The Additive–Multiplicative Error Model 944
3.2.1 Induction from Data 944
3.2.2 A Theoretical Deduction 946
4 Normalization 947
5 Detection of Differentially Expressed Genes 949
5.1 Stepwise versus Integrated Approaches 949
5.2 Measures of Differential Eexpression: The Variance Bias Trade-off 950
5.3 Identifying Differentially Expressed Genes from Replicated Measurements 951
6 Software 953
References 954
26 Classification of Patients
Claudio Lottaz, Dennis Kostka and Rainer Spang
957
1 Introduction 957
2 Molecular Diagnosis 958
2.1 Problem Statement 958
2.1.1 Notation 959
2.1.2 Loss and Risk 960
2.1.3 Bayes Classifier and Bayes Error 960
2.1.4 Minimal Empirical Risk and Maximum Likelihood 961
2.1.5 Regularized Risk and Priors 961
2.2 Supervised Classification 963
2.2.1 Discriminant Analysis and Feature Selection 964
2.2.2 Penalized Logistic Regression 965
2.2.3 Support Vector Classification 966
2.2.4 Bagging 967
2.2.5 Boosting 968
2.3 Gene Selection 968
2.3.1 Filter Approaches 969
2.3.2 Wrapper Approaches 969
2.4 Adaptive Model Selection and Validation 970
2.4.1 Adaptive Model Selection 970
2.4.1.1 Bias-variance Trade-off 970
2.4.1.2 Choosing a Trade-off via the Hold Out 971
2.4.1.3 Using Data More Efficiently via Cross-Validation 972
2.4.2 Validation of the Predictive Performance of a Molecular Signature 972
2.4.2.1 Estimating Error Rates 973
2.4.2.2 Selection Bias and Nested Loop Cross-validation 974
2.5 Discussion 975
3 Finding Molecular Disease Entities 975
3.1 Clustering 976
3.1.1 Clustering Algorithms 976
3.1.2 The Problem of Distances 977
3.2 Searching for Partitionings 978
3.2.1 Overlapping Partitionings 978
3.2.2 Search and Find 978
3.2.3 ISIS – Identifying Splits with Clear Separation 978
3.2.4 Overabundance of Differential Genes 980
3.2.5 Best-fitting Gaussian Model 980
3.3 Biclustering 980
3.4 Semisupervised Methods 981
3.4.1 Molecular Symptoms 981
3.4.2 Survival-driven Class-finding 981
3.4.3 Towards Survival Prediction 983
3.5 Validating Unsupervised Analysis 983
3.5.1 Statistical Signficance 983
3.5.2 Stability 983
3.5.3 Detect Consensus by Subsampling 984
3.5.4 Adding Simulated Noise 984
3.5.5 Over-represented Pathways 984
4 Conclusions 985
References 986
27 Classification of Genes
Jörg Rahnenführer and Thomas Lengauer
993
1 Introduction 993
2 Overview of Gene Classification Tasks 994
2.1 Grouping Genes without Additional Information 995
2.2 Functional Predictions 995
3 Grouping Genes on the Basis of Expression Data 996
3.1 Cluster Analysis 996
3.1.1 Similarity Measures 996
3.1.2 Hierarchical Clustering Algorithms 997
3.1.3 Partitioning Clustering Algorithms 999
3.1.4 Model-based Clustering 1001
3.1.5 Biclustering Algorithms 1002
3.2 Heuristic Gene Grouping of Expression Data 1003
3.2.1 CLICK Algorithm 1003
3.2.2 CAST 1004
3.2.3 Gene Shaving 1004
4 Predicting Gene Function from Expression Data 1005
4.1 Classification methods 1006
4.1.1 Support Vector Machines (SVMs) 1006
4.1.2 Rule-based Models 1006
4.2 Supplementing Expression Data with Additional Biological Information 1007
4.2.1 Adding Sequence Data 1009
4.2.2 Adding Gene Ontology Data 1009
4.2.3 Integrating Pathway Information 1011
4.2.4 Combination of Multiple Data Types 1012
5 Evaluation 1014
5.1 Assessing the Biological Relevance of Gene Groups 1015
5.1.1 Validation of Clustering Results 1015
5.1.2 Estimating the Number of Clusters in a Data Set 1016
5.2 Assessing Function Prediction Accuracy 1017
6 Conclusions 1017
References 1018
28 Proteomics: Beyond cDNA
Patricia M. Palagi, Yannick Brunner, Jean-Charles Sanchez and Ron D. Appel
1023
1 Introduction and Principles 1023
2 Proteomics Analytical Methods 1026
2.1 Electrophoresis Gels 1026
2.2 LC 1028
2.3 MS 1030
2.4 Protein Chips 1033
3 Computer Analysis of Proteomics Images 1034
3.1 Analysis of 2-DE Gels 1034
3.1.1 Data Analysis and Validation 1035
3.1.2 Annotation and Databases 1038
3.2 Analysis of LC-MS Images 1038
4 Identfication and Characterization of Proteins after Separation 1039
4.1 Identfication with MS 1041
4.2 Characterization with MS 1046
5 Proteome Databases 1047
5.1 Protein Sequence Databases 1048
5.2 2-DE Gel Databases 1049
5.3 Mass Spectra Repositories 1051
5.4 PTM Databases 1051
5.5 General Considerations on Databases 1053
6 Conclusion 1053
References 1054
  Volume 3  
Part 8 Protein Function Prediction 1061
29 Ontologies for Molecular Biology
Chris Wroe and Robert Stevens
1061
1 Introduction 1061
2 Ontologies and their Components 1063
2.1 Ontology Representation 1065
3 Ontologies in the Real World 1067
3.1 Ontology Tools 1068
3.2 Bio-ontology Communities 1069
3.3 Incremental Development of Ontologies 1071
3.4 Ontology Features to Manage Database Content 1072
3.4.1 A Controlled Vocabulary with Human Readable Definitions 1072
3.4.1.1 Gene Ontology 1072
3.4.1.2 MGED Ontology 1073
3.4.2 A Structured Controlled Vocabulary 1075
3.4.3 A Subsumption Hierarchy 1075
3.4.4 Multiple Hierarchies 1076
3.4.5 Formal Definition of Concepts 1077
3.5 Ontology Features to Manage Data Schemata 1080
3.5.1 TAMBIS 1081
3.6 Ontologies for Prediction and Simulation 1082
3.6.1 EcoCyc 1082
3.7 The Physiome Project 1083
4 Summary 1083
References 1085
30 Inferring Protein Function from Sequence
Douglas Lee Brutlag
1087
1 Introduction 1087
2 Sequence-based Motif Representations 1090
2.1 Consensus Sequences as Regular Expressions 1090
2.2 Accuracy and Precision of Motifs 1091
2.3 Position-specific Scoring Matrix (PSSM) Motifs 1094
2.4 Dirichlet-mixture Prior Probabilities and Pseudocounts 1094
2.5 Sensitivity and Specificity of PSSM Motifs 1096
2.6 HMMs 1098
2.7 Network Models 1099
2.8 Neural Networks 1101
3 Descriptions of Several Useful Motif Databases 1101
3.1 The Prosite Database 1101
3.2 The Blocks Databases 1104
3.3 The PRINTS Database 1105
3.4 The eBLOCKs Database 1106
3.5 The eMOTIF Database 1107
3.6 The eMATRIX Database 1108
3.7 HMM Databases 1109
3.8 The InterPro Database 1110
3.9 Supervised versus Unsupervised Learning of Motifs 1111
4 Summary and Conclusions 1112
References 1113
31 Analyzing Protein Interaction Networks
Johannes Goll and Peter Uetz
1121
1 Introduction 1121
2 Experimental Methods and Interaction Data 1122
3 Validation of Experimental Protein–Protein Interaction Data 1125
3.1 Crystal Structures as Benchmarks 1126
3.2 Overlap with Protein Complex Data 1126
3.3 Correlation with Expression Data 1126
3.4 Functional Annotation 1127
3.5 Localization 1127
3.6 Paralogous Proteins and Evolutionary Rate 1127
3.7 Other Approaches 1128
3.8 Combined Approaches 1128
3.9 Comparison of Specific Data Sets 1129
3.9.1 Comparison of Tandem Affinity Purfication (TAP) and High-throughput MS (HMS) complex purfication data 1129
3.9.2 Comparison between Y2H and MS data sets 1131
3.9.3 Comparison of Spoke versus Matrix Models 1131
4 Predicting Protein–Protein Interactions 1133
4.1 Predictions Based on Genomic Context 1138
4.1.1 The Rosetta Stone Method 1138
4.1.2 Gene Neighborhood 1139
4.1.3 Phylogenetic Profiles 1139
4.1.4 Similarity of Phylogenetic Trees (SPT) 1140
4.1.5 In Silico Two-hybrid (I2H) 1140
4.2 Predictions Based on Known 3-D Structures 1140
4.3 Predicting Interaction Domains 1140
4.4 Predicting Homologous Interactions: Interologs 1141
4.5 Predictions based on Literature Mining 1143
4.6 Validation of Predicted Protein–Protein Interactions 1144
5 Representing Protein–Protein Interactions as Graphs 1145
5.1 Graph Terminology 1145
5.2 Network Models 1148
5.3 Random Networks 1149
5.4 Small-world Networks 1149
5.5 Scale-free Networks 1150
5.6 Connectivity Distributions of Protein–Protein Interaction Networks 1151
5.7 Error Tolerance and Attack Vulnerability 1151
5.8 Modules and Motifs in Networks 1152
5.9 Comparing Protein Interaction Networks: Pathblast 1152
6 Integrating Multiple Protein–Protein Interaction Evidence 1154
6.1 Protein Interactions and Gene Expression Data 1157
6.2 Integration for Predicting Protein Function 1157
7 Predicting Protein Functions from Protein Networks 1157
8 Evolution of Protein–Protein Interactions 1158
8.1 The Network Level 1159
8.1.1 The Rates of Interaction Loss and Gain 1159
8.2 Sequence and Interaction Divergence in Proteins 1160
8.2.1 Protein Evolution Rate and Protein–Protein Interactions 1161
8.2.2 Phylogenetic Relationships between Families of Interacting Proteins 1162
8.3 Structural Aspects of Conserved Interactions 1165
9 Databases and Other Information Sources 1165
10 Analysis and Visualization Tools 1166
11 Outlook/Perspectives 1166
References 1171
32 Inferring Protein Function from Genomic Context
Christian von Mering
1179
1 Introduction 1179
1.1 Genomic Context – Genomes, Genes and Gene Arrangements 1179
1.2 Genome Comparisons Reveal Protein–Protein Associations 1180
1.3 Prerequisites for Genomic Context Analysis 1181
1.4 How Specific are the Inferred Functions? 1182
2 Gene Neighborhood 1183
2.1 Conserved Neighborhood versus Simple Synteny 1183
2.2 Operons and “Über-Operons” 1185
2.3 Divergently Transcribed Gene Pairs 1187
2.4 Gene Neighborhood in Eukaryotes 1189
3 Gene Fusion 1190
3.1 Gene Fusions and Gene Fissions 1190
3.2 Functional Implications 1191
3.3 Gene Fusions versus Domain Analysis 1193
4 Gene Co-occurrence 1194
4.1 Phylogenetic Profiles 1194
4.2 Discrete versus Continuous Profiles 1196
4.3 Profile Distance Measures 1197
4.4 Tree-based Methods 1198
4.5 Anti-correlated Profiles 1199
5 Outlook 1200
5.1 Methods based on Sequence Evolution 1200
5.2 Web-based Implementations of Genomic Context Tools 1202
5.3 Scoring and Integration 1204
5.4 Genome Sequencing Strategies: Impact on Genomic Context Analysis 1205
5.5 Environmental Context 1207
References 1208
33 Inferring Protein Function from Protein Structure
Francisco S. Domingues and Thomas Lengauer
1211
1 Introduction 1211
1.1 Different Levels of Protein Function 1212
1.2 Structural Models 1212
1.3 Homology and Function 1213
1.4 Structure and Function 1214
1.5 Why Predict Function from Structure 1216
1.6 The Challenges of Automatic Prediction of Function from Structure 1217
1.7 Structure of the Chapter 1217
2 Localization of Functional Sites 1218
2.1 Supersites 1218
2.2 Electrostatics 1218
2.3 Surface Geometry 1218
2.4 Structure and Evolutionary Information 1219
2.4.1 Evolutionary Trace (ET) 1219
2.4.2 ConSurf 1220
2.4.3 Residue Conservation and Structural Information 1221
2.5 Network Centrality 1222
2.6 Combined Approaches 1223
2.6.1 Catalytic Sites in Enzymes 1223
2.6.2 Protein–protein Interactions 1223
3 Characterization of Molecular Function 1224
3.1 General Principles 1224
3.1.1 Homology versus Nonhomology 1224
3.1.2 Uncertainty and Flexibility in the Structural Models 1225
3.1.3 Functional Descriptors, Comparison and Scoring 1226
3.2 Descriptors based on Atom Coordinates 1227
3.2.1 ASSAM 1227
3.2.2 SPASM 1228
3.2.3 PINTS 1228
3.2.4 SuMo 1229
3.2.5 TESS and Jess 1230
3.3 Descriptors based on Chemical Environment and Surface 1232
3.3.1 FEATURE 1232
3.3.2 CavBase and SiteEngine 1233
3.3.3 eF-site 1234
3.3.4 pvSOAR 1234
3.3.5 Enzyme Classifier 1236
3.3.6 3D Shape Descriptors 1236
3.4 Databases of Functional Sites 1237
3.4.1 Relibase 1237
3.4.2 MSDsite 1238
3.4.3 CSA 1238
3.4.4 SURFACE 1238
3.4.5 Databases of Structural Motifs 1239
3.4.6 Protein–protein Binding Sites 1239
4 Integration Efforts 1239
5 Resources for Structural Characterization 1241
5.1 Available Tools and Databases 1241
5.2 Characterizing a Protein 1242
6 Current Applications 1243
7 Future Perspectives 1244
References 1245
34 Mining Information on Protein Function from Text
Martin Krallinger and Alfonso Valencia
1253
1 Introduction 1253
2 Information Types of Protein Function Descriptions 1255
3 Literature Databases in Biomedicine 1256
4 NLP 1258
4.1 Grammatical Features 1258
4.2 Morphological Features 1259
4.3 Syntactic Features 1259
4.4 Semantic Features 1260
4.5 Contextual Features 1261
5 Main NLP Tasks 1261
5.1 IR 1261
5.2 IE 1265
5.3 QA 1266
5.4 NLG 1268
6 Difficulties when Processing Biological Texts 1268
7 Strategies of Extracting Functional Information from Text 1271
7.1 NER and Protein Tagging 1271
7.2 Associating Proteins with Biological Features from Databases and Ontologies 1274
7.3 Mining Interactions and Relations from Text 1278
7.4 Discovering Information Associated with Groups of Proteins 1281
7.5 Other Applications 1282
8 Evaluation of Text Mining Strategies 1283
9 Resources for Text Mining 1285
9.1 Literature Databases 1286
9.2 Annotated Text Corpora 1286
9.3 Generic NLP Tools 1286
9.4 Dictionaries and Ontologies 1288
9.5 Biomedical Domain NLP Systems 1289
10 Concluding Remarks 1289
References 1291
35 Integrating Information for Protein Function Prediction
William Stafford Noble and Asa Ben-Hur
1297
1 Introduction 1297
2 Vector-space Integration 1298
3 Classifier Integration 1301
4 Kernel Methods 1302
5 Learning Functional Relationships 1304
6 Learning Function from Networks of Pairwise Relationships 1307
7 Discussion 1311
References 1311
36 The Molecular Basis of Predicting Druggability
Bissan Al-Lazikani, Anna Gaulton, Gaia Paolini, Jerry Lanfear, John Overington and Andrew Hopkins
1315
1 Introduction 1315
2 Chemical Properties of Drugs, Leads and Tools 1316
3 Molecular Recognition is the Basis for Druggability 1316
4 Estimating the Size of the Druggable Genome 1319
4.1 Initial Estimates 1320
4.2 Hopkins and Groom’s Method 1320
4.3 Orth and Coworkers Update 2004
4.4 Russ and Lampel’s Update 2005
5 Homology-based Analysis of Drug Targets 1322
6 Feature-based Druggability Prediction 1327
7 Structure-based Druggability Analysis of Protein Data Base (PDB) Structures 1327
8 How Many Drug Targets are Accessible to Protein Therapeutics? 1329
9 Conclusions 1331
References 1333
Part 9 Comparative Genomics and Evolution of Genomes 1335
37 Comparative Genomics
Martin S. Taylor and Richard R. Copley
1335
1 Introduction 1335
2 The Genomic Landscape 1336
3 Concepts 1339
4 Practicalities 1343
4.1 Available Genomic Sequences 1343
4.2 Defining and Obtaining Genomic Sequences 1345
5 Technology 1347
5.1 Alignments 1347
5.1.1 Local Genomic Alignments 1349
5.1.2 Global Genomic Alignments 1350
5.1.3 Multiple Sequence Alignments 1351
5.1.4 Assessing the Quality of Genomic Alignment Tools 1353
5.1.5 Using Whole-genome Alignments 1354
5.2 Visualizing Genomic Alignments 1355
5.3 Detecting Selection 1357
6 Applications 1361
6.1 How Much of the Human Genome is Constrained? 1362
6.2 Ultra-conserved Regions 1363
6.3 Specific Locus Studies 1364
7 Challenges and Future Directions 1367
8 Conclusion 1368
References 1368
38 Association Studies of Complex Diseases
Momiao Xiong and Li Jin
1375
1 Introduction 1375
2 Linkage Disequilibrium (LD), Haplotype and Association Studies 1378
2.1 Concepts of LD 1378
2.2 Measures of LD 1379
2.2.1 LD Coefficient D 1379
2.2.2 Normalized Measure of LD D’ 1379
2.2.3 Correlation Coefficient r 1380
2.2.4 Composite Measure of LD 1380
2.2.5 The Relationship between the Measure of LD and Physical Distance 1380
2.3 SNPs and Haplotype Blocks in the Human Genome 1381
2.3.1 SNPs 1381
2.3.2 Tagging SNPs 1381
2.3.3 Haplotype Block Model 1381
2.3.4 Definitions of Haplotype Block 1383
2.3.4.1 Definition of Haplotype Blocks based on Pairwise LD 1384
2.3.4.2 Definition of Haplotype Blocks based on Haplotype Diversity 1384
2.3.4.3 Definition of Haplotype Blocks based on both Pairwise LD and Haplotype Diversity 1384
2.3.5 Haplotype Reconstruction 1385
2.3.5.1 Clark’s Algorithm 1385
2.3.5.2 Expectation Maximization (EM) Algorithm 1386
2.3.5.3 Bayesian and Coalescence-based Methods 1386
2.3.6 Measure of Haplotype Block LD 1387
3 A General Framework for Population-based Association Studies 1387
3.1 Motivation 1387
3.2 The Traditional 2 Test Statistic
3.3 Test Statistics 1391
3.4 Null Distribution of the Nonlinear Statistics 1392
3.5 Power of the Nonlinear Test Statistics and the Standard 2 Test
Statistic 1393
4 Similarity-based Statistics for Association Studies 1400
4.1 Similarity Measures 1400
4.1.1 Matching Measure 1402
4.1.2 Counting Measure 1403
4.1.3 Length Measure 1403
4.2 Test Statistics 1403
5 Generalized T2 Test Statistic 1404
5.1 Test Statistic 1405
5.2 Nonlinear T2 test 1406
6 Family-based Association Studies 1406
6.1 TDT at a Single Locus with Two Alleles 1407
6.2 TDT at a Single Locus with Multiple Alleles or at Multiple Loci with Phase-known Haplotypes 1407
6.3 Sib-TDT 1409
6.3.1 Comparison of Genotype Frequencies 1409
6.3.2 Comparison of Allele Frequencies 1410
7 Nonlinear Transmission/Disequilibrium Test 1410
7.1 General Procedures for the Construction of the Nonlinear TDT 1412
7.1.1 A Single Locus with Two Alleles 1412
7.1.2 A Single Locus with Multiple Alleles or Multiple Loci with Phase-known Haplotypes 1413
7.2 Power of the N\ nonlinear TDT 1414
7.3 Real Examples 1415
8 Perspective of Genome-wide Association Studies 1416
References 1417
39 Pharmacogenetics/Pharmacogenomics
Xing Jian Lou, Russ B. Altman and Teri E. Klein
1427
1 Introduction 1427
2 An Overview of Pharmacogenetics and Pharmacogenomics 1427
2.1 Background of Pharmacogenetics and Pharmacogenomics 1428
2.2 Influence of Pharmacogenetics and Pharmacogenomics on Drug Development and Therapy 1429
3 Biomedical Informatics Resources Relevant to Pharmacogenomics 1430
4 Building the PharmGKB 1433
4.1 Establishing a Repository of Pharmacogenetics and Pharmacogenomics Information 1435
4.1.1 The Data Model 1435
4.1.2 Primary Data 1436
4.1.3 Data from Literature 1437
4.1.4 Linking to other Data Resources 1438
4.2 Turning Data into Knowledge 1439
4.2.1 Categorizing Data 1440
4.2.1.1 Genotype 1440
4.2.1.2 Clinical Outcome 1441
4.2.1.3 Pharmacodynamics and Drug Responses 1441
4.2.1.4 Pharmacokinetics 1441
4.2.1.5 Molecular and Cellular Functional Assays 1442
4.2.2 Establishing Genotype–Phenotype Correlation 1442
4.2.3 Using Pathways to Summarize Current Pharmacogenetics and Pharmacogenomics Knowledge 1443
4.3 Providing Easy Access of Knowledge for the Research Community 1443
4.3.1 Querying System 1445
4.3.2 Visualization and Browsing 1445
4.3.3 Privacy Protection 1447
4.3.4 Data Exchange Strategy 1449
5 Analytic Tools for Pharmacogenomics 1449
6 Future Perspectives on Informatics for Pharmacogenetics/Pharmacogenomics 1451
References 1452
40 Evolution of Drug Resistance in HIV
Niko Beerenwinkel, Kirsten Roomp and Martin Däumer
1457
1 Introduction 1457
2 Biomedical Background 1458
2.1 Biology of HIV 1458
2.1.1 Epidemiology of HIV/AIDS 1458
2.1.2 Structure, Genome and Replication Cycle 1459
2.1.3 Basic Immunology and Course of Infection 1461
2.2 Antiretroviral Therapy 1462
2.2.1 Antiretroviral Drugs 1462
2.2.2 Drug Resistance 1464
2.3 Resistance Testing 1464
2.3.1 Genotypic Resistance Testing 1465
2.3.2 Phenotypic Resistance Testing 1465
3 Prediction of Phenotypic Resistance from Genotypes 1466
3.1 Drug Resistance Data 1466
3.2 Methods of Phenotype Prediction 1467
3.3 Comparisons 1468
4 Development of Resistance-associated Mutations 1470
4.1 Viral Evolution 1470
4.2 Learning Mutational Pathways 1472
4.3 Genetic Barrier 1473
4.4 Transitions between Sequence Clusters 1475
5 Selecting Optimal Combination Therapies 1476
5.1 Clinical Databases 1477
5.2 Simple Scoring Functions 1477
5.3 Look-ahead Techniques 1478
5.4 Rules-based Approaches 1479
6 Host Genetic Profiles and Viral Evolution 1480
6.1 Immunobiological Background 1480
6.1.1 HLA Genes 1480
6.1.2 Chemokine Receptors 1482
6.2 Epitope Prediction 1483
6.2.1 Problem Definition 1483
6.2.2 Methods 1484
6.3 Analysis of Escape Mutations 1485
7 Conclusions 1488
8 Webresources 1488
8.1 Los Alamos HIV Databases (http://www.hiv.lanl.gov) 1488
8.2 Stanford HIV Drug Resistance Database (http://hivdb.stanford.edu) 1488
8.3 Geno2pheno (http://www.geno2pheno.org) 1489
8.4 IMGT/HLA Databases (http://www.ebfiac.uk/imgt/hla) 1489
References 1489
41 Analyzing the Evolution of Infectious Bacteria
Dawn Field, Edward J. Feil, Gareth Wilson and Paul Swift
1497
1 Introduction 1497
1.1 Introduction to Molecular Evolutionary Theory 1498
1.2 The Quantity and Quality of Data Available 1501
1.3 A Practical Overview of Online Resources 1502
2 Identfication and Study of Determinants of Virulence and Pathogenicity 1504
2.1 Homology-based Detection 1506
2.2 Pattern-based Detection 1506
2.3 Comparative Genomic Methods of Detection 1507
2.4 Taxonomically Restricted Genes (TRGs) and Orphans 1508
3 Putting Isolates of Infectious Bacteria into a Phylogenetic Framework 1509
4 Mixing of Genetic Material among Bacteria 1512
4.1 The Importance of Phage and Plasmids 1513
5 Coevolution of Infectious Bacteria with Their Hosts 1516
5.1 Reconstructing Metabolic Pathways 1516
5.2 The Genetic Arms Race between Pathogen and Host 1517
6 Conclusions 1518
References 1520
Part 10 Basic Bioinformatics Technologies 1525
42 Integrating Biological Databases
Zoé Lacroix, Bertram Ludäscher and Robert Stevens
1525
1 Biological Resources 1525
2 Data Modeling 1527
2.1 Conceptual Model 1528
2.1.1 ER 1528
2.1.2 Unfied Modeling Language 1530
2.2 “Flat” Data Models 1532
2.3 Tree-structured Representations 1533
2.4 Graph Representations 1534
2.5 Multi-dimensional Data Model 1536
3 Data Integration 1537
3.1 Scientfic View of Data 1537
3.2 DataWarehouse 1540
3.3 Link-driven Federations 1541
3.4 Mediations 1541
4 Integrating Applications and Data 1542
4.1 Middleware 1543
4.2 CORBA 1544
4.3 Web Services 1545
4.4 P2P 1546
4.5 Grid 1547
5 Semantic Integration 1547
5.1 Identifying Objects 1549
5.2 Representing Metadata 1550
5.3 Ontologies and Data Integration 1552
5.3.1 Example 1553
5.3.2 From Information to Reasoning 1554
5.3.3 Biological Ontologies 1555
5.3.4 Ontologies and Data Integration 1556
5.4 Semantic Web 1557
6 Scientfic Workflows 1558
6.1 Example: Promoter IdentficationWorkflow (PIW) 1559
6.2 Scientfic Workflow Requirements and Desiderata 1561
6.3 Semantic Extensions and ScientficWorkflow Design 1565
7 Conclusion 1567
References 1567
43 Visualization of Biological Data
Harry Hochheiser, Kevin W. Eliceiri and Ilya G. Goldberg
1573
1 Introduction 1573
2 Microscopy Image Visualization 1574
2.1 Fluorescence Microscopy Techniques Applicable to HCS Screening 1574
2.1.1 Spectral Imaging 1575
2.1.2 Lifetime Imaging 1575
2.1.3 Fluorescence Resonant Energy Transfer (FRET) 1576
2.1.4 Optical Sectioning 1577
2.1.5 MP Imaging 1578
2.1.6 Second Harmonic Imaging 1579
2.2 Functional Genomics 1580
2.2.1 RNAi 1580
2.2.2 Chemical Compound Libraries 1581
2.3 Tools for Scientist-driven Analysis Development and Deployment 1582
2.3.1 ImageJ 1582
2.3.2 VisBio 1583
3 Biological Information Visualization 1585
3.1 Genome and Sequence Data 1586
3.2 Gene Expression Data 1594
3.3 Proteomics 1601
3.4 Interaction Networks and Pathways 1601
3.5 Phylogenies and Taxonomies 1605
3.6 Phenotypes and Lineages 1607
3.7 Visualization of the Scientfic Process 1608
4 Image Informatics 1608
4.1 Data and Information Management 1611
4.2 Image Analysis 1611
4.3 Analysis Workflows 1613
4.4 Provenance 1613
4.5 Federation 1614
4.6 Visualization and User Tools 1614
5 Conclusion: Research Questions and Challenges 1614
References 1616
44 Using Distributed Data and Tools in Bioinformatics Applications
Robert Stevens, Phillip Lord and Duncan Hull
1627
1 Introduction to Distributed Resources 1627
2 Heterogeneiety in Bioinformatics Resources 1629
3 Type Systems in Bioinformatics 1631
4 Plumbing Bioinformatics Resources 1634
4.1 CORBA 1635
4.2 XML in Bioinformatics 1638
4.3 Web Services 1640
5 Case Studies in Distributed Bioinformatics 1642
5.1 ISYS 1642
5.2 BioMOBY 1643
5.2.1 MOBY-S 1643
5.2.2 S-MOBY 1644
5.3 The Grid Future – the myGrid Project 1644
5.4 The myGrid Project 1645
6 Discussion 1647
References 1649
Part 11 Outlook 1651
45 Future Trends
Thomas Lengauer
1651
1 Introduction 1651
2 Building Blocks – Post-translational Modfication of Proteins 1653
3 Regulation – Synthesis and Degradation Pipeline of RNA and Proteins 1655
4 Regulation–RNAi 1656
5 Regulation – Tiling Arrays, ChIP-on-chip and array-CGH 1657
6 Regulation – Epigenetics 1659
7 Protein Function – Alternative Splicing 1663
8 Interaction Networks – Immunoinformatics 1665
9 Cell Engineering – Synthetic Biology 1670
9.1 Genetic Engineering 1671
9.2 Protein Engineering 1672
9.3 Genetic Networks 1672
10 Imaging 1673
10.1 Obtaining Pictures of Cellular Structures 1673
10.2 Movies of Cellular Processes 1675
10.3 Organism Development 1676
11 Modeling Organs 1676
12 Outlook 1677
References 1678
  Index 1687
  Name Index 1727

 
Bestellen
Online-Ausgabe
Inhaltsverzeichnis
Kurzbeschreibung
Langtext
Besprechungen
Autoreninformation
Sitz der Autoren

Weitere Bücher

Amino Acids, Peptides and Proteins in Organic Chemistry
6 Volume Set

Biopharmaceutical Production Technology

Encyclopedia of Radicals in Chemistry, Biology and Materials


[mehr >>]

Weitere Zeitschriften

Archiv der Pharmazie

CHEManager

Journal of Separation Science


[mehr>>]

Angebot

Christie, Daniel J. (ed.)

The Encyclopedia of Peace Psychology
385,- Euro
gültig bis
31. März 2012

[mehr Angebote >>]


 

        

Seite empfehlen          RSS-Feeds             Druckversion

©2012 Wiley-VCH Verlag GmbH & Co. KGaA - Betreiber
http://www.wiley-vch.de - mailto: info@wiley-vch.de
Datenschutz