Pattern Recognition in Computational Molecular Biology
Techniques and Approaches
Wiley Series in Bioinformatics
1. Edition February 2016
656 Pages, Hardcover
Wiley & Sons Ltd
A comprehensive overview of high-performance pattern recognition techniques and approaches to Computational Molecular Biology
This book surveys the developments of techniques and approaches on pattern recognition related to Computational Molecular Biology. Providing a broad coverage of the field, the authors cover fundamental and technical information on these techniques and approaches, as well as discussing their related problems. The text consists of twenty nine chapters, organized into seven parts: Pattern Recognition in Sequences, Pattern Recognition in Secondary Structures, Pattern Recognition in Tertiary Structures, Pattern Recognition in Quaternary Structures, Pattern Recognition in Microarrays, Pattern Recognition in Phylogenetic Trees, and Pattern Recognition in Biological Networks.
* Surveys the development of techniques and approaches on pattern recognition in biomolecular data
* Discusses pattern recognition in primary, secondary, tertiary and quaternary structures, as well as microarrays, phylogenetic trees and biological networks
* Includes case studies and examples to further illustrate the concepts discussed in the book
Pattern Recognition in Computational Molecular Biology: Techniques and Approaches is a reference for practitioners and professional researches in Computer Science, Life Science, and Mathematics. This book also serves as a supplementary reading for graduate students and young researches interested in Computational Molecular Biology.
PREFACE xxvii
I PATTERN RECOGNITION IN SEQUENCES 1
1 COMBINATORIAL HAPLOTYPING PROBLEMS 3
Giuseppe Lancia
1.1 Introduction / 3
1.2 Single Individual Haplotyping / 5
1.3 Population Haplotyping / 12
References / 23
2 ALGORITHMIC PERSPECTIVES OF THE STRING BARCODING PROBLEMS 28
Sima Behpour and Bhaskar DasGupta
2.1 Introduction / 28
2.2 Summary of Algorithmic Complexity Results for Barcoding Problems / 32
2.3 Entropy-Based Information Content Technique for Designing
Approximation Algorithms for String Barcoding Problems / 34
2.4 Techniques for Proving Inapproximability Results for String Barcoding Problems / 36
2.5 Heuristic Algorithms for String Barcoding Problems / 39
2.6 Conclusion / 40
Acknowledgments / 41
References / 41
3 ALIGNMENT-FREE MEASURES FOR WHOLE-GENOME COMPARISON 43
Matteo Comin and Davide Verzotto
3.1 Introduction / 43
3.2 Whole-Genome Sequence Analysis / 44
3.3 Underlying Approach / 47
3.4 Experimental Results / 54
3.5 Conclusion / 61
Author's Contributions / 62
Acknowledgments / 62
References / 62
4 A MAXIMUM LIKELIHOOD FRAMEWORK FOR MULTIPLE SEQUENCE LOCAL ALIGNMENT 65
Chengpeng Bi
4.1 Introduction / 65
4.2 Multiple Sequence Local Alignment / 67
4.3 Motif Finding Algorithms / 70
4.4 Time Complexity / 75
4.5 Case Studies / 75
4.6 Conclusion / 80
References / 81
5 GLOBAL SEQUENCE ALIGNMENT WITH A BOUNDED NUMBER OF GAPS 83
Carl Barton, Tomás Flouri, Costas S. Iliopoulos, and Solon P. Pissis
5.1 Introduction / 83
5.2 Definitions and Notation / 85
5.3 Problem Definition / 87
5.4 Algorithms / 88
5.5 Conclusion / 94
References / 95
II PATTERN RECOGNITION IN SECONDARY STRUCTURES 97
6 A SHORT REVIEW ON PROTEIN SECONDARY STRUCTURE PREDICTION METHODS 99
Renxiang Yan, Jiangning Song, Weiwen Cai, and Ziding Zhang
6.1 Introduction / 99
6.2 Representative Protein Secondary Structure Prediction Methods / 102
6.3 Evaluation of Protein Secondary Structure Prediction Methods / 106
6.4 Conclusion / 110
Acknowledgments / 110
References / 111
7 A GENERIC APPROACH TO BIOLOGICAL SEQUENCE SEGMENTATION PROBLEMS: APPLICATION TO PROTEIN SECONDARY STRUCTURE PREDICTION 114
Yann Guermeur and Fabien Lauer
7.1 Introduction / 114
7.2 Biological Sequence Segmentation / 115
7.3 MSVMpred / 117
7.4 Postprocessing with A Generative Model / 119
7.5 Dedication to Protein Secondary Structure Prediction / 120
7.6 Conclusions and Ongoing Research / 125
Acknowledgments / 126
References / 126
8 STRUCTURAL MOTIF IDENTIFICATION AND RETRIEVAL: A GEOMETRICAL APPROACH 129
Virginio Cantoni, Marco Ferretti, Mirto Musci, and Nahumi Nugrahaningsih
8.1 Introduction / 129
8.2 A Few Basic Concepts / 130
8.3 State of the Art / 135
8.4 A Novel Geometrical Approach to Motif Retrieval / 138
8.5 Implementation Notes / 149
8.6 Conclusions and Future Work / 151
Acknowledgment / 152
References / 152
9 GENOME-WIDE SEARCH FOR PSEUDOKNOTTED NONCODING RNAs: A COMPARATIVE STUDY 155
Meghana Vasavada, Kevin Byron, Yang Song, and Jason T.L. Wang
9.1 Introduction / 155
9.2 Background / 156
9.3 Methodology / 157
9.4 Results and Interpretation / 161
9.5 Conclusion / 162
References / 163
III PATTERN RECOGNITION IN TERTIARY STRUCTURES 165
10 MOTIF DISCOVERY IN PROTEIN 3D-STRUCTURES USING GRAPH MINING TECHNIQUES 167
Wajdi Dhifli and Engelbert Mephu Nguifo
10.1 Introduction / 167
10.2 From Protein 3D-Structures to Protein Graphs / 169
10.3 Graph Mining / 172
10.4 Subgraph Mining / 173
10.5 Frequent Subgraph Discovery / 173
10.6 Feature Selection / 179
10.7 Feature Selection for Subgraphs / 180
10.8 Discussion / 183
10.9 Conclusion / 185
Acknowledgments / 185
References / 186
11 FUZZY AND UNCERTAIN LEARNING TECHNIQUES FOR THE ANALYSIS AND PREDICTION OF PROTEIN TERTIARY STRUCTURES 190
Chinua Umoja, Xiaxia Yu, and Robert Harrison
11.1 Introduction / 190
11.2 Genetic Algorithms / 192
11.3 Supervised Machine Learning Algorithm / 201
11.4 Fuzzy Application / 204
11.5 Conclusion / 207
References / 208
12 PROTEIN INTER-DOMAIN LINKER PREDICTION 212
Maad Shatnawi, Paul D. Yoo, and Sami Muhaidat
12.1 Introduction / 212
12.2 Protein Structure Overview / 213
12.3 Technical Challenges and Open Issues / 214
12.4 Prediction Assessment / 215
12.5 Current Approaches / 216
12.6 Domain Boundary Prediction Using Enhanced General Regression Network / 220
12.7 Inter-Domain Linkers Prediction Using Compositional Index and Simulated Annealing / 227
12.8 Conclusion / 232
References / 233
13 PREDICTION OF PROLINE CIS-TRANS ISOMERIZATION 236
Paul D. Yoo, Maad Shatnawi, Sami Muhaidat, Kamal Taha, and Albert Y. Zomaya
13.1 Introduction / 236
13.2 Methods / 238
13.3 Model Evaluation and Analysis / 243
13.4 Conclusion / 245
References / 245
IV PATTERN RECOGNITION IN QUATERNARY STRUCTURES 249
14 PREDICTION OF PROTEIN QUATERNARY STRUCTURES 251
Akbar Vaseghi, Maryam Faridounnia, Soheila Shokrollahzade, Samad Jahandideh, and Kuo-Chen Chou
14.1 Introduction / 251
14.2 Protein Structure Prediction / 255
14.3 Template-Based Predictions / 257
14.4 Critical Assessment of Protein Structure Prediction / 258
14.5 Quaternary Structure Prediction / 258
14.6 Conclusion / 261
Acknowledgments / 261
References / 261
15 COMPARISON OF PROTEIN QUATERNARY STRUCTURES BY GRAPH APPROACHES 266
Sheng-Lung Peng and Yu-Wei Tsay
15.1 Introduction / 266
15.2 Similarity in the Graph Model / 268
15.3 Measuring Structural Similarity VIA MCES / 272
15.4 Protein Comparison VIA Graph Spectra / 279
15.5 Conclusion / 287
References / 287
16 STRUCTURAL DOMAINS IN PREDICTION OF BIOLOGICAL PROTEIN-PROTEIN INTERACTIONS 291
Mina Maleki, Michael Hall, and Luis Rueda
16.1 Introduction / 291
16.2 Structural Domains / 293
16.3 The Prediction Framework / 293
16.4 Feature Extraction and Prediction Properties / 294
16.5 Feature Selection / 299
16.6 Classification / 301
16.7 Evaluation and Analysis / 304
16.8 Results and Discussion / 304
16.9 Conclusion / 309
References / 310
V PATTERN RECOGNITION IN MICROARRAYS 315
17 CONTENT-BASED RETRIEVAL OF MICROARRAY EXPERIMENTS 317
Hasan O¢gul
17.1 Introduction / 317
17.2 Information Retrieval: Terminology and Background / 318
17.3 Content-Based Retrieval / 320
17.4 Microarray Data and Databases / 322
17.5 Methods for Retrieving Microarray Experiments / 324
17.6 Similarity Metrics / 327
17.7 Evaluating Retrieval Performance / 329
17.8 Software Tools / 330
17.9 Conclusion and Future Directions / 331
Acknowledgment / 332
References / 332
18 EXTRACTION OF DIFFERENTIALLY EXPRESSED GENES IN MICROARRAY DATA 335
Tiratha Raj Singh, Brigitte Vannier, and Ahmed Moussa
18.1 Introduction / 335
18.2 From Microarray Image to Signal / 336
18.3 Microarray Signal Analysis / 337
18.4 Algorithms for De Gene Selection / 339
18.5 Gene Ontology Enrichment and Gene Set Enrichment Analysis / 343
18.6 Conclusion / 345
References / 345
19 CLUSTERING AND CLASSIFICATION TECHNIQUES FOR GENE EXPRESSION PROFILE PATTERN ANALYSIS 347
Emanuel Weitschek, Giulia Fiscon, Valentina Fustaino, Giovanni Felici, and Paola Bertolazzi
19.1 Introduction / 347
19.2 Transcriptome Analysis / 348
19.3 Microarrays / 349
19.4 RNA-Seq / 351
19.5 Benefits and Drawbacks of RNA-Seq and Microarray Technologies / 353
19.6 Gene Expression Profile Analysis / 356
19.7 Real Case Studies / 364
19.8 Conclusions / 367
References / 368
20 MINING INFORMATIVE PATTERNS IN MICROARRAY DATA 371
Li Teng
20.1 Introduction / 371
20.2 Patterns with Similarity / 373
20.3 Conclusion / 391
References / 391
21 ARROW PLOT AND CORRESPONDENCE ANALYSIS MAPS FOR VISUALIZING THE EFFECTS OF BACKGROUND CORRECTION AND NORMALIZATION METHODS ON MICROARRAY DATA 394
Carina Silva, Adelaide Freitas, Sara Roque, and Lisete Sousa
21.1 Overview / 394
21.2 Arrow Plot / 399
21.3 Significance Analysis of Microarrays / 404
21.4 Correspondence Analysis / 405
21.5 Impact of the Preprocessing Methods / 407
21.6 Conclusions / 412
Acknowledgments / 413
References / 413
VI PATTERN RECOGNITION IN PHYLOGENETIC TREES 417
22 PATTERN RECOGNITION IN PHYLOGENETICS: TREES AND NETWORKS 419
David A. Morrison
22.1 Introduction / 419
22.2 Networks and Trees / 420
22.3 Patterns and Their Processes / 424
22.4 The Types of Patterns / 427
22.5 Fingerprints / 431
22.6 Constructing Networks / 433
22.7 Multi-Labeled Trees / 435
22.8 Conclusion / 436
References / 437
23 DIVERSE CONSIDERATIONS FOR SUCCESSFUL PHYLOGENETIC TREE RECONSTRUCTION: IMPACTS FROM MODEL MISSPECIFICATION, RECOMBINATION, HOMOPLASY, AND PATTERN RECOGNITION 439
Diego Mallo, Agustín Sánchez-Cobos, and Miguel Arenas
23.1 Introduction / 440
23.2 Overview on Methods and Frameworks for Phylogenetic Tree Reconstruction / 440
23.3 Influence of Substitution Model Misspecification on Phylogenetic Tree Reconstruction / 445
23.4 Influence of Recombination on Phylogenetic Tree Reconstruction / 446
23.5 Influence of Diverse Evolutionary Processes on Species Tree Reconstruction / 447
23.6 Influence of Homoplasy on Phylogenetic Tree Reconstruction: The Goals of Pattern Recognition / 449
23.7 Concluding Remarks / 449
Acknowledgments / 450
References / 450
24 AUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES 457
David Dao, Tomás Flouri, and Alexandros Stamatakis
24.1 Introduction / 457
24.2 Preliminaries / 459
24.3 A Naïve Approach / 462
24.4 Toward a Faster Method / 463
24.5 Improved Algorithm / 467
24.6 Implementation / 473
24.7 Evaluation / 474
24.8 Conclusion / 479
Acknowledgment / 481
References / 481
25 A NEW FAST METHOD FOR DETECTING AND VALIDATING HORIZONTAL GENE TRANSFER EVENTS USING PHYLOGENETIC TREES AND AGGREGATION FUNCTIONS 483
Dunarel Badescu, Nadia Tahiri, and Vladimir Makarenkov
25.1 Introduction / 483
25.2 Methods / 485
25.3 Experimental Study / 491
25.4 Results and Discussion / 501
25.5 Conclusion / 502
References / 503
VII PATTERN RECOGNITION IN BIOLOGICAL NETWORKS 505
26 COMPUTATIONAL METHODS FOR MODELING BIOLOGICAL INTERACTION NETWORKS 507
Christos Makris and Evangelos Theodoridis
26.1 Introduction / 507
26.2 Measures/Metrics / 508
26.3 Models of Biological Networks / 511
26.4 Reconstructing and Partitioning Biological Networks / 511
26.5 PPI Networks / 513
26.6 Mining PPI Networks--Interaction Prediction / 517
26.7 Conclusions / 519
References / 519
27 BIOLOGICAL NETWORK INFERENCE AT MULTIPLE SCALES: FROM GENE REGULATION TO SPECIES INTERACTIONS 525
Andrej Aderhold, V Anne Smith, and Dirk Husmeier
27.1 Introduction / 525
27.2 Molecular Systems / 528
27.3 Ecological Systems / 528
27.4 Models and Evaluation / 529
27.5 Learning Gene Regulation Networks / 532
27.6 Learning Species Interaction Networks / 540
27.7 Conclusion / 550
References / 550
28 DISCOVERING CAUSAL PATTERNS WITH STRUCTURAL EQUATION MODELING: APPLICATION TO TOLL-LIKE RECEPTOR SIGNALING PATHWAY IN CHRONIC LYMPHOCYTIC LEUKEMIA 555
Athina Tsanousa, Stavroula Ntoufa, Nikos Papakonstantinou, Kostas Stamatopoulos, and Lefteris Angelis
28.1 Introduction / 555
28.2 Toll-Like Receptors / 557
28.3 Structural Equation Modeling / 560
28.4 Application / 566
28.5 Conclusion / 580
References / 581
29 ANNOTATING PROTEINS WITH INCOMPLETE LABEL INFORMATION 585
Guoxian Yu, Huzefa Rangwala, and Carlotta Domeniconi
29.1 Introduction / 585
29.2 Related Work / 587
29.3 Problem Formulation / 589
29.4 Experimental Setup / 592
29.5 Experimental Analysis / 596
29.6 Conclusions / 605
Acknowledgments / 606
References / 606
INDEX 609