Deep Learning for Physical Scientists

Accelerating Research with Machine Learning

Pyzer-Knapp, Edward O. / Benatan, Matthew

1. Auflage Oktober 2021
208 Seiten, Hardcover
Wiley & Sons Ltd

ISBN: 978-1-119-40833-8

John Wiley & Sons

Probekapitel

Weitere Versionen

Discover the power of machine learning in the physical sciences with this one-stop resource from a leading voice in the field

Deep Learning for Physical Scientists: Accelerating Research with Machine Learning delivers an insightful analysis of the transformative techniques being used in deep learning within the physical sciences. The book offers readers the ability to understand, select, and apply the best deep learning techniques for their individual research problem and interpret the outcome.

Designed to teach researchers to think in useful new ways about how to achieve results in their research, the book provides scientists with new avenues to attack problems and avoid common pitfalls and problems. Practical case studies and problems are presented, giving readers an opportunity to put what they have learned into practice, with exemplar coding approaches provided to assist the reader.

From modelling basics to feed-forward networks, the book offers a broad cross-section of machine learning techniques to improve physical science research. Readers will also enjoy:
* A thorough introduction to the basic classification and regression with perceptrons
* An exploration of training algorithms, including back propagation and stochastic gradient descent and the parallelization of training
* An examination of multi-layer perceptrons for learning from descriptors and de-noising data
* Discussions of recurrent neural networks for learning from sequences and convolutional neural networks for learning from images
* A treatment of Bayesian optimization for tuning deep learning architectures

Perfect for academic and industrial research professionals in the physical sciences, Deep Learning for Physical Scientists: Accelerating Research with Machine Learning will also earn a place in the libraries of industrial researchers who have access to large amounts of data but have yet to learn the techniques to fully exploit that access. Perfect for academic and industrial research professionals in the physical sciences, Deep Learning for Physical Scientists: Accelerating Research with Machine Learning will also earn a place in the libraries of industrial researchers who have access to large amounts of data but have yet to learn the techniques to fully exploit that access. This book introduces the reader to the transformative techniques involved in deep learning. A range of methodologies are addressed including: *Basic classification and regression with perceptrons *Training algorithms, such as back propagation and stochastic gradient descent and the parallelization of training *Multi-Layer Perceptrons for learning from descriptors, and de-noising data *Recurrent neural networks for learning from sequences *Convolutional neural networks for learning from images *Bayesian optimization for tuning deep learning architectures Each of these areas has direct application to physical science research, and by the end of the book, the reader should feel comfortable enough to select the methodology which is best for their situation, and be able to implement and interpret outcome of the deep learning model. The book is designed to teach researchers to think in new ways, providing them with new avenues to attack problems, and avoid roadblocks within their research. This is achieved through the inclusion of case-study like problems at the end of each chapter, which will give the reader a chance to practice what they have just learnt in a close-to-real-world setting, with example 'solutions' provided through an online resource. Market Description This book introduces the reader to the transformative techniques involved in deep learning. A range of methodologies are addressed including: * Basic classification and regression with perceptrons * Training algorithms, such as back propagation and stochastic gradient descent and the parallelization of training * Multi-Layer Perceptrons for learning from descriptors, and de-noising data * Recurrent neural networks for learning from sequences * Convolutional neural networks for learning from images * Bayesian optimization for tuning deep learning architectures Each of these areas has direct application to physical science research, and by the end of the book, the reader should feel comfortable enough to select the methodology which is best for their situation, and be able to implement and interpret outcome of the deep learning model. The book is designed to teach researchers to think in new ways, providing them with new avenues to attack problems, and avoid roadblocks within their research. This is achieved through the inclusion of case-study like problems at the end of each chapter, which will give the reader a chance to practice what they have just learnt in a close-to-real-world setting, with example 'solutions' provided through an online resource.

About the Authors xi

Acknowledgements xii

1 Prefix - Learning to "Think Deep" 1

1.1 So What Do I Mean by Changing the Way You Think? 2

2 Setting Up a Python Environment for Deep Learning Projects 5

2.1 Python Overview 5

2.2 Why Use Python for Data Science? 6

2.3 Anaconda Python 7

2.3.1 Why Use Anaconda? 7

2.3.2 Downloading and Installing Anaconda Python 7

2.3.2.1 Installing TensorFlow 9

2.4 Jupyter Notebooks 10

2.4.1 Why Use a Notebook? 10

2.4.2 Starting a Jupyter Notebook Server 11

2.4.3 Adding Markdown to Notebooks 12

2.4.4 A Simple Plotting Example 14

2.4.5 Summary 16

3 Modelling Basics 17

3.1 Introduction 17

3.2 Start Where You Mean to Go On - Input Definition and Creation 17

3.3 Loss Functions 18

3.3.1 Classification and Regression 19

3.3.2 Regression Loss Functions 19

3.3.2.1 Mean Absolute Error 19

3.3.2.2 Root Mean Squared Error 19

3.3.3 Classification Loss Functions 20

3.3.3.1 Precision 21

3.3.3.2 Recall 21

3.3.3.3 F1 Score 22

3.3.3.4 Confusion Matrix 22

3.3.3.5 (Area Under) Receiver Operator Curve (AU-ROC) 23

3.3.3.6 Cross Entropy 25

3.4 Overfitting and Underfitting 28

3.4.1 Bias-Variance Trade-Off 29

3.5 Regularisation 31

3.5.1 Ridge Regression 31

3.5.2 LASSO Regularisation 33

3.5.3 Elastic Net 34

3.5.4 Bagging and Model Averaging 34

3.6 Evaluating a Model 35

3.6.1 Holdout Testing 35

3.6.2 Cross Validation 36

3.7 The Curse of Dimensionality 37

3.7.1 Normalising Inputs and Targets 37

3.8 Summary 39

Notes 39

4 Feedforward Networks and Multilayered Perceptrons 41

4.1 Introduction 41

4.2 The Single Perceptron 41

4.2.1 Training a Perceptron 41

4.2.2 Activation Functions 42

4.2.3 Back Propagation 43

4.2.3.1 Weight Initialisation 45

4.2.3.2 Learning Rate 46

4.2.4 Key Assumptions 46

4.2.5 Putting It All Together in TensorFlow 47

4.3 Moving to a Deep Network 49

4.4 Vanishing Gradients and Other "Deep" Problems 53

4.4.1 Gradient Clipping 54

4.4.2 Non-saturating Activation Functions 54

4.4.2.1 ReLU 54

4.4.2.2 Leaky ReLU 56

4.4.2.3 ELU 57

4.4.3 More Complex Initialisation Schemes 57

4.4.3.1 Xavier 58

4.4.3.2 He 58

4.4.4 Mini Batching 59

4.5 Improving the Optimisation 60

4.5.1 Bias 60

4.5.2 Momentum 63

4.5.3 Nesterov Momentum 63

4.5.4 (Adaptive) Learning Rates 63

4.5.5 AdaGrad 64

4.5.6 RMSProp 65

4.5.7 Adam 65

4.5.8 Regularisation 66

4.5.9 Early Stopping 66

4.5.10 Dropout 68

4.6 Parallelisation of learning 69

4.6.1 Hogwild! 69

4.7 High and Low-level Tensorflow APIs 70

4.8 Architecture Implementations 72

4.9 Summary 73

4.10 Papers to Read 73

5 Recurrent Neural Networks 77

5.1 Introduction 77

5.2 Basic Recurrent Neural Networks 77

5.2.1 Training a Basic RNN 78

5.2.2 Putting It All Together in TensorFlow 79

5.2.3 The Problem with Vanilla RNNs 81

5.3 Long Short-Term Memory (LSTM) Networks 82

5.3.1 Forget Gate 82

5.3.2 Input Gate 84

5.3.3 Output Gate 84

5.3.4 Peephole Connections 85

5.3.5 Putting It All Together in TensorFlow 86

5.4 Gated Recurrent Units 87

5.4.1 Putting It All Together in TensorFlow 88

5.5 Using Keras for RNNs 88

5.6 Real World Implementations 89

5.7 Summary 89

5.8 Papers to Read 90

6 Convolutional Neural Networks 93

6.1 Introduction 93

6.2 Fundamental Principles of Convolutional Neural Networks 94

6.2.1 Convolution 94

6.2.2 Pooling 95

6.2.2.1 Why Use Pooling? 95

6.2.2.2 Types of Pooling 96

6.2.3 Stride and Padding 99

6.2.4 Sparse Connectivity 101

6.2.5 Parameter Sharing 101

6.2.6 Convolutional Neural Networks with TensorFlow 102

6.3 Graph Convolutional Networks 103

6.3.1 Graph Convolutional Networks in Practice 104

6.4 Real World Implementations 107

6.5 Summary 108

6.6 Papers to Read 108

7 Auto-Encoders 111

7.1 Introduction 111

7.1.1 Auto-Encoders for Dimensionality Reduction 111

7.2 Getting a Good Start - Stacked Auto-Encoders, Restricted Boltzmann Machines, and Pretraining 115

7.2.1 Restricted Boltzmann Machines 115

7.2.2 Stacking Restricted Boltzmann Machines 118

7.3 Denoising Auto-Encoders 120

7.4 Variational Auto-Encoders 121

7.5 Sequence to Sequence Learning 125

7.6 The Attention Mechanism 126

7.7 Application in Chemistry: Building a Molecular Generator 127

7.8 Summary 132

7.9 Real World Implementations 132

7.10 Papers to Read 132

8 Optimising Models Using Bayesian Optimisation 135

8.1 Introduction 135

8.2 Defining Our Function 135

8.3 Grid and Random Search 136

8.4 Moving Towards an Intelligent Search 137

8.5 Exploration and Exploitation 137

8.6 Greedy Search 138

8.6.1 Key Fact One - Exploitation Heavy Search is Susceptible to Initial Data Bias 139

8.7 Diversity Search 141

8.8 Bayesian Optimisation 142

8.8.1 Domain Knowledge (or Prior) 142

8.8.2 Gaussian Processes 145

8.8.3 Kernels 146

8.8.3.1 Stationary Kernels 146

8.8.3.2 Noise Kernel 147

8.8.4 Combining Gaussian Process Prediction and Optimisation 149

8.8.4.1 Probability of Improvement 149

8.8.4.2 Expected Improvement 150

8.8.5 Balancing Exploration and Exploitation 151

8.8.6 Upper and Lower Confidence Bound Algorithm 151

8.8.7 Maximum Entropy Sampling 152

8.8.8 Optimising the Acquisition Function 153

8.8.9 Cost Sensitive Bayesian Optimisation 155

8.8.10 Constrained Bayesian Optimisation 158

8.8.11 Parallel Bayesian Optimisation 158

8.8.11.1 qEI 158

8.8.11.2 Constant Liar and Kriging Believer 160

8.8.11.3 Local Penalisation 162

8.8.11.4 Parallel Thompson Sampling 162

8.8.11.5 K-Means Batch Bayesian Optimisation 162

8.9 Summary 163

8.10 Papers to Read 163

Case Study 1 Solubility Prediction Case Study 167

CS 1.1 Step 1 - Import Packages 167

CS 1.2 Step 2 - Importing the Data 168

CS 1.3 Step 3 - Creating the Inputs 168

CS 1.4 Step 4 - Splitting into Training and Testing 168

CS 1.5 Step 5 - Defining Our Model 169

CS 1.6 Step 6 - Running Our Model 169

CS 1.7 Step 7 - Automatically Finding an Optimised Architecture Using Bayesian Optimisation 170

Case Study 2 Time Series Forecasting with LSTMs 173

CS 2.1 Simple LSTM 173

CS 2.2 Sequence-to-Sequence LSTM 177

Case Study 3 Deep Embeddings for Auto-Encoder-Based Featurisation 185

Index 190

Dr Edward O. Pyzer-Knapp is the worldwide lead for AI Enriched Modelling and Simulation at IBM Research. Previously, he obtained his PhD from the University of Cambridge using state of the art computational techniques to accelerate materials design then moving to Harvard where he was in charge of the day-to-day running of the Harvard Clean Energy Project - a collaboration with IBM which combined massive distributed computing, quantum-mechanical simulations, and machine-learning to accelerate discovery of the next generation of organic photovoltaic materials. He is also the Visiting Professor of Industrially Applied AI at the University of Liverpool, and the Editor in Chief for Applied AI Letters, a journal with a focus on real-world application and validation of AI.

Dr Matt Benatan received his PhD in Audio-Visual Speech Processing from the University of Leeds, after which he went on to pursue a career in AI research within industry. His work to date has involved the research and development of AI techniques for a broad variety of domains, from applications in audio processing through to materials discovery. His research interests include Computer Vision, Signal Processing, Bayesian Optimization, and Scalable Bayesian Inference.