Machine Learning in Chemical Safety and Health

Fundamentals with Applications

Wang, Qingsheng / Cai, Changjie (Editor)

1. Edition December 2022
320 Pages, Hardcover
Wiley & Sons Ltd

ISBN: 978-1-119-81748-2

John Wiley & Sons

Wiley Online Library

Buy now

Price: 159,00 €

Price incl. VAT, excl. Shipping

Further versions

Introduces Machine Learning Techniques and Tools and Provides Guidance on How to Implement Machine Learning Into Chemical Safety and Health-related Model Development

There is a growing interest in the application of machine learning algorithms in chemical safety and health-related model development, with applications in areas including property and toxicity prediction, consequence prediction, and fault detection. This book is the first to review the current status of machine learning implementation in chemical safety and health research and to provide guidance for implementing machine learning techniques and algorithms into chemical safety and health research.

Written by an international team of authors and edited by renowned experts in the areas of process safety and occupational and environmental health, sample topics covered within the work include:
* An introduction to the fundamentals of machine learning, including regression, classification and cross-validation, and an overview of software and tools
* Detailed reviews of various applications in the areas of chemical safety and health, including flammability prediction, consequence prediction, asset integrity management, predictive nanotoxicity and environmental exposure assessment, and more
* Perspective on the possible future development of this field

Machine Learning in Chemical Safety and Health serves as an essential guide on both the fundamentals and applications of machine learning for industry professionals and researchers in the fields of process safety, chemical safety, occupational and environmental health, and industrial hygiene.

List of Contributors xiii

Preface xvii

1 Introduction 1

Pingfan Hu and Qingsheng Wang

1.1 Background 2

1.2 Current State 5

1.2.1 Flammability Characteristics Prediction Using Quantitative Structure-Property

Relationship 5

1.2.2 Consequence Prediction Using Quantitative Property-Consequence

Relationship 6

1.2.3 Machine Learning in Process Safety and Asset Integrity Management 6

1.2.4 Machine Learning for Process Fault Detection and Diagnosis 7

1.2.5 Intelligent Method for Chemical Emission Source Identification 7

1.2.6 Machine Learning and Deep Learning Applications in Medical Image Analysis 7

1.2.7 Predictive Nanotoxicology: Nanoinformatics Approach to Toxicity Analysis of

Nanomaterials 8

1.2.8 Machine Learning in Environmental Exposure Assessment 8

1.2.9 Air Quality Prediction Using Machine Learning 8

1.3 Software and Tools 9

1.3.1 R 9

1.3.2 Python 12

References 13

2 Machine Learning Fundamentals 19

Yan Yan

2.1 What Is Learning? 19

2.1.1 Machine Learning Applications and Examples 20

2.1.2 Machine Learning Tasks 21

2.2 Concepts of Machine Learning 22

2.3 Machine Learning Paradigms 24

2.4 Probably Approximately Correct Learning 25

2.4.1 Deterministic Setting 26

2.4.2 Stochastic Setting 29

v

0005453285.3D 5 30/8/2022 8:51:33 PM

2.5 Estimation and Approximation 31

2.6 Empirical Risk Minimization 32

2.6.1 Empirical Risk Minimizer 32

2.6.2 VC-dimension Generalization Bound 33

2.6.3 General Loss Functions 34

2.7 Regularization 35

2.7.1 Regularized Loss Minimization 35

2.7.2 Constrained and Regularized Problem 36

2.7.3 Trade-off Between Estimation and Approximation Error 37

2.8 Maximum Likelihood Principle 38

2.8.1 Maximum Likelihood Estimation 39

2.8.2 Cross Entropy Minimization 40

2.9 Optimization 41

2.9.1 Linear Regression: An Example 42

2.9.2 Closed-form Solution 42

2.9.3 Gradient Descent 43

2.9.4 Stochastic Gradient Descent 45

References 46

3 Flammability Characteristics Prediction Using QSPR Modeling 47

Yong Pan and Juncheng Jiang

3.1 Introduction 47

3.1.1 Flammability Characteristics 47

3.1.2 QSPR Application 48

3.1.2.1 Concept of QSPR 48

3.1.2.2 Trends and Characteristics of QSPR 48

3.2 Flowchart for Flammability Characteristics Prediction 49

3.2.1 Dataset Preparation 51

3.2.2 Structure Input and Molecular Simulation 52

3.2.3 Calculation of Molecular Descriptors 53

3.2.4 Preliminary Screening of Molecular Descriptors 54

3.2.5 Descriptor Selection and Modeling 55

3.2.6 Model Validation 57

3.2.6.1 Model Fitting Ability Evaluation 57

3.2.6.2 Model Stability Analysis 59

3.2.6.3 Model Predictivity Evaluation 60

3.2.7 Model Mechanism Explanation 61

3.2.8 Summary of QSPR Process 61

3.3 QSPR Review for Flammability Characteristics 62

3.3.1 Flammability Limits 62

3.3.1.1 LFLT and LFL 62

3.3.1.2 UFLT and UFL 64

3.3.2 Flash Point 65

3.3.3 Auto-ignition Temperature 68

3.3.4 Heat of Combustion 69

vi Contents

0005453285.3D 6 30/8/2022 8:51:33 PM

3.3.5 Minimum Ignition Energy 70

3.3.6 Gas-liquid Critical Temperature 70

3.3.7 Other Properties 72

3.4 Limitations 72

3.5 Conclusions and Future Prospects 73

References 73

4 Consequence Prediction and Quantitative Property-Consequence Relationship

Models 81

Zeren Jiao and Qingsheng Wang

4.1 Introduction 81

4.2 Conventional Consequence Prediction Methods 82

4.2.1 Empirical Method 82

4.2.2 Computational Fluid Dynamics (CFD) Method 83

4.2.3 Integral Method 84

4.3 Machine Learning and Deep Learning-Based Consequence Prediction Models 84

4.4 Quantitative Property-Consequence Relationship Models 86

4.4.1 Consequence Database 88

4.4.2 Property Descriptors 89

4.4.3 Machine Learning and Deep Learning Algorithms 89

4.5 Challenges and Future Directions 90

References 91

5 Machine Learning in Process Safety and Asset Integrity Management 93

Ming Yang ,Hao Sun and Rustam Abubarkirov

5.1 Opportunities and Threats 93

5.2 State-of-the-Art Reviews 95

5.2.1 Artificial Neural Networks (ANNs) 95

5.2.2 Principal Component Analysis (PCA) 97

5.2.3 Genetic Algorithm (GA) 97

5.3 Case Study of Asset Integrity Assessment 98

5.4 Data-Driven Model of Asset Integrity Assessment 105

5.4.1 Condition Monitoring Data Collection 106

5.4.2 Data Processing and Storage 106

5.4.3 Data Mining for Risk Quantification and Monitoring Control 107

5.4.4 AIM Application 107

5.4.5 The Application of the Framework 108

5.5 Conclusion 109

References 109

6 Machine Learning for Process Fault Detection and Diagnosis 113

Rajeevan Arunthavanathan, Salim Ahmed, Faisal Khan and Syed Imtiaz

6.1 Background 113

6.2 Machine Learning Approaches in Fault Detection and Diagnosis 114

6.3 Supervised Methods for Fault Detection and Diagnosis 115

Contents vii

0005453285.3D 7 30/8/2022 8:51:33 PM

6.3.1 Neural Network 115

6.3.1.1 Neural Network Theory and Algorithm 115

6.3.1.2 Neural Network Learning for Fault Classification 117

6.3.1.3 Algorithm for Fault Classification Using Neural Network 118

6.3.2 Support Vector Machine 118

6.3.2.1 Support Vector Machine Theory and Algorithm 118

6.3.3 Support Vector Machine Model Selection and Algorithm 120

6.3.4 Support Vector Machine Multiclass Classification 121

6.4 Unsupervised Learning Models for Fault Detection and Diagnosis 122

6.4.1 K-Nearest Neighbors 122

6.4.2 One-Class Support Vector Machine 123

6.4.3 One-Class Neural Network 124

6.4.4 Comparison Between Deep Learning with Machine Learning in Fault Detection

and Diagnosis 126

6.5 Intelligent FDD Using Machine Learning 127

6.5.1 Model Development 127

6.5.2 Data Collection 129

6.5.2.1 Model Development Steps 129

6.5.2.2 Result Comparison 130

6.6 Concluding Remarks 134

References 134

7 Intelligent Method for Chemical Emission Source Identification 139

Denglong Ma

7.1 Introduction 139

7.1.1 Development of Detecting Gas Emission 139

7.1.2 Development of Source Term Identification 140

7.2 Intelligent Methods for Recognizing Gas Emission 141

7.2.1 Leakage Recognition of Sequestrated CO2 in the Atmosphere 141

7.2.1.1 Gas Leakage Recognition for CO2 Geological Sequestration 142

7.2.1.2 Case Studies for CO2 Recognition 144

7.2.2 Emission Gas Identification with Artificial Olfactory 149

7.2.2.1 Features of Responses in AOS 150

7.2.2.2 Support Vector Machine Models for Gas Identification 150

7.2.2.3 Deep Learning Models for Gas Identification 155

7.3 Intelligent Methods for Identifying Emission Sources 158

7.3.1 Source Estimation with Intelligent Optimization Method 158

7.3.1.1 Principle of Source Estimation with Optimization Method 158

7.3.1.2 Case Studies of Source Estimation with Optimization Method 159

7.3.2 Source Estimation with MRE-PSO Method 159

7.3.2.1 Principle of PSO-MRE for Source Estimation 161

7.3.2.2 Case Studies 163

7.3.3 Source Estimation with PSO-Tikhonov Regulation Method 164

7.3.3.1 Principle of PSO-Tikhonov Regularization Hybrid Method 164

7.3.3.2 Case Study 167

viii Contents

0005453285.3D 8 30/8/2022 8:51:33 PM

7.3.4 Source Estimation with MCMC-MLA Method 168

7.3.4.1 Forward Gas Dispersion Model Based on MLA 168

7.3.4.2 Source Estimation with MCMC-MLA Method 169

7.3.4.3 Case Study 172

7.4 Conclusions and Future Work 173

7.4.1 Conclusions 173

7.4.2 Limitations and Future Work 177

References 178

8 Machine Learning and Deep Learning Applications in Medical Image

Analysis 183

Pingfan Hu, Changjie Cai, Yu Feng and Qingsheng Wang

8.1 Introduction 183

8.1.1 Machine Learning in Medical Imaging 183

8.1.2 Deep Learning in Medical Imaging 183

8.2 CNN-Based Models for Classification 184

8.2.1 ResNet50 184

8.2.2 YOLOv4 (Darknet53) 185

8.2.3 Grad-CAM 186

8.3 Case Study 186

8.3.1 Background 186

8.3.2 Study Design 187

8.3.3 Training and Testing Database Preparation 187

8.3.4 Results 190

8.3.4.1 Classification Performance of the Modified ResNet50 Model 190

8.3.4.2 Classification Performance of the YOLOv4 Model 190

8.3.4.3 Post-Processing Via Grad-CAM Model and HSV 193

8.3.5 Conclusion 194

8.4 Limitations and Future Work 194

References 195

9 Predictive Nanotoxicology: Nanoinformatics Approach to Toxicity Analysis of

Nanomaterials 199

Bilal M. Khan and Yoram Cohen

9.1 Predictive Nanotoxicology 199

9.1.1 Introduction 199

9.1.2 Nano Quantitative Structure-Activity Relationship (QSAR) 200

9.1.3 Importance of Data for Nanotoxicology 204

9.2 Machine Learning Modeling for Predictive Nanotoxicology 205

9.2.1 Overview 205

9.2.2 Unsupervised Learning 211

9.2.2.1 Data Exploration Via Self-Organizing Maps (SOMs) 211

9.2.2.2 Evaluating Associations among Sublethal Toxicity Responses 214

9.2.3 Supervised Learning 215

9.2.3.1 Random Forest Models 216

Contents ix

0005453285.3D 9 30/8/2022 8:51:33 PM

9.2.3.2 Support Vector Machines 216

9.2.3.3 Bayesian Networks 216

9.2.3.4 Supervised Classification and Regression-Based Models for Nano-(Q)SARs 218

9.2.4 Predictive Nano-(Q)SARs for the Assessment of Causal Relationships 220

9.3 Development of Machine Learning Based Models for Nano-(Q)SARs 224

9.3.1 Overview 224

9.3.1.1 Data-Driven Models 224

9.3.1.2 Mechanistic/Theoretical Models 225

9.3.2 Data Generation, Collection, and Preprocessing 225

9.3.3 Descriptor Selection 226

9.3.4 Model Selection and Training 229

9.3.5 Model Validation 230

9.3.5.1 Descriptor Importance 231

9.3.5.2 Applicability Domain 231

9.3.6 Model Diagnosis and Debugging 231

9.4 Nanoinformatics Approaches to Predictive Nanotoxicology 234

9.5 Summary 235

References 238

10 Machine Learning in Environmental Exposure Assessment 251

Gregory L. Watson

10.1 Introduction 251

10.2 Environmental Exposure Modeling 252

10.3 Machine Learning Exposure Models 254

10.4 Model Evaluation 257

10.5 Case Study 258

10.6 Other Topics 260

10.6.1 Bias and Fairness 260

10.6.2 Wearable Sensors 260

10.6.3 Interpretability 260

10.6.4 Extreme Events 260

10.7 Conclusion 261

References 261

11 Air Quality Prediction Using Machine Learning 267

Lan Gao, Changjie Cai and Xiao-Ming Hu

11.1 Introduction 267

11.2 Air Quality and Climate Data Acquisition 269

11.2.1 Earth Satellite Observation Datasets 269

11.2.1.1 Basics of Earth Satellite Observations 269

11.2.1.2 Earth Satellite Products 270

11.2.2 Ground-Based In Situ Observation Datasets 276

11.2.2.1 Basics of the Ground-Based In Situ Observations 276

11.2.2.2 Ground-Based In Situ Products 277

11.3 Applications of Machine Learning in Air Quality Study 279

x Contents

0005453285.3D 10 30/8/2022 8:51:34 PM

11.3.1 Shallow Learning 280

11.3.2 Deep Learning 280

11.4 An Application Practice Example 281

11.4.1 Satellite Data Acquisition and Variable Selections 282

11.4.2 Machine Learning and Deep Learning Algorithms 282

References 283

12 Current Challenges and Perspectives 289

Changjie Cai and Qingsheng Wang

12.1 Current Challenges 289

12.1.1 Data Development and Cleaning 289

12.1.2 Hardware Issues 290

12.1.3 Data Confidentiality 290

12.1.4 Other Challenges 291

12.2 Perspectives 291

12.2.1 Real-Time Monitoring and Forecast of Chemical Hazards 291

12.2.2 Toolkits for Dummies 292

12.2.3 Physics-Informed Machine Learning 292

References 293

Index 000

Qingsheng Wang is Associate Professor of Chemical Engineering and George Armistead '23 Faculty Fellow at Texas A&M University. He has over 15 years of experience in the areas of process safety and fire protection. His experience is wide ranging, involving machine learning in chemical safety, flame retardant materials, fire and explosion dynamics, and composite manufacturing for safety and sustainability. He is a registered professional engineer (PE) and certified safety professional (CSP), and currently a principal member of the NFPA 18 and NFPA 30 committees. Professor Wang has established the Multiscale Process Safety Laboratory at Texas A&M and is currently leading the lab. He has published over 150 peer-reviewed journal publications and 6 book chapters. His work has been internationally recognized and heavily cited, and he is recognized as a world leader in the field of process safety.

Changjie Cai is Assistant Professor of Occupational and Environmental Health from Hudson College of Public Health at the University of Oklahoma Health Sciences Center. Dr Cai has formed an interdisciplinary research lab focusing on three major areas: (i) Developing portable and cost-effective devices to identify, assess and control the safety and health hazards; (ii) Integrating artificial intelligence techniques into safety and health fields; (iii) Modeling the hazard dispersion and their climate effects using chemical transport models.