Deep Reinforcement Learning for Wireless Communications and Networking

Theory, Applications and Implementation

Hoang, Dinh Thai / Huynh, Nguyen Van / Nguyen, Diep N. / Hossain, Ekram / Niyato, Dusit

1. Auflage Juli 2023
288 Seiten, Hardcover
Wiley & Sons Ltd

ISBN: 978-1-119-87367-9

John Wiley & Sons

Wiley Online Library Probekapitel

Weitere Versionen

Deep Reinforcement Learning for Wireless Communications and Networking

Comprehensive guide to Deep Reinforcement Learning (DRL) as applied to wireless communication systems

Deep Reinforcement Learning for Wireless Communications and Networking presents an overview of the development of DRL while providing fundamental knowledge about theories, formulation, design, learning models, algorithms and implementation of DRL together with a particular case study to practice. The book also covers diverse applications of DRL to address various problems in wireless networks, such as caching, offloading, resource sharing, and security. The authors discuss open issues by introducing some advanced DRL approaches to address emerging issues in wireless communications and networking.

Covering new advanced models of DRL, e.g., deep dueling architecture and generative adversarial networks, as well as emerging problems considered in wireless networks, e.g., ambient backscatter communication, intelligent reflecting surfaces and edge intelligence, this is the first comprehensive book studying applications of DRL for wireless networks that presents the state-of-the-art research in architecture, protocol, and application design.

Deep Reinforcement Learning for Wireless Communications and Networking covers specific topics such as:
* Deep reinforcement learning models, covering deep learning, deep reinforcement learning, and models of deep reinforcement learning
* Physical layer applications covering signal detection, decoding, and beamforming, power and rate control, and physical-layer security
* Medium access control (MAC) layer applications, covering resource allocation, channel access, and user/cell association
* Network layer applications, covering traffic routing, network classification, and network slicing

With comprehensive coverage of an exciting and noteworthy new technology, Deep Reinforcement Learning for Wireless Communications and Networking is an essential learning resource for researchers and communications engineers, along with developers and entrepreneurs in autonomous systems, who wish to harness this technology in practical applications.

Notes on Contributors xiii

Foreword xiv

Preface xv

Acknowledgments xviii

Acronyms xix

Introduction xxii

Part I Fundamentals of Deep Reinforcement Learning 1

1 Deep Reinforcement Learning and Its Applications 3

1.1 Wireless Networks and Emerging Challenges 3

1.2 Machine Learning Techniques and Development of DRL 4

1.2.1 Machine Learning 4

1.2.2 Artificial Neural Network 7

1.2.3 Convolutional Neural Network 8

1.2.4 Recurrent Neural Network 9

1.2.5 Development of Deep Reinforcement Learning 10

1.3 Potentials and Applications of DRL 11

1.3.1 Benefits of DRL in Human Lives 11

1.3.2 Features and Advantages of DRL Techniques 12

1.3.3 Academic Research Activities 12

1.3.4 Applications of DRL Techniques 13

1.3.5 Applications of DRL Techniques in Wireless Networks 15

1.4 Structure of this Book and Target Readership 16

1.4.1 Motivations and Structure of this Book 16

1.4.2 Target Readership 19

1.5 Chapter Summary 20

References 21

2 Markov Decision Process and Reinforcement Learning 25

2.1 Markov Decision Process 25

2.2 Partially Observable Markov Decision Process 26

2.3 Policy and Value Functions 29

2.4 Bellman Equations 30

2.5 Solutions of MDP Problems 31

2.5.1 Dynamic Programming 31

2.5.1.1 Policy Evaluation 31

2.5.1.2 Policy Improvement 31

2.5.1.3 Policy Iteration 31

2.5.2 Monte Carlo Sampling 32

2.6 Reinforcement Learning 33

2.7 Chapter Summary 35

References 35

3 Deep Reinforcement Learning Models and Techniques 37

3.1 Value-Based DRL Methods 37

3.1.1 Deep Q-Network 38

3.1.2 Double DQN 41

3.1.3 Prioritized Experience Replay 42

3.1.4 Dueling Network 44

3.2 Policy-Gradient Methods 45

3.2.1 REINFORCE Algorithm 46

3.2.1.1 Policy Gradient Estimation 46

3.2.1.2 Reducing the Variance 48

3.2.1.3 Policy Gradient Theorem 50

3.2.2 Actor-Critic Methods 51

3.2.3 Advantage of Actor-Critic Methods 52

3.2.3.1 Advantage of Actor-Critic (A2C) 53

3.2.3.2 Asynchronous Advantage Actor-Critic (A3C) 55

3.2.3.3 Generalized Advantage Estimate (GAE) 57

3.3 Deterministic Policy Gradient (DPG) 59

3.3.1 Deterministic Policy Gradient Theorem 59

3.3.2 Deep Deterministic Policy Gradient (DDPG) 61

3.3.3 Distributed Distributional DDPG (D4PG) 63

3.4 Natural Gradients 63

3.4.1 Principle of Natural Gradients 64

3.4.2 Trust Region Policy Optimization (TRPO) 67

3.4.2.1 Trust Region 69

3.4.2.2 Sample-Based Formulation 70

3.4.2.3 Practical Implementation 70

3.4.3 Proximal Policy Optimization (PPO) 72

3.5 Model-Based RL 74

3.5.1 Vanilla Model-Based RL 75

3.5.2 Robust Model-Based RL: Model-Ensemble TRPO (ME-TRPO) 76

3.5.3 Adaptive Model-Based RL: Model-Based Meta-Policy Optimization (mb-mpo) 77

3.6 Chapter Summary 78

References 79

4 A Case Study and Detailed Implementation 83

4.1 System Model and Problem Formulation 83

4.1.1 System Model and Assumptions 84

4.1.1.1 Jamming Model 84

4.1.1.2 System Operation 85

4.1.2 Problem Formulation 86

4.1.2.1 State Space 86

4.1.2.2 Action Space 87

4.1.2.3 Immediate Reward 88

4.1.2.4 Optimization Formulation 88

4.2 Implementation and Environment Settings 89

4.2.1 Install TensorFlow with Anaconda 89

4.2.2 Q-Learning 90

4.2.2.1 Codes for the Environment 91

4.2.2.2 Codes for the Agent 96

4.2.3 Deep Q-Learning 97

4.3 Simulation Results and Performance Analysis 102

4.4 Chapter Summary 106

References 106

Part II Applications of Drl in Wireless Communications and Networking 109

5 DRL at the Physical Layer 111

5.1 Beamforming, Signal Detection, and Decoding 111

5.1.1 Beamforming 111

5.1.1.1 Beamforming Optimization Problem 111

5.1.1.2 DRL-Based Beamforming 113

5.1.2 Signal Detection and Channel Estimation 118

5.1.2.1 Signal Detection and Channel Estimation Problem 118

5.1.2.2 RL-Based Approaches 120

5.1.3 Channel Decoding 122

5.2 Power and Rate Control 123

5.2.1 Power and Rate Control Problem 123

5.2.2 DRL-Based Power and Rate Control 124

5.3 Physical-Layer Security 128

5.4 Chapter Summary 129

References 131

6 DRL at the MAC Layer 137

6.1 Resource Management and Optimization 137

6.2 Channel Access Control 139

6.2.1 DRL in the IEEE 802.11 MAC 141

6.2.2 MAC for Massive Access in IoT 143

6.2.3 MAC for 5G and B5G Cellular Systems 147

6.3 Heterogeneous MAC Protocols 155

6.4 Chapter Summary 158

References 158

7 DRL at the Network Layer 163

7.1 Traffic Routing 163

7.2 Network Slicing 166

7.2.1 Network Slicing-Based Architecture 166

7.2.2 Applications of DRL in Network Slicing 168

7.3 Network Intrusion Detection 179

7.3.1 Host-Based IDS 180

7.3.2 Network-Based IDS 181

7.4 Chapter Summary 183

References 183

8 DRL at the Application and Service Layer 187

8.1 Content Caching 187

8.1.1 QoS-Aware Caching 187

8.1.2 Joint Caching and Transmission Control 189

8.1.3 Joint Caching, Networking, and Computation 191

8.2 Data and Computation Offloading 193

8.3 Data Processing and Analytics 198

8.3.1 Data Organization 198

8.3.1.1 Data Partitioning 198

8.3.1.2 Data Compression 199

8.3.2 Data Scheduling 200

8.3.3 Tuning of Data Processing Systems 201

8.3.4 Data Indexing 202

8.3.4.1 Database Index Selection 202

8.3.4.2 Index Structure Construction 203

8.3.5 Query Optimization 205

8.4 Chapter Summary 206

References 207

Part III Challenges, Approaches, Open Issues, and Emerging Research Topics 213

9 DRL Challenges in Wireless Networks 215

9.1 Adversarial Attacks on DRL 215

9.1.1 Attacks Perturbing the State space 215

9.1.1.1 Manipulation of Observations 216

9.1.1.2 Manipulation of Training Data 218

9.1.2 Attacks Perturbing the Reward Function 220

9.1.3 Attacks Perturbing the Action Space 222

9.2 Multiagent DRL in Dynamic Environments 223

9.2.1 Motivations 223

9.2.2 Multiagent Reinforcement Learning Models 224

9.2.2.1 Markov/Stochastic Games 225

9.2.2.2 Decentralized Partially Observable Markov Decision Process (dpomdp) 226

9.2.3 Applications of Multiagent DRL in Wireless Networks 227

9.2.4 Challenges of Using Multiagent DRL in Wireless Networks 229

9.2.4.1 Nonstationarity Issue 229

9.2.4.2 Partial Observability Issue 229

9.3 Other Challenges 230

9.3.1 Inherent Problems of Using RL in Real-Word Systems 230

9.3.1.1 Limited Learning Samples 230

9.3.1.2 System Delays 230

9.3.1.3 High-Dimensional State and Action Spaces 231

9.3.1.4 System and Environment Constraints 231

9.3.1.5 Partial Observability and Nonstationarity 231

9.3.1.6 Multiobjective Reward Functions 232

9.3.2 Inherent Problems of DL and Beyond 232

9.3.2.1 Inherent Problems of dl 232

9.3.2.2 Challenges of DRL Beyond Deep Learning 233

9.3.3 Implementation of DL Models in Wireless Devices 236

9.4 Chapter Summary 237

References 237

10 DRL and Emerging Topics in Wireless Networks 241

10.1 DRL for Emerging Problems in Future Wireless Networks 241

10.1.1 Joint Radar and Data Communications 241

10.1.2 Ambient Backscatter Communications 244

10.1.3 Reconfigurable Intelligent Surface-Aided Communications 247

10.1.4 Rate Splitting Communications 249

10.2 Advanced DRL Models 252

10.2.1 Deep Reinforcement Transfer Learning 252

10.2.1.1 Reward Shaping 253

10.2.1.2 Intertask Mapping 254

10.2.1.3 Learning from Demonstrations 255

10.2.1.4 Policy Transfer 255

10.2.1.5 Reusing Representations 256

10.2.2 Generative Adversarial Network (GAN) for DRL 257

10.2.3 Meta Reinforcement Learning 258

10.3 Chapter Summary 259

References 259

Index 263

Dinh Thai Hoang, Ph.D., is a faculty member at the University of Technology Sydney, Australia. He is also an Associate Editor of IEEE Communications Surveys & Tutorials and an Editor of IEEE Transactions on Wireless Communications, IEEE Transactions on Cognitive Communications and Networking, and IEEE Transactions on Vehicular Technology.

Nguyen Van Huynh, Ph.D., obtained his Ph.D. from the University of Technology Sydney in 2022. He is currently a Research Associate in the Department of Electrical and Electronic Engineering, Imperial College London, UK.

Diep N. Nguyen, Ph.D., is Director of Agile Communications and Computing Group and a member of the Faculty of Engineering and Information Technology at the University of Technology Sydney, Australia.

Ekram Hossain, Ph.D., is a Professor in the Department of Electrical and Computer Engineering at the University of Manitoba, Canada, and a Fellow of the IEEE. He co-authored the Wiley title Radio Resource Management in Multi-Tier Cellular Wireless Networks (2013).

Dusit Niyato, Ph.D., is a Professor in the School of Computer Science and Engineering at Nanyang Technological University, Singapore. He co-authored the Wiley title Radio Resource Management in Multi-Tier Cellular Wireless Networks (2013).

D. T. Hoang, University of Technology Sydney, Australia; N. V. Huynh, University of Technology Sydney, Australia; Imperial College London, U; D. N. Nguyen, University of Technology Sydney, Australia; E. Hossain, University of Manitoba, Canada; D. Niyato, Nanyang Technological University, Singapore