Multi-Agent Coordination

A Reinforcement Learning Approach

Sadhu, Arup Kumar / Konar, Amit

Wiley - IEEE

1. Edition January 2021
320 Pages, Hardcover
Wiley & Sons Ltd

ISBN: 978-1-119-69903-3

John Wiley & Sons

Wiley Online Library Sample Chapter

Buy now

Ordering and shipping via our shop or authorized distribution partners.

Order now

Further versions

ePub MOBI PDF

Discover the latest developments in multi-robot coordination techniques with this insightful and original resource

Multi-Agent Coordination: A Reinforcement Learning Approach delivers a comprehensive, insightful, and unique treatment of the development of multi-robot coordination algorithms with minimal computational burden and reduced storage requirements when compared to traditional algorithms. The accomplished academics, engineers, and authors provide readers with both a high-level introduction to, and overview of, multi-robot coordination, and in-depth analyses of learning-based planning algorithms.

You'll learn about how to accelerate the exploration of the team-goal and alternative approaches to speeding up the convergence of TMAQL by identifying the preferred joint action for the team. The authors also propose novel approaches to consensus Q-learning that address the equilibrium selection problem and a new way of evaluating the threshold value for uniting empires without imposing any significant computation overhead. Finally, the book concludes with an examination of the likely direction of future research in this rapidly developing field.

Readers will discover cutting-edge techniques for multi-agent coordination, including:
* An introduction to multi-agent coordination by reinforcement learning and evolutionary algorithms, including topics like the Nash equilibrium and correlated equilibrium
* Improving convergence speed of multi-agent Q-learning for cooperative task planning
* Consensus Q-learning for multi-agent cooperative planning
* The efficient computing of correlated equilibrium for cooperative q-learning based multi-agent planning
* A modified imperialist competitive algorithm for multi-agent stick-carrying applications

Perfect for academics, engineers, and professionals who regularly work with multi-agent learning algorithms, Multi-Agent Coordination: A Reinforcement Learning Approach also belongs on the bookshelves of anyone with an advanced interest in machine learning and artificial intelligence as it applies to the field of cooperative or competitive robotics.

Preface xi

Acknowledgments xix

About the Authors xxi

1 Introduction: Multi-agent Coordination by Reinforcement Learning and Evolutionary Algorithms 1

1.1 Introduction 2

1.2 Single Agent Planning 4

1.2.1 Terminologies Used in Single Agent Planning 4

1.2.2 Single Agent Search-Based Planning Algorithms 10

1.2.2.1 Dijkstra's Algorithm 10

1.2.2.2 A* (A-star) Algorithm 11

1.2.2.3 D* (D-star) Algorithm 15

1.2.2.4 Planning by STRIPS-Like Language 15

1.2.3 Single Agent RL 17

1.2.3.1 Multiarmed Bandit Problem 17

1.2.3.2 DP and Bellman Equation 20

1.2.3.3 Correlation Between RL and DP 21

1.2.3.4 Single Agent Q-Learning 21

1.2.3.5 Single Agent Planning Using Q-Learning 24

1.3 Multi-agent Planning and Coordination 25

1.3.1 Terminologies Related to Multi-agent Coordination 25

1.3.2 Classification of MAS 26

1.3.3 Game Theory for Multi-agent Coordination 28

1.3.3.1 Nash Equilibrium 31

1.3.3.2 Correlated Equilibrium 36

1.3.3.3 Static Game Examples 38

1.3.4 Correlation Among RL, DP, and GT 40

1.3.5 Classification of MARL 40

1.3.5.1 Cooperative MARL 42

1.3.5.2 Competitive MARL 56

1.3.5.3 Mixed MARL 59

1.3.6 Coordination and Planning by MAQL 84

1.3.7 Performance Analysis of MAQL and MAQL-Based Coordination 85

1.4 Coordination by Optimization Algorithm 87

1.4.1 PSO Algorithm 88

1.4.2 Firefly Algorithm 91

1.4.2.1 Initialization 92

1.4.2.2 Attraction to Brighter Fireflies 92

1.4.2.3 Movement of Fireflies 93

1.4.3 Imperialist Competitive Algorithm 93

1.4.3.1 Initialization 94

1.4.3.2 Selection of Imperialists and Colonies 95

1.4.3.3 Formation of Empires 95

1.4.3.4 Assimilation of Colonies 96

1.4.3.5 Revolution 96

1.4.3.6 Imperialistic Competition 97

1.4.4 Differential Evolution Algorithm 98

1.4.4.1 Initialization 99

1.4.4.2 Mutation 99

1.4.4.3 Recombination 99

1.4.4.4 Selection 99

1.4.5 Off-line Optimization 99

1.4.6 Performance Analysis of Optimization Algorithms 99

1.4.6.1 Friedman Test 100

1.4.6.2 Iman-Davenport Test 100

1.5 Summary 101

References 101

2 Improve Convergence Speed of Multi-Agent Q-Learning for Cooperative Task Planning 111

2.1 Introduction 112

2.2 Literature Review 116

2.3 Preliminaries 118

2.3.1 Single Agent Q-learning 119

2.3.2 Multi-agent Q-learning 119

2.4 Proposed MAQL 123

2.4.1 Two Useful Properties 124

2.5 Proposed FCMQL Algorithms and Their Convergence Analysis 128

2.5.1 Proposed FCMQL Algorithms 129

2.5.2 Convergence Analysis of the Proposed FCMQL Algorithms 130

2.6 FCMQL-Based Cooperative Multi-agent Planning 131

2.7 Experiments and Results 134

2.8 Conclusions 141

2.9 Summary 143

2.A More Details on Experimental Results 144

2.A.1 Additional Details of Experiment 2.1 144

2.A.2 Additional Details of Experiment 2.2 159

2.A.3 Additional Details of Experiment 2.4 161

References 162

3 Consensus Q-Learning for Multi-agent Cooperative Planning 167

3.1 Introduction 167

3.2 Preliminaries 169

3.2.1 Single Agent Q-Learning 169

3.2.2 Equilibrium-Based Multi-agent Q-Learning 170

3.3 Consensus 171

3.4 Proposed CoQL and Planning 173

3.4.1 Consensus Q-Learning 173

3.4.2 Consensus-Based Multi-robot Planning 175

3.5 Experiments and Results 176

3.5.1 Experimental Setup 176

3.5.2 Experiments for CoQL 177

3.5.3 Experiments for Consensus-Based Planning 177

3.6 Conclusions 179

3.7 Summary 180

References 180

4 An Efficient Computing of Correlated Equilibrium for Cooperative Q-Learning-Based Multi-Robot Planning 183

4.1 Introduction 183

4.2 Single-Agent Q-Learning and Equilibrium-Based MAQL 186

4.2.1 Single Agent Q-Learning 187

4.2.2 Equilibrium-Based MAQL 187

4.3 Proposed Cooperative MAQL and Planning 188

4.3.1 Proposed Schemes with Their Applicability 189

4.3.2 Immediate Rewards in Scheme-I and -II 190

4.3.3 Scheme-I-Induced MAQL 190

4.3.4 Scheme-II-Induced MAQL 193

4.3.5 Algorithms for Scheme-I and II 200

4.3.6 Constraint OmegaQL-I/OmegaQL-II(COmegaQL-I/COmegaQL-II) 201

4.3.7 Convergence 201

4.3.8 Multi-agent Planning 207

4.4 Complexity Analysis 209

4.4.1 Complexity of CQL 210

4.4.1.1 Space Complexity 210

4.4.1.2 Time Complexity 210

4.4.2 Complexity of the Proposed Algorithms 210

4.4.2.1 Space Complexity 211

4.4.2.2 Time Complexity 211

4.4.3 Complexity Comparison 213

4.4.3.1 Space Complexity 213

4.4.3.2 Time Complexity 214

4.5 Simulation and Experimental Results 215

4.5.1 Experimental Platform 215

4.5.1.1 Simulation 215

4.5.1.2 Hardware 216

4.5.2 Experimental Approach 217

4.5.2.1 Learning Phase 217

4.5.2.2 Planning Phase 217

4.5.3 Experimental Results 218

4.6 Conclusion 226

4.7 Summary 226

4.A Supporting Algorithm and Mathematical Analysis 227

References 228

5 A Modified Imperialist Competitive Algorithm for Multi-Robot Stick-Carrying Application 233

5.1 Introduction 234

5.2 Problem Formulation for Multi-Robot Stick-Carrying 239

5.3 Proposed Hybrid Algorithm 242

5.3.1 An Overview of ICA 242

5.3.1.1 Initialization 242

5.3.1.2 Selection of Imperialists and Colonies 243

5.3.1.3 Formation of Empires 243

5.3.1.4 Assimilation of Colonies 244

5.3.1.5 Revolution 244

5.3.1.6 Imperialistic Competition 245

5.4 An Overview of FA 247

5.4.1 Initialization 247

5.4.2 Attraction to Brighter Fireflies 247

5.4.3 Movement of Fireflies 248

5.5 Proposed ICFA 248

5.5.1 Assimilation of Colonies 251

5.5.1.1 Attraction to Powerful Colonies 251

5.5.1.2 Modification of Empire Behavior 251

5.5.1.3 Union of Empires 252

5.6 Simulation Results 254

5.6.1 Comparative Framework 254

5.6.2 Parameter Settings 254

5.6.3 Analysis on Explorative Power of ICFA 254

5.6.4 Comparison of Quality of the Final Solution 255

5.6.5 Performance Analysis 258

5.7 Computer Simulation and Experiment 265

5.7.1 Average Total Path Deviation (ATPD) 265

5.7.2 Average Uncovered Target Distance (AUTD) 265

5.7.3 Experimental Setup in Simulation Environment 265

5.7.4 Experimental Results in Simulation Environment 266

5.7.5 Experimental Setup with Khepera Robots 268

5.7.6 Experimental Results with Khepera Robots 269

5.8 Conclusion 270

5.9 Summary 272

5.A Additional Comparison of ICFA 272

References 275

6 Conclusions and Future Directions 281

6.1 Conclusions 281

6.2 Future Directions 283

Index 285

Arup Kumar Sadhu, PhD, received his doctorate in Multi-Robot Coordination by Reinforcement Learning from Jadavpur University in India in 2017. He works as a scientist with Research & Innovation Labs, Tata Consultancy Services.

Amit Konar, PhD, received his doctorate from Jadavpur University, India in 1994. He is Professor with the Department of Electronics and Tele-Communication Engineering at Jadavpur University where he serves as the Founding Coordinator of the M. Tech. program on intelligent automation and robotics.