Multi-Processor System-on-Chip 1

Architectures

Andrade, Liliana / Rousseau, Frederic

1. Auflage Mai 2021
320 Seiten, Hardcover
Wiley & Sons Ltd

ISBN: 978-1-78945-021-7

John Wiley & Sons

Wiley Online Library Probekapitel

Weitere Versionen

A Multi-Processor System-on-Chip (MPSoC) is the key component for complex applications. These applications put huge pressure on memory, communication devices and computing units. This book, presented in two volumes ? Architectures and Applications ? therefore celebrates the 20th anniversary of MPSoC, an interdisciplinary forum that focuses on multi-core and multi-processor hardware and software systems. It is this interdisciplinarity which has led to MPSoC bringing together experts in these fields from around the world, over the last two decades.

Multi-Processor System-on-Chip 1 covers the key components of MPSoC: processors, memory, interconnect and interfaces. It describes advance features of these components and technologies to build efficient MPSoC architectures. All the main components are detailed: use of memory and their technology, communication support and consistency, and specific processor architectures for general purposes or for dedicated applications.

Foreword xiii
Ahmed JERRAYA

Acknowledgments xv
Liliana ANDRADE and Frédéric ROUSSEAU

Part 1. Processors 1

Chapter 1. Processors for the Internet of Things 3
Pieter VAN DER WOLF and Yankin TANURHAN

1.1. Introduction 3

1.2. Versatile processors for low-power IoT edge devices 4

1.2.1. Control processing, DSP and machine learning 4

1.2.2. Configurability and extensibility 6

1.3. Machine learning inference 8

1.3.1. Requirements for low/mid-end machine learning inference 10

1.3.2. Processor capabilities for low-power machine learning inference 14

1.3.3. A software library for machine learning inference 17

1.3.4. Example machine learning applications and benchmarks 20

1.4. Conclusion 23

1.5. References 24

Chapter 2. A Qualitative Approach to Many-core Architecture 27
Benoît DUPONT DE DINECHIN

2.1. Introduction 28

2.2. Motivations and context 29

2.2.1. Many-core processors 29

2.2.2. Machine learning inference 30

2.2.3. Application requirements 32

2.3. The MPPA3 many-core processor 34

2.3.1. Global architecture 34

2.3.2. Compute cluster 36

2.3.3. VLIW core 38

2.3.4. Coprocessor 39

2.4. The MPPA3 software environments 42

2.4.1. High-performance computing 42

2.4.2. KaNN code generator 43

2.4.3. High-integrity computing 46

2.5. Conclusion 47

2.6. References 48

Chapter 3. The Plural Many-core Architecture - High Performance at Low Power 53
Ran GINOSAR

3.1. Introduction 54

3.2. Related works 55

3.3. Plural many-core architecture 55

3.4. Plural programming model 56

3.5. Plural hardware scheduler/synchronizer 58

3.6. Plural networks-on-chip 61

3.6.1. Schedule rNoC 61

3.6.2. Shared memory NoC 61

3.7. Hardware and software accelerators for the Plural architecture 62

3.8. Plural system software 63

3.9. Plural software development tools 65

3.10. Matrix multiplication algorithm on the Plural architecture 65

3.11. Conclusion 67

3.12. References 67

Chapter 4. ASIP-Based Multi-Processor Systems for an Efficient Implementation of CNNs 69
Andreas BYTYN, René AHLSDORF and Gerd ASCHEID

4.1. Introduction 70

4.2. Related works 71

4.3. ASIP architecture 74

4.4. Single-core scaling 75

4.5. MPSoC overview 78

4.6. NoC parameter exploration 79

4.7. Summary and conclusion 82

4.8. References 83

Part 2. Memory 85

Chapter 5. Tackling the MPSoC Data Locality Challenge 87
Sven RHEINDT, Akshay SRIVATSA, Oliver LENKE, Lars NOLTE, Thomas WILD and Andreas HERKERSDORF

5.1. Motivation 88

5.2. MPSoC target platform 90

5.3. Related work 91

5.4. Coherence-on-demand: region-based cache coherence 92

5.4.1. RBCC versus global coherence 93

5.4.2. OS extensions for coherence-on-demand 94

5.4.3. Coherency region manager 94

5.4.4. Experimental evaluations 97

5.4.5. RBCC and data placement 99

5.5. Near-memory acceleration 100

5.5.1. Near-memory synchronization accelerator 102

5.5.2. Near-memory queue management accelerator 104

5.5.3. Near-memory graph copy accelerator 107

5.5.4. Near-cache accelerator 110

5.6. The big picture 111

5.7. Conclusion 113

5.8. Acknowledgments 114

5.9. References 114

Chapter 6. mMPU: Building a Memristor-based General-purpose In-memory Computation Architecture 119
Adi ELIAHU, Rotem BEN HUR, Ameer HAJ ALI and Shahar KVATINSKY

6.1. Introduction 120

6.2. MAGIC NOR gate 121

6.3. In-memory algorithms for latency reduction 122

6.4. Synthesis and in-memory mapping methods 123

6.4.1. SIMPLE 124

6.4.2. SIMPLER 126

6.5. Designing the memory controller 127

6.6. Conclusion 129

6.7. References 130

Chapter 7. Removing Load/Store Helpers in Dynamic Binary Translation 133
Antoine FARAVELON, Olivier GRUBER and Frédéric PÉTROT

7.1. Introduction 134

7.2. Emulating memory accesses 136

7.3. Design of our solution 140

7.4. Implementation 143

7.4.1. Kernel module 143

7.4.2. Dynamic binary translation 145

7.4.3. Optimizing our slow path 147

7.5. Evaluation 149

7.5.1. QEMU emulation performance analysis 150

7.5.2. Our performance overview 151

7.5.3. Optimized slow path 153

7.6. Related works 155

7.7. Conclusion 157

7.8. References 158

Chapter 8. Study and Comparison of Hardware Methods for Distributing Memory Bank Accesses in Many-core Architectures 161
Arthur VIANES and Frédéric ROUSSEAU

8.1. Introduction 162

8.1.1. Context 162

8.1.2. MPSoC architecture 163

8.1.3. Interconnect 164

8.2. Basics on banked memory 165

8.2.1. Banked memory 165

8.2.2. Memory bank conflict and granularity 166

8.2.3. Efficient use of memory banks: interleaving 168

8.3. Overview of software approaches 170

8.3.1. Padding 170

8.3.2. Static scheduling of memory accesses 172

8.3.3. The need for hardware approaches 172

8.4. Hardware approaches 172

8.4.1. Prime modulus indexing 172

8.4.2. Interleaving schemes using hash functions 174

8.5. Modeling and experimenting 181

8.5.1. Simulator implementation 182

8.5.2. Implementation of the Kalray MPPA cluster interconnect 182

8.5.3. Objectives and method 184

8.5.4. Results and discussion 185

8.6. Conclusion 191

8.7. References 192

Part 3. Interconnect and Interfaces 195

Chapter 9. Network-on-Chip (NoC): The Technology that Enabled Multi-processor Systems-on-Chip (MPSoCs) 197
K. Charles JANAC

9.1. History: transition from buses and crossbars to NoCs 198

9.1.1.NoC architecture 202

9.1.2. Extending the bus comparison to crossbars 207

9.1.3. Bus, crossbar and NoC comparison summary and conclusion 207

9.2. NoC configurability 208

9.2.1. Human-guided design flow 208

9.2.2. Physical placement awareness and NoC architecture design 209

9.3. System-level services 211

9.3.1. Quality-of-service (QoS) and arbitration 211

9.3.2. Hardware debug and performance analysis 212

9.3.3. Functional safety and security 212

9.4. Hardware cache coherence 215

9.4.1. NoC protocols, semantics and messaging 216

9.5. Future NoC technology developments 217

9.5.1. Topology synthesis and floorplan awareness 217

9.5.2. Advanced resilience and functional safety for autonomous vehicles 218

9.5.3. Alternatives to von Neumann architectures for SoCs 219

9.5.4. Chiplets and multi-die NoC connectivity 221

9.5.5. Runtime software automation 222

9.5.6. Instrumentation, diagnostics and analytics for performance, safety and security 223

9.6. Summary and conclusion 224

9.7. References 224

Chapter 10. Minimum Energy Computing via Supply and Threshold Voltage Scaling 227
Jun SHIOMI and Tohru ISHIHARA

10.1. Introduction 228

10.2. Standard-cell-based memory for minimum energy computing 230

10.2.1. Overview of low-voltage on-chip memories 230

10.2.2. Design strategy for area- and energy-efficient SCMs 234

10.2.3. Hybrid memory design towards energy- and area-efficient memory systems 236

10.2.4. Body biasing as an alternative to power gating 237

10.3. Minimum energy point tracking 238

10.3.1. Basic theory 238

10.3.2. Algorithms and implementation 244

10.3.3. OS-based approach to minimum energy point tracking 246

10.4. Conclusion 249

10.5. Acknowledgments 249

10.6. References 250

Chapter 11. Maintaining Communication Consistency During Task Migrations in Heterogeneous Reconfigurable Devices 255
Arief WICAKSANA, OlivierMULLER, Frédéric ROUSSEAU and Arif SASONGKO

11.1. Introduction 256

11.1.1. Reconfigurable architectures 256

11.1.2. Contribution 257

11.2. Background 257

11.2.1. Definitions 258

11.2.2. Problem scenario and technical challenges 259

11.3. Related works 261

11.3.1. Hardware context switch 261

11.3.2. Communication management 262

11.4. Proposed communication methodology in hardware context switching 263

11.5. Implementation of the communication management on reconfigurable computing architectures 266

11.5.1. Reconfigurable channels in FIFO 267

11.5.2. Communication infrastructure 268

11.6. Experimental results 269

11.6.1. Setup 269

11.6.2. Experiment scenario 270

11.6.3. Resource overhead 271

11.6.4. Impact on the total execution time 273

11.6.5. Impact on the context extract and restore time 275

11.6.6. System responsiveness to context switch requests 276

11.6.7. Hardware task migration between heterogeneous FPGAs 280

11.7. Conclusion 282

11.8. References 283

List of Authors 287

Authors Biographies 291

Index 299

Liliana Andrade is Associate Professor at TIMA Lab, Universite Grenoble Alpes in France. She received her PhD in Computer Science, Telecommunications and Electronics from Universite Pierre et Marie Curie in 2016. Her research interests include system-level modeling/validation of systems-on-chips, and the acceleration of heterogeneous systems simulation.

Frederic Rousseau is Full Professor at TIMA Lab, Universite Grenoble Alpes in France. His research interests concern Multi-Processor Systems-on-Chip design and architecture, prototyping of hardware/software systems including reconfigurable systems and highlevel synthesis for embedded systems.