Information Theory Meets Power Laws
Stochastic Processes and Language Models

1. Auflage April 2021
384 Seiten, Hardcover
Wiley & Sons Ltd
Discover new theoretical connections between stochastic phenomena and the structure of natural language with this powerful volume!
Information Theory Meets Power Laws: Stochastic Processes and Language Models presents readers with a novel subtype of a probabilistic approach to language, which is based on statistical laws of texts and their analysis by means of information theory. The distinguished author insightfully and rigorously examines the linguistic and mathematical subject matter while eschewing needlessly abstract and superfluous constructions.
The book begins with a less formal treatment of its subjects in the first chapter, introducing its concepts to readers without mathematical training and allowing those unfamiliar with linguistics to learn the book's motivations. Despite its inherent complexity, Information Theory Meets Power Laws: Stochastic Processes and Language Models is a surprisingly approachable treatment of idealized mathematical models of human language.
The author succeeds in developing some of the theory underlying fundamental stochastic and semantic phenomena, like strong nonergodicity, in a way that has not previously been seriously attempted. In doing so, he covers topics including:
* Zipf's and Herdan's laws for natural language
* Power laws for information, repetitions, and correlations
* Markov, finite-state,and Santa Fe processes
* Bayesian and frequentist interpretations of probability
* Ergodic decomposition, Kolmogorov complexity, and universal coding
* Theorems about facts and words
* Information measures for fields
* Rényi entropies, recurrence times, and subword complexity
* Asymptotically mean stationary processes
Written primarily for mathematics graduate students and professionals interested in information theory or discrete stochastic processes, Information Theory Meets Power Laws: Stochastic Processes and Language Models also belongs on the bookshelves of doctoral students and researchers in artificial intelligence, computational and quantitative linguistics as well as physics of complex systems.
Acknowledgments xiv
Basic notations 1
1 Guiding ideas 3
1.1 The motivating question 3
1.2 Further questions about texts 7
1.3 Zipf's and Herdan's laws 10
1.4 Markov and finite-state processes 17
1.5 More general stochastic processes 23
1.6 Two interpretations of probability 26
1.7 Insights from information theory 28
1.8 Estimation of entropy rate 31
1.9 Entropy of natural language 34
1.10 Algorithmic information theory 38
1.11 Descriptions of a random world 41
1.12 Facts and words related 47
1.13 Repetitions and entropies 52
1.14 Decay of correlations 57
1.15 Recapitulation 59
2 Probabilistic preliminaries 63
2.1 Probability measures 65
2.2 Product measurable spaces 68
2.3 Discrete random variables 71
2.4 From IID to finite-state processes 74
Problems 79
3 Probabilistic toolbox 81
3.1 Borel -fields and a fair coin 83
3.2 Integral and expectation 87
3.3 Inequalities and corollaries 91
3.4 Semi-distributions 96
3.5 Conditional probability 99
3.6 Modes of convergence 106
3.7 Complete spaces 107
Problems 110
4 Ergodic properties 113
4.1 Plain relative frequency 115
4.2 Birkhoff ergodic theorem 120
4.3 Ergodic and mixing criteria 124
4.4 Ergodic decomposition 129
Problems 132
5 Entropy and information 135
5.1 Shannon measures for partitions 137
5.2 Block entropy and its limits 143
5.3 Shannon measures for fields 150
5.4 Block entropy limits revisited 160
5.5 Convergence of entropy 164
5.6 Entropy as self-information 166
Problems 169
6 Equipartition and universality 173
6.1 SMB theorem 175
6.2 Universal semi-distributions 177
6.3 PPM probability 178
6.4 SMB theorem revisited 184
6.5 PPM-based statistics 186
Problems 193
7 Coding and computation 195
7.1 Elements of coding 197
7.2 Kolmogorov complexity 203
7.3 Algorithmic coding theorems 213
7.4 Limits of mathematics 222
7.5 Algorithmic randomness 226
Problems 232
8 Power laws for information 237
8.1 Hilberg exponents 239
8.2 Second order SMB theorem 246
8.3 Probabilistic and algorithmic facts 250
8.4 Theorems about facts and words 256
Problems 263
9 Power laws for repetitions 267
9.1 R´enyi-Arimoto entropies 269
9.2 Generalized entropy rates 274
9.3 Recurrence times 277
9.4 Subword complexity 281
9.5 Two maximal lengths 288
9.6 Logarithmic power laws 293
Problems 299
10 AMS processes 301
10.1 AMS and pseudo-AMS measures 302
10.2 Quas i-periodic coding 305
10.3 Synchronizable coding 308
10.4 Entropy rate in the AMS case 310
Problems 314
11 Toy examples 317
11.1 Finite and ultra-finite energy 319
11.2 Santa Fe processes and alike 325
11.3 Encoding into a finite alphabet 334
11.4 Random hierarchical association 346
11.5 Towards better models 357
Problems 360
Future research 361
Index 363
Bibliography 366