The concept of a data lake is less than 10 years old, but they are already hugely implemented within large companies. Their goal is to efficiently deal with ever-growing volumes of heterogeneous data, while also facing various sophisticated user needs. However, defining and building a data lake is still a challenge, as no consensus has been reached so far.Data Lakes presents recent outcomes and trends in the field of data repositories. The main topics discussed are the data-driven architecture of a data lake; the management of metadata supplying key information about the stored data, master data and reference data; the roles of linked data and fog computing in a data lake ecosystem; and how gravity principles apply in the context of data lakes.A variety of case studies are also presented, thus providing the reader with practical examples of data lake management.
1. Introduction to Data Lakes: Definitions and Discussions, Anne Laurent, Dominique Laurent and Cedrine Madera.2. Architecture of Data Lakes, Houssem Chihoub, Cedrine Madera, Christoph Quix and Rihan Hai.3. Exploiting Software Product Lines and Formal Concept Analysis for the Design of Data Lake Architectures, Marianne Huchard, Anne Laurent, Therese Libourel, Cedrine Madera and Andre Miralles.4. Metadata in Data Lake Ecosystems, Asma Zgolli, Christine Collet and Cedrine Madera.5. A Use Case of Data Lake Metadata Management, Imen Megdiche, Franck Ravat, Yan Zhao.6. Master Data and Reference Data in Data Lake Ecosystems, Cedrine Madera.7. Linked Data Principles for Data Lakes, Alessandro Adamou and Mathieu D'Aquin.8. Fog Computing, Arnault Ioualalen.9. The Gravity Principle in Data Lakes, Anne Laurent, Therese Libourel, Cedrine Madera and Andree Miralles.
Anne Laurent is a Full Professor at the University of Montpellier, France, and teaches at the Polytech Montpellier Engineering School. She is also a member of the LIRMM laboratory at the University of Montpellier, where she works on the semantic web, data mining, data warehousing, data lakes and fuzzy logic.Dominique Laurent is Emeritus Professor at Cergy-Pontoise University, France. He is a member of the ETIS-CNRS laboratory and his main research interests include database theory, database updates, data mining and data warehousing.Cedrine Madera is an Executive Information Architect at IBM, France. She is a doctor in Data Science and, in close collaboration with the world of academics, she works on the evolution of information systems.