Distributed, Ambient and Pervasive Interactions. Smart Environments, Ecosystems, and Cities
The two-volume set, LNCS 13325 and 13326, are conference proceedings that constitutes the refereed proceedings of the 10th International Conference on Distributed, Ambient and Pervasive Interactions, DAPI 2022, held as part of the 24th International Conference, HCI International 2022, which took place during June-July 2022. The conference was held virtually due to the COVID-19 pandemic. The 58 papers of DAPI 2022 are organized in topical sections named for each volume: Part I: User Experience and Interaction Design for Smart Ecosystems; Smart Cities, Smart Islands, and Intelligent Urban Living; Smart Artifacts in Smart Environments; and Opportunities and Challenges for the Near Future Smart EnvironmentsPart II: Smart Living in Pervasive IoT Ecosystems; Distributed, Ambient, and Pervasive Education and Learning; Distributed, Ambient, and Pervasive Well-being and Healthcare; and Smart Creativity and Art.
Extending Oracle Application Express with Oracle Cloud Features
This book shows Oracle Application Express (APEX) developers how to take advantage of Oracle Cloud Infrastructure (OCI) features for APEX that might otherwise go missed. You will learn how to use OCI features for data science tasks such as detecting anomalies in your data, training machine learning models, and much more. The book provides an in-depth look at Oracle Cloud features and demonstrates how they can be easily integrated into an APEX application. While the book focuses on developing for APEX, the approaches covered in the book are also applicable to any other modern web developer framework for applications running on the OCI platform.For many organizations, the database is the heart of operations. Those who opt to invest in the Oracle Database can learn from this book how to maximize their return on investment. The book begins with an introduction to OCI and help on setting up your OCI developer environment. From there you'll begin with security by learningto provide single sign-on via the Oracle Identity Cloud Service. Subsequent chapters take you through cloud-focused features such as Object Storage, Oracle Function, Oracle Machine Learning REST Services, and Oracle Cloud Anomaly Detection. You'll even learn to troubleshoot email delivery services. What You Will LearnBe aware of Oracle Cloud Infrastructure features for developersIntegrate with cloud native services such as cloud-based object storage and serverless functionsEnhance APEX applications with machine learning featuresImplement Natural Language Processing and Anomaly Detection AlgorithmsTroubleshoot email delivery services when sending emails using the APEX_MAIL packageDesign and implement an APEX environment that is secureWho This Book Is For APEX developers who are looking to extend their application's capabilities using features and resources available through the Oracle Cloud, and cloud solutions architects who support development teams and help design and implement architectures that benefit business operations
Data Spaces
This open access book aims to educate data space designers to understand what is required to create a successful data space. It explores cutting-edge theory, technologies, methodologies, and best practices for data spaces for both industrial and personal data and provides the reader with a basis for understanding the design, deployment, and future directions of data spaces.The book captures the early lessons and experience in creating data spaces. It arranges these contributions into three parts covering design, deployment, and future directions respectively. The first part explores the design space of data spaces. The single chapters detail the organisational design for data spaces, data platforms, data governance federated learning, personal data sharing, data marketplaces, and hybrid artificial intelligence for data spaces.The second part describes the use of data spaces within real-world deployments. Its chapters are co-authored with industry experts and include case studies of data spaces in sectors including industry 4.0, food safety, FinTech, health care, and energy.The third and final part details future directions for data spaces, including challenges and opportunities for common European data spaces and privacy-preserving techniques for trustworthy data sharing.The book is of interest to two primary audiences: first, researchers interested in data management and data sharing, and second, practitioners and industry experts engaged in data-driven systems where the sharing and exchange of data within an ecosystem are critical.
Distributed, Ambient and Pervasive Interactions. Smart Living, Learning, Well-Being and Health, Art and Creativity
The two-volume set, LNCS 13325 and 13326, are conference proceedings that constitutes the refereed proceedings of the 10th International Conference on Distributed, Ambient and Pervasive Interactions, DAPI 2022, held as part of the 24th International Conference, HCI International 2022, which took place during June-July 2022. The conference was held virtually due to the COVID-19 pandemic. The 58 papers of DAPI 2022 are organized in topical sections named for each volume: Part I: User Experience and Interaction Design for Smart Ecosystems; Smart Cities, Smart Islands, and Intelligent Urban Living; Smart Artifacts in Smart Environments; and Opportunities and Challenges for the Near Future Smart EnvironmentsPart II: Smart Living in Pervasive IoT Ecosystems; Distributed, Ambient, and Pervasive Education and Learning; Distributed, Ambient, and Pervasive Well-being and Healthcare; and Smart Creativity and Art.
Database Principles and Technologies - Based on Huawei Gaussdb
This open access book contains eight chapters that deal with database technologies, including the development history of database, database fundamentals, introduction to SQL syntax, classification of SQL syntax, database security fundamentals, database development environment, database design fundamentals, and the application of Huawei's cloud database product GaussDB database. This book can be used as a textbook for database courses in colleges and universities, and is also suitable as a reference book for the HCIA-GaussDB V1.5 certification examination. The Huawei GaussDB (for MySQL) used in the book is a Huawei cloud-based high-performance, highly applicable relational database that fully supports the syntax and functionality of the open source database MySQL. All the experiments in this book can be run on this database platform. As the world's leading provider of ICT (information and communication technology) infrastructure and smart terminals, Huawei's productsrange from digital data communication, cyber security, wireless technology, data storage, cloud computing, and smart computing to artificial intelligence.
Data Spaces
This open access book aims to educate data space designers to understand what is required to create a successful data space. It explores cutting-edge theory, technologies, methodologies, and best practices for data spaces for both industrial and personal data and provides the reader with a basis for understanding the design, deployment, and future directions of data spaces.The book captures the early lessons and experience in creating data spaces. It arranges these contributions into three parts covering design, deployment, and future directions respectively. The first part explores the design space of data spaces. The single chapters detail the organisational design for data spaces, data platforms, data governance federated learning, personal data sharing, data marketplaces, and hybrid artificial intelligence for data spaces.The second part describes the use of data spaces within real-world deployments. Its chapters are co-authored with industry experts and include case studies of data spaces in sectors including industry 4.0, food safety, FinTech, health care, and energy.The third and final part details future directions for data spaces, including challenges and opportunities for common European data spaces and privacy-preserving techniques for trustworthy data sharing.The book is of interest to two primary audiences: first, researchers interested in data management and data sharing, and second, practitioners and industry experts engaged in data-driven systems where the sharing and exchange of data within an ecosystem are critical.
Advances in Knowledge Discovery and Data Mining
The 3-volume set LNAI 13280, LNAI 13281 and LNAI 13282 constitutes the proceedings of the 26th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2022, which was held during May 2022 in Chengdu, China. The 121 papers included in the proceedings were carefully reviewed and selected from a total of 558 submissions. They were organized in topical sections as follows: Part I: Data Science and Big Data Technologies, Part II: Foundations; and Part III: Applications.
Database Systems for Advanced Applications
The three-volume set LNCS 13245, 13246 and 13247 constitutes the proceedings of the 26th International Conference on Database Systems for Advanced Applications, DASFAA 2022, held online, in April 2021. The total of 72 full papers, along with 76 short papers, are presented in this three-volume set was carefully reviewed and selected from 543 submissions. Additionally, 13 industrial papers, 9 demo papers and 2 PhD consortium papers are included. The conference was planned to take place in Hyderabad, India, but it was held virtually due to the COVID-19 pandemic.
World Yearbook of Education 2021
Providing a comprehensive introduction to the topic of accountability and datafication in the governance of education, the World Yearbook of Education 2021 considers global policy dynamics and policy enactment processes. Chapters pay particular attention to the role of international organizations and the private sector in the promotion of performance-based accountability (PBA) in different educational settings and at multiple policy scales. Organized into three sections, chapters cover: the global/local construction of accountability and datafication; global discourse and national translations of performance-based accountability policies; and enactments and effects of accountability and datafication, including controversies and critical issues. With carefully chosen international contributions from around the globe, the World Yearbook of Education 2021 is ideal reading for anyone interested in the future of accountability and datafication in the governance of education.
Enterprise Architecture and Cartography
This textbook provides guidance to both students and practitioners of enterprise architecture (EA) on how to develop and maintain enterprise models. Rather than providing yet another list of EA notations and frameworks from A to Z, it focuses on methods to perform such tasks. The problem of EA maintenance, named Enterprise Cartography, is an important aspect addressed in this book because EA is a never ending challenge that increases as the organization transformations pace also increases. The long time perspective also entails the evolution of architectural frameworks and notations, something that does not occur when developing new models. Thus, a catalogue of patterns, principles and methods is presented to develop and maintain EA models and views. After a general introduction to the book in chapter 1, chapter 2 presents basic concepts for EA modeling. Chapter 3 further details the set of EA concepts needed to present the patterns, and principles, which are subsequently introduced in chapter 4. Next, chapter 5 describes enterprise cartography concepts and principles. The remaining book then turns to techniques and methodologies. In chapter 6 an EA development method is summarized. In chapter 7 an enterprise strategy design approach is proposed, while in chapter 8 a business process design methodology is described. Chapters 9 and 10 focus on information architecture and information systems architecture design approaches, including information systems architecture planning and application portfolio management. Eventually, chapter 11 describes a method for enterprise cartography (EC) design. Last not least, several case studies on EA and EC are proposed in the last chapter.
Decidability of Parameterized Verification
While the classic model checking problem is to decide whether a finite system satisfies a specification, the goal of parameterized model checking is to decide, given finite systems ����(n) parameterized by n ∈ ℕ, whether, for all n ∈ ℕ, the system ����(n) satisfies a specification. In this book we consider the important case of ����(n) being a concurrent system, where the number of replicated processes depends on the parameter n but each process is independent of n. Examples are cache coherence protocols, networks of finite-state agents, and systems that solve mutual exclusion or scheduling problems. Further examples are abstractions of systems, where the processes of the original systems actually depend on the parameter. The literature in this area has studied a wealth of computational models based on a variety of synchronization and communication primitives, including token passing, broadcast, and guarded transitions. Often, different terminology is used in the literature, and results are based on implicit assumptions. In this book, we introduce a computational model that unites the central synchronization and communication primitives of many models, and unveils hidden assumptions from the literature. We survey existing decidability and undecidability results, and give a systematic view of the basic problems in this exciting research area.
Distributed Graph Coloring
The focus of this monograph is on symmetry breaking problems in the message-passing model of distributed computing. In this model a communication network is represented by a n-vertex graph G = (V, E), whose vertices host autonomous processors. The processors communicate over the edges of G in discrete rounds. The goal is to devise algorithms that use as few rounds as possible. A typical symmetry-breaking problem is the problem of graph coloring. Denote by ? the maximum degree of G. While coloring G with ? + 1 colors is trivial in the centralized setting, the problem becomes much more challenging in the distributed one. One can also compromise on the number of colors, if this allows for more efficient algorithms. Other typical symmetry-breaking problems are the problems of computing a maximal independent set (MIS) and a maximal matching (MM). The study of these problems dates back to the very early days of distributed computing. The founding fathers of distributed computing laid firm foundations for the area of distributed symmetry breaking already in the eighties. In particular, they showed that all these problems can be solved in randomized logarithmic time. Also, Linial showed that an O(?2)-coloring can be solved very efficiently deterministically. However, fundamental questions were left open for decades. In particular, it is not known if the MIS or the (? + 1)-coloring can be solved in deterministic polylogarithmic time. Moreover, until recently it was not known if in deterministic polylogarithmic time one can color a graph with significantly fewer than ?2 colors. Additionally, it was open (and still open to some extent) if one can have sublogarithmic randomized algorithms for the symmetry breaking problems. Recently, significant progress was achieved in the study of these questions. More efficient deterministic and randomized (? + 1)-coloring algorithms were achieved. Deterministic ?1 + o(1)-coloring algorithms with polylogarithmic running time were devised. Improved (and often sublogarithmic-time) randomized algorithms were devised. Drastically improved lower bounds were given. Wide families of graphs in which these problems are solvable much faster than on general graphs were identified. The objective of our monograph is to cover most of these developments, and as a result to provide a treatise on theoretical foundations of distributed symmetry breaking in the message-passing model. We hope that our monograph will stimulate further progress in this exciting area.
Distributed Computing by Oblivious Mobile Robots
The study of what can be computed by a team of autonomous mobile robots, originally started in robotics and AI, has become increasingly popular in theoretical computer science (especially in distributed computing), where it is now an integral part of the investigations on computability by mobile entities. The robots are identical computational entities located and able to move in a spatial universe; they operate without explicit communication and are usually unable to remember the past; they are extremely simple, with limited resources, and individually quite weak. However, collectively the robots are capable of performing complex tasks, and form a system with desirable fault-tolerant and self-stabilizing properties. The research has been concerned with the computational aspects of such systems. In particular, the focus has been on the minimal capabilities that the robots should have in order to solve a problem. This book focuses on the recent algorithmic results in the field of distributed computing by oblivious mobile robots (unable to remember the past). After introducing the computational model with its nuances, we focus on basic coordination problems: pattern formation, gathering, scattering, leader election, as well as on dynamic tasks such as flocking. For each of these problems, we provide a snapshot of the state of the art, reviewing the existing algorithmic results. In doing so, we outline solution techniques, and we analyze the impact of the different assumptions on the robots' computability power. Table of Contents: Introduction / Computational Models / Gathering and Convergence / Pattern Formation / Scatterings and Coverings / Flocking / Other Directions
Iot and Big Data Analytics for Smart Cities
The book IoT and Big Data Analytics (IoT-BDA) for Smart Cities - A Global Perspective, emphasizes the challenges, architectural models, and intelligent frameworks with smart decisionmaking systems using Big Data and IoT with case studies. The book illustrates the benefits of Big Data and IoT methods in framing smart systems for smart applications. The text is a coordinated amalgamation of research contributions and industrial applications in the field of smart cities. Features: Provides the necessity of convergence of Big Data Analytics and IoT techniques in smart city application Challenges and Roles of IoT and Big Data in Smart City applications Provides Big Data-IoT intelligent smart systems in a global perspective Provides a predictive framework that can handle the traffic on abnormal days, such as weekends and festival holidays Gives various solutions and ideas for smart traffic development in smart cities Gives a brief idea of the available algorithms/techniques of Big Data and IoT and guides in developing a solution for smart city applications This book is primarily aimed at IT professionals. Undergraduates, graduates, and researchers in the area of computer science and information technology will also find this book useful.
Advanced Information Systems Engineering
This book constitutes the refereed proceedings of the 34th International Conference on Advanced Information Systems Engineering, CAiSE 2022, which was held in Leuven, Belgium, during June 6-10, 2022.The 31 full papers included in these proceedings were selected from 203 submissions. They were organized in topical sections as follows: Process mining; sustainable and explainable applications; tools and methods to support research and design; process modeling; natural language processing techniques in IS engineering; process monitoring and simulation; graph and network models; model analysis and comprehension; recommender systems; conceptual models, metamodels and taxonomies; and services engineering and digitalization.
Implementation and Application of Automata
This book constitutes the proceedings of the 26th International Conference on Implementation and Application of Automata, CIAA 2022, held in Rouen, France in June/ July 2022. The 16 regular papers presented together with 3 invited lectures in this book were carefully reviewed and selected from 26 submissions. The topics of the papers covering various fields in the application, implementation, and theory of automata and related structures.
Distributed Computing Pearls
Computers and computer networks are one of the most incredible inventions of the 20th century, having an ever-expanding role in our daily lives by enabling complex human activities in areas such as entertainment, education, and commerce. One of the most challenging problems in computer science for the 21st century is to improve the design of distributed systems where computing devices have to work together as a team to achieve common goals. In this book, I have tried to gently introduce the general reader to some of the most fundamental issues and classical results of computer science underlying the design of algorithms for distributed systems, so that the reader can get a feel of the nature of this exciting and fascinating field called distributed computing. The book will appeal to the educated layperson and requires no computer-related background. I strongly suspect that also most computer-knowledgeable readers will be able to learn something new.
Introduction to Distributed Self-Stabilizing Algorithms
This book aims at being a comprehensive and pedagogical introduction to the concept of self-stabilization, introduced by Edsger Wybe Dijkstra in 1973. Self-stabilization characterizes the ability of a distributed algorithm to converge within finite time to a configuration from which its behavior is correct (i.e., satisfies a given specification), regardless the arbitrary initial configuration of the system. This arbitrary initial configuration may be the result of the occurrence of a finite number of transient faults. Hence, self-stabilization is actually considered as a versatile non-masking fault tolerance approach, since it recovers from the effect of any finite number of such faults in an unified manner. Another major interest of such an automatic recovery method comes from the difficulty of resetting malfunctioning devices in a large-scale (and so, geographically spread) distributed system (the Internet, Pair-to-Pair networks, and Delay Tolerant Networks are examples of such distributed systems). Furthermore, self-stabilization is usually recognized as a lightweight property to achieve fault tolerance as compared to other classical fault tolerance approaches. Indeed, the overhead, both in terms of time and space, of state-of-the-art self-stabilizing algorithms is commonly small. This makes self-stabilization very attractive for distributed systems equipped of processes with low computational and memory capabilities, such as wireless sensor networks. After more than 40 years of existence, self-stabilization is now sufficiently established as an important field of research in theoretical distributed computing to justify its teaching in advanced research-oriented graduate courses. This book is an initiation course, which consists of the formal definition of self-stabilization and its related concepts, followed by a deep review and study of classical (simple) algorithms, commonly used proof schemes and design patterns, as well as premium results issued from the self-stabilizing community. As often happens in the self-stabilizing area, in this book we focus on the proof of correctness and the analytical complexity of the studied distributed self-stabilizing algorithms. Finally, we underline that most of the algorithms studied in this book are actually dedicated to the high-level atomic-state model, which is the most commonly used computational model in the self-stabilizing area. However, in the last chapter, we present general techniques to achieve self-stabilization in the low-level message passing model, as well as example algorithms.
Network Topology and Fault-Tolerant Consensus
As the structure of contemporary communication networks grows more complex, practical networked distributed systems become prone to component failures. Fault-tolerant consensus in message-passing systems allows participants in the system to agree on a common value despite the malfunction or misbehavior of some components. It is a task of fundamental importance for distributed computing, due to its numerous applications. We summarize studies on the topological conditions that determine the feasibility of consensus, mainly focusing on directed networks and the case of restricted topology knowledge at each participant. Recently, significant efforts have been devoted to fully characterize the underlying communication networks in which variations of fault-tolerant consensus can be achieved. Although the deduction of analogous topological conditions for undirected networks of known topology had shortly followed the introduction of the problem, their extension to the directed network case has been proven a highly non-trivial task. Moreover, global knowledge restrictions, inherent in modern large-scale networks, require more elaborate arguments concerning the locality of distributed computations. In this work, we present the techniques and ideas used to resolve these issues. Recent studies indicate a number of parameters that affect the topological conditions under which consensus can be achieved, namely, the fault model, the degree of system synchrony (synchronous vs. asynchronous), the type of agreement (exact vs. approximate), the level of topology knowledge, and the algorithm class used (general vs. iterative). We outline the feasibility and impossibility results for various combinations of the above parameters, extensively illustrating the relation between network topology and consensus.
State-Space Control Systems
These days, nearly all the engineering problem are solved with the aid of suitable computer packages. This book shows how MATLAB/Simulink could be used to solve state-space control problems. In this book, it is assumed that you are familiar with the theory and concepts of state-space control, i.e., you took or you are taking a course on state-space control system and you read this book in order to learn how to solve state-space control problems with the aid of MATLAB/Simulink. The book is composed of three chapters. Chapter 1 shows how a state-space mathematical model could be entered into the MATLAB/Simulink environment. Chapter 2 shows how a nonlinear system could be linearized around the desired opperating point with the aid of tools provided by MATLAB/Simulink. Finally, Chapter 3 shows how a state-space controller could be designed with the aid MATLAB and be tested with Simulink. The book will be usefull for students and practical engineers who want to design a state-space control system.
Feedback Control Systems
Feedback control systems is an important course in aerospace engineering, chemical engineering, electrical engineering, mechanical engineering, and mechatronics engineering, to name just a few. Feedback control systems improve the system's behavior so the desired response can be acheived. The first course on control engineering deals with Continuous Time (CT) Linear Time Invariant (LTI) systems. Plenty of good textbooks on the subject are available on the market, so there is no need to add one more. This book does not focus on the control engineering theories as it is assumed that the reader is familiar with them, i.e., took/takes a course on control engineering, and now wants to learn the applications of MATLAB(R) in control engineering. The focus of this book is control engineering applications of MATLAB(R) for a first course on control engineering.
Control Systems Synthesis
This book introduces the so-called ""stable factorization approach"" to the synthesis of feedback controllers for linear control systems. The key to this approach is to view the multi-input, multi-output (MIMO) plant for which one wishes to design a controller as a matrix over the fraction field F associated with a commutative ring with identity, denoted by R, which also has no divisors of zero. In this setting, the set of single-input, single-output (SISO) stable control systems is precisely the ring R, while the set of stable MIMO control systems is the set of matrices whose elements all belong to R. The set of unstable, meaning not necessarily stable, control systems is then taken to be the field of fractions F associated with R in the SISO case, and the set of matrices with elements in F in the MIMO case. The central notion introduced in the book is that, in most situations of practical interest, every matrix P whose elements belong to F can be ""factored"" as a ""ratio"" of two matrices N, D whose elements belong to R, in such a way that N, D are coprime. In the familiar case where the ring R corresponds to the set of bounded-input, bounded-output (BIBO)-stable rational transfer functions, coprimeness is equivalent to two functions not having any common zeros in the closed right half-plane including infinity. However, the notion of coprimeness extends readily to discrete-time systems, distributed-parameter systems in both the continuous- as well as discrete-time domains, and to multi-dimensional systems. Thus the stable factorization approach enables one to capture all these situations within a common framework. The key result in the stable factorization approach is the parametrization of all controllers that stabilize a given plant. It is shown that the set of all stabilizing controllers can be parametrized by a single parameter R, whose elements all belong to R. Moreover, every transfer matrix in the closed-loop system is an affine function of the design parameter R. Thus problems of reliable stabilization, disturbance rejection, robust stabilization etc. can all be formulated in terms of choosing an appropriate R. This is a reprint of the book Control System Synthesis: A Factorization Approach originally published by M.I.T. Press in 1985. Table of Contents: Filtering and Sensitivity Minimization / Robustness / Extensions to General Settings
Control Systems Synthesis
This book introduces the so-called ""stable factorization approach"" to the synthesis of feedback controllers for linear control systems. The key to this approach is to view the multi-input, multi-output (MIMO) plant for which one wishes to design a controller as a matrix over the fraction field F associated with a commutative ring with identity, denoted by R, which also has no divisors of zero. In this setting, the set of single-input, single-output (SISO) stable control systems is precisely the ring R, while the set of stable MIMO control systems is the set of matrices whose elements all belong to R. The set of unstable, meaning not necessarily stable, control systems is then taken to be the field of fractions F associated with R in the SISO case, and the set of matrices with elements in F in the MIMO case. The central notion introduced in the book is that, in most situations of practical interest, every matrix P whose elements belong to F can be ""factored"" as a ""ratio"" of two matrices N, D whose elements belong to R, in such a way that N, D are coprime. In the familiar case where the ring R corresponds to the set of bounded-input, bounded-output (BIBO)-stable rational transfer functions, coprimeness is equivalent to two functions not having any common zeros in the closed right half-plane including infinity. However, the notion of coprimeness extends readily to discrete-time systems, distributed-parameter systems in both the continuous- as well as discrete-time domains, and to multi-dimensional systems. Thus the stable factorization approach enables one to capture all these situations within a common framework. The key result in the stable factorization approach is the parametrization of all controllers that stabilize a given plant. It is shown that the set of all stabilizing controllers can be parametrized by a single parameter R, whose elements all belong to R. Moreover, every transfer matrix in the closed-loop system is an affine function of the design parameter R. Thus problems of reliable stabilization, disturbance rejection, robust stabilization etc. can all be formulated in terms of choosing an appropriate R. This is a reprint of the book Control System Synthesis: A Factorization Approach originally published by M.I.T. Press in 1985. Table of Contents: Introduction / Proper Stable Rational Functions / Scalar Systems: An Introduction / Matrix Rings / Stabilization
Visual Analysis of Multilayer Networks
The emergence of multilayer networks as a concept from the field of complex systems provides many new opportunities for the visualization of network complexity, and has also raised many new exciting challenges. The multilayer network model recognizes that the complexity of relationships between entities in real-world systems is better embraced as several interdependent subsystems (or layers) rather than a simple graph approach. Despite only recently being formalized and defined, this model can be applied to problems in the domains of life sciences, sociology, digital humanities, and more. Within the domain of network visualization there already are many existing systems, which visualize data sets having many characteristics of multilayer networks, and many techniques, which are applicable to their visualization. In this Synthesis Lecture, we provide an overview and structured analysis of contemporary multilayer network visualization. This is not only for researchers in visualization, but also for those who aim to visualize multilayer networks in the domain of complex systems, as well as those solving problems within application domains. We have explored the visualization literature to survey visualization techniques suitable for multilayer network visualization, as well as tools, tasks, and analytic techniques from within application domains. We also identify the research opportunities and examine outstanding challenges for multilayer network visualization along with potential solutions and future research directions for addressing them.
Data Stream Management
Many applications process high volumes of streaming data, among them Internet traffic analysis, financial tickers, and transaction log mining. In general, a data stream is an unbounded data set that is produced incrementally over time, rather than being available in full before its processing begins. In this lecture, we give an overview of recent research in stream processing, ranging from answering simple queries on high-speed streams to loading real-time data feeds into a streaming warehouse for off-line analysis. We will discuss two types of systems for end-to-end stream processing: Data Stream Management Systems (DSMSs) and Streaming Data Warehouses (SDWs). A traditional database management system typically processes a stream of ad-hoc queries over relatively static data. In contrast, a DSMS evaluates static (long-running) queries on streaming data, making a single pass over the data and using limited working memory. In the first part of this lecture, we will discuss research problems in DSMSs, such as continuous query languages, non-blocking query operators that continually react to new data, and continuous query optimization. The second part covers SDWs, which combine the real-time response of a DSMS by loading new data as soon as they arrive with a data warehouse's ability to manage Terabytes of historical data on secondary storage. Table of Contents: Introduction / Data Stream Management Systems / Streaming Data Warehouses / Conclusions
Articulation and Intelligibility
Immediately following the Second World War, between 1947 and 1955, several classic papers quantified the fundamentals of human speech information processing and recognition. In 1947 French and Steinberg published their classic study on the articulation index. In 1948 Claude Shannon published his famous work on the theory of information. In 1950 Fletcher and Galt published their theory of the articulation index, a theory that Fletcher had worked on for 30 years, which integrated his classic works on loudness and speech perception with models of speech intelligibility. In 1951 George Miller then wrote the first book Language and Communication, analyzing human speech communication with Claude Shannon's just published theory of information. Finally in 1955 George Miller published the first extensive analysis of phone decoding, in the form of confusion matrices, as a function of the speech-to-noise ratio. This work extended the Bell Labs' speech articulation studies with ideas from Shannon's Information theory. Both Miller and Fletcher showed that speech, as a code, is incredibly robust to mangling distortions of filtering and noise. Regrettably much of this early work was forgotten. While the key science of information theory blossomed, other than the work of George Miller, it was rarely applied to aural speech research. The robustness of speech, which is the most amazing thing about the speech code, has rarely been studied. It is my belief (i.e., assumption) that we can analyze speech intelligibility with the scientific method. The quantitative analysis of speech intelligibility requires both science and art. The scientific component requires an error analysis of spoken communication, which depends critically on the use of statistics, information theory, and psychophysical methods. The artistic component depends on knowing how to restrict the problem in such a way that progress may be made. It is critical to tease out the relevant from the irrelevant and dig for the key issues. This will focus us on the decoding of nonsense phonemes with no visual component, which have been mangled by filtering and noise. This monograph is a summary and theory of human speech recognition. It builds on and integrates the work of Fletcher, Miller, and Shannon. The long-term goal is to develop a quantitative theory for predicting the recognition of speech sounds. In Chapter 2 the theory is developed for maximum entropy (MaxEnt) speech sounds, also called nonsense speech. In Chapter 3, context is factored in. The book is largely reflective, and quantitative, with a secondary goal of providing an historical context, along with the many deep insights found in these early works.
Latent Semantic Mapping
Latent semantic mapping (LSM) is a generalization of latent semantic analysis (LSA), a paradigm originally developed to capture hidden word patterns in a text document corpus. In information retrieval, LSA enables retrieval on the basis of conceptual content, instead of merely matching words between queries and documents. It operates under the assumption that there is some latent semantic structure in the data, which is partially obscured by the randomness of word choice with respect to retrieval. Algebraic and/or statistical techniques are brought to bear to estimate this structure and get rid of the obscuring ""noise."" This results in a parsimonious continuous parameter description of words and documents, which then replaces the original parameterization in indexing and retrieval. This approach exhibits three main characteristics: -Discrete entities (words and documents) are mapped onto a continuous vector space; -This mapping is determined by global correlation patterns; and -Dimensionality reduction is an integral part of the process. Such fairly generic properties are advantageous in a variety of different contexts, which motivates a broader interpretation of the underlying paradigm. The outcome (LSM) is a data-driven framework for modeling meaningful global relationships implicit in large volumes of (not necessarily textual) data. This monograph gives a general overview of the framework, and underscores the multifaceted benefits it can bring to a number of problems in natural language understanding and spoken language processing. It concludes with a discussion of the inherent tradeoffs associated with the approach, and some perspectives on its general applicability to data-driven information extraction. Contents: I. Principles / Introduction / Latent Semantic Mapping / LSM Feature Space / Computational Effort / Probabilistic Extensions / II. Applications/ Junk E-mail Filtering / Semantic Classification / Language Modeling / Pronunciation Modeling / Speaker Verification / TTS Unit Selection / III. Perspectives / Discussion / Conclusion / Bibliography
Natural Language Processing for Historical Texts
More and more historical texts are becoming available in digital form. Digitization of paper documents is motivated by the aim of preserving cultural heritage and making it more accessible, both to laypeople and scholars. As digital images cannot be searched for text, digitization projects increasingly strive to create digital text, which can be searched and otherwise automatically processed, in addition to facsimiles. Indeed, the emerging field of digital humanities heavily relies on the availability of digital text for its studies. Together with the increasing availability of historical texts in digital form, there is a growing interest in applying natural language processing (NLP) methods and tools to historical texts. However, the specific linguistic properties of historical texts -- the lack of standardized orthography, in particular -- pose special challenges for NLP. This book aims to give an introduction to NLP for historical texts and an overview of the state of the art in this field. The book starts with an overview of methods for the acquisition of historical texts (scanning and OCR), discusses text encoding and annotation schemes, and presents examples of corpora of historical texts in a variety of languages. The book then discusses specific methods, such as creating part-of-speech taggers for historical languages or handling spelling variation. A final chapter analyzes the relationship between NLP and the digital humanities. Certain recently emerging textual genres, such as SMS, social media, and chat messages, or newsgroup and forum postings share a number of properties with historical texts, for example, nonstandard orthography and grammar, and profuse use of abbreviations. The methods and techniques required for the effective processing of historical texts are thus also of interest for research in other domains. Table of Contents: Introduction / NLP and Digital Humanities / Spelling in Historical Texts / Acquiring Historical Texts / Text Encoding andAnnotation Schemes / Handling Spelling Variation / NLP Tools for Historical Languages / Historical Corpora / Conclusion / Bibliography
User-Centered Data Management
This lecture covers several core issues in user-centered data management, including how to design usable interfaces that suitably support database tasks, and relevant approaches to visual querying, information visualization, and visual data mining. Novel interaction paradigms, e.g., mobile and interfaces that go beyond the visual dimension, are also discussed. Table of Contents: Why User-Centered / The Early Days: Visual Query Systems / Beyond Querying / More Advanced Applications / Non-Visual Interfaces / Conclusions
Automatic Parallelization
Compiling for parallelism is a longstanding topic of compiler research. This book describes the fundamental principles of compiling "regular" numerical programs for parallelism. We begin with an explanation of analyses that allow a compiler to understand the interaction of data reads and writes in different statements and loop iterations during program execution. These analyses include dependence analysis, use-def analysis and pointer analysis. Next, we describe how the results of these analyses are used to enable transformations that make loops more amenable to parallelization, and discuss transformations that expose parallelism to target shared memory multicore and vector processors. We then discuss some problems that arise when parallelizing programs for execution on distributed memory machines. Finally, we conclude with an overview of solving Diophantine equations and suggestions for further readings in the topics of this book to enable the interested reader to delve deeper into the field. Table of Contents: Introduction and overview / Dependence analysis, dependence graphs and alias analysis / Program parallelization / Transformations to modify and eliminate dependences / Transformation of iterative and recursive constructs / Compiling for distributed memory machines / Solving Diophantine equations / A guide to further reading
Sentiment Analysis and Opinion Mining
Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language. It is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining. In fact, this research has spread outside of computer science to the management sciences and social sciences due to its importance to business and society as a whole. The growing importance of sentiment analysis coincides with the growth of social media such as reviews, forum discussions, blogs, micro-blogs, Twitter, and social networks. For the first time in human history, we now have a huge volume of opinionated data recorded in digital form for analysis. Sentiment analysis systems are being applied in almost every business and social domain because opinions are central to almost all human activities and are key influencers of our behaviors. Our beliefs and perceptions of reality, and the choices we make, are largely conditioned on how others see and evaluate the world. For this reason, when we need to make a decision we often seek out the opinions of others. This is true not only for individuals but also for organizations. This book is a comprehensive introductory and survey text. It covers all important topics and the latest developments in the field with over 400 references. It is suitable for students, researchers and practitioners who are interested in social media analysis in general and sentiment analysis in particular. Lecturers can readily use it in class for courses on natural language processing, social media analysis, text mining, and data mining. Lecture slides are also available online. Table of Contents: Preface / Sentiment Analysis: A Fascinating Problem / The Problem of Sentiment Analysis / Document Sentiment Classification / Sentence Subjectivity and Sentiment Classification / Aspect-Based Sentiment Analysis / Sentiment Lexicon Generation /Opinion Summarization / Analysis of Comparative Opinions / Opinion Search and Retrieval / Opinion Spam Detection / Quality of Reviews / Concluding Remarks / Bibliography / Author Biography
Uncertain Schema Matching
Schema matching is the task of providing correspondences between concepts describing the meaning of data in various heterogeneous, distributed data sources. Schema matching is one of the basic operations required by the process of data and schema integration, and thus has a great effect on its outcomes, whether these involve targeted content delivery, view integration, database integration, query rewriting over heterogeneous sources, duplicate data elimination, or automatic streamlining of workflow activities that involve heterogeneous data sources. Although schema matching research has been ongoing for over 25 years, more recently a realization has emerged that schema matchers are inherently uncertain. Since 2003, work on the uncertainty in schema matching has picked up, along with research on uncertainty in other areas of data management. This lecture presents various aspects of uncertainty in schema matching within a single unified framework. We introduce basic formulations of uncertainty and provide several alternative representations of schema matching uncertainty. Then, we cover two common methods that have been proposed to deal with uncertainty in schema matching, namely ensembles, and top-K matchings, and analyze them in this context. We conclude with a set of real-world applications. Table of Contents: Introduction / Models of Uncertainty / Modeling Uncertain Schema Matching / Schema Matcher Ensembles / Top-K Schema Matchings / Applications / Conclusions and Future Work
Relational and XML Data Exchange
Data exchange is the problem of finding an instance of a target schema, given an instance of a source schema and a specification of the relationship between the source and the target. Such a target instance should correctly represent information from the source instance under the constraints imposed by the target schema, and it should allow one to evaluate queries on the target instance in a way that is semantically consistent with the source data. Data exchange is an old problem that re-emerged as an active research topic recently, due to the increased need for exchange of data in various formats, often in e-business applications. In this lecture, we give an overview of the basic concepts of data exchange in both relational and XML contexts. We give examples of data exchange problems, and we introduce the main tasks that need to addressed. We then discuss relational data exchange, concentrating on issues such as relational schema mappings, materializing target instances (including canonical solutions and cores), query answering, and query rewriting. After that, we discuss metadata management, i.e., handling schema mappings themselves. We pay particular attention to operations on schema mappings, such as composition and inverse. Finally, we describe both data exchange and metadata management in the context of XML. We use mappings based on transforming tree patterns, and we show that they lead to a host of new problems that did not arise in the relational case, but they need to be addressed for XML. These include consistency issues for mappings and schemas, as well as imposing tighter restrictions on mappings and queries to achieve tractable query answering in data exchange. Table of Contents: Overview / Relational Mappings and Data Exchange / Metadata Management / XML Mappings and Data Exchange
Mobile Robotics for Multidisciplinary Study
This lecture provides an introduction to the field of mobile robotics and the intersection between multiple robotics-related disciplines including electrical, mechanical, computer, software engineering and computer science. It is intended for an upper-level undergraduate or first-year graduate students interested in mobile robotics and artificial intelligence with some experience in object-oriented programming and controls. Focus areas will include robotics history, hardware, control and software. Specific topics include robot components, effectors and actuators, locomotion, kinematics, sensors, feedback control, control architectures, representation, navigation, localization and mapping. The end of each chapter includes review questions as well as exercises to provide applications for the concepts as well as opportunities for further study. Table of Contents: Introduction / Hardware / Control / Software
Mining Structures of Factual Knowledge from Text
The real-world data, though massive, is largely unstructured, in the form of natural-language text. It is challenging but highly desirable to mine structures from massive text data, without extensive human annotation and labeling. In this book, we investigate the principles and methodologies of mining structures of factual knowledge (e.g., entities and their relationships) from massive, unstructured text corpora. Departing from many existing structure extraction methods that have heavy reliance on human annotated data for model training, our effort-light approach leverages human-curated facts stored in external knowledge bases as distant supervision and exploits rich data redundancy in large text corpora for context understanding. This effort-light mining approach leads to a series of new principles and powerful methodologies for structuring text corpora, including (1) entity recognition, typing and synonym discovery, (2) entity relation extraction, and (3) open-domain attribute-valuemining and information extraction. This book introduces this new research frontier and points out some promising research directions.
Data Mining and Market Intelligence
This book is written to address the issues relating to data gathering, data warehousing, and data analysis, all of which are useful when working with large amounts of data. Using practical examples of market intelligence, this book is designed to inspire and inform readers on how to conduct market intelligence by leveraging data and technology, supporting smart decision making. The book explains some suitable methodologies for data analysis that are based on robust statistical methods. For illustrative purposes, the author uses real-life data for all the examples in this book. In addition, the book discusses the concepts, techniques, and applications of digital media and mobile data mining. Hence, this book is a guide tool for policy makers, academics, and practitioners whose areas of interest are statistical inference, applied statistics, applied mathematics, business mathematics, quantitative techniques, and economic and social statistics.
Scalable Processing of Spatial-Keyword Queries
Text data that is associated with location data has become ubiquitous. A tweet is an example of this type of data, where the text in a tweet is associated with the location where the tweet has been issued. We use the term spatial-keyword data to refer to this type of data. Spatial-keyword data is being generated at massive scale. Almost all online transactions have an associated spatial trace. The spatial trace is derived from GPS coordinates, IP addresses, or cell-phone-tower locations. Hundreds of millions or even billions of spatial-keyword objects are being generated daily. Spatial-keyword data has numerous applications that require efficient processing and management of massive amounts of spatial-keyword data. This book starts by overviewing some important applications of spatial-keyword data, and demonstrates the scale at which spatial-keyword data is being generated. Then, it formalizes and classifies the various types of queries that execute over spatial-keyword data.Next, it discusses important and desirable properties of spatial-keyword query languages that are needed to express queries over spatial-keyword data. As will be illustrated, existing spatial-keyword query languages vary in the types of spatial-keyword queries that they can support. There are many systems that process spatial-keyword queries. Systems differ from each other in various aspects, e.g., whether the system is batch-oriented or stream-based, and whether the system is centralized or distributed. Moreover, spatial-keyword systems vary in the types of queries that they support. Finally, systems vary in the types of indexing techniques that they adopt. This book provides an overview of the main spatial-keyword data-management systems (SKDMSs), and classifies them according to their features. Moreover, the book describes the main approaches adopted when indexing spatial-keyword data in the centralized and distributed settings. Several case studies of {SKDMSs} are presentedalong with the applications and query types that these {SKDMSs} are targeted for and the indexing techniques they utilize for processing their queries. Optimizing the performance and the query processing of {SKDMSs} still has many research challenges and open problems. The book concludes with a discussion about several important and open research-problems in the domain of scalable spatial-keyword processing.
Exploratory Causal Analysis with Time Series Data
Many scientific disciplines rely on observational data of systems for which it is difficult (or impossible) to implement controlled experiments. Data analysis techniques are required for identifying causal information and relationships directly from such observational data. This need has led to the development of many different time series causality approaches and tools including transfer entropy, convergent cross-mapping (CCM), and Granger causality statistics. A practicing analyst can explore the literature to find many proposals for identifying drivers and causal connections in time series data sets. Exploratory causal analysis (ECA) provides a framework for exploring potential causal structures in time series data sets and is characterized by a myopic goal to determine which data series from a given set of series might be seen as the primary driver. In this work, ECA is used on several synthetic and empirical data sets, and it is found that all of the tested time series causality tools agree with each other (and intuitive notions of causality) for many simple systems but can provide conflicting causal inferences for more complicated systems. It is proposed that such disagreements between different time series causality tools during ECA might provide deeper insight into the data than could be found otherwise.
Similarity Joins in Relational Database Systems
State-of-the-art database systems manage and process a variety of complex objects, including strings and trees. For such objects equality comparisons are often not meaningful and must be replaced by similarity comparisons. This book describes the concepts and techniques to incorporate similarity into database systems. We start out by discussing the properties of strings and trees, and identify the edit distance as the de facto standard for comparing complex objects. Since the edit distance is computationally expensive, token-based distances have been introduced to speed up edit distance computations. The basic idea is to decompose complex objects into sets of tokens that can be compared efficiently. Token-based distances are used to compute an approximation of the edit distance and prune expensive edit distance calculations. A key observation when computing similarity joins is that many of the object pairs, for which the similarity is computed, are very different from each other. Filters exploit this property to improve the performance of similarity joins. A filter preprocesses the input data sets and produces a set of candidate pairs. The distance function is evaluated on the candidate pairs only. We describe the essential query processing techniques for filters based on lower and upper bounds. For token equality joins we describe prefix, size, positional and partitioning filters, which can be used to avoid the computation of small intersections that are not needed since the similarity would be too low.
Incomplete Data and Data Dependencies in Relational Databases
The chase has long been used as a central tool to analyze dependencies and their effect on queries. It has been applied to different relevant problems in database theory such as query optimization, query containment and equivalence, dependency implication, and database schema design. Recent years have seen a renewed interest in the chase as an important tool in several database applications, such as data exchange and integration, query answering in incomplete data, and many others. It is well known that the chase algorithm might be non-terminating and thus, in order for it to find practical applicability, it is crucial to identify cases where its termination is guaranteed. Another important aspect to consider when dealing with the chase is that it can introduce null values into the database, thereby leading to incomplete data. Thus, in several scenarios where the chase is used the problem of dealing with data dependencies and incomplete data arises. This book discusses fundamental issues concerning data dependencies and incomplete data with a particular focus on the chase and its applications in different database areas. We report recent results about the crucial issue of identifying conditions that guarantee the chase termination. Different database applications where the chase is a central tool are discussed with particular attention devoted to query answering in the presence of data dependencies and database schema design. Table of Contents: Introduction / Relational Databases / Incomplete Databases / The Chase Algorithm / Chase Termination / Data Dependencies and Normal Forms / Universal Repairs / Chase and Database Applications
Query Answer Authentication
In data publishing, the owner delegates the role of satisfying user queries to a third-party publisher. As the servers of the publisher may be untrusted or susceptible to attacks, we cannot assume that they would always process queries correctly, hence there is a need for users to authenticate their query answers. This book introduces various notions that the research community has studied for defining the correctness of a query answer. In particular, it is important to guarantee the completeness, authenticity and minimality of the answer, as well as its freshness. We present authentication mechanisms for a wide variety of queries in the context of relational and spatial databases, text retrieval, and data streams. We also explain the cryptographic protocols from which the authentication mechanisms derive their security properties. Table of Contents: Introduction / Cryptography Foundation / Relational Queries / Spatial Queries / Text Search Queries / Data Streams / Conclusion
Deep Web Query Interface Understanding and Integration
There are millions of searchable data sources on the Web and to a large extent their contents can only be reached through their own query interfaces. There is an enormous interest in making the data in these sources easily accessible. There are primarily two general approaches to achieve this objective. The first is to surface the contents of these sources from the deep Web and add the contents to the index of regular search engines. The second is to integrate the searching capabilities of these sources and support integrated access to them. In this book, we introduce the state-of-the-art techniques for extracting, understanding, and integrating the query interfaces of deep Web data sources. These techniques are critical for producing an integrated query interface for each domain. The interface serves as the mediator for searching all data sources in the concerned domain. While query interface integration is only relevant for the deep Web integration approach, the extraction and understanding of query interfaces are critical for both deep Web exploration approaches. This book aims to provide in-depth and comprehensive coverage of the key technologies needed to create high quality integrated query interfaces automatically. The following technical issues are discussed in detail in this book: query interface modeling, query interface extraction, query interface clustering, query interface matching, query interface attribute integration, and query interface integration. Table of Contents: Introduction / Query Interface Representation and Extraction / Query Interface Clustering and Categorization / Query Interface Matching / Query Interface Attribute Integration / Query Interface Integration / Summary and Future Research
Foundations of Data Quality Management
Data quality is one of the most important problems in data management. A database system typically aims to support the creation, maintenance, and use of large amount of data, focusing on the quantity of data. However, real-life data are often dirty: inconsistent, duplicated, inaccurate, incomplete, or stale. Dirty data in a database routinely generate misleading or biased analytical results and decisions, and lead to loss of revenues, credibility and customers. With this comes the need for data quality management. In contrast to traditional data management tasks, data quality management enables the detection and correction of errors in the data, syntactic or semantic, in order to improve the quality of the data and hence, add value to business processes. While data quality has been a longstanding problem for decades, the prevalent use of the Web has increased the risks, on an unprecedented scale, of creating and propagating dirty data. This monograph gives an overview of fundamental issues underlying central aspects of data quality, namely, data consistency, data deduplication, data accuracy, data currency, and information completeness. We promote a uniform logical framework for dealing with these issues, based on data quality rules. The text is organized into seven chapters, focusing on relational data. Chapter One introduces data quality issues. A conditional dependency theory is developed in Chapter Two, for capturing data inconsistencies. It is followed by practical techniques in Chapter 2b for discovering conditional dependencies, and for detecting inconsistencies and repairing data based on conditional dependencies. Matching dependencies are introduced in Chapter Three, as matching rules for data deduplication. A theory of relative information completeness is studied in Chapter Four, revising the classical Closed World Assumption and the Open World Assumption, to characterize incomplete information in the real world. A data currency model is presented in Chapter Five, to identify the current values of entities in a database and to answer queries with the current values, in the absence of reliable timestamps. Finally, interactions between these data quality issues are explored in Chapter Six. Important theoretical results and practical algorithms are covered, but formal proofs are omitted. The bibliographical notes contain pointers to papers in which the results were presented and proven, as well as references to materials for further reading. This text is intended for a seminar course at the graduate level. It is also to serve as a useful resource for researchers and practitioners who are interested in the study of data quality. The fundamental research on data quality draws on several areas, including mathematical logic, computational complexity and database theory. It has raised as many questions as it has answered, and is a rich source of questions and vitality. Table of Contents: Data Quality: An Overview / Conditional Dependencies / Cleaning Data with Conditional Dependencies / Data Deduplication / Information Completeness / Data Currency / Interactions between Data Quality Issues
Outlier Detection for Temporal Data
Outlier (or anomaly) detection is a very broad field which has been studied in the context of a large number of research areas like statistics, data mining, sensor networks, environmental science, distributed systems, spatio-temporal mining, etc. Initial research in outlier detection focused on time series-based outliers (in statistics). Since then, outlier detection has been studied on a large variety of data types including high-dimensional data, uncertain data, stream data, network data, time series data, spatial data, and spatio-temporal data. While there have been many tutorials and surveys for general outlier detection, we focus on outlier detection for temporal data in this book. A large number of applications generate temporal datasets. For example, in our everyday life, various kinds of records like credit, personnel, financial, judicial, medical, etc., are all temporal. This stresses the need for an organized and detailed study of outliers with respect to such temporal data.In the past decade, there has been a lot of research on various forms of temporal data including consecutive data snapshots, series of data snapshots and data streams. Besides the initial work on time series, researchers have focused on rich forms of data including multiple data streams, spatio-temporal data, network data, community distribution data, etc. Compared to general outlier detection, techniques for temporal outlier detection are very different. In this book, we will present an organized picture of both recent and past research in temporal outlier detection. We start with the basics and then ramp up the reader to the main ideas in state-of-the-art outlier detection techniques. We motivate the importance of temporal outlier detection and brief the challenges beyond usual outlier detection. Then, we list down a taxonomy of proposed techniques for temporal outlier detection. Such techniques broadly include statistical techniques (like AR models, Markov models, histograms, neuralnetworks), distance- and density-based approaches, grouping-based approaches (clustering, community detection), network-based approaches, and spatio-temporal outlier detection approaches. We summarize by presenting a wide collection of applications where temporal outlier detection techniques have been applied to discover interesting outliers. Table of Contents: Preface / Acknowledgments / Figure Credits / Introduction and Challenges / Outlier Detection for Time Series and Data Sequences / Outlier Detection for Data Streams / Outlier Detection for Distributed Data Streams / Outlier Detection for Spatio-Temporal Data / Outlier Detection for Temporal Network Data / Applications of Outlier Detection for Temporal Data / Conclusions and Research Directions / Bibliography / Authors' Biographies
Probabilistic Databases
Probabilistic databases are databases where the value of some attributes or the presence of some records are uncertain and known only with some probability. Applications in many areas such as information extraction, RFID and scientific data management, data cleaning, data integration, and financial risk assessment produce large volumes of uncertain data, which are best modeled and processed by a probabilistic database. This book presents the state of the art in representation formalisms and query processing techniques for probabilistic data. It starts by discussing the basic principles for representing large probabilistic databases, by decomposing them into tuple-independent tables, block-independent-disjoint tables, or U-databases. Then it discusses two classes of techniques for query evaluation on probabilistic databases. In extensional query evaluation, the entire probabilistic inference can be pushed into the database engine and, therefore, processed as effectively as the evaluation of standard SQL queries. The relational queries that can be evaluated this way are called safe queries. In intensional query evaluation, the probabilistic inference is performed over a propositional formula called lineage expression: every relational query can be evaluated this way, but the data complexity dramatically depends on the query being evaluated, and can be #P-hard. The book also discusses some advanced topics in probabilistic data management such as top-k query processing, sequential probabilistic databases, indexing and materialized views, and Monte Carlo databases. Table of Contents: Overview / Data and Query Model / The Query Evaluation Problem / Extensional Query Evaluation / Intensional Query Evaluation / Advanced Techniques
P2P Techniques for Decentralized Applications
As an alternative to traditional client-server systems, Peer-to-Peer (P2P) systems provide major advantages in terms of scalability, autonomy and dynamic behavior of peers, and decentralization of control. Thus, they are well suited for large-scale data sharing in distributed environments. Most of the existing P2P approaches for data sharing rely on either structured networks (e.g., DHTs) for efficient indexing, or unstructured networks for ease of deployment, or some combination. However, these approaches have some limitations, such as lack of freedom for data placement in DHTs, and high latency and high network traffic in unstructured networks. To address these limitations, gossip protocols which are easy to deploy and scale well, can be exploited. In this book, we will give an overview of these different P2P techniques and architectures, discuss their trade-offs, and illustrate their use for decentralizing several large-scale data sharing applications. Table of Contents: P2P Overlays, Query Routing, and Gossiping / Content Distribution in P2P Systems / Recommendation Systems / Top-k Query Processing in P2P Systems