Data Science
This two volume set (CCIS 1628 and 1629) constitutes the refereed proceedings of the 8th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2022 held in Chengdu, China, in August, 2022. The 65 full papers and 26 short papers presented in these two volumes were carefully reviewed and selected from 261 submissions. The papers are organized in topical sections on: Big Data Management and Applications; Data Security and Privacy; Applications of Data Science; Infrastructure for Data Science; Education Track; Regulatory Technology in Finance.
Data Science
This two volume set (CCIS 1628 and 1629) constitutes the refereed proceedings of the 8th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2022 held in Chengdu, China, in August, 2022. The 65 full papers and 26 short papers presented in these two volumes were carefully reviewed and selected from 261 submissions. The papers are organized in topical sections on: Big Data Mining and Knowledge Management; Machine Learning for Data Science; Multimedia Data Management and Analysis.
Human Aspects of Information Security and Assurance
This book constitutes the proceedings of the 16th IFIP WG 11.12 International Symposium on Human Aspects of Information Security and Assurance, HAISA 2022, held in Mytilene, Lesbos, Greece, in July 2022. The 25 papers presented in this volume were carefully reviewed and selected from 30 submissions. They are organized in the following topical sections: cyber security education and training; cyber security culture; privacy; and cyber security management.
Reverse Mathematics
Reverse mathematics studies the complexity of proving mathematical theorems and solving mathematical problems. Typical questions include: Can we prove this result without first proving that one? Can a computer solve this problem? A highly active part of mathematical logic and computability theory, the subject offers beautiful results as well as significant foundational insights.This text provides a modern treatment of reverse mathematics that combines computability theoretic reductions and proofs in formal arithmetic to measure the complexity of theorems and problems from all areas of mathematics. It includes detailed introductions to techniques from computable mathematics, Weihrauch style analysis, and other parts of computability that have become integral to research in the field. Topics and features: Provides a complete introduction to reverse mathematics, including necessary background from computability theory, second order arithmetic, forcing, induction, and model constructionOffers a comprehensive treatment of the reverse mathematics of combinatorics, including Ramsey's theorem, Hindman's theorem, and many other resultsProvides central results and methods from the past two decades, appearing in book form for the first time and including preservation techniques and applications of probabilistic argumentsIncludes a large number of exercises of varying levels of difficulty, supplementing each chapterThe text will be accessible to students with a standard first year course in mathematical logic. It will also be a useful reference for researchers in reverse mathematics, computability theory, proof theory, and related areas.Damir D. Dzhafarov is an Associate Professor of Mathematics at the University of Connecticut, CT, USA. Carl Mummert is a Professor of Computer and Information Technology at Marshall University, WV, USA.
Exploratory Examination of Agent-Based Modeling for the Study of Social Movements
Social movement research is becoming increasingly important, as information and communications technologies (ICTs) have altered the ways movements form, organize, mobilize, and act, as well as the ways in which they are surveilled and disrupted. The authors of this report explore the use of agent-based modeling as a method for studying the effects of ICTs on the formation, maintenance, and dissolution of social movements over time.
Computer, Communication, and Signal Processing
This book constitutes the refereed proceedings of the 6th International Conference on Computer, Communication, and Signal Processing, ICCSP 2022, held in Chennai, India, in February 2022.* The 21 full and 2 short papers presented in this volume were carefully reviewed and selected from 111 submissions. The papers are categorized into topical sub-headings: artificial intelligence and machine learning; Cyber security; and internet of things.*The conference was held as a virtual event due to the COVID-19 pandemic.
Principles of Data Management
Data is a valuable corporate asset and its effective management is vital to success. This professional guide covers all the key areas of data management, including database development and corporate data modelling. The new edition adds chapters on linked data, concept systems and big data and artificial intelligence.
Designing Data Spaces
This open access book provides a comprehensive view on data ecosystems and platform economics from methodical and technological foundations up to reports from practical implementations and applications in various industries. To this end, the book is structured in four parts: Part I "Foundations and Contexts" provides a general overview about building, running, and governing data spaces and an introduction to the IDS and GAIA-X projects. Part II "Data Space Technologies" subsequently details various implementation aspects of IDS and GAIA-X, including eg data usage control, the usage of blockchain technologies, or semantic data integration and interoperability. Next, Part III describes various "Use Cases and Data Ecosystems" from various application areas such as agriculture, healthcare, industry, energy, and mobility. Part IV eventually offers an overview of several "Solutions and Applications", eg including products and experiences from companies like Google, SAP, Huawei, T-Systems, Innopay and many more. Overall, the book provides professionals in industry with an encompassing overview of the technological and economic aspects of data spaces, based on the International Data Spaces and Gaia-X initiatives. It presents implementations and business cases and gives an outlook to future developments. In doing so, it aims at proliferating the vision of a social data market economy based on data spaces which embrace trust and data sovereignty.
Emerging Computing Paradigms
EMERGING COMPUTING PARADIGMS A holistic overview of major new computing paradigms of the 21st Century In Emerging Computing Paradigms: Principles, Advances and Applications, international scholars offer a compendium of essential knowledge on new promising computing paradigms. The book examines the characteristics and features of emerging computing technologies and provides insight into recent technological developments and their potential real-world applications that promise to shape the future. This book is a useful resource for all those who wish to quickly grasp new concepts of, and insights on, emerging computer paradigms and pursue further research or innovate new novel applications harnessing these concepts. Key Features Presents a comprehensive coverage of new technologies that have the potential to shape the future of our world--quantum computing, computational intelligence, advanced wireless networks and blockchain technology Revisits mainstream ideas now being widely adopted, such as cloud computing, the Internet of Things (IoT) and cybersecurity Offers recommendations and practical insights to assist the readers in the application of these technologies Aimed at IT professionals, educators, researchers, and students, Emerging Computing Paradigms: Principles, Advances and Applications is a comprehensive resource to get ahead of the curve in examining and exploiting emerging new concepts and technologies. Business executives will also find the book valuable and gain an advantage over competitors in harnessing the concepts examined therein.
Tidy Modeling with R
Get going with tidymodels, a collection of R packages for modeling and machine learning. Whether you're just starting out or have years of experience with modeling, this practical introduction shows data analysts, business analysts, and data scientists how the tidymodels framework offers a consistent, flexible approach for your work. RStudio engineers Max Kuhn and Julia Silge demonstrate ways to create models by focusing on an R dialect called the tidyverse. Software that adopts tidyverse principles shares both a high-level design philosophy and low-level grammar and data structures, so learning one piece of the ecosystem makes it easier to learn the next. You'll understand why the tidymodels framework has been built to be used by a broad range of people. With this book, you will: Learn the steps necessary to build a model from beginning to end Understand how to use different modeling and feature engineering approaches fluently Examine the options for avoiding common pitfalls of modeling, such as overfitting Learn practical methods to prepare your data for modeling Tune models for optimal performance Use good statistical practices to compare, evaluate, and choose among models
Advances in Knowledge Discovery and Data Mining
The 3-volume set LNAI 13280, LNAI 13281 and LNAI 13282 constitutes the proceedings of the 26th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2022, which was held during May 2022 in Chengdu, China. The 121 papers included in the proceedings were carefully reviewed and selected from a total of 558 submissions. They were organized in topical sections as follows: Part I: Data Science and Big Data Technologies, Part II: Foundations; and Part III: Applications.
Sport Business Analytics
Developing and implementing a systematic analytics strategy can result in a sustainable competitive advantage within the sport business industry. This timely and relevant book provides practical strategies to collect data and then convert that data into meaningful, value-added information and actionable insights. Its primary objective is to help sport business organizations utilize data-driven decision-making to generate optimal revenue from such areas as ticket sales and corporate partnerships. To that end, the book includes in-depth case studies from such leading sports organizations as the Orlando Magic, Tampa Bay Buccaneers, Duke University, and the Aspire Group.The core purpose of sport business analytics is to convert raw data into information that enables sport business professionals to make strategic business decisions that result in improved company financial performance and a measurable and sustainable competitive advantage. Readers will learn about the role of big data and analytics in: Ticket pricing Season ticket member retention Fan engagement Sponsorship valuation Customer relationship management Digital marketing Market research Data visualization. This book examines changes in the ticketing marketplace and spotlights innovative ticketing strategies used in various sport organizations. It shows how to engage fans with social media and digital analytics, presents techniques to analyze engagement and marketing strategies, and explains how to utilize analytics to leverage fan engagement to enhance revenue for sport organizations.Filled with insightful case studies, this book benefits both sports business professionals and students. The concluding chapter on teaching sport analytics further enhances its value to academics.
Building IoT Visualizations using Grafana
The IoT developer's complete guide to building powerful dashboards, analyzing data, and integrating with other platformsKey Features: Connect devices, store and manage data, and build powerful data visualizationsIntegrate Grafana with other systems, such as Prometheus, OpenSearch, and LibreNMSLearn about message brokers and data forwarders to send data from sensors and systems to different platformsBook Description: Grafana is a powerful open source software that helps you to visualize and analyze data gathered from various sources. It allows you to share valuable information through unclouded dashboards, run analytics, and send notifications.Building IoT Visualizations Using Grafana offers how-to procedures, useful resources, and advice that will help you to implement IoT solutions with confidence. You'll begin by installing and configuring Grafana according to your needs. Next, you'll acquire the skills needed to implement your own IoT system using communication brokers, databases, and metric management systems, as well as integrate everything with Grafana. You'll learn to collect data from IoT devices and store it in databases, as well as discover how to connect databases to Grafana, make queries, and build insightful dashboards. Finally, the book will help you implement analytics for visualizing data, performing automation, and delivering notifications.By the end of this Grafana book, you'll be able to build insightful dashboards, perform analytics, and deliver notifications that apply to IoT and IT systems.What You Will Learn: Install and configure Grafana in different types of environmentsEnable communication between your IoT devices using different protocolsBuild data sources by ingesting data from IoT devicesGather data from Grafana using different types of data sourcesBuild actionable insights using plugins and analyticsDeliver notifications across several communication channelsIntegrate Grafana with other platformsWho this book is for: This book is for IoT developers who want to build powerful visualizations and analytics for their projects and products. Technicians from the embedded world looking to learn how to build systems and platforms using open source software will also benefit from this book. If you have an interest in technology, IoT, open source, and related subjects then this book is for you. Basic knowledge of administration tasks on Linux-based systems, IP networks and network services, protocols, ports, and related topics will help you make the most out of this book.
Mastering Microsoft Power BI - Second Edition
Plan, design, develop, and manage robust Power BI solutions to generate meaningful insights and make data-driven decisions.Purchase of the print or Kindle book includes a free eBook in the PDF format.Key FeaturesMaster the latest dashboarding and reporting features of Microsoft Power BICombine data from multiple sources, create stunning visualizations and publish Power BI apps to thousands of usersGet the most out of Microsoft Power BI with real-world use cases and examplesBook DescriptionMastering Microsoft Power BI, Second Edition, provides an advanced understanding of Power BI to get the most out of your data and maximize business intelligence. This updated edition walks through each essential phase and component of Power BI, and explores the latest, most impactful Power BI features. Using best practices and working code examples, you will connect to data sources, shape and enhance source data, and develop analytical data models. You will also learn how to apply custom visuals, implement new DAX commands and paginated SSRS-style reports, manage application workspaces and metadata, and understand how content can be staged and securely distributed via Power BI apps. Furthermore, you will explore top report and interactive dashboard design practices using features such as bookmarks and the Power KPI visual, alongside the latest capabilities of Power BI mobile applications and self-service BI techniques. Additionally, important management and administration topics are covered, including application lifecycle management via Power BI pipelines, the on-premises data gateway, and Power BI Premium capacity. By the end of this Power BI book, you will be confident in creating sustainable and impactful charts, tables, reports, and dashboards with any kind of data using Microsoft Power BI.What you will learnBuild efficient data retrieval and transformation processes with the Power Query M language and dataflowsDesign scalable, user-friendly DirectQuery, import, and composite data modelsCreate basic and advanced DAX measuresAdd ArcGIS Maps to create interesting data storiesBuild pixel-perfect paginated reportsDiscover the capabilities of Power BI mobile applicationsManage and monitor a Power BI environment as a Power BI administratorScale up a Power BI solution for an enterprise via Power BI Premium capacityWho this book is forBusiness Intelligence professionals and intermediate Power BI users looking to master Power BI for all their data visualization and dashboarding needs will find this book useful. An understanding of basic BI concepts is required and some familiarity with Microsoft Power BI will be helpful to make the most out of this book.Table of ContentsPlanning Power BI ProjectsPreparing Data sourcesConnecting to Sources and Transforming Data with MDesigning Import, DirectQuery, and Composite Data ModelsDeveloping DAX Measures and Security RolesPlanning Power BI ReportsCreating and Formatting VisualizationsApplying Advanced AnalyticsDesigning DashboardsManaging Workspaces and ContentManaging the On-Premises Data GatewayDeploying Paginated ReportsCreating Power BI Apps and Content DistributionAdministering Power BI for an OrganizationBuilding Enterprise BI with Power BI Premium
Optical Switching
OPTICAL SWITCHING Comprehensive coverage of optical switching technologies and their applications in optical networks Optical Switching: Device Technology and Applications in Networks delivers an accessible exploration of the evolution of optical networks with clear explanations of the current state-of-the-art in the field and modern challenges in the development of Internet-of-Things devices. A variety of optical switches--including MEMS-based, magneto, photonic, and SOA-based--are discussed, as is the application of optical switches in networks. The book is written in a tutorial style, easily understood by both undergraduate and graduate students. It describes the fundamentals and recent developments in optical switch networks and examines the architectural and design challenges faced by those who design and construct emerging optical switch networks, as well as how to overcome those challenges. The book offers ways to assess and analyze systems and applications, comparing a variety of approaches available to the reader. It also provides: A thorough introduction to switch characterization, including optical, electro optical, thermo optical, magneto optical, and acoustic-optic switches Comprehensive explorations of MEMS-based, SOA-based, liquid crystal, photonic crystal, and optical electrical optical (OEO) switches Practical discussions of quantum optical switches, as well as nonlinear optical switches In-depth examinations of the application of optical switches in networks, including switch fabric control and optical switching for high-performance computing Perfect for researchers and professionals in the fields of telecommunications, Internet of Things, and optoelectronics, Optical Switching: Device Technology and Applications in Networks will also earn a place in the libraries of advanced undergraduate and graduate students studying optical networks, optical communications, and sensor applications.
Oracle Database Programming with Java
Databases have become an integral part of modern life. Today's society is an information-driven society, and database technology has a direct impact on all aspects of daily life. Decisions are routinely made by organizations based on the information collected and stored in databases. Database management systems such as Oracle are crucial to apply data in industrial or commercial systems. Equally crucial is a graphical user interface (GUI) to enable users to access and manipulate data in databases. The Apache NetBeans IDE with Java is an ideal candidate for developing a GUI with programming functionality. Oracle Database Programming with Java: Ideas, Designs, and Implementations is written for college students and software programmers who want to develop practical and commercial database programming with Java and relational databases such as Oracle Database XE 18c. The book details practical considerations and applications of database programming with Java and is filled with authentic examples as well as detailed explanations. Advanced topics in Java Web, like Java Web Applications and Java Web Services, are covered in real project examples to show how to handle the database programming issues in the Apache NetBeans IDE environment. This book features: A real sample database, CSE _ DEPT, which is built with Oracle SQL Developer, provided and used throughout the book Step by step, detailed illustrations and descriptions of how to design and build a practical relational database Fundamental and advanced Java database programming techniques practical to both beginning students and experienced programmers Updated Java desktop and Web database programming techniques, such as Java Enterprise Edition 7, JavaServer Pages, JavaServer Faces, Enterprise Java Beans, Web applications and Web services, including GlassFish and Tomcat Web servers More than 30 real database programming projects with detailed illustrations Actual JDBC APIs and JDBC drivers, along with code explanations Homework and selected solutions for each chapter to strengthen and improve students' learning and understanding of the topics they have studied
Analyzing Spatial Models of Choice and Judgment
With recent advances in computing power and the widespread availability of preference, perception and choice data, such as public opinion surveys and legislative voting, the empirical estimation of spatial models using scaling and ideal point estimation methods has never been more accessible.The second edition of Analyzing Spatial Models of Choice and Judgment demonstrates how to estimate and interpret spatial models with a variety of methods using the open-source programming language R. Requiring only basic knowledge of R, the book enables social science researchers to apply the methods to their own data. Also suitable for experienced methodologists, it presents the latest methods for modeling the distances between points. The authors explain the basic theory behind empirical spatial models, then illustrate the estimation technique behind implementing each method, exploring the advantages and limitations while providing visualizations to understand the results. This second edition updates and expands the methods and software discussed in the first edition, including new coverage of methods for ordinal data and anchoring vignettes in surveys, as well as an entire chapter dedicated to Bayesian methods. The second edition is made easier to use by the inclusion of an R package, which provides all data and functions used in the book. David A. Armstrong II is Canada Research Chair in Political Methodology and Associate Professor of Political Science at Western University. His research interests include measurement, Democracy and state repressive action. Ryan Bakker is Reader in Comparative Politics at the University of Essex. His research interests include applied Bayesian modeling, measurement, Western European politics, and EU politics. Royce Carroll is Professor in Comparative Politics at the University of Essex. His research focuses on measurement of ideology and the comparative politics of legislatures and political parties. Christopher Hare is Assistant Professor in Political Science at the University of California, Davis. His research focuses on ideology and voting behavior in US politics, political polarization, and measurement. Keith T. Poole is Philip H. Alston Jr. Distinguished Professor of Political Science at the University of Georgia. His research interests include methodology, US political-economic history, economic growth and entrepreneurship. Howard Rosenthal is Professor of Politics at NYU and Roger Williams Straus Professor of Social Sciences, Emeritus, at Princeton. Rosenthal's research focuses on political economy, American politics and methodology.
Web Engineering
This book constitutes the thoroughly refereed proceedings of the 22nd International Conference on Web Engineering, ICWE 2022, held in Bari, Italy, in July 2022. The 23 revised full papers and 5 short papers presented were carefully reviewed and selected from 81 submissions. The books also contains 6 demonstration and poster papers, 7 symposium and 5 tutorial papers. They are organized in topical sections named: recommender systems based on web technology; social web applications; web applications modelling and engineering; web big data and web data analytics; web mining and knowledge extraction; web security and privacy; web user interfaces.
Digital Business and Intelligent Systems
This book constitutes the refereed proceedings of the 15th International Baltic Conference on Digital Business and Intelligent Systems, Baltic DB&IS 2022, held in Riga, Latvia, in July 2022. The 16 revised full papers and 1 short paper presented were carefully reviewed and selected from 42 submissions. The papers are centered around topics like architectures and quality of information systems, artificial intelligence in information systems, data and knowledge engineering, enterprise and information systems engineering, security of information systems.
Natural Language Processing and Information Systems
This book constitutes the refereed proceedings of the 27th International Conference on Applications of Natural Language to Information Systems, NLDB 2022, held in Valencia, Spain in June 2022. The 28 full papers and 20 short papers were carefully reviewed and selected from 106 submissions. The papers are organized in the following topical sections: Sentiment Analysis and Social Media; Text Classification; Applications; Argumentation; Information Extraction and Linking; User Profiling; Semantics; Language Resources and Evaluation.
Fundamentals of Data Engineering
Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you'll learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available through the framework of the data engineering lifecycle. Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You'll understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, and governance that are critical in any data environment regardless of the underlying technology. This book will help you: Get a concise overview of the entire data engineering landscape Assess data engineering problems using an end-to-end framework of best practices Cut through marketing hype when choosing data technologies, architecture, and processes Use the data engineering lifecycle to design and build a robust architecture Incorporate data governance and security across the data engineering lifecycle
Developing High-Frequency Trading Systems
Use your programming skills to create and optimize high-frequency trading systems in no time with Java, C++, and PythonKey Features- Learn how to build high-frequency trading systems with ultra-low latency- Understand the critical components of a trading system- Optimize your systems with high-level programming techniquesBook DescriptionThe world of trading markets is complex, but it can be made easier with technology. Sure, you know how to code, but where do you start? What programming language do you use? How do you solve the problem of latency? This book answers all these questions. It will help you navigate the world of algorithmic trading and show you how to build a high-frequency trading (HFT) system from complex technological components, supported by accurate data.Starting off with an introduction to HFT, exchanges, and the critical components of a trading system, this book quickly moves on to the nitty-gritty of optimizing hardware and your operating system for low-latency trading, such as bypassing the kernel, memory allocation, and the danger of context switching. Monitoring your system's performance is vital, so you'll also focus on logging and statistics. As you move beyond the traditional HFT programming languages, such as C++ and Java, you'll learn how to use Python to achieve high levels of performance. And what book on trading is complete without diving into cryptocurrency? This guide delivers on that front as well, teaching how to perform high-frequency crypto trading with confidence.By the end of this trading book, you'll be ready to take on the markets with HFT systems.What you will learnWho this book is forThis book is for software engineers, quantitative developers or researchers, and DevOps engineers who want to understand the technical side of high-frequency trading systems and the optimizations that are needed to achieve ultra-low latency systems. Prior experience working with C++ and Java will help you grasp the topics covered in this book more easily.Table of Contents- Fundamentals of a High-Frequency Trading System- The Critical Components of a Trading System- Understanding the Trading Exchange Dynamics- HFT System Foundations - From Hardware to OS- Networking in Motion- HFT Optimization - Architecture and Operating System- HFT Optimization - Logging, Performance, and Networking- C++ - The Quest for Microsecond Latency- Java and JVM for Low-Latency Systems- Python - Interpreted but Open to High Performance- High Frequency FPGA and Crypto
Mapping White Identity Terrorism and Racially or Ethnically Motivated Violent Extremism
The authors reviewed literature on White identity terrorism and racially or ethnically motivated violent extremism (REMVE) and analyzed social media data from six platforms that host extremist content. They developed a network map that evaluates REMVE network construction, connectivity, geographic location, and proclivity to violence and found that users in the United States are overwhelmingly responsible for REMVE discourse online.
Russian Disinformation Efforts on Social Media
Russia is conducting wide-reaching information warfare with the West. This report describes Russia's information warfare waged via social media and provides recommendations to better counter this threat. Although popular portrayals of the Russian disinformation machine imply an organized and well-resourced operation, evidence suggests that it is neither. Nonetheless, Russian activity can be harmful to U.S. interests and is likely to evolve.
Algorithms and Data Structures for Massive Datasets
Massive modern datasets make traditional data structures and algorithms grind to a halt. This fun and practical guide introduces cutting-edge techniques that can reliably handle even the largest distributed datasets. In Algorithms and Data Structures for Massive Datasets you will learn: Probabilistic sketching data structures for practical problems Choosing the right database engine for your application Evaluating and designing efficient on-disk data structures and algorithms Understanding the algorithmic trade-offs involved in massive-scale systems Deriving basic statistics from streaming data Correctly sampling streaming data Computing percentiles with limited space resources Algorithms and Data Structures for Massive Datasets reveals a toolbox of new methods that are perfect for handling modern big data applications. You'll explore the novel data structures and algorithms that underpin Google, Facebook, and other enterprise applications that work with truly massive amounts of data. These effective techniques can be applied to any discipline, from finance to text analysis. Graphics, illustrations, and hands-on industry examples make complex ideas practical to implement in your projects--and there's no mathematical proofs to puzzle over. Work through this one-of-a-kind guide, and you'll find the sweet spot of saving space without sacrificing your data's accuracy. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the technology Standard algorithms and data structures may become slow--or fail altogether--when applied to large distributed datasets. Choosing algorithms designed for big data saves time, increases accuracy, and reduces processing cost. This unique book distills cutting-edge research papers into practical techniques for sketching, streaming, and organizing massive datasets on-disk and in the cloud. About the book Algorithms and Data Structures for Massive Datasets introduces processing and analytics techniques for large distributed data. Packed with industry stories and entertaining illustrations, this friendly guide makes even complex concepts easy to understand. You'll explore real-world examples as you learn to map powerful algorithms like Bloom filters, Count-min sketch, HyperLogLog, and LSM-trees to your own use cases. What's inside Probabilistic sketching data structures Choosing the right database engine Designing efficient on-disk data structures and algorithms Algorithmic tradeoffs in massive-scale systems Computing percentiles with limited space resources About the reader Examples in Python, R, and pseudocode. About the author Dzejla Medjedovic earned her PhD in the Applied Algorithms Lab at Stony Brook University, New York. Emin Tahirovic earned his PhD in biostatistics from University of Pennsylvania. Illustrator Ines Dedovic earned her PhD at the Institute for Imaging and Computer Vision at RWTH Aachen University, Germany. Table of Contents 1 Introduction PART 1 HASH-BASED SKETCHES 2 Review of hash tables and modern hashing 3 Approximate membership: Bloom and quotient filters 4 Frequency estimation and count-min sketch 5 Cardinality estimation and HyperLogLog PART 2 REAL-TIME ANALYTICS 6 Streaming data: Bringing everything together 7 Sampling from data streams 8 Approximate quantiles on data streams PART 3 DATA STRUCTURES FOR DATABASES AND EXTERNAL MEMORY ALGORITHMS 9 Introducing the external memory model 10 Data structures for databases: B-trees, Bε-trees, and LSM-trees 11 External memory sorting
Technologies and Applications for Big Data Value
This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas.The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part "Technologies and Methods" contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part "Processes and Applications" details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry.The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems.
SQL Server Advanced Troubleshooting and Performance Tuning
This practical book provides a comprehensive overview of troubleshooting and performance tuning best practices for Microsoft SQL Server. Database engineers, including database developers and administrators, will learn how to identify performance issues, troubleshoot the system in a holistic fashion, and properly prioritize tuning efforts to attain the best system performance possible. Author Dmitri Korotkevitch, Microsoft Data Platform MVP and Microsoft Certified Master (MCM), explains the interdependencies between SQL Server database components. You'll learn how to quickly diagnose your system and discover the root cause of any issue. Techniques in this book are compatible with all versions of SQL Server and cover both on-premises and cloud-based SQL Server installations. Discover how performance issues present themselves in SQL Server Learn about SQL Server diagnostic tools, methods, and technologies Perform health checks on SQL Server installations Learn the dependencies between SQL Server components Tune SQL Server to improve performance and reduce bottlenecks Detect poorly optimized queries and inefficiencies in query execution plans Find inefficient indexes and common database design issues Use these techniques with Microsoft Azure SQL databases, Azure SQL Managed Instances, and Amazon RDS for SQL Server
Research in Computational Molecular Biology
This book constitutes the proceedings of the 26th Annual Conference on Research in Computational Molecular Biology, RECOMB 2022, held in San Diego, CA, USA in May 2022. The 17 regular and 23 short papers presented were carefully reviewed and selected from 188 submissions. The papers report on original research in all areas of computational molecular biology and bioinformatics.
Big Data Privacy and Security in Smart Cities
This book highlights recent advances in smart cities technologies, with a focus on new technologies such as biometrics, blockchains, data encryption, data mining, machine learning, deep learning, cloud security, and mobile security. During the past five years, digital cities have been emerging as a technology reality that will come to dominate the usual life of people, in either developed or developing countries. Particularly, with big data issues from smart cities, privacy and security have been a widely concerned matter due to its relevance and sensitivity extensively present in cybersecurity, healthcare, medical service, e-commercial, e-governance, mobile banking, e-finance, digital twins, and so on. These new topics rises up with the era of smart cities and mostly associate with public sectors, which are vital to the modern life of people. This volume summarizes the recent advances in addressing the challenges on big data privacy and security in smart cities and points out the future research direction around this new challenging topic.
Technologies and Applications for Big Data Value
This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas.The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part "Technologies and Methods" contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part "Processes and Applications" details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry.The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems.
Integer Programming and Combinatorial Optimization
This book constitutes the refereed proceedings of the 23rd International Conference on Integer Programming and Combinatorial Optimization, IPCO 2022, held in Eindhoven, The Netherlands, in June 2022. The 33 full papers presented were carefully reviewed and selected from 93 submissions addressing key techniques of document analysis. IPCO is under the auspices of the Mathematical Optimization Society, and it is an important forum for presenting the latest results of theory and practice of the various aspects of discrete optimization.
Web and Wireless Geographical Information Systems
This book constitutes the refereed proceedings of the 18th International Symposium on Web and Wireless Geographical Information Systems, W2GIS 2022, held in Konstanz, Germany, in April 2022. The 7 full papers presented together with 6 short papers in the volume were carefully reviewed and selected from 16 submissions. The papers cover topics that range from mobile GIS and Location-Based Services to Spatial Information Retrieval and Wireless Sensor Networks.
Distributed, Ambient and Pervasive Interactions. Smart Living, Learning, Well-Being and Health, Art and Creativity
The two-volume set, LNCS 13325 and 13326, are conference proceedings that constitutes the refereed proceedings of the 10th International Conference on Distributed, Ambient and Pervasive Interactions, DAPI 2022, held as part of the 24th International Conference, HCI International 2022, which took place during June-July 2022. The conference was held virtually due to the COVID-19 pandemic. The 58 papers of DAPI 2022 are organized in topical sections named for each volume: Part I: User Experience and Interaction Design for Smart Ecosystems; Smart Cities, Smart Islands, and Intelligent Urban Living; Smart Artifacts in Smart Environments; and Opportunities and Challenges for the Near Future Smart EnvironmentsPart II: Smart Living in Pervasive IoT Ecosystems; Distributed, Ambient, and Pervasive Education and Learning; Distributed, Ambient, and Pervasive Well-being and Healthcare; and Smart Creativity and Art.
Database Systems for Advanced Applications
The three-volume set LNCS 13245, 13246 and 13247 constitutes the proceedings of the 26th International Conference on Database Systems for Advanced Applications, DASFAA 2022, held online, in April 2021. The total of 72 full papers, along with 76 short papers, are presented in this three-volume set was carefully reviewed and selected from 543 submissions. Additionally, 13 industrial papers, 9 demo papers and 2 PhD consortium papers are included. The conference was planned to take place in Hyderabad, India, but it was held virtually due to the COVID-19 pandemic.
Database Principles and Technologies - Based on Huawei Gaussdb
This open access book contains eight chapters that deal with database technologies, including the development history of database, database fundamentals, introduction to SQL syntax, classification of SQL syntax, database security fundamentals, database development environment, database design fundamentals, and the application of Huawei's cloud database product GaussDB database. This book can be used as a textbook for database courses in colleges and universities, and is also suitable as a reference book for the HCIA-GaussDB V1.5 certification examination. The Huawei GaussDB (for MySQL) used in the book is a Huawei cloud-based high-performance, highly applicable relational database that fully supports the syntax and functionality of the open source database MySQL. All the experiments in this book can be run on this database platform. As the world's leading provider of ICT (information and communication technology) infrastructure and smart terminals, Huawei's productsrange from digital data communication, cyber security, wireless technology, data storage, cloud computing, and smart computing to artificial intelligence.
Data Spaces
This open access book aims to educate data space designers to understand what is required to create a successful data space. It explores cutting-edge theory, technologies, methodologies, and best practices for data spaces for both industrial and personal data and provides the reader with a basis for understanding the design, deployment, and future directions of data spaces.The book captures the early lessons and experience in creating data spaces. It arranges these contributions into three parts covering design, deployment, and future directions respectively. The first part explores the design space of data spaces. The single chapters detail the organisational design for data spaces, data platforms, data governance federated learning, personal data sharing, data marketplaces, and hybrid artificial intelligence for data spaces.The second part describes the use of data spaces within real-world deployments. Its chapters are co-authored with industry experts and include case studies of data spaces in sectors including industry 4.0, food safety, FinTech, health care, and energy.The third and final part details future directions for data spaces, including challenges and opportunities for common European data spaces and privacy-preserving techniques for trustworthy data sharing.The book is of interest to two primary audiences: first, researchers interested in data management and data sharing, and second, practitioners and industry experts engaged in data-driven systems where the sharing and exchange of data within an ecosystem are critical.
Distributed, Ambient and Pervasive Interactions. Smart Environments, Ecosystems, and Cities
The two-volume set, LNCS 13325 and 13326, are conference proceedings that constitutes the refereed proceedings of the 10th International Conference on Distributed, Ambient and Pervasive Interactions, DAPI 2022, held as part of the 24th International Conference, HCI International 2022, which took place during June-July 2022. The conference was held virtually due to the COVID-19 pandemic. The 58 papers of DAPI 2022 are organized in topical sections named for each volume: Part I: User Experience and Interaction Design for Smart Ecosystems; Smart Cities, Smart Islands, and Intelligent Urban Living; Smart Artifacts in Smart Environments; and Opportunities and Challenges for the Near Future Smart EnvironmentsPart II: Smart Living in Pervasive IoT Ecosystems; Distributed, Ambient, and Pervasive Education and Learning; Distributed, Ambient, and Pervasive Well-being and Healthcare; and Smart Creativity and Art.
Data Spaces
This open access book aims to educate data space designers to understand what is required to create a successful data space. It explores cutting-edge theory, technologies, methodologies, and best practices for data spaces for both industrial and personal data and provides the reader with a basis for understanding the design, deployment, and future directions of data spaces.The book captures the early lessons and experience in creating data spaces. It arranges these contributions into three parts covering design, deployment, and future directions respectively. The first part explores the design space of data spaces. The single chapters detail the organisational design for data spaces, data platforms, data governance federated learning, personal data sharing, data marketplaces, and hybrid artificial intelligence for data spaces.The second part describes the use of data spaces within real-world deployments. Its chapters are co-authored with industry experts and include case studies of data spaces in sectors including industry 4.0, food safety, FinTech, health care, and energy.The third and final part details future directions for data spaces, including challenges and opportunities for common European data spaces and privacy-preserving techniques for trustworthy data sharing.The book is of interest to two primary audiences: first, researchers interested in data management and data sharing, and second, practitioners and industry experts engaged in data-driven systems where the sharing and exchange of data within an ecosystem are critical.
World Yearbook of Education 2021
Providing a comprehensive introduction to the topic of accountability and datafication in the governance of education, the World Yearbook of Education 2021 considers global policy dynamics and policy enactment processes. Chapters pay particular attention to the role of international organizations and the private sector in the promotion of performance-based accountability (PBA) in different educational settings and at multiple policy scales. Organized into three sections, chapters cover: the global/local construction of accountability and datafication; global discourse and national translations of performance-based accountability policies; and enactments and effects of accountability and datafication, including controversies and critical issues. With carefully chosen international contributions from around the globe, the World Yearbook of Education 2021 is ideal reading for anyone interested in the future of accountability and datafication in the governance of education.
Distributed Graph Coloring
The focus of this monograph is on symmetry breaking problems in the message-passing model of distributed computing. In this model a communication network is represented by a n-vertex graph G = (V, E), whose vertices host autonomous processors. The processors communicate over the edges of G in discrete rounds. The goal is to devise algorithms that use as few rounds as possible. A typical symmetry-breaking problem is the problem of graph coloring. Denote by ? the maximum degree of G. While coloring G with ? + 1 colors is trivial in the centralized setting, the problem becomes much more challenging in the distributed one. One can also compromise on the number of colors, if this allows for more efficient algorithms. Other typical symmetry-breaking problems are the problems of computing a maximal independent set (MIS) and a maximal matching (MM). The study of these problems dates back to the very early days of distributed computing. The founding fathers of distributed computing laid firm foundations for the area of distributed symmetry breaking already in the eighties. In particular, they showed that all these problems can be solved in randomized logarithmic time. Also, Linial showed that an O(?2)-coloring can be solved very efficiently deterministically. However, fundamental questions were left open for decades. In particular, it is not known if the MIS or the (? + 1)-coloring can be solved in deterministic polylogarithmic time. Moreover, until recently it was not known if in deterministic polylogarithmic time one can color a graph with significantly fewer than ?2 colors. Additionally, it was open (and still open to some extent) if one can have sublogarithmic randomized algorithms for the symmetry breaking problems. Recently, significant progress was achieved in the study of these questions. More efficient deterministic and randomized (? + 1)-coloring algorithms were achieved. Deterministic ?1 + o(1)-coloring algorithms with polylogarithmic running time were devised. Improved (and often sublogarithmic-time) randomized algorithms were devised. Drastically improved lower bounds were given. Wide families of graphs in which these problems are solvable much faster than on general graphs were identified. The objective of our monograph is to cover most of these developments, and as a result to provide a treatise on theoretical foundations of distributed symmetry breaking in the message-passing model. We hope that our monograph will stimulate further progress in this exciting area.
State-Space Control Systems
These days, nearly all the engineering problem are solved with the aid of suitable computer packages. This book shows how MATLAB/Simulink could be used to solve state-space control problems. In this book, it is assumed that you are familiar with the theory and concepts of state-space control, i.e., you took or you are taking a course on state-space control system and you read this book in order to learn how to solve state-space control problems with the aid of MATLAB/Simulink. The book is composed of three chapters. Chapter 1 shows how a state-space mathematical model could be entered into the MATLAB/Simulink environment. Chapter 2 shows how a nonlinear system could be linearized around the desired opperating point with the aid of tools provided by MATLAB/Simulink. Finally, Chapter 3 shows how a state-space controller could be designed with the aid MATLAB and be tested with Simulink. The book will be usefull for students and practical engineers who want to design a state-space control system.
Advanced Information Systems Engineering
This book constitutes the refereed proceedings of the 34th International Conference on Advanced Information Systems Engineering, CAiSE 2022, which was held in Leuven, Belgium, during June 6-10, 2022.The 31 full papers included in these proceedings were selected from 203 submissions. They were organized in topical sections as follows: Process mining; sustainable and explainable applications; tools and methods to support research and design; process modeling; natural language processing techniques in IS engineering; process monitoring and simulation; graph and network models; model analysis and comprehension; recommender systems; conceptual models, metamodels and taxonomies; and services engineering and digitalization.
Network Topology and Fault-Tolerant Consensus
As the structure of contemporary communication networks grows more complex, practical networked distributed systems become prone to component failures. Fault-tolerant consensus in message-passing systems allows participants in the system to agree on a common value despite the malfunction or misbehavior of some components. It is a task of fundamental importance for distributed computing, due to its numerous applications. We summarize studies on the topological conditions that determine the feasibility of consensus, mainly focusing on directed networks and the case of restricted topology knowledge at each participant. Recently, significant efforts have been devoted to fully characterize the underlying communication networks in which variations of fault-tolerant consensus can be achieved. Although the deduction of analogous topological conditions for undirected networks of known topology had shortly followed the introduction of the problem, their extension to the directed network case has been proven a highly non-trivial task. Moreover, global knowledge restrictions, inherent in modern large-scale networks, require more elaborate arguments concerning the locality of distributed computations. In this work, we present the techniques and ideas used to resolve these issues. Recent studies indicate a number of parameters that affect the topological conditions under which consensus can be achieved, namely, the fault model, the degree of system synchrony (synchronous vs. asynchronous), the type of agreement (exact vs. approximate), the level of topology knowledge, and the algorithm class used (general vs. iterative). We outline the feasibility and impossibility results for various combinations of the above parameters, extensively illustrating the relation between network topology and consensus.
Distributed Computing Pearls
Computers and computer networks are one of the most incredible inventions of the 20th century, having an ever-expanding role in our daily lives by enabling complex human activities in areas such as entertainment, education, and commerce. One of the most challenging problems in computer science for the 21st century is to improve the design of distributed systems where computing devices have to work together as a team to achieve common goals. In this book, I have tried to gently introduce the general reader to some of the most fundamental issues and classical results of computer science underlying the design of algorithms for distributed systems, so that the reader can get a feel of the nature of this exciting and fascinating field called distributed computing. The book will appeal to the educated layperson and requires no computer-related background. I strongly suspect that also most computer-knowledgeable readers will be able to learn something new.
Introduction to Distributed Self-Stabilizing Algorithms
This book aims at being a comprehensive and pedagogical introduction to the concept of self-stabilization, introduced by Edsger Wybe Dijkstra in 1973. Self-stabilization characterizes the ability of a distributed algorithm to converge within finite time to a configuration from which its behavior is correct (i.e., satisfies a given specification), regardless the arbitrary initial configuration of the system. This arbitrary initial configuration may be the result of the occurrence of a finite number of transient faults. Hence, self-stabilization is actually considered as a versatile non-masking fault tolerance approach, since it recovers from the effect of any finite number of such faults in an unified manner. Another major interest of such an automatic recovery method comes from the difficulty of resetting malfunctioning devices in a large-scale (and so, geographically spread) distributed system (the Internet, Pair-to-Pair networks, and Delay Tolerant Networks are examples of such distributed systems). Furthermore, self-stabilization is usually recognized as a lightweight property to achieve fault tolerance as compared to other classical fault tolerance approaches. Indeed, the overhead, both in terms of time and space, of state-of-the-art self-stabilizing algorithms is commonly small. This makes self-stabilization very attractive for distributed systems equipped of processes with low computational and memory capabilities, such as wireless sensor networks. After more than 40 years of existence, self-stabilization is now sufficiently established as an important field of research in theoretical distributed computing to justify its teaching in advanced research-oriented graduate courses. This book is an initiation course, which consists of the formal definition of self-stabilization and its related concepts, followed by a deep review and study of classical (simple) algorithms, commonly used proof schemes and design patterns, as well as premium results issued from the self-stabilizing community. As often happens in the self-stabilizing area, in this book we focus on the proof of correctness and the analytical complexity of the studied distributed self-stabilizing algorithms. Finally, we underline that most of the algorithms studied in this book are actually dedicated to the high-level atomic-state model, which is the most commonly used computational model in the self-stabilizing area. However, in the last chapter, we present general techniques to achieve self-stabilization in the low-level message passing model, as well as example algorithms.