Combinatorial Optimization
This book constitutes thoroughly refereed and revised selected papers from the 7th International Symposium on Combinatorial Optimization, ISCO 2022, which was held online during May 18-20, 2022.The 24 full papers included in this book were carefully reviewed and selected from 50 submissions. They were organized in topical sections as follows: Polyhedra and algorithms; polyhedra and combinatorics; non-linear optimization; game theory; graphs and trees; cutting and packing; applications; and approximation algorithms.
Chinese Computational Linguistics
This book constitutes the proceedings of the 21st China National Conference on Computational Linguistics, CCL 2022, held in Nanchang, China, in October 2022.The 22 full English-language papers in this volume were carefully reviewed and selected from 293 Chinese and English submissions.The conference papers are categorized into the following topical sub-headings: Linguistics and Cognitive Science; Fundamental Theory and Methods of Computational Linguistics; Information Retrieval, Dialogue and Question Answering; Text Generation and Summarization; Knowledge Graph and Information Extraction; Machine Translation and Multilingual Information Processing; Minority Language Information Processing; Language Resource and Evaluation; NLP Applications.
Data Science Concepts and Techniques with Applications
This textbook comprehensively covers both fundamental and advanced topics related to data science. Data science is an umbrella term that encompasses data analytics, data mining, machine learning, and several other related disciplines. The chapters of this book are organized into three parts: The first part (chapters 1 to 3) is a general introduction to data science. Starting from the basic concepts, the book will highlight the types of data, its use, its importance and issues that are normally faced in data analytics, followed by presentation of a wide range of applications and widely used techniques in data science. The second part, which has been updated and considerably extended compared to the first edition, is devoted to various techniques and tools applied in data science. Its chapters 4 to 10 detail data pre-processing, classification, clustering, text mining, deep learning, frequent pattern mining, and regression analysis. Eventually, the third part (chapters 11 and 12) present a brief introduction to Python and R, the two main data science programming languages, and shows in a completely new chapter practical data science in the WEKA (Waikato Environment for Knowledge Analysis), an open-source tool for performing different machine learning and data mining tasks. An appendix explaining the basic mathematical concepts of data science completes the book. This textbook is suitable for advanced undergraduate and graduate students as well as for industrial practitioners who carry out research in data science. They both will not only benefit from the comprehensive presentation of important topics, but also from the many application examples and the comprehensive list of further readings, which point to additional publications providing more in-depth research results or provide sources for a more detailed description of related topics. "This book delivers a systematic, carefully thoughtful material on Data Science." from the Foreword by Witold Pedrycz, U Alberta, Canada.
Composite NUV Priors and Applications
Normal with unknown variance (NUV) priors are a central idea of sparse Bayesian learning and allow variational representations of non-Gaussian priors. More specifically, such variational representations can be seen as parameterized Gaussians, wherein the parameters are generally unknown. The advantage is apparent: for fixed parameters, NUV priors are Gaussian, and hence computationally compatible with Gaussian models. Moreover, working with (linear-)Gaussian models is particularly attractive since the Gaussian distribution is closed under affine transformations, marginalization, and conditioning. Interestingly, the variational representation proves to be rather universal than restrictive: many common sparsity-promoting priors (among them, in particular, the Laplace prior) can be represented in this manner. In estimation problems, parameters or variables of the underlying model are often subject to constraints (e.g., discrete-level constraints). Such constraints cannot adequately be represented by linear-Gaussian models and generally require special treatment. To handle such constraints within a linear-Gaussian setting, we extend the idea of NUV priors beyond its original use for sparsity. In particular, we study compositions of existing NUV priors, referred to as composite NUV priors, and show that many commonly used model constraints can be represented in this way.
New Trends in Database and Information Systems
This book constitutes the proceedings of the 26th European Conference on Advances in Databases and Information Systems, ADBIS 2022, held in Turin, Italy, in September 2022. The 29 short papers presented were carefully reviewed and selected from 90 submissions. The selected short papers are organized in the following sections: data understanding, modeling and visualization; fairness in data processing; data management pipeline, information and process retrieval; data access optimization; data pre-processing and cleaning; data science and machine learning. Further, papers from the following workshops and satellite events are provided in the volume: DOING: 3rd Workshop on Intelligent Data - From Data to Knowledge; K-GALS: 1st Workshop on Knowledge Graphs Analysis on a Large Scale; MADEISD: 4th Workshop on Modern Approaches in Data Engineering and Information System Design; MegaData: 2nd Workshop on Advanced Data Systems Management, Engineering, and Analytics; SWODCH: 2nd Workshop on Semantic Web and Ontology Design for Cultural Heritage; Doctoral Consortium.
Recent Trends in Analysis of Images, Social Networks and Texts
This book constitutes revised selected papers of the 10th International Conference on Analysis of Images, Social Networks and Texts, AIST 2021, held in Tbilisi, Georgia, in December 2021. Due to the COVID-19 pandemic the conference was held in hybrid mode. The 17 full papers were carefully reviewed and selected from 118 submissions, out of which 92 were sent to peer review. The papers are organized in topical sections on ​natural language processing; computer vision; data analysis and machine learning; social network analysis; theoretical machine learning and optimisation.
Data Science and Analytics for Smes
Master the tricks and techniques of business analytics consulting, specifically applicable to small-to-medium businesses (SMEs). Written to help you hone your business analytics skills, this book applies data science techniques to help solve problems and improve upon many aspects of a business' operations. SMEs are looking for ways to use data science and analytics, and this need is becoming increasingly pressing with the ongoing digital revolution. The topics covered in the books will help to provide the knowledge leverage needed for implementing data science in small business. The demand of small business for data analytics are in conjunction with the growing number of freelance data science consulting opportunities; hence this book will provide insight on how to navigate this new terrain.This book uses a do-it-yourself approach to analytics and introduces tools that are easily available online and are non-programming based. Data science will allow SMEs to understand their customer loyalty, market segmentation, sales and revenue increase etc. more clearly. Data Science and Analytics for SMEs is particularly focused on small businesses and explores the analytics and data that can help them succeed further in their business. What You'll LearnCreate and measure the success of their analytics projectStart your business analytics consulting careerUse solutions taught in the book in practical uses cases and problems Who This Book Is ForBusiness analytics enthusiasts who are not particularly programming inclined, small business owners and data science consultants, data science and business students, and SME (small-to-medium enterprise) analysts
Enterprise Systems Architecture
Enhance your technical and business skills to better manage your organization's technology ecosystem. This book aims to explain how to align the technology landscape to service your company's business operating model. The book begins by exploring different architectural approaches before taking a deep dive into multiple layers of the architectural stack and the methodology of each component. You'll also learn about the many products delivered by enterprise architecture. To complete the book, author Daljit Banger delves into the various roles and responsibilities of an enterprise architect. After completing Enterprise Systems Architecture, you will understand how to develop an ICT (Information Communication Technology) strategy to meet the needs of your organization. What Will You Learn Gain a complete understanding of enterprise architectureConceptualize the enterprise ecosystem using the EsA canvasMaster the products and services of an enterprise architecture function Who This Book Is For Architects (Enterprise, Solution, or Technical), CTOs, Business Analysts, or any stakeholder in delivering technology services to their organization.
Algebraic Approach to Data Processing
The book explores a new general approach to selecting-and designing-data processing techniques. Symmetry and invariance ideas behind this algebraic approach have been successful in physics, where many new theories are formulated in symmetry terms. The book explains this approach and expands it to new application areas ranging from engineering, medicine, education to social sciences. In many cases, this approach leads to optimal techniques and optimal solutions. That the same data processing techniques help us better analyze wooden structures, lung dysfunctions, and deep learning algorithms is a good indication that these techniques can be used in many other applications as well. The book is recommended to researchers and practitioners who need to select a data processing technique-or who want to design a new technique when the existing techniques do not work. It is also recommended to students who want to learn the state-of-the-art data processing.
Computational Intelligence in Data Science
This book constitutes the refereed post-conference proceedings of the Fifth IFIP TC 12 International Conference on Computational Intelligence in Data Science, ICCIDS 2022, held virtually, in March 2022. The 28 revised full papers presented were carefully reviewed and selected from 96 submissions. The papers cover topics such as computational intelligence for text analysis; computational intelligence for image and video analysis; blockchain and data science.
Blockchain Foundations and Applications
This monograph provides a comprehensive and rigorous exposition of the basic concepts and most important modern research results concerning blockchain and its applications. The book includes the required cryptographic fundamentals underpinning the blockchain technology, since understanding of the concepts of cryptography involved in the design of blockchain is necessary for mastering the security guarantees furnished by blockchain. It also contains an introduction to cryptographic primitives, and separate chapters on bitcoin, ethereum and smart contracts, public blockchain, private blockchain, cryptocurrencies, and blockchain applications.This volume is of great interest to active researchers who are keen to develop novel applications of blockchain in the field of their investigatio. Further, it is also beneficial for industry practitioners as well as undergraduate students in computing and information technology.
Cooperative Information Systems
This volume LNCS 13591 constitutes the proceedings of the International Conference on Cooperative Information Systems, CoopIS 2022, collocated with the Enterprise Design, Operations and Computing conference, EDOC 2022, in October 2022 in Bozen-Bolzano, Italy. The 15 regular papers presented together with 5 research in progress papers were carefully reviewed and selected from 68 submissions. The conference focuses on technical, economical, and societal aspects of distributed information systems at scale. As said, this 28th edition was collocated with the 26th edition of the Enterprise Design, Operations and Computing conference, EDOC 2022, and its guiding theme was "Information Systems in a Digital World".
Optimal Surface Fitting of Point Clouds Using Local Refinement
Introduction.- Locally Refined Splines.- Adaptive surface Fitting with Local Refinement: LR B-spline Surfaces.- A Statistical Criterion to Judge the Goodness of Fit of LR B-splines Surface Approximation.- LR B-splines for Representation of Terrain and Seabed: Data Fusion, Outliers, and Voids.- LR B-spline Surfaces and Volumes for Deformation Analysis of Terrain Data.- Conclusion.
Computational Advances in Bio and Medical Sciences
This book constitutes revised selected papers from the refereed proceedings of the 11th International Conference on Computational Advances in Bio and Medical Sciences, ICCABS 2021, held as a virtual event during December 16-18, 2021.The 13 full papers included in this book were carefully reviewed and selected from 17 submissions. They were organized in topical sections as follows: Computational advances in bio and medical sciences; and computational advances in molecular epidemiology.
R 4 Data Science Quick Reference
In this handy, quick reference book you'll be introduced to several R data science packages, with examples of how to use each of them. All concepts will be covered concisely, with many illustrative examples using the following APIs: readr, dibble, forecasts, lubridate, stringr, tidyr, magnittr, dplyr, purrr, ggplot2, modelr, and more.With R 4 Data Science Quick Reference, you'll have the code, APIs, and insights to write data science-based applications in the R programming language. You'll also be able to carry out data analysis. All source code used in the book is freely available on GitHub.. What You'll LearnImplement applicable R 4 programming language specification featuresImport data with readrWork with categories using forcats, time and dates with lubridate, and strings with stringrFormat data using tidyr and then transform that data using magrittr and dplyrWrite functions with R for data science, data mining, and analytics-based applicationsVisualize data with ggplot2 and fit data to models using modelrWho This Book Is ForProgrammers new to R's data science, data mining, and analytics packages. Some prior coding experience with R in general is recommended.
Co-Creating for Context in the Transfer and Diffusion of It
This volume, IFIP AICT 660, constitutes the refereed proceedings of the IFIP WG 8.6 International Working Conference "Co-creating for Context in Prospective Transfer and Diffusion of IT" on Transfer and Diffusion of IT, TDIT 2022, held in Maynooth, Ireland, during June 15-16, 2022.The 19 full papers and 10 short papers presented were carefully reviewed and selected from 60 submissions. The papers focus on the re-imagination of diffusion and adoption of emerging technologies. They are organized in the following parts:
Transactions on Large-Scale Data- And Knowledge-Centered Systems LII
The LNCS journal Transactions on Large-Scale Data and Knowledge-Centered Systems focuses on data management, knowledge discovery, and knowledge processing, which are core and hot topics in computer science. Since the 1990s, the Internet has become the main driving force behind application development in all domains. An increase in the demand for resource sharing (e.g., computing resources, services, metadata, data sources) across different sites connected through networks has led to an evolution of data- and knowledge-management systems from centralized systems to decentralized systems enabling large-scale distributed applications providing high scalability.This, the 52nd issue of Transactions on Large-Scale Data and Knowledge-Centered Systems, contains 6 fully revised selected regular papers.
Smart Cities, Green Technologies, and Intelligent Transport Systems
​This book includes extended and revised selected papers from the 10th International Conference on Smart Cities and Green ICT Systems, SMARTGREENS 2021, and 7th International Conference on Vehicle Technology and Intelligent Transport Systems, VEHITS 2021, held as virtual event, in April 28-30, 2021. The conference was held virtually due to the COVID-19 crisis.The 22 full papers included in this book were carefully reviewed and selected from 140 submissions. The papers present research on advances and applications in the fields of smart cities, electric vehicles, sustainable computing and communications, energy aware systems and technologies, intelligent vehicle technologies, intelligent transport systems and infrastructure, connected vehicles.
Business Intelligence with Databricks SQL
Master critical skills needed to deploy and use Databricks SQL and elevate your BI from the warehouse to the lakehouse with confidenceKey Features: Learn about business intelligence on the lakehouse with features and functions of Databricks SQLMake the most of Databricks SQL by getting to grips with the enablers of its data warehousing capabilitiesA unique approach to teaching concepts and techniques with follow-along scenarios on real datasetsBook Description: In this new era of data platform system design, data lakes and data warehouses are giving way to the lakehouse - a new type of data platform system that aims to unify all data analytics into a single platform. Databricks, with its Databricks SQL product suite, is the hottest lakehouse platform out there, harnessing the power of Apache Spark(TM), Delta Lake, and other innovations to enable data warehousing capabilities on the lakehouse with data lake economics.This book is a comprehensive hands-on guide that helps you explore all the advanced features, use cases, and technology components of Databricks SQL. You'll start with the lakehouse architecture fundamentals and understand how Databricks SQL fits into it. The book then shows you how to use the platform, from exploring data, executing queries, building reports, and using dashboards through to learning the administrative aspects of the lakehouse - data security, governance, and management of the computational power of the lakehouse. You'll also delve into the core technology enablers of Databricks SQL - Delta Lake and Photon. Finally, you'll get hands-on with advanced SQL commands for ingesting data and maintaining the lakehouse.By the end of this book, you'll have mastered Databricks SQL and be able to deploy and deliver fast, scalable business intelligence on the lakehouse.What You Will Learn: Understand how Databricks SQL fits into the Databricks Lakehouse PlatformPerform everyday analytics with Databricks SQL Workbench and business intelligence toolsOrganize and catalog your data assetsProgram the data security model to protect and govern your dataTune SQL warehouses (computing clusters) for optimal query experienceTune the Delta Lake storage format for maximum query performanceDeliver extreme performance with the Photon query execution engineImplement advanced data ingestion patterns with Databricks SQLWho this book is for: This book is for business intelligence practitioners, data warehouse administrators, and data engineers who are new to Databrick SQL and want to learn how to deliver high-quality insights unhindered by the scale of data or infrastructure. This book is also for anyone looking to study the advanced technologies that power Databricks SQL. Basic knowledge of data warehouses, SQL-based analytics, and ETL processes is recommended to effectively learn the concepts introduced in this book and appreciate the innovation behind the platform.
Developing Information Systems Accurately
This textbook shows how to develop the functional requirements of (information) systems. It emphasizes the importance to consider the complete development path of a functional requirement, i.e. not only the individual development steps but also their proper combination and their alignment. The book consists of two parts: Part I presents the underlying theory while Part II contains various illustrative case studies. Part I starts with an introduction to the topic (Chapter 1). Then it explains how to develop functional requirements that represent the conceptual dynamics of an information system (Chapters 2 and 3). Chapters 4 and 5 explain how to model the conceptual statics of an information system. Chapter 6 gives some directions for implementation. Finally, Chapter 7 explains how a 'technical manager' can organize and manage the development process. As an illustration of the theory, Part II contains three substantial case studies. The first one (Chapter 8) presents a stepwise development starting from an informal situation sketch via a simple domain model towards a precisely specified, full-fledged conceptual data model, which finally is translated to an SQL database. In the second case study (Chapter 9) the author converts the well-known non-trivial use case Process Sale from Larman into a textual System Sequence Description (SSD). For validation purposes, that textual SSD is subsequently translated into natural language and into a graphical SSD. The third case study (Chapter 10) shows the applicability of the author's approach to a control system and also illustrates the typical situation that the requirements are constantly changing during development. This book is written for (under)graduate students in software engineering or information systems who want to learn how to carry out adequate problem analysis, to make good system specifications, and/or to understand how toorganize and manage an IS-development process. It also targets practitioners who want to improve their problem analysis abilities and/or their ability to make good system specifications. To this end, it includes more than 150 explanatory figures and is accompanied by a Web site which provides additional course material such as slides, additional exercises, solutions to exercises, and the code for the figures used in the book.
Pro SQL Server 2022 Wait Statistics
Use this practical guide to analyze and troubleshoot SQL Server performance using wait statistics. You'll learn to identify precisely why your queries are running slowly. And you'll know how to measure the amount of time consumed by each bottleneck so you can focus attention on making the largest improvements first. This edition is updated to cover analysis of wait statistics current with SQL Server 2022. Whether you are new to wait statistics, or already familiar with them, this book provides a deeper understanding on how wait statistics are generated and what they mean for your SQL Server instance's performance. The book goes beyond the most common wait types into the more complex and performance-threatening wait types. You'll learn about per-query wait statistics and session-based wait statistics, and the types of problems they can help you solve. The different wait types are categorized by their area of impact, including CPU, IO, Latching, Locking, and many more.Clear examples are included to help you gain practical knowledge of why and how specific wait times increase or decrease, how they impact your SQL Server's performance, and what you can do to improve performance. After reading this book, you won't want to be without the valuable information that wait statistics provide regarding where you should be spending your limited tuning time to maximize performance and value to your business. What You'll LearnUnderstand how the SQL Server engine processes requestsIdentify resource bottlenecks in a running SQL Server instanceLocate wait statistics information inside DMVs and Query StoreAnalyze the root cause of sub-optimal performanceDiagnose I/O contention and locking contentionBenchmark SQL Server performanceImprove database performance by lowering overall wait time Who This Book Is ForDatabase administrators who want to identify and resolve performance bottlenecks, those who want to learn more about how the SQL Server engine accesses and uses resources inside SQL Server, and administrators concerned with achieving--and knowing they have achieved--optimal performance
Analysis of Images, Social Networks and Texts
This book constitutes revised selected papers from the thoroughly refereed proceedings of the 10th International Conference on Analysis of Images, Social Networks and Texts, AIST 2021, held in Tbilisi, Georgia, during December 16-18, 2021. The 20 full papers and 5 short papers included in this book were carefully reviewed and selected from 118 submissions. They were organized in topical sections as follows: Invited papers; natural language processing; computer vision; data analysis and machine learning; social network analysis; and theoretical machine learning and optimization.
Building the Snowflake Data Cloud
Implement the Snowflake Data Cloud using best practices and reap the benefits of scalability and low-cost from the industry-leading, cloud-based, data warehousing platform. This book provides a detailed how-to explanation, and assumes familiarity with Snowflake core concepts and principles. It is a project-oriented book with a hands-on approach to designing, developing, and implementing your Data Cloud with security at the center. As you work through the examples, you will develop the skill, knowledge, and expertise to expand your capability by incorporating additional Snowflake features, tools, and techniques. Your Snowflake Data Cloud will be fit for purpose, extensible, and at the forefront of both Direct Share, Data Exchange, and Snowflake Marketplace.Building the Snowflake Data Cloud helps you transform your organization into monetizing the value locked up within your data. As the digital economy takes hold, with data volume, velocity, and variety growing at exponential rates, you need tools and techniques to quickly categorize, collate, summarize, and aggregate data. You also need the means to seamlessly distribute to release value. This book shows how Snowflake provides all these things and how to use them to your advantage. The book helps you succeed by delivering faster than you can deliver with legacy products and techniques. You will learn how to leverage what you already know, and what you don't, all applied in a Snowflake Data Cloud context. After reading this book, you will discover and embrace the future where the Data Cloud is central. You will be able to position your organization to take advantage by identifying, adopting, and preparing your tooling for the coming wave of opportunity around sharing and monetizing valuable, corporate data. What You Will LearnUnderstand why Data Cloud is important tothe success of your organizationUp-skill and adopt Snowflake, leveraging the benefits of cloud platformsArticulate the Snowflake Marketplace and identify opportunities to monetize dataIdentify tools and techniques to accelerate integration with Data CloudManage data consumption by monitoring and controlling access to datasetsDevelop data load and transform capabilities for use in future projects Who This Book Is ForSolution architects seeking implementation patterns to integrate with a Data Cloud; data warehouse developers looking for tips, tools, and techniques to rapidly deliver data pipelines; sales managers who want to monetize their datasets and understand the opportunities that Data Cloud presents; and anyone who wishes to unlock value contained within their data silos
Databases Theory and Applications
This book constitutes the refereed proceedings of the 33rd International Conference on Databases Theory and Applications, ADC 2022, held in Sydney, Australia, in September 2022. The conference is co-located with the 48th International Conference on Very Large Data Bases, VLDB 2022. The 9 full papers presented together with 8 short papers were carefully reviewed and selected from 36 submissions. ADC focuses on database systems, data-driven applications, and data analytics.
Advances in Databases and Information Systems
This book constitutes the proceedings of the 26th European Conference on Advances in Databases and Information Systems, ADBIS 2022, held in Turin, Italy, in September 2022.The 23 full papers presented together with 5 keynote and tutorial papers were carefully reviewed and selected from 90 submissions. The papers are organized in the following topical sections: ​keynote talk and tutorials; graph processing; time series and data streams; on line analytical processing; advanced querying; performance; machine learning; data science methods.
Mastering MongoDB 6.x - Third Edition
Design and build solutions with the most powerful document database, MongoDBKey Features: Learn from the experts about every new feature in MongoDB 6 and 5Develop applications and administer clusters using MongoDB on premise or in the cloudExplore code-rich case studies showcasing MongoDB's major features followed by best practicesBook Description: MongoDB is a leading non-relational database. This book covers all the major features of MongoDB including the latest version 6. MongoDB 6.x adds many new features and expands on existing ones such as aggregation, indexing, replication, sharding and MongoDB Atlas tools. Some of the MongoDB Atlas tools that you will master include Atlas dedicated clusters and Serverless, Atlas Search, Charts, Realm Application Services/Sync, Compass, Cloud Manager and Data Lake.By getting hands-on working with code using realistic use cases, you will master the art of modeling, shaping and querying your data and become the MongoDB oracle for the business. You will focus on broadly used and niche areas such as optimizing queries, configuring large-scale clusters, configuring your cluster for high performance and availability and many more. Later, you will become proficient in auditing, monitoring, and securing your clusters using a structured and organized approach.By the end of this book, you will have grasped all the practical understanding needed to design, develop, administer and scale MongoDB-based database applications both on premises and on the cloud.What You Will Learn: Understand data modeling and schema design, including smart indexingMaster querying data using aggregationUse distributed transactions, replication and sharding for better resultsAdminister your database using backups and monitoring toolsSecure your cluster with the best checklists and adviceMaster MongoDB Atlas, Search, Charts, Serverless, Realm, Compass, Cloud Manager and other tools offered in the cloud or on premisesIntegrate MongoDB with other big data sourcesDesign and deploy MongoDB in mobile, IoT and serverless environmentsWho this book is for: This book is for MongoDB developers and database administrators who want to learn how to model their data using MongoDB in depth, for both greenfield and existing projects. An understanding of MongoDB, shell command skills and basic database design concepts is required to get the most out of this book.
Service-Oriented Computing - Icsoc 2021 Workshops
This book constitutes the selected papers from the scientific satellite events held in conjunction with the19th International Conference on Service-Oriented Computing, ICSOC 2021. The conference was held Dubai, United Arab Emirates in November 2021.This year, these satellite events were organized around three main tracks, including a workshop track, a demonstration track, and a tutorials track. The ICSOC 2021 workshop track consisted of the following three workshops covering a wide range of topics that fall into the general area of service computing. - International Workshop on Artificial Intelligence for IT Operations (AIOps) - 3rd Workshop on Smart Data Integration and Processing (STRAPS 2021) - International Workshop on AI-enabled Process Automation (AI-PA 2021)
Data Quality Engineering in Financial Services
Data quality will either make you or break you in the financial services industry. Missing prices, wrong market values, trading violations, client performance restatements, and incorrect regulatory filings can all lead to harsh penalties, lost clients, and financial disaster. This practical guide provides data analysts, data scientists, and data practitioners in financial services firms with the framework to apply manufacturing principles to financial data management, understand data dimensions, and engineer precise data quality tolerances at the datum level and integrate them into your data processing pipelines. You'll get invaluable advice on how to: Evaluate data dimensions and how they apply to different data types and use cases Determine data quality tolerances for your data quality specification Choose the points along the data processing pipeline where data quality should be assessed and measured Apply tailored data governance frameworks within a business or technical function or across an organization Precisely align data with applications and data processing pipelines And more
How To Gather And Use Data For Business Analysis
These days the business world is full of talk about data science, big data, and how data analysis can transform your business.And it's absolutely true that collecting the right information, in the right way, analyzing that information, and then using it effectively to manage your business can give your business a competitive edge.But most businesses don't need to go so far as big data and data science. They just need to understand and implement some basic steps for gathering the right information about their business and using it effectively.Leveraging over twenty years of experience using data for business, M.L. Humphrey will walk you through what you need to know to help improve your bottom line today.So don't wait. Get started now.This title was previously published as Data Principles for Beginners.
Database Systems
This textbook is ideally suited for an undergraduate course in database systems. The discipline of database systems design and management is discussed within the context of software engineering. The student is made to understand from the outset that a database is a mission-critical component of a software system.
Psychology, Learning, Technology
This open access book constitutes the refereed proceedings of 1st International Workshop on Psychology, Learning, Technology, PLT 2022, Foggia, Italy, during January 2022. The 8 full papers presented here were carefully reviewed and selected from 23 submissions. In addition, one invited paper is also included. Psychology, Learning, ad Technology Conference (PLT2022) aims to explore learning paths that incorporate digital technologies in innovative and transformative ways and the improvement of the psychological and relational life. The conference includes topics about the methodology of application of the ICT tools in psychology and education: from blended learning to the application of artificial intelligence in education; from the teaching, learning, and assessment strategies and practices to the new frontiers on Human-Computer Interaction.
Experimental IR Meets Multilinguality, Multimodality, and Interaction
This book constitutes the refereed proceedings of the 13th International Conference of the CLEF Association, CLEF 2022, held in Bologna, Italy in September 2022.The conference has a clear focus on experimental information retrieval with special attention to the challenges of multimodality, multilinguality, and interactive search ranging from unstructured to semi structures and structured data. The 7 full papers presented together with 3 short papers in this volume were carefully reviewed and selected from 14 submissions. This year, the contributions addressed the following challenges: authorship attribution, fake news detection and news tracking, noise-detection in automatically transferred relevance judgments, impact of online education on children's conversational search behavior, analysis of multi-modal social media content, knowledge graphs for sensitivity identification, a fusion of deep learning and logic rules for sentiment analysis, medical concept normalization and domain-specific information extraction. In addition to this, the volume presents 7 "best of the labs" papers which were reviewed as full paper submissions with the same review criteria. 14 lab overview papers were accepted and represent scientific challenges based on new datasets and real world problems in multimodal and multilingual information access.
Graph-Theoretic Concepts in Computer Science
This LNCS 13453 constitutes the thoroughly refereed proceedings of the 48th International Workshop on Graph-Theoretic Concepts in Computer Science, WG 2022.The 32 full papers presented in this volume were carefully reviewed and selected from a total of 96 submissions. The WG 2022 workshop aims to merge theory and practice by demonstrating how concepts from Graph Theory can be applied to various areas in Computer Science, or by extracting new graph theoretic problems from applications.
Integrating Data
Overcome the challenges, appreciate the varieties, and apply the process of data integration. Learn all about data integration and become a data integration hero instead of following the masses and running in the opposite direction at the mere mention of the word "integration". Understand why organizations avoid data integration and often wind up with spider web environments containing siloed applications instead of an enterprise database which excites analysts and data scientists. Distinguish the different types of integration: database, attribute, key, index, encoding, measurement, format, definition, KPI, calculations, summarization, selection criteria, data exclusion, lineage, and timing. Apply identification, equivocation, and physical conversion levels of integration for both structured and textual data. Leverage deidentification, proximity analysis, alternate spelling, stop word resolution, homographic resolution, stemming, taxonomical resolution, inline contextualization, classification, and acronym resolution. Learn how to combine structured and textual data in the context of three levels of interaction. Follow the steps of scope, model, and map in integrating structured data. Follow the steps of scope, connect taxonomies, ingest raw text, and determine analytical processes in integrating textual data. Apply integration best practices, including identifying integration roles, developing a reusable data integration process, and documenting the integration benefits. Compare taxonomies with data models. Know how data integration helps data science. To reinforce all of the concepts within the book, we include a detailed case study on data integration.
Algorithmic Aspects in Information and Management
This book constitutes the proceedings of the 16th International Conference on Algorithmic Aspects in Information and Management, AAIM 2022, which was held online during August 13-14, 2022. The conference was originally planned to take place in Guangzhou, China, but changed to a virtual event due to the COVID-19 pandemic.The 41 regular papers included in this book were carefully reviewed and selected from 59 submissions.
Applied Meta-Analysis with R and Stata
In biostatistical research and courses, practitioners and students often lack a thorough understanding of how to apply statistical methods to synthesize biomedical and clinical trial data. Filling this knowledge gap, this book shows how to implement statistical meta-analysis methods to real data using R and Stata.
Main Memory Management on Relational Database Systems
This book provides basic knowledge about main memory management in relational databases as it is needed to support large-scale applications processed completely in memory. In business operations, real-time predictability and high speed is a must. Hence every opportunity must be exploited to improve performance, including reducing dependency on the hard disk, adding more memory to make more data resident in the memory, and even deploying an in-memory system where all data can be kept in memory. The book provides one chapter for each of the main related topics, i.e. the memory system, memory management, virtual memory, and databases and their memory systems, and it is complemented by a short survey of six commercial systems: TimesTen, MySQL, VoltDB, Hekaton, HyPer/ScyPer, and SAP HANA.
Python for Data Analysis
Get the definitive handbook for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.10 and pandas 1.4, the third edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You'll learn the latest versions of pandas, NumPy, and Jupyter in the process. Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It's ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub. Use the Jupyter notebook and IPython shell for exploratory computing Learn basic and advanced features in NumPy Get started with data analysis tools in the pandas library Use flexible tools to load, clean, transform, merge, and reshape data Create informative visualizations with matplotlib Apply the pandas groupby facility to slice, dice, and summarize datasets Analyze and manipulate regular and irregular time series data Learn how to solve real-world data analysis problems with thorough, detailed examples
Snowflake: The Definitive Guide
Snowflake's ability to eliminate data silos and run workloads from a single platform creates opportunities to democratize data analytics, allowing users at all levels within an organization to make data-driven decisions. Whether you're an IT professional working in data warehousing or data science, a business analyst or technical manager, or an aspiring data professional wanting to get more hands-on experience with the Snowflake platform, this book is for you. You'll learn how Snowflake users can build modern integrated data applications and develop new revenue streams based on data. Using hands-on SQL examples, you'll also discover how the Snowflake Data Cloud helps you accelerate data science by avoiding replatforming or migrating data unnecessarily. You'll be able to: Efficiently capture, store, and process large amounts of data at an amazing speed Ingest and transform real-time data feeds in both structured and semistructured formats and deliver meaningful data insights within minutes Use Snowflake Time Travel and zero-copy cloning to produce a sensible data recovery strategy that balances system resilience with ongoing storage costs Securely share data and reduce or eliminate data integration costs by accessing ready-to-query datasets available in the Snowflake Marketplace
Knowledge and Systems Sciences
This book constitutes the refereed proceedings of the 21st International Symposium on Knowledge and Systems Sciences, KSS 2022, held in Beijing, China, in June 2022.The 14 revised full papers and 3 short paper presented were carefully reviewed and selected from 51 submissions. The papers are organized in topical secions on ​data mining and machine learning; model-based systems engineering; complex systems modeling and knowledge technologies.
Innovation Practices for Digital Transformation in the Global South
This book is a collection of chapters from the IFIP working groups 13.8 and 9.4. The 10 papers included present experiences and research on the topic of digital transformation and innovation practices in the global south. The topics span from digital transformation initiatives to novel innovative technological developments, practices and applications of marginalised people in the global south.
Privacy in Statistical Databases
​This book constitutes the refereed proceedings of the International Conference on Privacy in Statistical Databases, PSD 2022, held in Paris, France, during September 21-23, 2022.The 25 papers presented in this volume were carefully reviewed and selected from 45 submissions. They were organized in topical sections as follows: Privacy models; tabular data; disclosure risk assessment and record linkage; privacy-preserving protocols; unstructured and mobility data; synthetic data; machine learning and privacy; and case studies.
Data Analytics and Management in Data Intensive Domains
This book constitutes the post-conference proceedings of the 23rd International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2021, held in Moscow, Russia, in October 2021*.The 16 revised full papers were carefully reviewed and selected from 61 submissions. The papers are organized in the following topical sections: problem solving infrastructures, experiment organization, and machine learning applications; data analysis in astronomy; data analysis in material and earth sciences; information extraction from text* The conference was held virtually due to the COVID-19 pandemic.
Database and Expert Systems Applications - DEXA 2022 Workshops
This volume constitutes the refereed proceedings of the workshops held at the 33rd International Conference on Database and Expert Systems Applications, DEXA 2022, held in Vienna, Austria, in August 2022: The 6th International Workshop on Cyber-Security and Functional Safety in Cyber-Physical Systems (IWCFS 2022); 4th International Workshop on Machine Learning and Knowledge Graphs (MLKgraphs 2022); 2nd International Workshop on Time Ordered Data (ProTime2022); 2nd International Workshop on AI System Engineering: Math, Modelling and Software (AISys2022); 1st International Workshop on Distributed Ledgers and Related Technologies (DLRT2022); 1st International Workshop on Applied Research, Technology Transfer and Knowledge Exchange in Software and Data Science (ARTE2022). The 40 papers were thoroughly reviewed and selected from 62 submissions, and discuss a range of topics including: knowledge discovery, biological data, cyber security, cyber-physical system, machine learning, knowledge graphs, information retriever, data base, and artificial intelligence.
Simplifying Data Engineering and Analytics with Delta
Explore how Delta brings reliability, performance, and governance to your data lake and all the AI and BI use cases built on top of itKey Features: Learn Delta's core concepts and features as well as what makes it a perfect match for data engineering and analysisSolve business challenges of different industry verticals using a scenario-based approachMake optimal choices by understanding the various tradeoffs provided by DeltaBook Description: Delta helps you generate reliable insights at scale and simplifies architecture around data pipelines, allowing you to focus primarily on refining the use cases being worked on. This is especially important when you consider that existing architecture is frequently reused for new use cases.In this book, you'll learn about the principles of distributed computing, data modeling techniques, and big data design patterns and templates that help solve end-to-end data flow problems for common scenarios and are reusable across use cases and industry verticals. You'll also learn how to recover from errors and the best practices around handling structured, semi-structured, and unstructured data using Delta. After that, you'll get to grips with features such as ACID transactions on big data, disciplined schema evolution, time travel to help rewind a dataset to a different time or version, and unified batch and streaming capabilities that will help you build agile and robust data products.By the end of this Delta book, you'll be able to use Delta as the foundational block for creating analytics-ready data that fuels all AI/BI use cases.What You Will Learn: Explore the key challenges of traditional data lakesAppreciate the unique features of Delta that come out of the boxAddress reliability, performance, and governance concerns using DeltaAnalyze the open data format for an extensible and pluggable architectureHandle multiple use cases to support BI, AI, streaming, and data discoveryDiscover how common data and machine learning design patterns are executed on DeltaBuild and deploy data and machine learning pipelines at scale using DeltaWho this book is for: Data engineers, data scientists, ML practitioners, BI analysts, or anyone in the data domain working with big data will be able to put their knowledge to work with this practical guide to executing pipelines and supporting diverse use cases using the Delta protocol. Basic knowledge of SQL, Python programming, and Spark is required to get the most out of this book.