Mastering PostgreSQL 17 - Sixth Edition
Learn advanced PostgreSQL techniques vital for everyday operation and security with a focus on PostgreSQL 17, its new features, and evolving real-world applicationsKey Features: - Optimize queries and performance for PostgreSQL installations- Secure databases with advanced access controls and encryption- Master replication, backups, and disaster recovery strategies- Purchase of the print or Kindle book includes a free PDF eBookBook Description: Starting with new features introduced in PostgreSQL 17, the sixth edition of this book provides comprehensive insights into advanced database management, helping you elevate your PostgreSQL skills to an expert level. Written by Hans-J羹rgen Sch繹nig, a PostgreSQL expert with over 25 years of experience and the CEO of CYBERTEC PostgreSQL International GmbH, this guide distills real-world expertise from supporting countless global customers. It guides you through crucial aspects of professional database management, including performance tuning, replication, indexing, and security strategies.You'll learn how to handle complex queries, optimize execution plans, and enhance user interactions with advanced SQL features such as window functions and JSON support. Hans equips you with practical approaches for managing database locks, transactions, and stored procedures to ensure peak performance. With real-world examples and expert solutions, you'll also explore replication techniques for high availability, along with troubleshooting methods to detect and resolve bottlenecks, preparing you to tackle everyday challenges in database administration.By the end of the book, you'll be ready to deploy, secure, and maintain PostgreSQL databases efficiently in production environments.What You Will Learn: - Deploy and manage PostgreSQL in production environments- Improve database throughput and ensure speedy responses from PostgreSQL- Utilize indexes, partitions, and full-text search- Handle transactions, locking, and concurrency issues- Secure PostgreSQL with encryption and access controls- Implement replication for high availability- Get to grips with handling redundancy and data safety- Fix the most common issues and real-world problems faced by PostgreSQL usersWho this book is for: This book is for database administrators, PostgreSQL developers, and IT professionals who want to implement advanced functionalities and tackle complex administrative tasks using PostgreSQL 17. A foundational understanding of PostgreSQL and core database concepts is essential, along with familiarity with SQL. Prior experience in database administration will enhance your ability to leverage the advanced techniques presented in this book.Table of Contents- What is New in PostgreSQL 17- Understanding Transactions and Locking- Making Use of Indexes- Handling Advanced SQL- Log Files and System Statistics- Optimizing Queries for Good Performance- Writing Stored Procedures- Managing PostgreSQL Security- Handling Backup and Recovery- Making Sense of Backups and Replication- Deciding on Useful Extensions- Troubleshooting PostgreSQL- Migrating to PostgreSQL
Cracking the Data Code
Why do we continue to struggle with data? With all the powerful tools we have in processing power, data tools, and computer programming, we still search for some elusive truth to pervasive problems. AI hallucinates, 'good data' that we started with is suddenly unintelligible, systems that should talk to each other seamlessly continually experience errors and need correction.What we fail to incorporate into our data world is the fact that data is language and has entwined in that language its own code that does not get captured in databases, APIs, LLMs and the systems we use day in and day out. So, how can we crack this data code?By stepping back, we can incorporate the tools that already exist in applied linguistics used to crack the human language code into our approaches in how we tackle the data code challenge. Just because we call it data doesn't mean that it doesn't suffer from bias or the need for context. But by recognizing these linguistic challenges, and infusing that inside the data, we can create data code that can be cracked, data that tells us its biases, context, and purpose, and for who that data is actually useful to, and for whom it is not.If you are interested in data, and why understanding language and jargon can help you crack the data code, this book is for you. If you've had a data challenge and have struggled to find a way to understand it, the practical foundational principles inside can help you frame your problem in a different way. And in doing so, help you crack the data code.
Sustainability and Empowerment in the Context of Digital Libraries
The two-volume set LNCS 15493 and LNCS 15494 constitutes the refereed proceedings of the 26th International Conference on Asia-Pacific Digital Libraries, ICADL 2024, held in Bandar Sunway, Malaysia, during December 4-6, 2024. The 19 full papers, 10 short papers, 7 posters and 2 practice papers presented were carefully reviewed and selected from 110 submissions. These papers are included in both volumes of the proceedings, grouped according to the following topics: Cultural Data Analysis, Design & Evaluation, Generative AI & Digital Libraries, Information Retrieval, Information Seeking & Use (Part I) and Knowledge Extraction, Scholarly Information Processing, and Social Media Analytics in Part II.
Sustainability and Empowerment in the Context of Digital Libraries
The two-volume set LNCS 15493 and LNCS 15494 constitutes the refereed proceedings of the 26th International Conference on Asia-Pacific Digital Libraries, ICADL 2024, held in Bandar Sunway, Malaysia, during December 4-6, 2024. The 19 full papers, 10 short papers, 7 posters and 2 practice papers presented were carefully reviewed and selected from 110 submissions. These papers are included in both volumes of the proceedings, grouped according to the following topics: Cultural Data Analysis, Design & Evaluation, Generative AI & Digital Libraries, Information Retrieval, Information Seeking & Use (Part I) and Knowledge Extraction, Scholarly Information Processing, and Social Media Analytics in Part II.
Duckdb: Up and Running
DuckDB, an open source in-process database created for OLAP workloads, provides key advantages over more mainstream OLAP solutions: It's embeddable and optimized for analytics. It also integrates well with Python and is compatible with SQL, giving you the performance and flexibility of SQL right within your Python environment. This handy guide shows you how to get started with this versatile and powerful tool. Author Wei-Meng Lee takes developers and data professionals through DuckDB's primary features and functions, best practices, and practical examples of how you can use DuckDB for a variety of data analytics tasks. You'll also dive into specific topics, including how to import data into DuckDB, work with tables, perform exploratory data analysis, visualize data, perform spatial analysis, and use DuckDB with JSON files, Polars, and JupySQL. Understand the purpose of DuckDB and its main functions Conduct data analytics tasks using DuckDB Integrate DuckDB with pandas, Polars, and JupySQL Use DuckDB to query your data Perform spatial analytics using DuckDB's spatial extension Work with a diverse range of data including Parquet, CSV, and JSON
Visual Analytics Using Tableau
DESCRIPTION Tableau is one of the leading business intelligence and data visualization tools that fulfill almost all the requirements for getting insights from huge data and solving complex business queries using simple and complex visualizations. This book covers all the features supported by Tableau, from basics to advanced.Master data storytelling with Tableau by learning to connect, clean, and analyze data from various sources. This book covers essential chart types like bar, line, and pie charts while introducing advanced features like filters, LOD expressions, and dual-axis charts. Create interactive dashboards by combining visualizations, adding controls, and customizing designs to engage your audience. Use storytelling techniques to present insights effectively. With advanced visualizations like combo and Gantt charts, this guide equips you with the skills to communicate data clearly and make informed, data-driven decisions.This book begins with very basic information that even a beginner can understand. Gradually, the book covers intermediate and advanced features of Tableau, so it can help readers of all levels become experts in Tableau.WHAT YOU WILL LEARN● Understand different types of data sources and how to connect them.● Learn techniques for cleaning and preparing data for visualization.● Perform calculations, aggregations, and level of detail (LOD) expressions.● Create both, simple and advanced visualizations to present data.● Design visually engaging dashboards and storyboards to answer business questions effectively.WHO THIS BOOK IS FORThis book is for students and professionals who want to learn and have a rewarding career in data visualization using Tableau. It is also for anyone who wants to become a data analyst, Tableau developer, business analyst, etc.
Graph Data Analytics
DESCRIPTION For most modern-day data, graph data models are proving to be advantageous since they facilitate a diverse range of data analyses. This has spiked the interest and usage of graph databases, especially Neo4j. We study Neo4j and cypher along with various plugins that augment database capabilities in terms of data types or facilitate applications in data science and machine learning using plugins like graph data science (GDS).A significant portion of the book is focused on discussing the structure and usage of graph algorithms. Readers will gain insights into well-known algorithms like shortest path, PageRank, or Label Propagation among others, and how one can apply these algorithms in real-world scenarios within a Neo4j graph.Once readers become acquainted with the various algorithms applicable to graph analysis, we transition to data science problems. Here, we explore how a graph's structure and algorithms can enhance predictive modeling, prediction of connections in the graph, etc. In conclusion, we demonstrate that beyond its prowess in data analysis, Neo4j can be tweaked in a production setup to handle large data sets and queries at scale, allowing more complex and sophisticated analyses to come to life.KEY FEATURES ● Utilizing graphs to improve search and recommendations on graph data models.● Understand GDS and Neo4j graph algorithms including cluster detection, link prediction, and centrality.● Complex problem-solving for predicting connections, application in ML pipelines and GNNs using graphs. WHAT YOU WILL LEARN● Understand Neo4j graphs and how to effectively query them with cypher.● Learn to employ graphs for effective search and recommendations around graph data.● Work with graph algorithms to solve problems like finding paths, centrality metrics, and detection of communities and clusters.● Explore Neo4j's GDS library through practical examples.● Integrate machine learning with Neo4j graphs, covering data prep, feature extraction, and model training.WHO THIS BOOK IS FORThe book is intended to serve as a reference for data scientists, business analysts, graph enthusiasts, and database developers and administrators who work or intend to work on extracting critical insights from graph-based data stores.
The Stories Behind the Numbers
Unlock the Power of Data to Make Smarter DecisionsIn Data-Driven Decisions: Making Better Decisions Using Insights, Roy Okonkwo demystifies the art of transforming raw numbers into meaningful narratives. Blending practical strategies with ethical considerations, this book empowers readers to leverage data for impactful decision-making in both personal and professional contexts. With accessible explanations of key concepts like predictive analytics, storytelling with data, and AI, Okonkwo offers tools to navigate the complexities of a data-driven world. Whether you're a novice or a seasoned professional, this book is your guide to making informed, ethical, and effective decisions.
The Semantic Web - Iswc 2024
This three-volume set constitutes the proceedings of the 23rd International Semantic Web Conference, ISWC 2023, held in Hanover, MD, USA, during November 11-15, 2024. The 44 full papers presented in these proceedings were carefully reviewed and selected from 155 submissions. This conference focuses on research on the Semantic Web, including benchmarks, knowledge graphs, tools and vocabularies.
The Semantic Web - Iswc 2024
This three-volume set constitutes the proceedings of the 23rd International Semantic Web Conference, ISWC 2023, held in Hanover, MD, USA, during November 11-15, 2024. The 44 full papers presented in these proceedings were carefully reviewed and selected from 155 submissions. This conference focuses on research on the Semantic Web, including benchmarks, knowledge graphs, tools and vocabularies.
The Semantic Web - Iswc 2024
This three-volume set constitutes the proceedings of the 23rd International Semantic Web Conference, ISWC 2023, held in Hanover, MD, USA, during November 11-15, 2024. The 44 full papers presented in these proceedings were carefully reviewed and selected from 155 submissions. This conference focuses on research on the Semantic Web, including benchmarks, knowledge graphs, tools and vocabularies. Chapters 10 and 11 are available open access under a Creative Commons Attribution 4.0 International License via link.springer.com.
Biostatistics with Python
Learn how to utilize biostatistics with Python for excelling in research and biomedical professions with practical exemplar projectsKey Features: - Bridge the gap between biostatistics and life sciences with Python- Work with practical exercises for real-world data analysis in biology and medicine- Access a portfolio of exemplar projects in the domains of biomedicine, biotechnology, and biology- Purchase of the print or Kindle book includes a free PDF eBookBook Description: This book leverages the author's decade-long experience in biostatistics and data science to simplify the practical use of biostatistics with Python. The chapters show you how to clean and describe your data effectively, setting a solid foundation for accurate analysis and proficiency in biostatistical inference to help you draw meaningful conclusions from your data through hypothesis testing and effect size analysis.The book walks you through predictive modeling to harness the power of Python to create robust predictive analytics that can drive your research and professional projects forward. You'll explore clinical biostatistics, learn how to design studies, conduct survival analysis, and synthesize evidence from multiple studies with meta-analysis - skills that are crucial for making informed decisions based on comprehensive data reviews. The concluding chapters will enhance your ability to analyze biological variables, enabling you to perform detailed and accurate data analysis for biological research. This book's unique blend of biostatistics and Python helps you find practical solutions that make complex concepts easy to grasp and apply.By the end of this biostatistics book, you'll have moved from theoretical knowledge to practical experience, allowing you to perform biostatistical analysis confidently and accurately.What You Will Learn: - Get to grips with the basics of biostatistics and Python programming- Clean and describe data using Python- Familiarize yourself with hypothesis testing and effect size analysis- Explore predictive modeling in biostatistics- Understand clinical study design and survival analysis- Gain a clear understanding of the meta-analysis of clinical research data- Analyze biological variables with Python- Discover practical data analysis for biological researchWho this book is for: This book is for life science professionals, researchers, biomedical professionals, and aspiring biostatisticians who want to integrate biostatistics into their work or research. A basic understanding of life sciences, biology, or medicine is recommended to fully benefit from this book.Table of Contents: - Introduction to Biostatistics- Getting Started with Python for Biostatistics- Exercise 1 - Cleaning and Describing Data Using Python- Part 1 Exemplar Project - Load, Clean, and Describe Diabetes Data in Python- Introduction to Python for Biostatistics- Biostatistical Inference Using Hypothesis Tests and Effect Sizes- Predictive Biostatistics Using Python- Part 2 Exercise - T-Test, ANOVA, and Linear and Logistic Regression- Biostatistical Inference and Predictive Analytics Using Cardiovascular Study Data- Clinical Study Design- Survival Analysis in Biomedical Research- Meta-Analysis - Synthesizing Evidence from Multiple Studies- Survival Predictive Analysis and Meta-Analysis Practice- Part 3 Exemplar Project - Meta-Analysis of Survival Data in Clinical Research- Understanding Biological Variables- Data Analysis Frameworks and Performance for Life Sciences Research- Part 4 Exercise - Performing Statistics for Biology Studies in Python
CI/CD Design Patterns
Learn CI/CD design patterns directly from industry leaders in this easy-to-follow guide, offering immediately applicable solutions for sustainable CI/CD adoptionKey Features: - Simplify CI/CD adoption with the help of practical examples, case studies, and best practices- Deploy market-ready solutions by implementing key components like pipelines, infrastructure, and release capabilities- Explore advanced design patterns, including integration with machine learning and generative AI- Purchase of the print or Kindle book includes a free PDF eBookBook Description: The fast-changing world of software development demands robust CI/CD solutions that go beyond traditional methods to address the complexities of modern pipelines. This practical guide presents proven design patterns to streamline your CI/CD processes, tackling pain points often overlooked by other resources. This book introduces continuous delivery design patterns to help practitioners and engineering teams design, adopt, and implement CI/CD. Drawing from decades of combined industry experience, the expert author team-including DevOps and cloud leader Garima Bajpai, industry expert Michel Schildmeijer, CI/CD framework creator Pawel Piwosz, and open source advocate Muktesh Mishra-provides invaluable insights from leading voices in the industry.The book lays a solid foundation by starting with the importance of CI/CD design patterns, components, and principles. You'll learn strategies for scaling CI/CD with a focus on performance, security, measurements, and pipeline auditability, along with infrastructure and release automation. The book also covers advanced design patterns that integrate machine learning, generative AI, and near real-time CI/CD processes.By the end of this book, you'll have a deep understanding of continuous delivery design patterns, a solid foundation for audits and controls, and be able to mitigate risks associated with the rapid integration of modern technology into the SDLC.What You Will Learn: - Use and manage CI/CD patterns effectively to design and implement CI/CD- Understand types of CI/CD design patterns and components- Discover relationships and interactions between tools within CI/CD- Implement well-tested development/design paradigms- Explore anti-patterns for CI/CD design pattern deployments- Master the taxonomy of assessment and audits for CI/CD design patterns- Get to grips with automation techniques for seamless CI/CD workflows- Gain insights into scaling CI/CD processes for large projectsWho this book is for: This design patterns book is for senior software developers, software architects, SRE architects, DevOps architects, cloud architects and platform engineering teams looking to speed up the development process by adopting well-tested, proven development/design paradigms for continuous delivery and its adoption. You are expected to have a basic understanding of CI/CD concepts and be familiar with the cloud ecosystem, DevOps principles, and CI/CD pipelines. Table of Contents- Foundations of CI/CD Design Patterns- Understanding Types of CI/CD Design Patterns and Their Components- Advancing on CI/CD Design Patterns - from Testing to Deployment- Business Outcome Alignment with CI/CD Design Patterns- Exploring Structural CI/CD Design Patterns- Deployment Strategies for Structural Design Patterns for CI/CD- Understanding Behavioral Design Patterns for CI/CD- Domain-Driven Design Patterns for Regulated Sectors- Applying Creational CI/CD Design Patterns- Understanding Deployment Strategies - Creational CI/CD with Cloud Providers(N.B. Please use the Read Sample option to see further chapters)
Business 101 for the Data Professional
This new book from bestselling author Jordan Morrow empowers data professionals to work and operate more effectively in an organizational setting, equipping them with key business knowledge and skills. It is vital for data professionals to understand the business needs and outcomes of the organizations they work for and collaborate effectively with non-technical colleagues. Business 101 for the Data Professional is the definitive guide for data professionals looking to upskill their organizational effectiveness and enhance their career prospects. From business strategy to different business areas such as product, marketing, sales and operations to data monetization and value, the book explains how these contribute to the business, and, crucially, the role that data plays in supporting them. Business 101 for the Data Professional explores how to navigate key challenges and pitfalls of data in business, such as bias, misuse of data and the balancing of data and technical debt. It shows how to build networking, influencing and relationship building skills and outlines the key principles of strong communication and data storytelling, explaining how these can be used to engage effectively with internal and external stakeholders such as clients. It is supported by examples, summaries of key learnings, and exercises at the end of each chapter to help readers detail their progress and map out their goals.
Business 101 for the Data Professional
This new book from bestselling author Jordan Morrow empowers data professionals to work and operate more effectively in an organizational setting, equipping them with key business knowledge and skills. It is vital for data professionals to understand the business needs and outcomes of the organizations they work for and collaborate effectively with non-technical colleagues. Business 101 for the Data Professional is the definitive guide for data professionals looking to upskill their organizational effectiveness and enhance their career prospects. From business strategy to different business areas such as product, marketing, sales and operations to data monetization and value, the book explains how these contribute to the business, and, crucially, the role that data plays in supporting them. Business 101 for the Data Professional explores how to navigate key challenges and pitfalls of data in business, such as bias, misuse of data and the balancing of data and technical debt. It shows how to build networking, influencing and relationship building skills and outlines the key principles of strong communication and data storytelling, explaining how these can be used to engage effectively with internal and external stakeholders such as clients. It is supported by examples, summaries of key learnings, and exercises at the end of each chapter to help readers detail their progress and map out their goals.
Managing Data as a Product
Learn everything you need to know to manage data as a product and shift toward a more modular and decentralized socio-technical data architecture to deliver business value in an incremental, measurable, and sustainable wayKey Features: - Leverage data-as-product to unlock the modular platform potential and fix flaws in traditional monolithic architectures- Learn how to identify, implement, and operate data products throughout their life cycle- Design and execute a forward-thinking strategy to turn your data products into organizational assets- Purchase of the print or Kindle book includes a free PDF eBookBook Description: Traditional monolithic data platforms struggle with scalability and burden central data teams with excessive cognitive load, leading to challenges in managing technological debt. As maintenance costs escalate, these platforms lose their ability to provide sustained value over time. With two decades of hands-on experience implementing data solutions and his pioneering work in the Open Data Mesh Initiative, Andrea Gioia brings practical insights and proven strategies for transforming how organizations manage their data assets.Managing Data as a Product introduces a modular and distributed approach to data platform development, centered on the concept of data products. In this book, you'll explore the rationale behind this shift, understand the core features and structure of data products, and learn how to identify, develop, and operate them in a production environment. The book guides you through designing and implementing an incremental, value-driven strategy for adopting data product-centered architectures, including strategies for securing buy-in from stakeholders. Additionally, it explores data modeling in distributed environments, emphasizing its crucial role in fully leveraging modern generative AI solutions.By the end of this book, you'll have gained a comprehensive understanding of product-centric data architecture and the essential steps needed to adopt this modern approach to data management.What You Will Learn: - Overcome the challenges in scaling monolithic data platforms, including cognitive load, tech debt, and maintenance costs- Discover the benefits of adopting a data-as-a-product approach for scalability and sustainability- Navigate the complete data product lifecycle, from inception to decommissioning- Automate data product lifecycle management using a self-serve platform- Implement an incremental, value-driven strategy for transitioning to data-product-centric architectures- Optimize data modeling in distributed environments to enhance GenAI-based use casesWho this book is for: If you're an experienced data engineer, data leader, architect, or practitioner committed to reimagining your data architecture and designing one that enables your organization to get the most value from your data in a sustainable and scalable way, this book is for you. Whether you're a staff engineer, product manager, or a software engineering leader or executive, you'll find this book useful. Familiarity with basic data engineering principles and practices is assumed.Table of Contents- From Data as a Byproduct to Data as a Product- Data Products- Data Product-Centered Architectures- Identifying Data Products and Prioritizing Developments- Designing and Implementing Data Products- Operating Data Products in Production- Automating Data Product Lifecycle Management(N.B. Please use the Read Sample option to see further chapters)
Intelligent Data Engineering and Automated Learning - Ideal 2024
This two-volume set, LNCS 15346 and LNCS 15347, constitutes the proceedings of the 25th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2024, held in Valencia, Spain, during November 20-22, 2024. The 86 full papers and 6 short papers presented in this book were carefully reviewed and selected from 130 submissions. IDEAL 2024 is focusing on Big Data Analytics and Privacy, Machine Learning & Deep Learning for Real-World Applications, Data Mining and Pattern Recognition, Information Retrieval and Management, Bio and Neuro-Informatics, and Hybrid Intelligent Systems and Agents.
Intelligent Data Engineering and Automated Learning - Ideal 2024
This two-volume set, LNCS 15346 and LNCS 15347, constitutes the proceedings of the 25th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2024, held in Valencia, Spain, during November 20-22, 2024. The 86 full papers and 6 short papers presented in this book were carefully reviewed and selected from 130 submissions. IDEAL 2024 is focusing on Big Data Analytics and Privacy, Machine Learning & Deep Learning for Real-World Applications, Data Mining and Pattern Recognition, Information Retrieval and Management, Bio and Neuro-Informatics, and Hybrid Intelligent Systems and Agents.
The Data Science Handbook
Practical, accessible guide to becoming a data scientist, updated to include the latest advances in data science and related fields. Becoming a data scientist is hard. The job focuses on mathematical tools, but also demands fluency with software engineering, understanding of a business situation, and deep understanding of the data itself. This book provides a crash course in data science, combining all the necessary skills into a unified discipline. The focus of The Data Science Handbook is on practical applications and the ability to solve real problems, rather than theoretical formalisms that are rarely needed in practice. Among its key points are: Readers of the third edition of Construction Graphics will also find: An emphasis on software engineering and coding skills, which play a significant role in most real data science problems. Extensive sample code, detailed discussions of important libraries, and a solid grounding in core concepts from computer science (computer architecture, runtime complexity, programming paradigms, etc.) A broad overview of important mathematical tools, including classical techniques in statistics, stochastic modeling, regression, numerical optimization, and more. Extensive tips about the practical realities of working as a data scientist, including understanding related jobs functions, project life cycles, and the varying roles of data science in an organization. Exactly the right amount of theory. A solid conceptual foundation is required for fitting the right model to a business problem, understanding a tool's limitations, and reasoning about discoveries. Data science is a quickly evolving field, and the 2nd edition has been updated to reflect the latest developments, including the revolution in AI that has come from Large Language Models and the growth of ML Engineering as its own discipline. Much of data science has become a skillset that anybody can have, making this book not only for aspiring data scientists, but also for professionals in other fields who want to use analytics as a force multiplier in their organization.
Knowledge and Systems Sciences
This book constitutes the refereed proceedings of the 23rd International Symposium on Knowledge and Systems Sciences, KSS 2024, held in Hobart, Tasmania, Australia, during November 16-17, 2024. The 23 full papers presented in this book were carefully reviewed and selected from 50 submissions. They are organized in the following topical sections: Complex networks and modeling; Opinion dynamics; Knowledge technologies and systems engineering; Knowledge management.
New Trends in Database and Information Systems
This book constitutes short papers, Doctoral Consortium and Workshops papers which were held in conjunction with the 28th European Conference on New Trends in Databases and Information Systems, ADBIS 2024, which took place in Bayonne, France, during August 28-31, 2024. The total of 28 full papers and 7 short papers presented in this book were carefully reviewed and selected from 103 submissions. They were organized in the following topical sections: Doctoral Consortium; 5th Workshop on Intelligent Data - From Data to Knowledge (DOING 2024); 3rd Workshop on Knowledge Graphs Analysis on a Large Scale (K-GALS 2024); 6th Workshop on Modern Approaches in Data Engineering and Information System Design (MADEISD 2024); 3rd Workshop on Personalization and Recommender Systems (PERS 2024); Access methods and query processing; discovery and data analysis; Machine Learning; large language models; and tutorials.
Data Management
This guide illuminates the intricate relationship between data management, computer architecture, and system software. It traces the evolution of computing to today's data-centric focus and underscores the importance of hardware-software co-design in achieving efficient data processing systems with high throughput and low latency. The thorough coverage includes topics such as logical data formats, memory architecture, GPU programming, and the innovative use of ray tracing in computational tasks. Special emphasis is placed on minimizing data movement within memory hierarchies and optimizing data storage and retrieval. Tailored for professionals and students in computer science, this book combines theoretical foundations with practical applications, making it an indispensable resource for anyone wanting to master the synergies between data management and computing infrastructure.
Big Data - Bigdata 2024
This book constitutes the refereed proceedings of the 13th International Conference on Big Data, BigData 2024, held as part of the Services Conference Federation, SCF 2024, in Bangkok, Thailand, during November 16-19, 2024. The 8 full papers and 1 short paper included in this book were carefully reviewed and selected from 21 submissions. They focus on various topics within the field of Data-based services such as Big Data Architecture, Big Data Modeling, Big Data As A Service, Big Data for Vertical Industries (Government, Healthcare, etc.), Big Data Analytics, Big Data Toolkits, Big Data Open Platforms, Economic Analysis, Big Data for Enterprise Transformation, Big Data in Business Performance Management, Big Data for Business Model Innovations and Analytics, Big Data in Enterprise Management Models and Practices, Big Data in Government Management Models and Practices, and Big Data in Smart Planet Solutions. The papers have been organized under the following topical sections: Research track; Application track; and Short paper track.
Applying Color Theory to Digital Media and Visualization
Applying Color Theory to Digital Media and Visualization provides an overview of the application of color theory concepts to digital media and visualization. It highlights specific color concepts such as color harmony and data color schemes. Examples of generative AI solutions for color scheme suggestion are provided. The usage of these concepts is shown with actual online and mobile tools. Color deficiencies are reviewed, and color tools for examining how a specific color map design will look to someone with the deficiency are discussed. A five-stage colorization process is defined and applied to case study examples.Features: Presents color theory and data color concepts that can be applied to digital media and visualization problems over and over again Offers a comprehensive review of the historical progression of color models Demonstrates actual case study implementations of color analyses tools Provides overview of color theory and harmony analytics in terms of online and mobile analysis tools Teaches the color theory language to use in interacting with color management professionals Unlike many books on color, which examine artists' use of color, color management or color science, this book applies fundamental color concepts to digital media and visualization solutions, and the new edition includes generative AI solutions for color suggestion. A video summary of the chapters by the author can also be found here: https: //www.youtube.com/watch?v=aGDXyTd1UWk. This is the ideal book for digital media and visualization content creators and developers.
Data Science
This book covers the topic of data science in a comprehensive manner and synthesizes both fundamental and advanced topics of a research area that has now reached its maturity. The book starts with the basic concepts of data science. It highlights the types of data and their use and importance, followed by a discussion on a wide range of applications of data science and widely used techniques in data science.Key Features- Provides an internationally respected collection of scientific research methods, technologies and applications in the area of data science.- Presents predictive outcomes by applying data science techniques to real-life applications.- Provides readers with the tools, techniques and cases required to excel with modern artificial intelligence methods.- Gives the reader a variety of intelligent applications that can be designed using data science and its allied fields.The book is aimed primarily at advanced undergraduates and graduates studying machine learning and data science. Researchers and professionals will also find this book useful.
Big Data
Discover the power of big data and learn how to harness its potential in this comprehensive guide. From understanding its impact on businesses and society to tackling the challenges of data privacy and security, this book covers everything you need to know to navigate the world of big data. Packed with real-world examples, practical tips, and insights into emerging trends, this is a must-read for anyone looking to unlock the value of big data.
Applying Color Theory to Digital Media and Visualization
Applying Color Theory to Digital Media and Visualization provides an overview of the application of color theory concepts to digital media and visualization. It highlights specific color concepts such as color harmony and data color schemes. Examples of generative AI solutions for color scheme suggestion are provided. The usage of these concepts is shown with actual online and mobile tools. Color deficiencies are reviewed, and color tools for examining how a specific color map design will look to someone with the deficiency are discussed. A five-stage colorization process is defined and applied to case study examples.Features: Presents color theory and data color concepts that can be applied to digital media and visualization problems over and over again Offers a comprehensive review of the historical progression of color models Demonstrates actual case study implementations of color analyses tools Provides overview of color theory and harmony analytics in terms of online and mobile analysis tools Teaches the color theory language to use in interacting with color management professionals Unlike many books on color, which examine artists' use of color, color management or color science, this book applies fundamental color concepts to digital media and visualization solutions, and the new edition includes generative AI solutions for color suggestion. A video summary of the chapters by the author can also be found here: https: //www.youtube.com/watch?v=aGDXyTd1UWk. This is the ideal book for digital media and visualization content creators and developers.
Machine Learning and Metaheuristic Computation
Learn to bridge the gap between machine learning and metaheuristic methods to solve problems in optimization approaches Few areas of technology have greater potential to revolutionize the globe than artificial intelligence. Two key areas of artificial intelligence, machine learning and metaheuristic computation, have an enormous range of individual and combined applications in computer science and technology. To date, these two complementary paradigms have not always been treated together, despite the potential of a combined approach which maximizes the utility and minimizes the drawbacks of both. Machine Learning and Metaheuristic Computation offers an introduction to both of these approaches and their joint applications. Both a reference text and a course, it is built around the popular Python programming language to maximize utility. It guides the reader gradually from an initial understanding of these crucial methods to an advanced understanding of cutting-edge artificial intelligence tools. The text also provides: Treatment suitable for readers with only basic mathematical training Detailed discussion of topics including dimensionality reduction, clustering methods, differential evolution, and more A rigorous but accessible vision of machine learning algorithms and the most popular approaches of metaheuristic optimization Machine Learning and Metaheuristic Computation is ideal for students, researchers, and professionals looking to combine these vital methods to solve problems in optimization approaches.
Data Science
This three-volume set CCIS 2213-2215 constitutes the refereed proceedings of the 10th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2024, held in Macau, China, during September 27-30, 2024. The 74 full papers and 3 short papers presented in these three volumes were carefully reviewed and selected from 249 submissions. The papers are organized in the following topical sections: Part I: Novel methods or tools used in big data and its applications; applications of data science. Part II: Education research, methods and materials for data science and engine; data security and privacy; big data mining and knowledge management. Part III: Infrastructure for data science; social media and recommendation system; multimedia data management and analysis.
Data Science
This three-volume set CCIS 2213-2215 constitutes the refereed proceedings of the 10th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2024, held in Macau, China, during September 27-30, 2024. The 74 full papers and 3 short papers presented in these three volumes were carefully reviewed and selected from 249 submissions. The papers are organized in the following topical sections: Part I: Novel methods or tools used in big data and its applications; applications of data science. Part II: Education research, methods and materials for data science and engine; data security and privacy; big data mining and knowledge management. Part III: Infrastructure for data science; social media and recommendation system; multimedia data management and analysis.
Data Science
This three-volume set CCIS 2213-2215 constitutes the refereed proceedings of the 10th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2024, held in Macau, China, during September 27-30, 2024. The 74 full papers and 3 short papers presented in these three volumes were carefully reviewed and selected from 249 submissions. The papers are organized in the following topical sections: Part I: Novel methods or tools used in big data and its applications; applications of data science. Part II: Education research, methods and materials for data science and engine; data security and privacy; big data mining and knowledge management. Part III: Infrastructure for data science; social media and recommendation system; multimedia data management and analysis.
Subject-Oriented Business Process Management. Models for Designing Digital Transformations
This book constitutes the refereed post proceedings of the 15th International Conference on Subject-Oriented Business Process Management, S-BPM ONE 2024, held in Weiden, Germany, during May 21-22, 2024. The 14 full papers and 8 short papers included in this book were carefully reviewed and selected from 30 submissions. These papers have been organized in the following topical sections: Processes And Data; Subject-Oriented Modeling, Philosophy, and Technology; Processes and Sustainability; Good Process Practices.
Data Quality Management in the Data Age
This book addresses data quality management for data markets, including foundational quality issues in modern data science. By clarifying the concept of data quality, its impact on real-world applications, and the challenges stemming from poor data quality, it will equip data scientists and engineers with advanced skills in data quality management, with a particular focus on applications within data markets. This will help them create an environment that encourages potential data sellers with high-quality data to join the market, ultimately leading to an improvement in overall data quality. High-quality data, as a novel factor of production, has assumed a pivotal role in driving digital economic development. The acquisition of such data is particularly important for contemporary decision-making models. Data markets facilitate the procurement of high-quality data and thereby enhance the data supply. Consequently, potential data sellers with high-quality data are incentivized to enter the market, an aspect that is particularly relevant in data-scarce domains such as personalized medicine and services. Data scientists have a pivotal role to play in both the intellectual vitality and the practical utility of high-quality data. Moreover, data quality control presents opportunities for data scientists to engage with less structured or ambiguous problems. The book will foster fruitful discussions on the contributions that various scientists and engineers can make to data quality and the further evolution of data markets.
Delta Lake: The Definitive Guide
Ready to simplify the process of building data lakehouses and data pipelines at scale? In this practical guide, learn how Delta Lake is helping data engineers, data scientists, and data analysts overcome key data reliability challenges with modern data engineering and management techniques. Authors Denny Lee, Tristen Wentling, Scott Haines, and Prashanth Babu (with contributions from Delta Lake maintainer R. Tyler Croy) share expert insights on all things Delta Lake--including how to run batch and streaming jobs concurrently and accelerate the usability of your data. You'll also uncover how ACID transactions bring reliability to data lakehouses at scale. This book helps you: Understand key data reliability challenges and how Delta Lake solves them Explain the critical role of Delta transaction logs as a single source of truth Learn the Delta Lake ecosystem with technologies like Apache Flink, Kafka, and Trino Architect data lakehouses with the medallion architecture Optimize Delta Lake performance with features like deletion vectors and liquid clustering
Pandas Cookbook - Third Edition
From fundamental techniques to advanced strategies for handling big data, visualization, and more, this book equips you with skills to excel in real-world data analysis projects.Key Features: - This book targets features in pandas 2.x and beyond- Practical, easy to implement recipes for quick solutions to common problems in data using pandas- Master the fundamentals of pandas to quickly begin exploring any datasetBook Description: The pandas library is massive, and it's common for frequent users to be unaware of many of its more impressive features. The official pandas documentation, while thorough, does not contain many useful examples of how to piece together multiple commands as one would do during an actual analysis. This book guides you, as if you were looking over the shoulder of an expert, through situations that you are highly likely to encounter.With this latest edition unlock the full potential of pandas 2.x onwards. Whether you're a beginner or an experienced data analyst, this book offers a wealth of practical recipes to help you excel in your data analysis projects. This cookbook covers everything from fundamental data manipulation tasks to advanced techniques for handling big data, visualization, and more. Each recipe is designed to address common real-world challenges, providing clear explanations and step-by-step instructions to guide you through the process.Explore cutting-edge topics such as idiomatic pandas coding, efficient handling of large datasets, and advanced data visualization techniques.  Whether you're looking to sharpen or expand your skills, the "Pandas Cookbook" is your essential companion for mastering data analysis and manipulation with pandas 2.x, and beyond.What You Will Learn: - The pandas type system and how to best navigate it- Import/export DataFrames to/from common data formats- Data exploration in pandas through dozens of practice problems- Grouping, aggregation, transformation, reshaping, and filtering data- Merge data from different sources through pandas SQL-like operations- Leverage the robust pandas time series functionality in advanced analyses- Scale pandas operations to get the most out of your system- The large ecosystem that pandas can coordinate with and supplementWho this book is for: This book is for Python developers, data scientists, engineers, and analysts. pandas is the ideal tool for manipulating structured data with Python and this book provides ample instruction and examples. Not only does it cover the basics required to be proficient, but it goes into the details of idiomatic pandasTable of Contents- Pandas Foundations- Selection / Indexing- Pandas data types- Pandas Input/Output- Algorithms and how to apply them- Visualization- Reshaping Dataframes- Groupby- Temporal Data Types and Algorithms- Exploratory Data Analysis - The pandas ecosystem
Building Modern Data Applications Using Databricks Lakehouse
Get up to speed with the Databricks Data Intelligence Platform to build and scale modern data applications, leveraging the latest advancements in data engineeringKey Features: - Learn how to work with real-time data using Delta Live Tables- Unlock insights into the performance of data pipelines using Delta Live Tables- Apply your knowledge to Unity Catalog for robust data security and governance- Purchase of the print or Kindle book includes a free PDF eBookBook Description: With so many tools to choose from in today's data engineering development stack as well as operational complexity, this often overwhelms data engineers, causing them to spend less time gleaning value from their data and more time maintaining complex data pipelines. Guided by a lead specialist solutions architect at Databricks with 10+ years of experience in data and AI, this book shows you how the Delta Live Tables framework simplifies data pipeline development by allowing you to focus on defining input data sources, transformation logic, and output table destinations.This book gives you an overview of the Delta Lake format, the Databricks Data Intelligence Platform, and the Delta Live Tables framework. It teaches you how to apply data transformations by implementing the Databricks medallion architecture and continuously monitor the data quality of your pipelines. You'll learn how to handle incoming data using the Databricks Auto Loader feature and automate real-time data processing using Databricks workflows. You'll master how to recover from runtime errors automatically.By the end of this book, you'll be able to build a real-time data pipeline from scratch using Delta Live Tables, leverage CI/CD tools to deploy data pipeline changes automatically across deployment environments, and monitor, control, and optimize cloud costs.What You Will Learn: - Deploy near-real-time data pipelines in Databricks using Delta Live Tables- Orchestrate data pipelines using Databricks workflows- Implement data validation policies and monitor/quarantine bad data- Apply slowly changing dimensions (SCD), Type 1 and 2, data to lakehouse tables- Secure data access across different groups and users using Unity Catalog- Automate continuous data pipeline deployment by integrating Git with build tools such as Terraform and Databricks Asset BundlesWho this book is for: This book is for data engineers looking to streamline data ingestion, transformation, and orchestration tasks. Data analysts responsible for managing and processing lakehouse data for analysis, reporting, and visualization will also find this book beneficial. Additionally, DataOps/DevOps engineers will find this book helpful for automating the testing and deployment of data pipelines, optimizing table tasks, and tracking data lineage within the lakehouse. Beginner-level knowledge of Apache Spark and Python is needed to make the most out of this book.Table of Contents- An Introduction to Delta Live Tables- Applying Data Transformations Using Delta Live Tables- Managing Data Quality Using Delta Live Tables- Scaling DLT Pipelines- Mastering Data Governance in the Lakehouse with Unity Catalog- Managing Data Locations in Unity Catalog- Viewing Data Lineage Using Unity Catalog- Deploying, Maintaining, and Administrating DLT Pipelines Using Terraform- Leveraging Databricks Asset Bundles to Streamline Data Pipeline Deployment- Monitoring Data Pipelines in Production
Data Engineering in Medical Imaging
This book constitutes the proceedings of the Second MICCAI Workshop on Data Engineering in Medical Imaging, DEMI 2024, held in conjunction with the 27th International conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2024, in Marrakesh, Morocco, on October 10, 2024. The 18 papers presented in this book were carefully reviewed and selected. These papers focus on the application of various Data engineering techniques in the field of Medical Imaging.
Data Engineering
Introduction to Data Engineering: Keep That Shit Flowing is a comprehensive guide that covers the fundamentals, best practices, and advanced topics of data engineering. From building reliable pipelines to handling big data and ensuring data security, this book equips readers with the knowledge and tools needed to excel in the field of data engineering. With real-world case studies and practical techniques, it empowers data professionals to master the art of designing and managing data pipelines, making it an essential resource for anyone looking to thrive in the rapidly evolving world of data engineering.
Elements of Data Science
Elements of Data Science is an introduction to the practical skills of working with data, written for people with no programming experience. Concepts are explained clearly and concisely, and exercises in each chapter demonstrate the real-world use of each feature. * Step-by-Step Approach: Learn how to execute a data science project from start to finish, formulating questions, visualizing data, applying statistical methods, and communicating results. * Practical Python Programming: This book starts with basic Python concepts and builds up to advanced data processing and analysis techniques. * Interactive Learning: Jupyter notebooks are available for each chapter, so readers can follow along, run experiments, and build understanding through hands-on exercises. * Solid Foundation: Explore fundamental concepts such as exploratory data analysis, statistical inference, regression analysis, classification algorithms, and more, all through the lens of real-world case studies. Whether you're a student, a professional, or simply curious about the power of data, Elements of Data Science provides essential tools for finding insights in data.
Computer and Communication Engineering
This book constitutes the proceedings of the 4th International Conference on Computer and Communication Engineering, CCCE 2024, which took place in Oslo, Norway, during May 24-26, 2024. The 19 full papers included in this book were carefully reviewed and selected from 47 submissions. They are organized in topical sections as follows: Intelligent image analysis and multimedia technology; information network and security; digital communication and information systems; and design and implementation of modern information management systems.
Mastering Opentelemetry and Observability
Discover the power of open source observability for your enterprise environment In Mastering Observability and OpenTelemetry: Enhancing Application and Infrastructure Performance and Avoiding Outages, accomplished engineering leader and open source contributor Steve Flanders unlocks the secrets of enterprise application observability with a comprehensive guide to OpenTelemetry (OTel). Explore how OTel transforms observability, providing a robust toolkit for capturing and analyzing telemetry data across your environment. You will learn how OTel delivers unmatched flexibility, extensibility, and vendor neutrality, freeing you from vendor lock-in and enabling data sovereignty and portability. You will also discover: Comprehensive coverage of observability issues and technology: Dive deep into the world of observability and gain a comprehensive understanding of observability fundamentals with practical insights and real-world use cases. Practical guidance: From instrumentation techniques to advanced tracing strategies, gain the skills needed to create highly observable systems. Learn how to deploy and configure OTel, even in challenging brownfield environments, with step-by-step instructions and hands-on exercises. An opportunity for community contributions and communication: Join the OTel community, including end-users, vendors, and cloud providers, and shape the future of observability while connecting with experts and peers. Whether you are a novice or a seasoned professional, Mastering Observability and OpenTelemetry is your roadmap to troubleshooting availability and performance problems by learning to detect anomalies, interpret data, and proactively optimize performance in your enterprise environment. Embark on your journey to observability mastery today!