ETL with Azure Cookbook
Explore the latest Azure ETL techniques both on-premises and in the cloud using Azure services such as SQL Server Integration Services (SSIS), Azure Data Factory, and Azure DatabricksKey FeaturesUnderstand the key components of an ETL solution using Azure Integration ServicesDiscover the common and not-so-common challenges faced while creating modern and scalable ETL solutionsProgram and extend your packages to develop efficient data integration and data transformation solutionsBook DescriptionETL is one of the most common and tedious procedures for moving and processing data from one database to another. With the help of this book, you will be able to speed up the process by designing effective ETL solutions using the Azure services available for handling and transforming any data to suit your requirements.With this cookbook, you'll become well versed in all the features of SQL Server Integration Services (SSIS) to perform data migration and ETL tasks that integrate with Azure. You'll learn how to transform data in Azure and understand how legacy systems perform ETL on-premises using SSIS. Later chapters will get you up to speed with connecting and retrieving data from SQL Server 2019 Big Data Clusters, and even show you how to extend and customize the SSIS toolbox using custom-developed tasks and transforms. This ETL book also contains practical recipes for moving and transforming data with Azure services, such as Data Factory and Azure Databricks, and lets you explore various options for migrating SSIS packages to Azure. Toward the end, you'll find out how to profile data in the cloud and automate service creation with Business Intelligence Markup Language (BIML).By the end of this book, you'll have developed the skills you need to create and automate ETL solutions on-premises as well as in Azure.What you will learnExplore ETL and how it is different from ELTMove and transform various data sources with Azure ETL and ELT servicesUse SSIS 2019 with Azure HDInsight clustersDiscover how to query SQL Server 2019 Big Data Clusters hosted in AzureMigrate SSIS solutions to Azure and solve key challenges associated with itUnderstand why data profiling is crucial and how to implement it in Azure DatabricksGet to grips with BIML and learn how it applies to SSIS and Azure Data Factory solutionsWho this book is forThis book is for data warehouse architects, ETL developers, or anyone who wants to build scalable ETL applications in Azure. Those looking to extend their existing on-premise ETL applications to use big data and a variety of Azure services or others interested in migrating existing on-premise solutions to the Azure cloud platform will also find the book useful. Familiarity with SQL Server services is necessary to get the most out of this book.
Learn Quantum Computing with Python and IBM Quantum Experience
A step-by-step guide to learning the implementation and associated methodologies in quantum computing with the help of the IBM Quantum Experience, Qiskit, and Python that will have you up and running and productive in no timeKey featuresDetermine the difference between classical computers and quantum computersUnderstand the quantum computational principles such as superposition and entanglement and how they are leveraged on IBM Quantum Experience systemsRun your own quantum experiments and applications by integrating with QiskitBook DescriptionIBM Quantum Experience is a platform that enables developers to learn the basics of quantum computing by allowing them to run experiments on a quantum computing simulator and a real quantum computer. This book will explain the basic principles of quantum mechanics, the principles involved in quantum computing, and the implementation of quantum algorithms and experiments on IBM's quantum processors.You will start working with simple programs that illustrate quantum computing principles and slowly work your way up to more complex programs and algorithms that leverage quantum computing. As you build on your knowledge, you'll understand the functionality of IBM Quantum Experience and the various resources it offers. Furthermore, you'll not only learn the differences between the various quantum computers but also the various simulators available. Later, you'll explore the basics of quantum computing, quantum volume, and a few basic algorithms, all while optimally using the resources available on IBM Quantum Experience.By the end of this book, you'll learn how to build quantum programs on your own and have gained practical quantum computing skills that you can apply to your business.What you will learnExplore quantum computational principles such as superposition and quantum entanglementBecome familiar with the contents and layout of the IBM Quantum ExperienceUnderstand quantum gates and how they operate on qubitsDiscover the quantum information science kit and its elements such as Terra and AerGet to grips with quantum algorithms such as Bell State, Deutsch-Jozsa, Grover's algorithm, and Shor's algorithmHow to create and visualize a quantum circuitWho this book is forThis book is for Python developers who are looking to learn quantum computing and put their knowledge to use in practical situations with the help of IBM Quantum Experience. Some background in computer science and high-school-level physics and math is required.
Mastering KVM Virtualization - Second Edition
Learn how to configure, automate, orchestrate, troubleshoot, and monitor KVM-based environments capable of scaling to private and hybrid cloud modelsKey FeaturesGain expert insights into Linux virtualization and the KVM ecosystem with this comprehensive guideLearn to use various Linux tools such as QEMU, oVirt, libvirt, Cloud-Init, and Cloudbase-InitScale, monitor, and troubleshoot your VMs on various platforms, including OpenStack and AWSBook DescriptionKernel-based Virtual Machine (KVM) enables you to virtualize your data center by transforming your Linux operating system into a powerful hypervisor that allows you to manage multiple operating systems with minimal fuss. With this book, you'll gain insights into configuring, troubleshooting, and fixing bugs in KVM virtualization and related software.This second edition of Mastering KVM Virtualization is updated to cover the latest developments in the core KVM components - libvirt and QEMU. Starting with the basics of Linux virtualization, you'll explore VM lifecycle management and migration techniques. You'll then learn how to use SPICE and VNC protocols while creating VMs and discover best practices for using snapshots. As you progress, you'll integrate third-party tools with Ansible for automation and orchestration. You'll also learn to scale out and monitor your environments, and will cover oVirt, OpenStack, Eucalyptus, AWS, and ELK stack. Throughout the book, you'll find out more about tools such as Cloud-Init and Cloudbase-Init. Finally, you'll be taken through the performance tuning and troubleshooting guidelines for KVM-based virtual machines and a hypervisor.By the end of this book, you'll be well-versed with KVM virtualization and the tools and technologies needed to build and manage diverse virtualization environments.What you will learnImplement KVM virtualization using libvirt and oVirtDelve into KVM storage and networkUnderstand snapshots, templates, and live migration featuresGet to grips with managing, scaling, and optimizing the KVM ecosystemDiscover how to tune and optimize KVM virtualization hostsAdopt best practices for KVM platform troubleshootingWho this book is forIf you are a systems administrator, DevOps practitioner, or developer with Linux experience looking to sharpen your open-source virtualization skills, this virtualization book is for you. Prior understanding of the Linux command line and virtualization is required before getting started with this book.
Representation Learning for Natural Language Processing
This open access book provides an overview of the recent advances in representation learning theory, algorithms and applications for natural language processing (NLP). It is divided into three parts. Part I presents the representation learning techniques for multiple language entries, including words, phrases, sentences and documents. Part II then introduces the representation techniques for those objects that are closely related to NLP, including entity-based world knowledge, sememe-based linguistic knowledge, networks, and cross-modal entries. Lastly, Part III provides open resource tools for representation learning techniques, and discusses the remaining challenges and future research directions. The theories and algorithms of representation learning presented can also benefit other related domains such as machine learning, social network analysis, semantic Web, information retrieval, data mining and computational biology. This book is intended for advanced undergraduate andgraduate students, post-doctoral fellows, researchers, lecturers, and industrial engineers, as well as anyone interested in representation learning and natural language processing.
Data Teams
Learn how to run successful big data projects, how to resource your teams, and how the teams should work with each other to be cost effective. This book introduces the three teams necessary for successful projects, and what each team does. Most organizations fail with big data projects and the failure is almost always blamed on the technologies used. To be successful, organizations need to focus on both technology and management. Making use of data is a team sport. It takes different kinds of people with different skill sets all working together to get things done. In all but the smallest projects, people should be organized into multiple teams to reduce project failure and underperformance. This book focuses on management. A few years ago, there was little to nothing written or talked about on the management of big data projects or teams. Data Teams shows why management failures are at the root of so many project failures and how to proactively prevent such failures with your project. What You Will Learn Discover the three teams that you will need to be successful with big dataUnderstand what a data scientist is and what a data science team doesUnderstand what a data engineer is and what a data engineering team doesUnderstand what an operations engineer is and what an operations team doesKnow how the teams and titles differ and why you need all three teamsRecognize the role that the business plays in working with data teams and how the rest of the organization contributes to successful data projects Who This Book Is ForManagement, at all levels, including those who possess some technical ability and are about to embark on a big data project or have already started a big data project. It will be especially helpful for those who have projects whichmay be stuck and they do not know why, or who attended a conference or read about big data and are beginning their due diligence on what it will take to put a project in place. This book is also pertinent for leads or technical architects who are: on a team tasked by the business to figure out what it will take to start a project, in a project that is stuck, or need to determine whether there are non-technical problems affecting their project.
Big Data - Bigdata 2020
This book constitutes the proceedings of the 9th International Conference on Big Data, BigData 2020, held as part of SCF 2020, during September 18-20, 2020. The conference was planned to take place in Honolulu, HI, USA and was changed to a virtual format due to the COVID-19 pandemic. The 16 full and 3 short papers presented were carefully reviewed and selected from 52 submissions. The topics covered are Big Data Architecture, Big Data Modeling, Big Data As A Service, Big Data for Vertical Industries (Government, Healthcare, etc.), Big Data Analytics, Big Data Toolkits, Big Data Open Platforms, Economic Analysis, Big Data for Enterprise Transformation, Big Data in Business Performance Management, Big Data for Business Model Innovations and Analytics, Big Data in Enterprise Management Models and Practices, Big Data in Government Management Models and Practices, and Big Data in Smart Planet Solutions.
Nature-Inspired Computation in Data Mining and Machine Learning
Adaptive Improved Flower Pollination Algorithm for Global Optimization.- Algorithms for Optimization and Machine Learning over Cloud.- Implementation of Machine Learning and Data Mining to Improve Cybersecurity and Limit Vulnerabilities to Cyber Attacks.- Comparative analysis of different classifiers on crisis-related tweets: An elaborate study.- An Improved Extreme Learning Machine Tuning by Flower Pollination Algorithm.- Prospects of Machine and Deep Learning in Analysis of Vital Signs for the Improvement of Healthcare Services.
Database and Expert Systems Applications
This volume constitutes the refereed proceedings of the three workshops held at the 31st International Conference on Database and Expert Systems Applications, DEXA 2020, held in September 2020: The 11th International Workshop on Biological Knowledge Discovery from Data, BIOKDD 2020, the 4th International Workshop on Cyber-Security and Functional Safety in Cyber-Physical Systems, IWCFS 2020, the 2nd International Workshop on Machine Learning and Knowledge Graphs, MLKgraphs2019. Due to the COVID-19 pandemic the conference and workshops were held virtually. The 10 papers were thoroughly reviewed and selected from 15 submissions, and discuss a range of topics including: knowledge discovery, biological data, cyber security, cyber-physical system, machine learning, knowledge graphs, information retriever, data base, and artificial intelligence.
Artificial Intelligence for Knowledge Management
This book features a selection of extended papers presented at the 5th IFIP WG 12.6 International Workshop on Artificial Intelligence for Knowledge Management, AI4KM 2017, held in Melbourne, VIC, Australia, in August 2017, in the framework of the International Joint Conference on Artificial Intelligence, IJCAI 2017. The 11 revised and extended papers were carefully reviewed and selected for inclusion in this volume. They present new research and innovative aspects in the field of knowledge management such as machine learning, knowledge models, KM and Web, knowledge capturing and learning, and KM and AI intersections.
Real-Time Linked Dataspaces
1 Real-time Linked Dataspaces: A Data Platform for Intelligent Systems within Internet of Things-based Smart Environments.- 2 Enabling Knowledge Flows in an Intelligent Systems Data Ecosystem.- 3 Dataspaces: Fundamentals, Principles, and Techniques.- 4 Fundamentals of Real-time Linked Dataspaces.- 5 Data Support Services for Real-time Linked Dataspaces.- 6 Catalog and Entity Management Service for Internet of Things-based Smart Environments.- 7 Querying and Searching Heterogeneous Knowledge Graphs in Real-time Linked Dataspaces.- 8 Enhancing the Discovery of Internet of Things-based Data Services in Real-time Linked Dataspaces.- 9 Human-in-the-Loop Tasks for Data Management, Citizen Sensing, and Actuation in Smart Environments.- 10 Stream and Event Processing Services for Real-time Linked Dataspaces.- 11 Quality of Service-Aware Complex Event Service Composition in Real-time Linked Dataspaces.- 12 Dissemination of Internet of Things Streams in a Real-time Linked Dataspace.- 13 Approximate Semantic Event Processing in Real-time Linked Dataspaces.- 14 Enabling Intelligent Systems, Applications, and Analytics for Smart Environments using Real-time Linked Dataspaces.- 15 Autonomic Source Selection for Real-time Predictive Analytics using the Internet of Things and Open Data.- 16 Building Internet of Things-enabled Digital Twins and Intelligent Applications using a Real-time Linked Dataspace.- 17 A Model for Internet of Things Enhanced User Experience in Smart Environments.- 18 Future Research Directions for Dataspaces, Data Ecosystems, and Intelligent Systems.
Scalable Uncertainty Management
This book constitutes the refereed proceedings of the 14th International Conference on Scalable Uncertainty Management, SUM 2020, which was held in Bozen-Bolzano, Italy, in September 2020. The 12 full, 7 short papers presented in this volume were carefully reviewed and selected from 30 submissions. Besides that, the book also contains 2 abstracts of invited talks, 2 tutorial papers, and 2 PhD track papers. The conference aims to gather researchers with a common interest in managing and analyzing imperfect information from a wide range of fields, such as artificial intelligence and machine learning, databases, information retrieval and data mining, the semantic web and risk analysis. Due to the Corona pandemic SUM 2020 was held as an virtual event.
Discrete Mathematics and Graph Theory
This textbook can serve as a comprehensive manual of discrete mathematics and graph theory for non-Computer Science majors; as a reference and study aid for professionals and researchers who have not taken any discrete math course before. It can also be used as a reference book for a course on Discrete Mathematics in Computer Science or Mathematics curricula. The study of discrete mathematics is one of the first courses on curricula in various disciplines such as Computer Science, Mathematics and Engineering education practices. Graphs are key data structures used to represent networks, chemical structures, games etc. and are increasingly used more in various applications such as bioinformatics and the Internet. Graph theory has gone through an unprecedented growth in the last few decades both in terms of theory and implementations; hence it deserves a thorough treatment which is not adequately found in any other contemporary books on discrete mathematics, whereas about 40% of this textbook is devoted to graph theory. The text follows an algorithmic approach for discrete mathematics and graph problems where applicable, to reinforce learning and to show how to implement the concepts in real-world applications.
Quantum Computing Solutions
Know how to use quantum computing solutions involving artificial intelligence (AI) algorithms and applications across different disciplines.Quantum solutions involve building quantum algorithms that improve computational tasks within quantum computing, AI, data science, and machine learning. As opposed to quantum computer innovation, quantum solutions offer automation, cost reduction, and other efficiencies to the problems they tackle.Starting with the basics, this book covers subsystems and properties as well as the information processing network before covering quantum simulators. Solutions such as the Traveling Salesman Problem, quantum cryptography, scheduling, and cybersecurity are discussed in step-by-step detail. The book presents code samples based on real-life problems in a variety of industries, such as risk assessment and fraud detection in banking. In pharma, you will look at drug discovery and protein-folding solutions. Supply chain optimizationand purchasing solutions are presented in the manufacturing domain. In the area of utilities, energy distribution and optimization problems and solutions are explained. Advertising scheduling and revenue optimization solutions are included from media and technology verticals. What You Will Learn Understand the mathematics behind quantum computingKnow the solution benefits, such as automation, cost reduction, and efficienciesBe familiar with the quantum subsystems and properties, including states, protocols, operations, and transformationsBe aware of the quantum classification algorithms: classifiers, and support and sparse support vector machinesUse AI algorithms, including probability, walks, search, deep learning, and parallelism Who This Book Is For Developers in Python and other languages interested inquantum solutions. The secondary audience includes IT professionals and academia in mathematics and physics. A tertiary audience is those in industry verticals such as manufacturing, banking, and pharma.
Towards Interoperable Research Infrastructures for Environmental and Earth Sciences
This open access book summarises the latest developments on data management in the EU H2020 ENVRIplus project, which brought together more than 20 environmental and Earth science research infrastructures into a single community. It provides readers with a systematic overview of the common challenges faced by research infrastructures and how a 'reference model guided' engineering approach can be used to achieve greater interoperability among such infrastructures in the environmental and earth sciences. The 20 contributions in this book are structured in 5 parts on the design, development, deployment, operation and use of research infrastructures. Part one provides an overview of the state of the art of research infrastructure and relevant e-Infrastructure technologies, part two discusses the reference model guided engineering approach, the third part presents the software and tools developed for common data management challenges, the fourth part demonstrates the software via several use cases, and the last part discusses the sustainability and future directions.
Docker for Developers
Learn how to deploy and test Linux-based Docker containers with the help of real-world use casesKey Features Understand how to make a deployment workflow run smoothly with Docker containers Learn Docker and DevOps concepts such as continuous integration and continuous deployment (CI/CD) Gain insights into using various Docker tools and libraries Book Description Docker is the de facto standard for containerizing apps, and with an increasing number of software projects migrating to containers, it is crucial for engineers and DevOps teams to understand how to build, deploy, and secure Docker environments effectively. Docker for Developers will help you understand Docker containers from scratch while taking you through best practices and showing you how to address security concerns. Starting with an introduction to Docker, you'll learn how to use containers and VirtualBox for development. You'll explore how containers work and develop projects within them after you've explored different ways to deploy and run containers. The book will also show you how to use Docker containers in production in both single-host set-ups and in clusters and deploy them using Jenkins, Kubernetes, and Spinnaker. As you advance, you'll get to grips with monitoring, securing, and scaling Docker using tools such as Prometheus and Grafana. Later, you'll be able to deploy Docker containers to a variety of environments, including the cloud-native Amazon Elastic Kubernetes Service (Amazon EKS), before finally delving into Docker security concepts and best practices. By the end of the Docker book, you'll be able to not only work in a container-driven environment confidently but also use Docker for both new and existing projects. What you will learn Get up to speed with creating containers and understand how they work Package and deploy your containers to a variety of platforms Work with containers in the cloud and on the Kubernetes platform Deploy and then monitor the health and logs of running containers Explore best practices for working with containers from a security perspective Become familiar with scanning containers and using third-party security tools and libraries Who this book is for If you're a software engineer new to containerization or a DevOps engineer responsible for deploying Docker containers in the cloud and building DevOps pipelines for container-based projects, you'll find this book useful. This Docker containers book is also a handy reference guide for anyone working with a Docker-based DevOps ecosystem or interested in understanding the security implications and best practices for working in container-driven environments.
The Self-Service Data Roadmap
Data-driven insights are a key competitive advantage for any industry today, but deriving insights from raw data can still take days or weeks. Most organizations can't scale data science teams fast enough to keep up with the growing amounts of data to transform. What's the answer? Self-service data. With this practical book, data engineers, data scientists, and team managers will learn how to build a self-service data science platform that helps anyone in your organization extract insights from data. Sandeep Uttamchandani provides a scorecard to track and address bottlenecks that slow down time to insight across data discovery, transformation, processing, and production. This book bridges the gap between data scientists bottlenecked by engineering realities and data engineers unclear about ways to make self-service work. Build a self-service portal to support data discovery, quality, lineage, and governance Select the best approach for each self-service capability using open source cloud technologies Tailor self-service for the people, processes, and technology maturity of your data platform Implement capabilities to democratize data and reduce time to insight Scale your self-service portal to support a large number of users within your organization
Guide to Intelligent Data ScienceHow to Intelligently Make Use of Real Data
Making use of data is not anymore a niche project but central to almost every project. With access to massive compute resources and vast amounts of data, it seems at least in principle possible to solve any problem. However, successful data science projects result from the intelligent application of: human intuition in combination with computational power; sound background knowledge with computer-aided modelling; and critical reflection of the obtained insights and results.Substantially updating the previous edition, then entitled Guide to Intelligent Data Analysis, this core textbook continues to provide a hands-on instructional approach to many data science techniques, and explains how these are used to solve real world problems. The work balances the practical aspects of applying and using data science techniques with the theoretical and algorithmic underpinnings from mathematics and statistics. Major updates on techniques and subject coverage (including deep learning) are included.Topics and features: guides the reader through the process of data science, following the interdependent steps of project understanding, data understanding, data blending and transformation, modeling, as well as deployment and monitoring; includes numerous examples using the open source KNIME Analytics Platform, together with an introductory appendix; provides a review of the basics of classical statistics that support and justify many data analysis methods, and a glossary of statistical terms; integrates illustrations and case-study-style examples to support pedagogical exposition; supplies further tools and information at an associated website.This practical and systematic textbook/reference is a "need-to-have" tool for graduate and advanced undergraduate students and essential reading for all professionals who face data science problems. Moreover, it is a "need to use, need to keep" resource following one's exploration of thesubject.
Data Analytics for Pandemics
Epidemic trend analysis, timeline progression, prediction, and recommendation are critical for initiating effective public health control strategies, and AI and data analytics play an important role in epidemiology, diagnostic, and clinical fronts. The focus of this book is data analytics for COVID-19, which includes an overview of COVID-19 in terms of epidemic/pandemic, data processing and knowledge extraction. Data sources, storage and platforms are discussed along with discussions on data models, their performance, different big data techniques, tools and technologies. This book also addresses the challenges in applying analytics to pandemic scenarios, case studies and control strategies. Aimed at Data Analysts, Epidemiologists and associated researchers, this book: discusses challenges of AI model for big data analytics in pandemic scenarios; explains how different big data analytics techniques can be implemented; provides a set of recommendations to minimize infection rate of COVID-19; summarizes various techniques of data processing and knowledge extraction; enables users to understand big data analytics techniques required for prediction purposes.
New Trends in Databases and Information Systems
This book constitutes thoroughly refereed short papers of the 24th European Conference on Advances in Databases and Information Systems, ADBIS 2020, held in August 2020. ADBIS 2020 was to be held in Lyon, France, however due to COVID-19 pandemic the conference was held in online format.The 18 presented short research papers were carefully reviewed and selected from 69 submissions. The papers are organized in the following sections: data access and database performance; machine learning; data processing; semantic web; data analytics.
Challenges in Social Network Research
The book includes both invited and contributed chapters dealing with advanced methods and theoretical development for the analysis of social networks and applications in numerous disciplines. Some authors explore new trends related to network measures, multilevel networks and clustering on networks, while other contributions deepen the relationship among statistical methods for data mining and social network analysis. Along with the new methodological developments, the book offers interesting applications to a wide set of fields, ranging from the organizational and economic studies, collaboration and innovation, to the less usual field of poetry. In addition, the case studies are related to local context, showing how the substantive reasoning is fundamental in social network analysis. The list of authors includes both top scholars in the field of social networks and promising young researchers. All chapters passed a double blind review process followed by the guest editors. This edited volume will appeal to students, researchers and professionals.
Digital Forensic Education
In this book, the editors explain how students enrolled in two digital forensic courses at their institution are exposed to experiential learning opportunities, where the students acquire the knowledge and skills of the subject-matter while also learning how to adapt to the ever-changing digital forensic landscape. Their findings (e.g., forensic examination of different IoT devices) are also presented in the book. Digital forensics is a topic of increasing importance as our society becomes "smarter" with more of the "things" around us been internet- and inter-connected (e.g., Internet of Things (IoT) and smart home devices); thus, the increasing likelihood that we will need to acquire data from these things in a forensically sound manner. This book is of interest to both digital forensic educators and digital forensic practitioners, as well as students seeking to learn about digital forensics.
Digital Transformation of Collaboration
This proceedings is focused on the emerging concept of Collaborative Innovation Networks (COINs). COINs are at the core of collaborative knowledge networks, distributed communities taking advantage of the wide connectivity and the support of communication technologies, spanning beyond the organizational perimeter of companies on a global scale. The book presents the refereed conference papers from the 7th International Conference on COINs, October 8-9, 2019, in Warsaw, Poland. It includes papers for both application areas of COINs, (1) optimizing organizational creativity and performance, and (2) discovering and predicting new trends by identifying COINs on the Web through online social media analysis. Papers at COINs19 combine a wide range of interdisciplinary fields such as social network analysis, group dynamics, design and visualization, information systems and the psychology and sociality of collaboration, and intercultural analysis through the lens of online social media. They will cover most recent advances in areas from leadership and collaboration, trend prediction and data mining, to social competence and Internet communication.
Broad Learning Through Fusions
This book offers a clear and comprehensive introduction to broad learning, one of the novel learning problems studied in data mining and machine learning. Broad learning aims at fusing multiple large-scale information sources of diverse varieties together, and carrying out synergistic data mining tasks across these fused sources in one unified analytic. This book takes online social networks as an application example to introduce the latest alignment and knowledge discovery algorithms. Besides the overview of broad learning, machine learning and social network basics, specific topics covered in this book include network alignment, link prediction, community detection, information diffusion, viral marketing, and network embedding.
Smart Data Discovery Using SAS Viya
Gain Powerful Insights with SAS Viya! Whether you are an executive, departmental decision maker, or analyst, the need to leverage data and analytical techniques in order make critical business decisions is now crucial to every part of an organization. Smart Data Discovery with SAS Viya: Powerful Techniques for Deeper Insights provides you with the necessary knowledge and skills to conduct a smart discovery process and empower you to ask more complex questions using your data. The book highlights key components of a smart data discovery process utilizing advanced machine learning techniques, powerful capabilities from SAS Viya, and finally brings it all together using real examples and applications. With its step-by-step approach and integrated examples, the book provides a relevant and practical guide to insight discovery that goes beyond traditional charts and graphs. By showcasing the powerful visual modeling capabilities of SAS Viya, it also opens up the world of advanced analytics and machine learning techniques to a much broader set of audiences.
Hands-On Graph Analytics with Neo4j
Discover how to use Neo4j to identify relationships within complex and large graph datasets using graph modeling, graph algorithms, and machine learningKey Features Get up and running with graph analytics with the help of real-world examples Explore various use cases such as fraud detection, graph-based search, and recommendation systems Get to grips with the Graph Data Science library with the help of examples, and use Neo4j in the cloud for effective application scaling Book Description Neo4j is a graph database that includes plugins to run complex graph algorithms. The book starts with an introduction to the basics of graph analytics, the Cypher query language, and graph architecture components, and helps you to understand why enterprises have started to adopt graph analytics within their organizations. You'll find out how to implement Neo4j algorithms and techniques and explore various graph analytics methods to reveal complex relationships in your data. You'll be able to implement graph analytics catering to different domains such as fraud detection, graph-based search, recommendation systems, social networking, and data management. You'll also learn how to store data in graph databases and extract valuable insights from it. As you become well-versed with the techniques, you'll discover graph machine learning in order to address simple to complex challenges using Neo4j. You will also understand how to use graph data in a machine learning model in order to make predictions based on your data. Finally, you'll get to grips with structuring a web application for production using Neo4j. By the end of this book, you'll not only be able to harness the power of graphs to handle a broad range of problem areas, but you'll also have learned how to use Neo4j efficiently to identify complex relationships in your data. What you will learn Become well-versed with Neo4j graph database building blocks, nodes, and relationships Discover how to create, update, and delete nodes and relationships using Cypher querying Use graphs to improve web search and recommendations Understand graph algorithms such as pathfinding, spatial search, centrality, and community detection Find out different steps to integrate graphs in a normal machine learning pipeline Formulate a link prediction problem in the context of machine learning Implement graph embedding algorithms such as DeepWalk, and use them in Neo4j graphs Who this book is for This book is for data analysts, business analysts, graph analysts, and database developers looking to store and process graph data to reveal key data insights. This book will also appeal to data scientists who want to build intelligent graph applications catering to different domains. Some experience with Neo4j is required.
Data Analysis for Business Decisions
This laboratory manual is intended for business analysts who wish to increase their skills in the use of statistical analysis to support business decisions. Most of the case studies use Excel, today's most common analysis tool. They range from the most basic descriptiveanalytical techniques to more advanced techniques such as linear regression and forecasting. Advanced projects cover inferential statistics for continuous variables (t-Test) and categorical variables (chi-square), as well as A/B testing. The manual ends with techniques to deal with the analysis of text data and tools to manage the analysis of large data sets (Big Data) using Excel. Includes companion files with solution spreadsheets, sample files, data sets, etc. from the book. Features: Teaches the statistical analysis skills needed to support business decisionsProvides projects ranging from the most basic descriptive analytical techniques to more advanced techniques such as linear regression, forecasting, inferential statistics, and analyzing big data setsIncludes companion files with solution spreadsheets, sample files, data sets, etc. used in the book's case studiesThe companion files are available online by emailing the publisher with proof of purchase at info@merclearning.com.
Learning SparkLightning-Fast Data Analytics
Data is bigger, arrives faster, and comes in a variety of formats璽 and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark. Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, you璽 ll be able to: Learn Python, SQL, Scala, or Java high-level Structured APIs Understand Spark operations and SQL Engine Inspect, tune, and debug Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow
Mongodb Topology Design
Create a world-class MongoDB cluster that is scalable, reliable, and secure. Comply with mission-critical regulatory regimes such as the European Union's General Data Protection Regulation (GDPR). Whether you are thinking of migrating to MongoDB or need to meet legal requirements for an existing self-managed cluster, this book has you covered. It begins with the basics of replication and sharding, and quickly scales up to cover everything you need to know to control your data and keep it safe from unexpected data loss or downtime.This book covers best practices for stable MongoDB deployments. For example, a well-designed MongoDB cluster should have no single point of failure. The book covers common use cases when only one or two data centers are available. It goes into detail about creating geopolitical sharding configurations to cover the most stringent data protection regulation compliance. The book also covers different tools and approaches for automating and monitoring a cluster with Kubernetes, Docker, and popular cloud provider containers. What You Will LearnGet started with the basics of MongoDB clustersProtect and monitor a MongoDB deploymentDeepen your expertise around replication and shardingKeep effective backups and plan ahead for disaster recoveryRecognize and avoid problems that can occur in distributed databasesBuild optimal MongoDB deployments within hardware and data center limitationsWho This Book Is ForSolutions architects, DevOps architects and engineers, automation and cloud engineers, and database administrators who are new to MongoDB and distributed databases or who need to scale up simple deployments. This book is a complete guide to planning a deployment for optimal resilience, performance, and scaling, and covers all the details required to meet the new set of data protection regulations such as the GDPR. This book is particularly relevant for large global organizations such as financial and medical institutions, as well as government departments that need to control data in the whole stack and are prohibited from using managed cloud services.
Linear Algebra for Computational Sciences and Engineering
This book presents the main concepts of linear algebra from the viewpoint of applied scientists such as computer scientists and engineers, without compromising on mathematical rigor. Based on the idea that computational scientists and engineers need, in both research and professional life, an understanding of theoretical concepts of mathematics in order to be able to propose research advances and innovative solutions, every concept is thoroughly introduced and is accompanied by its informal interpretation. Furthermore, most of the theorems included are first rigorously proved and then shown in practice by a numerical example. When appropriate, topics are presented also by means of pseudocodes, thus highlighting the computer implementation of algebraic theory.It is structured to be accessible to everybody, from students of pure mathematics who are approaching algebra for the first time to researchers and graduate students in applied sciences who need a theoretical manual of algebra to successfully perform their research. Most importantly, this book is designed to be ideal for both theoretical and practical minds and to offer to both alternative and complementary perspectives to study and understand linear algebra.
Smart Transportation Systems 2019
An Algorithm for Reducing Vehicles' Stop Behind the Bus Pre-Signals.- Estimation Models for the Safety Level of Indoor Space Pedestrian Flows.- Station-Level Hourly Bike Demand Prediction for Dynamic Repositioning in Bike Sharing Systems.- Study of Data-Driven Methods for Vessel Anomaly Detection Based on AIS Data.- Estimation Method of Saturation Flow Rate for Shared Left-Turn Lane at Signalized Intersection, PartⅠ Methodology.- Estimation Method of Saturation Flow Rate for Shared Left-Turn Lane at Signalized Intersection, PartⅡ Case Study.- Safety on Italian Highways, Impacts of the Highway Chauffeur System.- Restorable Robustness Considering Carbon Tax in Weekly Berth and Quay Crane Planning.
Data Science
This two volume set (CCIS 1257 and 1258) constitutes the refereed proceedings of the 6th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2020 held in Taiyuan, China, in September 2020. The 98 papers presented in these two volumes were carefully reviewed and selected from 392 submissions. The papers are organized in topical sections: database, machine learning, network, graphic images, system, natural language processing, security, algorithm, application, and education.
Data Analysis in Bi-Partial Perspective: Clustering and Beyond
This book presents the bi-partial approach to data analysis, which is both uniquely general and enables the development of techniques for many data analysis problems, including related models and algorithms. It is based on adequate representation of the essential clustering problem: to group together the similar, and to separate the dissimilar. This leads to a general objective function and subsequently to a broad class of concrete implementations. Using this basis, a suboptimising procedure can be developed, together with a variety of implementations.This procedure has a striking affinity with the classical hierarchical merger algorithms, while also incorporating the stopping rule, based on the objective function. The approach resolves the cluster number issue, as the solutions obtained include both the content and the number of clusters. Further, it is demonstrated how the bi-partial principle can be effectively applied to a wide variety of problems in data analysis.The book offers a valuable resource for all data scientists who wish to broaden their perspective on basic approaches and essential problems, and to thus find answers to questions that are often overlooked or have yet to be solved convincingly. It is also intended for graduate students in the computer and data sciences, and will complement their knowledge and skills with fresh insights on problems that are otherwise treated in the standard "academic" manner.
Evolutionary Decision Trees in Large-Scale Data Mining
Evolutionary computation.- Decision trees in data mining.- Parallel and distributed computation.- Global induction of univariate trees.- Oblique and mixed decision trees.- Cost-sensitive tree induction.- Multi-test decision trees for gene expression data.- Parallel computations for evolutionary induction.
Concepts and Methods for a Librarian of the Web
The World Wide Web can be considered a huge library that in consequence needs a capable librarian responsible for the classification and retrieval of documents as well as the mediation between library resources and users. Based on this idea, the concept of the "Librarian of the Web" is introduced which comprises novel, librarian-inspired methods and technical solutions to decentrally search for text documents in the web using peer-to-peer technology.The concept's implementation in the form of an interactive peer-to-peer client, called "WebEngine", is elaborated on in detail. This software extends and interconnects common web servers creating a fully integrated, decentralised and self-organising web search system on top of the existing web structure. Thus, the web is turned into its own powerful search engine without the need for any central authority.This book is intended for researchers and practitioners having a solid background in the fields of Information Retrievaland Web Mining.
Advances in the Theory of Probabilistic and Fuzzy Data Scientific Methods with Applications
This book focuses on the advanced soft computational and probabilistic methods that the authors have published over the past few years. It describes theoretical results and applications, and discusses how various uncertainty measures - probability, plausibility and belief measures - can be treated in a unified way. It also examines approximations of four notable probability distributions (Weibull, exponential, logistic and normal) using a unified probability distribution function, and presents a fuzzy arithmetic-based time series model that provides an easy-to-use forecasting technique. Lastly, it proposes flexible fuzzy numbers for Likert scale-based evaluations. Featuring methods that can be successfully applied in a variety of areas, including engineering, economics, biology and the medical sciences, the book offers useful guidelines for practitioners and researchers.
Business Process Management
Business process management is usually treated from two different perspectives: business administration and computer science. While business administration professionals tend to consider information technology as a subordinate aspect in business process management for experts to handle, by contrast computer science professionals often consider business goals and organizational regulations as terms that do not deserve much thought but require the appropriate level of abstraction.Matthias Weske argues that all communities involved need to have a common understanding of the different aspects of business process management. To this end, he details the complete business process lifecycle from the modeling phase to process enactment and improvement, taking into account all different stakeholders involved. After starting with a presentation of general foundations and abstraction models, he explains concepts like process orchestrations and choreographies, as well as process properties and data dependencies. Finally, he presents both traditional and advanced business process management architectures, covering, for example, workflow management systems, service-oriented architectures, and data-driven approaches. In addition, he shows how standards like WfMC, SOAP, WSDL, and BPEL fit into the picture.This textbook is ideally suited for classes on business process management, information systems architecture, and workflow management. This 3rd edition contains a new chapter on business decision modelling, covering the Decision Model and Notation (DMN) standard; the chapter on process choreographies has been streamlined, and numerous clarifications have been fetched throughout the book. The accompanying website www.bpm-book.com contains further information and additional teaching material.
Large-Scale Disk Failure PredictionPakdd 2020 Competition and Workshop, AI Ops 2020, February 7 - May 15, 2020, Revised Selected Papers
This book constitutes the thoroughly refereed post-competition proceedings of the AI Ops Competition on Large-Scale Disk Failure Prediction, conducted between February 7th and May 15, 2020 on the Alibaba Cloud Tianchi Platform. A dedicated workshop, featuring the best performing teams of the competition, was held at the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2020, in Singapore, in April 2019. Due to the COVID-19 pandemic, the workshop was hosted online. This book includes 13 selected contributions: an introduction to dataset, selected approaches of the competing teams and the competition summary, describing the competition task, practical challenges, evaluation metrics, etc.
97 Things about Ethics Everyone in Data Science Should Know
Most of the high-profile cases of real or perceived unethical activity in data science aren't matters of bad intent. Rather, they occur because the ethics simply aren't thought through well enough. Being ethical takes constant diligence, and in many situations identifying the right choice can be difficult. In this in-depth book, contributors from top companies in technology, finance, and other industries share experiences and lessons learned from collecting, managing, and analyzing data ethically. Data science professionals, managers, and tech leaders will gain a better understanding of ethics through powerful, real-world best practices. Articles include: Ethics Is Not a Binary Concept: Jim Wilson How to Approach Ethical Transparency: Rado Kotorov Unbiased & Fair: Doug Hague Rules and Rational:: Christof Wolf Brenner The Truth About AI: Cassie Kozyrkov Cautionary Ethics Tales: Sherrill Hayes Fairness in the Age of Algorithms: Anna Jacobson The Ethical Data Storyteller: Brent Dykes Introducing Ethics: the Fully AI-Driven Cloud-Based Ethics Solution: Brian O'Neill Be Careful with Decisions of the Heart: Hugh Watson Understanding Passive Versus Proactive Ethics: Bill Schmarzo
Computer Information Systems and Industrial Management
This book constitutes the proceedings of the 19th International Conference on Computer Information Systems and Industrial Management Applications, CISIM 2020, held in Bialystok, Poland, in October 2020. Due to the COVID-19 pandemic the conference has been postponed to October 2020. The 40 full papers presented together with 5 abstracts of keynotes were carefully reviewed and selected from 62 submissions. The main topics covered by the chapters in this book are biometrics, security systems, multimedia, classification and clustering, industrial management. Besides these, the reader will find interesting papers on computer information systems as applied to wireless networks, computer graphics, and intelligent systems. The papers are organized in the following topical sections: biometrics and pattern recognition applications; computer information systems and security; industrial management and other applications; machine learning and high performance computing; modelling and optimization.
Complex Event Processing
Eine wichtige Aufgabe f羹r die IT der vernetzten Welt ist die maschinelle Auswertung und Verarbeitung von Informationen, die f羹r eine Anwendung relevant sind und 羹bers Netz verschickt werden. Mit Complex Event Processing (CEP) k繹nnen gro?e Mengen von zeitbehafteten Daten unterschiedlichster Art in nahezu Echtzeit analysiert und weiterverarbeitet werden. Die grundlegende Vorgehensweise beim CEP entspricht der menschlichen Entscheidungsfindung in Prozessabl瓣ufen des t瓣glichen Lebens und stellt eine Erweiterung bekannter Methoden des Data Analytics wie Data Mining, statistische Analyse oder regelbasierte Wissensverarbeitung dar. Typische Anwendungsgebiete sind Big-Data-Systeme, Internet of Things, Industrie 4.0.
Smart Data Discovery Using SAS Viya
Gain Powerful Insights with SAS Viya!Whether you are an executive, departmental decision maker, or analyst, the need to leverage data and analytical techniques in order make critical business decisions is now crucial to every part of an organization. Smart Data Discovery with SAS Viya: Powerful Techniques for Deeper Insights provides you with the necessary knowledge and skills to conduct a smart discovery process and empower you to ask more complex questions using your data. The book highlights key components of a smart data discovery process utilizing advanced machine learning techniques, powerful capabilities from SAS Viya, and finally brings it all together using real examples and applications.With its step-by-step approach and integrated examples, the book provides a relevant and practical guide to insight discovery that goes beyond traditional charts and graphs. By showcasing the powerful visual modeling capabilities of SAS Viya, it also opens up the world of advanced analytics and machine learning techniques to a much broader set of audiences.
Become Itil(r) 4 Foundation Certified in 7 Days
Use this guide book in its fully updated second edition to study for the ITIL 4 Foundation certification exam. Know the latest ITIL framework and DevOps concepts.The book will take you through the new ITIL framework and nuances of the DevOps methodology. The book follows the topics included in the foundation certification exam syllabus and includes new sections on ITIL's guiding principles, service value chain, and the four dimensions of service management. Also included are the concepts, processes, and philosophies used in DevOps programs and projects. ITIL and DevOps concepts are explained with relevant examples.By the time you finish this book, you will have a complete understanding of ITIL 4 and will be ready to take the ITIL 4 Foundation certification exam. You will know the DevOps methodology and how ITIL reinforces the philosophy of shared responsibility and collaboration. Over the course of a week, even while working your day job, you will be prepared to take the exam. What You Will LearnKnow the basics of ITIL as you prepare for the ITIL Foundation certification examUnderstand ITIL through examplesBe aware of ITIL's relevance to DevOps and DevOps conceptsWho This Book Is ForProfessionals from the IT services industry
Multimedia Technology and Enhanced Learning
This two-volume book constitutes the refereed proceedings of the Second International Conference on Multimedia Technology and Enhanced Learning, ICMTEL 2020, held in Leicester, United Kingdom, in April 2020. Due to the COVID-19 pandemic all papers were presented in YouTubeLive. The 83 revised full papers have been selected from 158 submissions. They describe new learning technologies which range from smart school, smart class and smart learning at home and which have been developed from new technologies such as machine learning, multimedia and Internet of Things.
Data Analytics For Beginners
Have you ever been asked to analyze data in your job but not understood what you were doing? Now is the time to change that! In this book, you are going to learn The risks of data analysis The benefits of data analysis Terms you are going to use And so much more. So, now is the time to dive in and begin to advance your knowledge of data and how you are going to use it. In the end, you are going to be able to use it to be more efficient in your job.
A Practical Guide to Database Design
Fully updated and expanded from the previous edition, A Practical Guide to Database Design, Second Edition is intended for those involved in the design or development of a database system or application. It begins by illustrating how to develop a Third Normal Form data model where data is placed "where it belongs". The reader is taken step-by-step through the Normalization process, first using a simple then a more complex set of data requirements. Next, usage analysis for each Logical Data Model is reviewed and a Physical Data Model is produced that will satisfy user performance requirements. Finally, each Physical Data Model is used as input to create databases using both Microsoft Access and SQL Server.The book next shows how to use an industry-leading data modeling tool to define and manage logical and physical data models, and how to create Data Definition Language statements to create or update a database running in SQL Server, Oracle, or other type of DBMS.One chapter is devoted to illustrating how Microsoft Access can be used to create user interfaces to review and update underlying tables in that database as well as tables residing in SQL Server or Oracle.For users involved with Cyber activity or support, one chapter illustrates how to extract records of interest from a log file using PERL, then shows how to load these extracted records into one or more SQL Server "tracking" tables adding status flags for analysts to use when reviewing activity of interest. These status flags are used to flag/mark collected records as "Reviewed", "Pending" (currently being analyzed) and "Resolved". The last chapter then shows how to build a web-based GUI using PHP to query these tracking tables and allow an analyst to review new activity, flag items that need to be investigated, and finally flag items that have been investigated and resolved. Note that the book has complete code/scripts for both PERL and the PHP GUI.
Social Networks with Rich Edge Semantics
Social Networks with Rich Edge Semantics introduces a new mechanism for representing social networks in which pairwise relationships can be drawn from a range of realistic possibilities, including different types of relationships, different strengths in the directions of a pair, positive and negative relationships, and relationships whose intensities change with time. For each possibility, the book shows how to model the social network using spectral embedding. It also shows how to compose the techniques so that multiple edge semantics can be modeled together, and the modeling techniques are then applied to a range of datasets.Features Introduces the reader to difficulties with current social network analysis, and the need for richer representations of relationships among nodes, including accounting for intensity, direction, type, positive/negative, and changing intensities over time Presents a novel mechanism to allow social networks with qualitatively different kinds of relationships to be described and analyzed Includes extensions to the important technique of spectral embedding, shows that they are mathematically well motivated and proves that their results are appropriate Shows how to exploit embeddings to understand structures within social networks, including subgroups, positional significance, link or edge prediction, consistency of role in different contexts, and net flow of properties through a node Illustrates the use of the approach for real-world problems for online social networks, criminal and drug smuggling networks, and networks where the nodes are themselves groups Suitable for researchers and students in social network research, data science, statistical learning, and related areas, this book will help to provide a deeper understanding of real-world social networks.
Distributed Tracing in PracticeInstrumenting, Analyzing, and Debugging Microservices
Since most applications today are distributed in some fashion, monitoring their health and performance requires a new approach. Enter distributed tracing, a method of profiling and monitoring distributed applications--particularly those that use microservice architectures. There's just one problem: distributed tracing can be hard. But it doesn't have to be. With this guide, you'll learn what distributed tracing is and how to use it to understand the performance and operation of your software. Key players at LightStep and other organizations walk you through instrumenting your code for tracing, collecting the data that your instrumentation produces, and turning it into useful operational insights. If you want to implement distributed tracing, this book tells you what you need to know. You'll learn: The pieces of a distributed tracing deployment: instrumentation, data collection, and analysis Best practices for instrumentation: methods for generating trace data from your services How to deal with (or avoid) overhead using sampling and other techniques How to use distributed tracing to improve baseline performance and to mitigate regressions quickly Where distributed tracing is headed in the future