Big Data Analysis with Python

更新时间：2021-06-11 13:46:55

封面

版权页

Preface

Chapter 1 The Python Data Science Stack

Introduction

Python Libraries and Packages

Using Pandas

Data Type Conversion

Aggregation and Grouping

Exporting Data from Pandas

Visualization with Pandas

Summary

Chapter 2 Statistical Visualizations

Introduction

Types of Graphs and When to Use Them

Components of a Graph

Seaborn

Which Tool Should Be Used?

Types of Graphs

Pandas DataFrames and Grouped Data

Changing Plot Design: Modifying Graph Components

Exporting Graphs

Summary

Chapter 3 Working with Big Data Frameworks

Introduction

Hadoop

Spark

Writing Parquet Files

Handling Unstructured Data

Summary

Chapter 4 Diving Deeper with Spark

Introduction

Getting Started with Spark DataFrames

Writing Output from Spark DataFrames

Exploring Spark DataFrames

Data Manipulation with Spark DataFrames

Graphs in Spark

Summary

Chapter 5 Handling Missing Values and Correlation Analysis

Introduction

Setting up the Jupyter Notebook

Missing Values

Handling Missing Values in Spark DataFrames

Correlation

Summary

Chapter 6 Exploratory Data Analysis

Introduction

Defining a Business Problem

Translating a Business Problem into Measurable Metrics and Exploratory Data Analysis (EDA)

Structured Approach to the Data Science Project Life Cycle

Summary

Chapter 7 Reproducibility in Big Data Analysis

Introduction

Reproducibility with Jupyter Notebooks

Gathering Data in a Reproducible Way

Code Practices and Standards

Avoiding Repetition

Summary

Chapter 8 Creating a Full Analysis Report

Introduction

Reading Data in Spark from Different Data Sources

SQL Operations on a Spark DataFrame

Generating Statistical Measurements

Summary

Appendix

Chapter 01: The Python Data Science Stack

Chapter 02: Statistical Visualizations Using Matplotlib and Seaborn

Chapter 03: Working with Big Data Frameworks

Chapter 04: Diving Deeper with Spark

Chapter 05: Missing Value Handling and Correlation Analysis in Spark

Chapter 6: Business Process Definition and Exploratory Data Analysis

Chapter 07: Reproducibility in Big Data Analysis

Chapter 08: Creating a Full Analysis Report

更新时间：2021-06-11 13:46:55