更新时间:2021-06-11 13:46:55
封面
版权页
Preface
Chapter 1 The Python Data Science Stack
Introduction
Python Libraries and Packages
Using Pandas
Data Type Conversion
Aggregation and Grouping
Exporting Data from Pandas
Visualization with Pandas
Summary
Chapter 2 Statistical Visualizations
Types of Graphs and When to Use Them
Components of a Graph
Seaborn
Which Tool Should Be Used?
Types of Graphs
Pandas DataFrames and Grouped Data
Changing Plot Design: Modifying Graph Components
Exporting Graphs
Chapter 3 Working with Big Data Frameworks
Hadoop
Spark
Writing Parquet Files
Handling Unstructured Data
Chapter 4 Diving Deeper with Spark
Getting Started with Spark DataFrames
Writing Output from Spark DataFrames
Exploring Spark DataFrames
Data Manipulation with Spark DataFrames
Graphs in Spark
Chapter 5 Handling Missing Values and Correlation Analysis
Setting up the Jupyter Notebook
Missing Values
Handling Missing Values in Spark DataFrames
Correlation
Chapter 6 Exploratory Data Analysis
Defining a Business Problem
Translating a Business Problem into Measurable Metrics and Exploratory Data Analysis (EDA)
Structured Approach to the Data Science Project Life Cycle
Chapter 7 Reproducibility in Big Data Analysis
Reproducibility with Jupyter Notebooks
Gathering Data in a Reproducible Way
Code Practices and Standards
Avoiding Repetition
Chapter 8 Creating a Full Analysis Report
Reading Data in Spark from Different Data Sources
SQL Operations on a Spark DataFrame
Generating Statistical Measurements
Appendix
Chapter 01: The Python Data Science Stack
Chapter 02: Statistical Visualizations Using Matplotlib and Seaborn
Chapter 03: Working with Big Data Frameworks
Chapter 04: Diving Deeper with Spark
Chapter 05: Missing Value Handling and Correlation Analysis in Spark
Chapter 6: Business Process Definition and Exploratory Data Analysis
Chapter 07: Reproducibility in Big Data Analysis
Chapter 08: Creating a Full Analysis Report