Ggplot Python

 

Ggplot is a Python implementation of the grammar of graphics. It is not intended to be a feature-for-feature port of `ggplot2 for R `–though there is much greatness in ggplot2, the Python world could stand to benefit from it. Ggplot2 is a part of the tidyverse, an ecosystem of packages designed with common APIs and a shared philosophy. Learn more at tidyverse.org. Developed by Hadley Wickham, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, Dewey Dunnington,. In this R graphics tutorial, we present a gallery of ggplot themes. You’ll learn how to: Change the default ggplot theme by using the list of the standard themes available in ggplot2 R package. Our selection of best ggplot themes for professional publications or presentations, include: themeclassic, thememinimal and themebw.Another famous theme is the dark theme: themedark.

Introduction

Python has a number of powerful plotting libraries to choose from. One of the oldest and most popular is matplotlib - it forms the foundation for many other Python plotting libraries. For this exercise we are going to use plotnine which is a Python implementation of the The Grammar of Graphics, inspired by the interface of the ggplot2 package from R. plotnine (and it's R cousin ggplot2) is a very nice way to create publication quality plots.

The Grammar of Graphics

Statistical graphics is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars)

Faceting can be used to generate the same plot for different subsets of the dataset

These are basic building blocks according to the grammar of graphics:

  • data The data + a set of aesthetic mappings that describing variables mapping
  • geom Geometric objects, represent what you actually see on the plot: points, lines, polygons, etc.
  • stats Statistical transformations, summarise data in many useful ways.
  • scale The scales map values in the data space to values in an aesthetic space
  • coord A coordinate system, describes how data coordinates are mapped to the plane of the graphic.
  • facet A faceting specification describes how to break up the data into subsets for plotting individual set

Let's explore these in detail.

First, install the pandas and plotnine packages to ensure they are available.

Ggplot

Let's set up our working environment with necessary libraries and also load our csv file into data frame called survs_df, Ivy and bean take care of the babysitter pdf free download.

To produce a plot with the ggplot class from plotnine, we must provide three things:

  1. A data frame containing our data.
  2. How the columns of the data frame can be translated into positions, colors, sizes, and shapes of graphical elements ('aesthetics').
  3. The actual graphical elements to display ('geometric objects').

Introduction to plotting

Let's see if we can also include information about species and year.

Notice that we've dropped the x= and y= ? These are implied for the first and second argument of aes().

We can do simple counting plot, to see how many observation (data points) we have for each year for example

Let's now also color by species to see how many observation we have per species in a given year

Challenges

  1. Produce a plot comparing the number of observations for each species at each site. The plot should have site_id on the x axis, ideally as categorical data. (HINT: You can convert a column in a DataFrame df to the 'category' type using: df['some_col_name'] = df['some_col_name'].astype('category'))

  2. Create a boxplot of hindfoot_length across different species (species_id column) (HINT: There's a list of geoms available for plotnine in the docs - instead of geom_bar, which one should you use ?)

More geom types

Why are we not seeing mulitple boxplots, one for each year?This is because year variable is continuous in our data frame, but for this purpose we want it to be categorical.

You'll notice the x-axis labels are overlapped. To flip them 90-degrees we can apply a theme so they look less cluttered. We will revisit themes later.

Ggplot

To save some typing, let's define this x-axis label rotating theme as a short variable name that we can reuse:

To save an image for later:

Challenges

Ggplot Python 3

  1. Can you log2 transform weight and plot a 'normalised' boxplot ? Hint: use np.log2() function and name new column weight_log.

  2. Does a log2 transform make this data visualisation better ?

Faceting

ggplot has a special technique called faceting that allows to split one plotinto multiple plots based on a factor included in the dataset. We will use it tomake one plot for a time series for each species.

The two faceted plots above are probably easier to interpret using the weight_log column we created - give it a try !

The 'Layered Grammar of Graphics'

Theming

plotnine allows pre-defined 'themes' to be applied as aesthetics to the plot.

A list available theme you may want to experiment with is here: https://plotnine.readthedocs.io/en/stable/api.html#themes

Extra bits 1

Ggplot Python Documentation

Let's try to bin years into decades, which could be crude but might gives simple images to look at.

Python Install Ggplot

Extra bits 2

This is a different way to look at your data

Learning ggplot2

If you are new to ggplot2 you are better off starting with a systematic introduction, rather than trying to learn from reading individual documentation pages. Currently, there are three good places to start:

  1. The Data Visualisation and Graphics for communication chapters in R for Data Science. R for Data Science is designed to give you a comprehensive introduction to the tidyverse, and these two chapters will get you up to speed with the essentials of ggplot2 as quickly as possible.

  2. If you’d like to take an online course, try Data Visualization in R With ggplot2 by Kara Woo.

  3. If you want to dive into making common graphics as quickly as possible, I recommend The R Graphics Cookbook by Winston Chang. It provides a set of recipes to solve common graphics problems.

If you’ve mastered the basics and want to learn more, read ggplot2: Elegant Graphics for Data Analysis. It describes the theoretical underpinnings of ggplot2 and shows you how all the pieces fit together. This book helps you understand the theory that underpins ggplot2, and will help you create new types of graphics specifically tailored to your needs. The book is not available for free, but you can find the complete source for the book at https://github.com/hadley/ggplot2-book.