Note

This is a static copy of a Jupyter notebook.

You can access a live version allowing you to modify and execute the code using Binder.

6.2. Creating Visualizations with Matplotlib

Matplotlib is a “Python 2D plotting library” for creating a wide range of data visualizations. It is very powerful, and that means it can get complicated. Here, we will stick to a few specific tools it provides in order to limit the complexity.


First, some imports.

The first imports one piece of Matplotlib that we will use and names it plt. This is commonly done to make it faster to type (we’ll be using it and thus typing it a lot). This will be our tool for creating and modifying plots.

import matplotlib.pyplot as plt

We’ll also run the following line of code to make the charts look a bit nicer than the defaults in Jupyter. Exactly how these work isn’t terribly important, but if you like you can try changing the numbers and regenerate plots below to see the effects.

# For slightly nicer charts
%config InlineBackend.figure_formats = ['svg']
plt.rcParams['figure.figsize'] = [10, 5]
plt.rcParams['figure.dpi'] = 150

Then, just for generating some example data in this notebook, we import our old friend the random module from Python’s standard library.

import random

6.2.1. Example data for use in charts

The following lists are example data for use in a few demonstration charts.

categories = ['dogs', 'cats', 'narwhals', 'cows', 'koalas']

years = ['2012', '2013', '2014', '2015', '2016']

values = [4, 8, 16, 32, 64]

# Randomly choose 5 values from 0 to 99:
values_rand = random.sample(range(100), 5)

# Make two lists, each with 1000 elements sampled from a
# random gaussian distribution (aka "normal" distribution) centered on 50:
values_rand_long = []
values_rand_long_2 = []
for i in range(1000):
    values_rand_long.append(random.gauss(50, 50))
    values_rand_long_2.append(random.gauss(50, 50))

6.2.2. Line chart

First, we will use Matplotlib (via the plt name we made above when we imported it) to make a line chart.

Note that every line in the following cell is using dot notation to call a method in the plt object.

The methods used here are: - plt.plot() - Add a “data series” (one line) to the chart, given as two arguments: a list of x values and a list of y values of the same length. Call this multiple times to add multiple data series. - plt.xlabel() - Add a label to the x-axis. - plt.ylabel() - Add a label to the y-axis. - plt.legend() - Add a legend to the chart. The argument should be a list of strings, with one string for each data series to give it a label. - plt.title() - Add a title to the chart. - plt.show() - Show (display) the chart that has been generated at this point.

plt.plot(years, values)
plt.plot(years, values_rand)
plt.xlabel("Year")
plt.ylabel("Value")
plt.legend(["Values", "Random Values"])
plt.title('A line chart')
plt.show()
../_images/matplotlib_12_0.svg

6.2.3. Bar chart

Next is the bar chart. Here, we use plt.bar() instead of plt.plot(), and for clarity, we have specified the names of its two arguments, writing them as x=[...] and height=[...]. You can leave off the x= and height= parts, and it will still work, but it often is easier to read and understand when the names are explicitly written like this.

The methods plt.xlabel(), plt.ylabel(), plt.title(), and plt.show() are used again here.

plt.bar(x=categories, height=values_rand)
plt.xlabel("Animals")
plt.ylabel("Value")
plt.title("A bar chart")
plt.show()
../_images/matplotlib_14_0.svg

Or, we can create a horizontal bar chart. This might work better for visualizing some data sets.

Note that we’ve changed plt.bar() to plt.barh(), which takes y and width arguments instead of x and height. We also swapped the x- and y-axis labels.

plt.barh(y=categories, width=values_rand)
plt.xlabel("Value")
plt.ylabel("Animals")
plt.title("A horizontal bar chart")
plt.show()
../_images/matplotlib_16_0.svg

6.2.4. Histogram

Matplotlib can make a histogram from any list of numbers. We’ll use the long list of random values generated before, values_rand_long.

We use the plt.hist() method here, which can take a list of numbers as its argument x. It will automatically create “bins” and count the number of values that fall in each bin to create the plot.

plt.hist(x=values_rand_long)
plt.xlabel("Values")
plt.ylabel("Frequency")
plt.title("A histogram")
plt.show()
../_images/matplotlib_18_0.svg

We can control the number of bins using the bins= argument to the plt.hist() method:

plt.hist(x=values_rand_long, bins=50)
plt.xlabel("Values")
plt.ylabel("Frequency")
plt.title("A histogram with more bins")
plt.show()
../_images/matplotlib_20_0.svg

6.2.5. Scatter plot

To create a scatter plot, we use the plt.scatter() method and provide lists for its x and y arguments containing the x and y values, respectively, for each point in the chart:

plt.scatter(x=values_rand_long, y=values_rand_long_2)
plt.xlabel("Values 1")
plt.ylabel("Values 2")
plt.title("A scatter plot")
plt.show()
../_images/matplotlib_22_0.svg

We might want to make the individual markers somewhat transparent to better see what is going on in a region with a lot of overlapping markers. The alpha=0.25 argument makes each marker 25% opaque (thus 75% transparent). It can range from 0.0 (fully transparent) to 1.0 (fully opaque).

plt.scatter(x=values_rand_long, y=values_rand_long_2, alpha=0.25)
plt.xlabel("Values 1")
plt.ylabel("Values 2")
plt.title("A scatter plot with semi-transparent markers")
plt.show()
../_images/matplotlib_24_0.svg