Note
This is a static copy of a Jupyter notebook.
You can access a live version allowing you to modify and execute the code using Binder.
6.4. Series: Examples¶
I’ve grabbed some data on the estimated population of each U.S. State in 2019 from this website.
We are going to put this data into a series and then work with it a bit using the tools we have introduced in this reading.
6.4.1. Creating a Series¶
# import pandas
import pandas as pd
# lists of state names and populations
state_list = ['California', 'Texas', 'Florida', 'New York', 'Illinois',
'Pennsylvania', 'Ohio', 'Georgia', 'North Carolina', 'Michigan',
'New Jersey', 'Virginia', 'Washington', 'Arizona', 'Massachusetts',
'Tennessee', 'Indiana', 'Missouri', 'Maryland', 'Wisconsin',
'Colorado', 'Minnesota', 'South Carolina', 'Alabama', 'Louisiana',
'Kentucky', 'Oregon', 'Oklahoma', 'Connecticut', 'Utah', 'Iowa',
'Nevada', 'Arkansas', 'Mississippi', 'Kansas', 'New Mexico',
'Nebraska', 'West Virginia', 'Idaho', 'Hawaii', 'New Hampshire',
'Maine', 'Montana', 'Rhode Island', 'Delaware', 'South Dakota',
'North Dakota', 'Alaska', 'DC', 'Vermont', 'Wyoming']
population_list = [39512223, 28995881, 21477737, 19453561, 12671821, 12801989,
11689100, 10617423, 10488084, 9986857, 8882190, 8535519,
7614893, 7278717, 6949503, 6833174, 6732219, 6137428,
6045680, 5822434, 5758736, 5639632, 5148714, 4903185,
4648794, 4467673, 4217737, 3956971, 3565287, 3205958,
3155070, 3080156, 3017825, 2976149, 2913314, 2096829,
1934408, 1792147, 1787065, 1415872, 1359711, 1344212,
1068778, 1059361, 973764, 884659, 762062, 731545, 705749,
623989, 578759,]
# create a series from a list of values and a list of labels
state_series = pd.Series(data=population_list,
index=state_list,
dtype='int64')
type(state_series)
Output:
pandas.core.series.Series
6.4.2. Examining a Series¶
# examine with head
state_series.head()
Output:
California 39512223
Texas 28995881
Florida 21477737
New York 19453561
Illinois 12671821
dtype: int64
# examine with tail
state_series.tail()
Output:
North Dakota 762062
Alaska 731545
DC 705749
Vermont 623989
Wyoming 578759
dtype: int64
# examine values
state_series.values
Output:
array([39512223, 28995881, 21477737, 19453561, 12671821, 12801989,
11689100, 10617423, 10488084, 9986857, 8882190, 8535519,
7614893, 7278717, 6949503, 6833174, 6732219, 6137428,
6045680, 5822434, 5758736, 5639632, 5148714, 4903185,
4648794, 4467673, 4217737, 3956971, 3565287, 3205958,
3155070, 3080156, 3017825, 2976149, 2913314, 2096829,
1934408, 1792147, 1787065, 1415872, 1359711, 1344212,
1068778, 1059361, 973764, 884659, 762062, 731545,
705749, 623989, 578759], dtype=int64)
# examine index
state_series.index
Output:
Index(['California', 'Texas', 'Florida', 'New York', 'Illinois',
'Pennsylvania', 'Ohio', 'Georgia', 'North Carolina', 'Michigan',
'New Jersey', 'Virginia', 'Washington', 'Arizona', 'Massachusetts',
'Tennessee', 'Indiana', 'Missouri', 'Maryland', 'Wisconsin', 'Colorado',
'Minnesota', 'South Carolina', 'Alabama', 'Louisiana', 'Kentucky',
'Oregon', 'Oklahoma', 'Connecticut', 'Utah', 'Iowa', 'Nevada',
'Arkansas', 'Mississippi', 'Kansas', 'New Mexico', 'Nebraska',
'West Virginia', 'Idaho', 'Hawaii', 'New Hampshire', 'Maine', 'Montana',
'Rhode Island', 'Delaware', 'South Dakota', 'North Dakota', 'Alaska',
'DC', 'Vermont', 'Wyoming'],
dtype='object')
6.4.3. Selecting Data in a Series¶
# indexing by label
state_series.loc['Hawaii']
Output:
1415872
# slicing by label, note! includes stop value
state_series.loc['Illinois':'Indiana']
Output:
Illinois 12671821
Pennsylvania 12801989
Ohio 11689100
Georgia 10617423
North Carolina 10488084
Michigan 9986857
New Jersey 8882190
Virginia 8535519
Washington 7614893
Arizona 7278717
Massachusetts 6949503
Tennessee 6833174
Indiana 6732219
dtype: int64
# indexing by location
state_series.iloc[4]
Output:
12671821
# slicing by location, note! does not include stop value
state_series.iloc[1:4]
Output:
Texas 28995881
Florida 21477737
New York 19453561
dtype: int64
6.4.4. Sorting a Series¶
# sort by index
state_series.sort_index().head()
Output:
Alabama 4903185
Alaska 731545
Arizona 7278717
Arkansas 3017825
California 39512223
dtype: int64
# sort by values
state_series.sort_values().head()
Output:
Wyoming 578759
Vermont 623989
DC 705749
Alaska 731545
North Dakota 762062
dtype: int64
6.4.5. Basic Operations with a Series¶
# use .sum() to get the U.S. total population
total_population = state_series.sum()
total_population
Output:
328300544
# create a new series with the percent of pop. for each state
state_series_percent = state_series / total_population
state_series_percent.head()
Output:
California 0.120354
Texas 0.088321
Florida 0.065421
New York 0.059255
Illinois 0.038598
dtype: float64
# convert from decimal to percent and round
state_series_percent = state_series_percent * 100
state_series_percent = state_series_percent.round(2)
state_series_percent.head()
Output:
California 12.04
Texas 8.83
Florida 6.54
New York 5.93
Illinois 3.86
dtype: float64
6.4.6. Small Program with Series¶
# a little program to find the number of large states that account
# for more than 50% of the US population
# sort the values in the series to make sure largest states are at the top
state_series_percent = state_series_percent.sort_values(ascending=False)
# create an empty list that we will add states to
state_50percent_list = []
# initialize a sum that we will add to
percent_sum = 0
# iterate through the values in the index
for state in state_series_percent.index:
# add the current state name to the list
state_50percent_list.append(state)
# add the current state percent to the sum
percent_sum = percent_sum + state_series_percent[state]
# if the sum of the percentages exceed 50% stop the loop
if percent_sum >= 50:
break
print(f'The top {len(state_50percent_list)} U.S. States by population account \
for more than 50% of the U.S. population.')
print('These states include:\n')
for state in state_50percent_list:
print(f'{state} with a population of {state_series[state]}.')
Output:
The top 9 U.S. States by population account for more than 50% of the U.S. population.
These states include:
California with a population of 39512223.
Texas with a population of 28995881.
Florida with a population of 21477737.
New York with a population of 19453561.
Pennsylvania with a population of 12801989.
Illinois with a population of 12671821.
Ohio with a population of 11689100.
Georgia with a population of 10617423.
North Carolina with a population of 10488084.