Note

This is a static copy of a Jupyter notebook.

You can access a live version allowing you to modify and execute the code using Binder.

6.4. Series: Examples

I’ve grabbed some data on the estimated population of each U.S. State in 2019 from this website.

We are going to put this data into a series and then work with it a bit using the tools we have introduced in this reading.

6.4.1. Creating a Series

# import pandas
import pandas as pd

# lists of state names and populations
state_list = ['California', 'Texas', 'Florida', 'New York', 'Illinois',
              'Pennsylvania', 'Ohio', 'Georgia', 'North Carolina', 'Michigan',
              'New Jersey', 'Virginia', 'Washington', 'Arizona', 'Massachusetts',
              'Tennessee', 'Indiana', 'Missouri', 'Maryland', 'Wisconsin',
              'Colorado', 'Minnesota', 'South Carolina', 'Alabama', 'Louisiana',
              'Kentucky', 'Oregon', 'Oklahoma', 'Connecticut', 'Utah', 'Iowa',
              'Nevada', 'Arkansas', 'Mississippi', 'Kansas', 'New Mexico',
              'Nebraska', 'West Virginia', 'Idaho', 'Hawaii', 'New Hampshire',
              'Maine', 'Montana', 'Rhode Island', 'Delaware', 'South Dakota',
              'North Dakota', 'Alaska', 'DC', 'Vermont', 'Wyoming']

population_list = [39512223, 28995881, 21477737, 19453561, 12671821, 12801989,
                   11689100, 10617423, 10488084, 9986857, 8882190, 8535519,
                   7614893, 7278717, 6949503, 6833174, 6732219, 6137428,
                   6045680, 5822434, 5758736, 5639632, 5148714, 4903185,
                   4648794, 4467673, 4217737, 3956971, 3565287, 3205958,
                   3155070, 3080156, 3017825, 2976149, 2913314, 2096829,
                   1934408, 1792147, 1787065, 1415872, 1359711, 1344212,
                   1068778, 1059361, 973764, 884659, 762062, 731545, 705749,
                   623989, 578759,]

# create a series from a list of values and a list of labels
state_series = pd.Series(data=population_list,
                         index=state_list,
                         dtype='int64')

type(state_series)

Output:

pandas.core.series.Series

6.4.2. Examining a Series

# examine with head
state_series.head()

Output:

California    39512223
Texas         28995881
Florida       21477737
New York      19453561
Illinois      12671821
dtype: int64
# examine with tail
state_series.tail()

Output:

North Dakota    762062
Alaska          731545
DC              705749
Vermont         623989
Wyoming         578759
dtype: int64
# examine values
state_series.values

Output:

array([39512223, 28995881, 21477737, 19453561, 12671821, 12801989,
       11689100, 10617423, 10488084,  9986857,  8882190,  8535519,
        7614893,  7278717,  6949503,  6833174,  6732219,  6137428,
        6045680,  5822434,  5758736,  5639632,  5148714,  4903185,
        4648794,  4467673,  4217737,  3956971,  3565287,  3205958,
        3155070,  3080156,  3017825,  2976149,  2913314,  2096829,
        1934408,  1792147,  1787065,  1415872,  1359711,  1344212,
        1068778,  1059361,   973764,   884659,   762062,   731545,
         705749,   623989,   578759], dtype=int64)
# examine index
state_series.index

Output:

Index(['California', 'Texas', 'Florida', 'New York', 'Illinois',
       'Pennsylvania', 'Ohio', 'Georgia', 'North Carolina', 'Michigan',
       'New Jersey', 'Virginia', 'Washington', 'Arizona', 'Massachusetts',
       'Tennessee', 'Indiana', 'Missouri', 'Maryland', 'Wisconsin', 'Colorado',
       'Minnesota', 'South Carolina', 'Alabama', 'Louisiana', 'Kentucky',
       'Oregon', 'Oklahoma', 'Connecticut', 'Utah', 'Iowa', 'Nevada',
       'Arkansas', 'Mississippi', 'Kansas', 'New Mexico', 'Nebraska',
       'West Virginia', 'Idaho', 'Hawaii', 'New Hampshire', 'Maine', 'Montana',
       'Rhode Island', 'Delaware', 'South Dakota', 'North Dakota', 'Alaska',
       'DC', 'Vermont', 'Wyoming'],
      dtype='object')

6.4.3. Selecting Data in a Series

# indexing by label
state_series.loc['Hawaii']

Output:

1415872
# slicing by label, note! includes stop value
state_series.loc['Illinois':'Indiana']

Output:

Illinois          12671821
Pennsylvania      12801989
Ohio              11689100
Georgia           10617423
North Carolina    10488084
Michigan           9986857
New Jersey         8882190
Virginia           8535519
Washington         7614893
Arizona            7278717
Massachusetts      6949503
Tennessee          6833174
Indiana            6732219
dtype: int64
# indexing by location
state_series.iloc[4]

Output:

12671821
# slicing by location, note! does not include stop value
state_series.iloc[1:4]

Output:

Texas       28995881
Florida     21477737
New York    19453561
dtype: int64

6.4.4. Sorting a Series

# sort by index
state_series.sort_index().head()

Output:

Alabama        4903185
Alaska          731545
Arizona        7278717
Arkansas       3017825
California    39512223
dtype: int64
# sort by values
state_series.sort_values().head()

Output:

Wyoming         578759
Vermont         623989
DC              705749
Alaska          731545
North Dakota    762062
dtype: int64

6.4.5. Basic Operations with a Series

# use .sum() to get the U.S. total population
total_population = state_series.sum()
total_population

Output:

328300544
# create a new series with the percent of pop. for each state
state_series_percent = state_series / total_population
state_series_percent.head()

Output:

California    0.120354
Texas         0.088321
Florida       0.065421
New York      0.059255
Illinois      0.038598
dtype: float64
# convert from decimal to percent and round
state_series_percent = state_series_percent * 100
state_series_percent = state_series_percent.round(2)
state_series_percent.head()

Output:

California    12.04
Texas          8.83
Florida        6.54
New York       5.93
Illinois       3.86
dtype: float64

6.4.6. Small Program with Series

# a little program to find the number of large states that account
# for more than 50% of the US population

# sort the values in the series to make sure largest states are at the top
state_series_percent = state_series_percent.sort_values(ascending=False)

# create an empty list that we will add states to
state_50percent_list = []

# initialize a sum that we will add to
percent_sum = 0

# iterate through the values in the index
for state in state_series_percent.index:

  # add the current state name to the list
  state_50percent_list.append(state)

  # add the current state percent to the sum
  percent_sum = percent_sum + state_series_percent[state]

  # if the sum of the percentages exceed 50% stop the loop
  if percent_sum >= 50:
    break


print(f'The top {len(state_50percent_list)} U.S. States by population account \
for more than 50% of the U.S. population.')

print('These states include:\n')
for state in state_50percent_list:
  print(f'{state} with a population of {state_series[state]}.')

Output:

The top 9 U.S. States by population account for more than 50% of the U.S. population.
These states include:

California with a population of 39512223.
Texas with a population of 28995881.
Florida with a population of 21477737.
New York with a population of 19453561.
Pennsylvania with a population of 12801989.
Illinois with a population of 12671821.
Ohio with a population of 11689100.
Georgia with a population of 10617423.
North Carolina with a population of 10488084.