Note

This is a static copy of a Jupyter notebook.

You can access a live version allowing you to modify and execute the code using Binder.

6.4. Series: Examples

I’ve grabbed some data on the estimated population of each U.S. State in 2019 from this website.

We are going to put this data into a series and then work with it a bit using the tools we have introduced in this reading.

6.4.1. Creating a Series

# import pandas
import pandas as pd

# lists of state names and populations
state_list = ['California', 'Texas', 'Florida', 'New York', 'Illinois',
              'Pennsylvania', 'Ohio', 'Georgia', 'North Carolina', 'Michigan',
              'New Jersey', 'Virginia', 'Washington', 'Arizona', 'Massachusetts',
              'Tennessee', 'Indiana', 'Missouri', 'Maryland', 'Wisconsin',
              'Colorado', 'Minnesota', 'South Carolina', 'Alabama', 'Louisiana',
              'Kentucky', 'Oregon', 'Oklahoma', 'Connecticut', 'Utah', 'Iowa',
              'Nevada', 'Arkansas', 'Mississippi', 'Kansas', 'New Mexico',
              'Nebraska', 'West Virginia', 'Idaho', 'Hawaii', 'New Hampshire',
              'Maine', 'Montana', 'Rhode Island', 'Delaware', 'South Dakota',
              'North Dakota', 'Alaska', 'DC', 'Vermont', 'Wyoming']

population_list = [39512223, 28995881, 21477737, 19453561, 12671821, 12801989,
                   11689100, 10617423, 10488084, 9986857, 8882190, 8535519,
                   7614893, 7278717, 6949503, 6833174, 6732219, 6137428,
                   6045680, 5822434, 5758736, 5639632, 5148714, 4903185,
                   4648794, 4467673, 4217737, 3956971, 3565287, 3205958,
                   3155070, 3080156, 3017825, 2976149, 2913314, 2096829,
                   1934408, 1792147, 1787065, 1415872, 1359711, 1344212,
                   1068778, 1059361, 973764, 884659, 762062, 731545, 705749,
                   623989, 578759,]

# create a series from a list of values and a list of labels
state_series = pd.Series(data=population_list,
                         index=state_list,
                         dtype='int64')

type(state_series)
pandas.core.series.Series

6.4.2. Examining a Series

# examine with head
state_series.head()
California    39512223
Texas         28995881
Florida       21477737
New York      19453561
Illinois      12671821
dtype: int64
# examine with tail
state_series.tail()
North Dakota    762062
Alaska          731545
DC              705749
Vermont         623989
Wyoming         578759
dtype: int64
# examine values
state_series.values
array([39512223, 28995881, 21477737, 19453561, 12671821, 12801989,
       11689100, 10617423, 10488084,  9986857,  8882190,  8535519,
        7614893,  7278717,  6949503,  6833174,  6732219,  6137428,
        6045680,  5822434,  5758736,  5639632,  5148714,  4903185,
        4648794,  4467673,  4217737,  3956971,  3565287,  3205958,
        3155070,  3080156,  3017825,  2976149,  2913314,  2096829,
        1934408,  1792147,  1787065,  1415872,  1359711,  1344212,
        1068778,  1059361,   973764,   884659,   762062,   731545,
         705749,   623989,   578759])
# examine index
state_series.index
Index(['California', 'Texas', 'Florida', 'New York', 'Illinois',
       'Pennsylvania', 'Ohio', 'Georgia', 'North Carolina', 'Michigan',
       'New Jersey', 'Virginia', 'Washington', 'Arizona', 'Massachusetts',
       'Tennessee', 'Indiana', 'Missouri', 'Maryland', 'Wisconsin', 'Colorado',
       'Minnesota', 'South Carolina', 'Alabama', 'Louisiana', 'Kentucky',
       'Oregon', 'Oklahoma', 'Connecticut', 'Utah', 'Iowa', 'Nevada',
       'Arkansas', 'Mississippi', 'Kansas', 'New Mexico', 'Nebraska',
       'West Virginia', 'Idaho', 'Hawaii', 'New Hampshire', 'Maine', 'Montana',
       'Rhode Island', 'Delaware', 'South Dakota', 'North Dakota', 'Alaska',
       'DC', 'Vermont', 'Wyoming'],
      dtype='object')

6.4.3. Selecting Data in a Series

# indexing by label
state_series.loc['Hawaii']
1415872
# slicing by label, note! includes stop value
state_series.loc['Illinois':'Indiana']
Illinois          12671821
Pennsylvania      12801989
Ohio              11689100
Georgia           10617423
North Carolina    10488084
Michigan           9986857
New Jersey         8882190
Virginia           8535519
Washington         7614893
Arizona            7278717
Massachusetts      6949503
Tennessee          6833174
Indiana            6732219
dtype: int64
# indexing by location
state_series.iloc[4]
12671821
# slicing by location, note! does not include stop value
state_series.iloc[1:4]
Texas       28995881
Florida     21477737
New York    19453561
dtype: int64

6.4.4. Sorting a Series

# sort by index
state_series.sort_index().head()
Alabama        4903185
Alaska          731545
Arizona        7278717
Arkansas       3017825
California    39512223
dtype: int64
# sort by values
state_series.sort_values().head()
Wyoming         578759
Vermont         623989
DC              705749
Alaska          731545
North Dakota    762062
dtype: int64

6.4.5. Basic Operations with a Series

# use .sum() to get the U.S. total population
total_population = state_series.sum()
total_population
328300544
# create a new series with the percent of pop. for each state
state_series_percent = state_series / total_population
state_series_percent.head()
California    0.120354
Texas         0.088321
Florida       0.065421
New York      0.059255
Illinois      0.038598
dtype: float64
# convert from decimal to percent and round
state_series_100 = state_series_percent * 100
state_series_rnd = state_series_100.round(2)
state_series_rnd.head()
California    12.04
Texas          8.83
Florida        6.54
New York       5.93
Illinois       3.86
dtype: float64

6.4.6. Small Program with Series

# a little program to find the number of large states that account
# for more than 50% of the US population

state_series_rnd.sort_values(ascending=False)
state_50percent_list = []
percent_sum = 0

for state in state_series_rnd.index:
  state_50percent_list.append(state)
  percent_sum = percent_sum + state_series_rnd[state]
  if percent_sum >= 50:
    break

state_series_rnd[state_50percent_list]

print(f'The top {len(state_50percent_list)} U.S. States by population account \
for more than 50% of the U.S. population.')

print('These states include:\n')
for state in state_50percent_list:
  print(f'{state} with a population of {state_series[state]}.')
The top 9 U.S. States by population account for more than 50% of the U.S. population.
These states include:

California with a population of 39512223.
Texas with a population of 28995881.
Florida with a population of 21477737.
New York with a population of 19453561.
Illinois with a population of 12671821.
Pennsylvania with a population of 12801989.
Ohio with a population of 11689100.
Georgia with a population of 10617423.
North Carolina with a population of 10488084.