Note

This is a static copy of a Jupyter notebook.

You can access a live version allowing you to modify and execute the code using Binder.

6.3. Series: Updating and Sorting

6.3.1. Updating Values and Labels

6.3.1.1. Updating Values

Like a list, a series is mutable, so values can be updated using indexing with .loc or iloc.

# import pandas
import pandas as pd

# create fruit weight series from lists
fruit_name_list = ['apple', 'banana', 'cherry', 'dates', 'elderberry']
fruit_weight_list = [180, 120, 15, 45, 75]
my_fruit_series = pd.Series(data=fruit_weight_list, index=fruit_name_list)

# update a value based on a label
my_fruit_series.loc['cherry'] = 25
my_fruit_series
apple         180
banana        120
cherry         25
dates          45
elderberry     75
dtype: int64
# update a value based on a position
my_fruit_series.iloc[2] = 225
my_fruit_series
apple         180
banana        120
cherry        225
dates          45
elderberry     75
dtype: int64

You can also update multiple values simultaneously.

# update multiple values based on label
my_fruit_series.loc['apple':'cherry'] = [200, 150, 35]
my_fruit_series
apple         200
banana        150
cherry         35
dates          45
elderberry     75
dtype: int64
# update multiple values based on position
my_fruit_series.iloc[2:] = [300, 400, 500]
my_fruit_series
apple         200
banana        150
cherry        300
dates         400
elderberry    500
dtype: int64

Since multiple values can share the same label in the index, multiple values that share a value can be updated simultaneously.

# constructs a series with repeated values in index
fruit_name_list = ['apple', 'apple', 'apple', 'banana', 'banana']
fruit_weight_list = [180, 120, 15, 650, 450]

my_fruit_series = pd.Series(data = fruit_weight_list, index = fruit_name_list)
my_fruit_series
apple     180
apple     120
apple      15
banana    650
banana    450
dtype: int64
# updating multiple values with a single assignment
my_fruit_series.loc['apple'] = 333
my_fruit_series
apple     333
apple     333
apple     333
banana    650
banana    450
dtype: int64

Note: setting or ‘updating’ variables with .loc or .iloc, as we have in this section, changes the series directly. We do not need to make a copy of the series to make the change.

6.3.1.2. Updating Index Labels

The series index object is not mutable, so we cannot use positional indexing to update a single label within the index. However, we can reassign a list of new values to the index.

my_fruit_series.index
Index(['apple', 'apple', 'apple', 'banana', 'banana'], dtype='object')
my_fruit_series.index = ['apple', 'banana', 'cherry', 'dates', 'elderberry']
my_fruit_series.index
Index(['apple', 'banana', 'cherry', 'dates', 'elderberry'], dtype='object')

6.3.2. Basic Operations with Series

We can update all the values in a series (or just create a view) without using a for loop. We have some examples of updates with mathematical operators and with string operations below.

6.3.2.1. Basic Math Operations

If we had a list full of integers and we were interested in adding five to each value in the list, we would need to write a for loop to loop over the list and update each value. This sort of operation is much simpler with a series.

my_fruit_series
apple         333
banana        333
cherry        333
dates         650
elderberry    450
dtype: int64
my_fruit_series + 5
apple         338
banana        338
cherry        338
dates         655
elderberry    455
dtype: int64
my_fruit_series - 5
apple         328
banana        328
cherry        328
dates         645
elderberry    445
dtype: int64
my_fruit_series * 5
apple         1665
banana        1665
cherry        1665
dates         3250
elderberry    2250
dtype: int64
my_fruit_series / 5
apple          66.6
banana         66.6
cherry         66.6
dates         130.0
elderberry     90.0
dtype: float64

Important note! In all of the previous examples of operations, we have not actually made any change to the underlying series. The result we are seeing is called a ‘view’ (see earlier discussion) and is not retained unless we assign it to a variable. If we want to make a lasting alteration of the series, we have to perform the operation and assign the result to a variable.

# this makes a copy, adds five,
# and then reassigns the result to the original variable name
my_fruit_series = my_fruit_series + 5
my_fruit_series
apple         338
banana        338
cherry        338
dates         655
elderberry    455
dtype: int64

6.3.2.2. Basic String Operations

# creates a series with strings as values
fruit_name_list = ['banana', 'banana', 'apple', 'apple', 'apple']
fruit_weight_list = [180, 120, 15, 650, 450]

my_fruit_series = pd.Series(data=fruit_name_list, index=fruit_weight_list)
my_fruit_series
180    banana
120    banana
15      apple
650     apple
450     apple
dtype: object
# creating a view using string concatenation
my_fruit_series + ' is a fruit!'
180    banana is a fruit!
120    banana is a fruit!
15      apple is a fruit!
650     apple is a fruit!
450     apple is a fruit!
dtype: object

As we saw in the previous section with the basic operations, performing the operations does not change the underlying series. If we want to retain the change we have to reassign the result over the original.

# this doesn't change anything
my_fruit_series + ' is a fruit!'
my_fruit_series
180    banana
120    banana
15      apple
650     apple
450     apple
dtype: object
# but this does
my_fruit_series = my_fruit_series + ' is a fruit!'
my_fruit_series
180    banana is a fruit!
120    banana is a fruit!
15      apple is a fruit!
650     apple is a fruit!
450     apple is a fruit!
dtype: object

6.3.3. Sorting Series by Index and Value

When we have wanted to sort a Python list, we have relied on the .sort() list method which works by sorting the list ‘in place’ and returning None. Sorting a series works much differently, for a few reasons. First, when we are working with a series, we can sort on either the index or sort on the values.

To sort on the index, we use .sort_index()

# sort on the index
my_fruit_series.sort_index()
15      apple is a fruit!
120    banana is a fruit!
180    banana is a fruit!
450     apple is a fruit!
650     apple is a fruit!
dtype: object

To sort on the values, we use .sort_values().

# sorting on the values
my_fruit_series.sort_values()
15      apple is a fruit!
650     apple is a fruit!
450     apple is a fruit!
180    banana is a fruit!
120    banana is a fruit!
dtype: object

Unlike sorting a list, when you sort a series, the sorting is not done ‘in place’. The previous two code cells show the series sorted by index and then sorted by values, but, in both cases, we only created a view, we did not assign the result to a variable and the original series remains in its original order.

# the actual series object was not changed
my_fruit_series
180    banana is a fruit!
120    banana is a fruit!
15      apple is a fruit!
650     apple is a fruit!
450     apple is a fruit!
dtype: object

If we want to retain the sorted series, we can reassign the result over the original series.

my_fruit_series = my_fruit_series.sort_values()
my_fruit_series
15      apple is a fruit!
650     apple is a fruit!
450     apple is a fruit!
180    banana is a fruit!
120    banana is a fruit!
dtype: object
my_fruit_series = my_fruit_series.sort_index()
my_fruit_series
15      apple is a fruit!
120    banana is a fruit!
180    banana is a fruit!
450     apple is a fruit!
650     apple is a fruit!
dtype: object

Pandas does actually allow for ‘in place’ sorting. If you want to sort a series ‘in place’, perhaps because your series is very large and you don’t want two copies in memory at the same time, sorting ‘in place’ can be done using an optional argument with both types of sort that we have covered. However, to keep things simple for now, I do not want you to use the ‘in place’ sorting argument in this class.

Similar to .sort(), both .sort_index() and .sort_values() have optional arguments to control the direction of the sorting.

my_fruit_series = my_fruit_series.sort_index(ascending=False)
my_fruit_series
650     apple is a fruit!
450     apple is a fruit!
180    banana is a fruit!
120    banana is a fruit!
15      apple is a fruit!
dtype: object