Note
This is a static copy of a Jupyter notebook.
You can access a live version allowing you to modify and execute the code using Binder.
6.3. Series: Updating and Sorting¶
6.3.1. Updating Values and Labels¶
6.3.1.1. Updating Values¶
Like a list, a series is mutable, so values can be updated using
indexing with .loc
or iloc
.
# import pandas
import pandas as pd
# create fruit weight series from lists
fruit_name_list = ['apple', 'banana', 'cherry', 'dates', 'elderberry']
fruit_weight_list = [180, 120, 15, 45, 75]
my_fruit_series = pd.Series(data=fruit_weight_list, index=fruit_name_list)
# update a value based on a label
my_fruit_series.loc['cherry'] = 25
my_fruit_series
Output:
apple 180
banana 120
cherry 25
dates 45
elderberry 75
dtype: int64
# update a value based on a position
my_fruit_series.iloc[2] = 225
my_fruit_series
Output:
apple 180
banana 120
cherry 225
dates 45
elderberry 75
dtype: int64
You can also update multiple values simultaneously.
# update multiple values based on label
my_fruit_series.loc['apple':'cherry'] = [200, 150, 35]
my_fruit_series
Output:
apple 200
banana 150
cherry 35
dates 45
elderberry 75
dtype: int64
# update multiple values based on position
my_fruit_series.iloc[2:] = [300, 400, 500]
my_fruit_series
Output:
apple 200
banana 150
cherry 300
dates 400
elderberry 500
dtype: int64
Since multiple values can share the same label in the index, multiple values that share a value can be updated simultaneously.
# constructs a series with repeated values in index
fruit_name_list = ['apple', 'apple', 'apple', 'banana', 'banana']
fruit_weight_list = [180, 120, 15, 650, 450]
my_fruit_series = pd.Series(data = fruit_weight_list, index = fruit_name_list)
my_fruit_series
Output:
apple 180
apple 120
apple 15
banana 650
banana 450
dtype: int64
# updating multiple values with a single assignment
my_fruit_series.loc['apple'] = 333
my_fruit_series
Output:
apple 333
apple 333
apple 333
banana 650
banana 450
dtype: int64
Note: setting or ‘updating’ variables with .loc
or .iloc
, as we
have in this section, changes the series directly. We do not need to
make a copy of the series to make the change.
6.3.1.2. Updating Index Labels¶
The series index object is not mutable, so we cannot use positional indexing to update a single label within the index. However, we can reassign a list of new values to the index.
my_fruit_series.index
Output:
Index(['apple', 'apple', 'apple', 'banana', 'banana'], dtype='object')
my_fruit_series.index = ['apple', 'banana', 'cherry', 'dates', 'elderberry']
my_fruit_series.index
Output:
Index(['apple', 'banana', 'cherry', 'dates', 'elderberry'], dtype='object')
6.3.2. Basic Operations with Series¶
We can update all the values in a series (or just create a view) without using a for loop. We have some examples of updates with mathematical operators and with string operations below.
6.3.2.1. Basic Math Operations¶
If we had a list full of integers and we were interested in adding five to each value in the list, we would need to write a for loop to loop over the list and update each value. This sort of operation is much simpler with a series.
my_fruit_series
Output:
apple 333
banana 333
cherry 333
dates 650
elderberry 450
dtype: int64
my_fruit_series + 5
Output:
apple 338
banana 338
cherry 338
dates 655
elderberry 455
dtype: int64
my_fruit_series - 5
Output:
apple 328
banana 328
cherry 328
dates 645
elderberry 445
dtype: int64
my_fruit_series * 5
Output:
apple 1665
banana 1665
cherry 1665
dates 3250
elderberry 2250
dtype: int64
my_fruit_series / 5
Output:
apple 66.6
banana 66.6
cherry 66.6
dates 130.0
elderberry 90.0
dtype: float64
Important note! In all the previous examples of operations, we have not actually made any change to the underlying series. The result we are seeing is called a ‘view’ (see earlier discussion) and is not retained unless we assign it to a variable. If we want to make a lasting alteration of the series, we have to perform the operation and assign the result to a variable.
# this makes a copy, adds five,
# and then reassigns the result to the original variable name
my_fruit_series = my_fruit_series + 5
my_fruit_series
Output:
apple 338
banana 338
cherry 338
dates 655
elderberry 455
dtype: int64
6.3.2.2. Basic String Operations¶
# creates a series with strings as values
fruit_name_list = ['banana', 'banana', 'apple', 'apple', 'apple']
fruit_weight_list = [180, 120, 15, 650, 450]
my_fruit_series = pd.Series(data=fruit_name_list, index=fruit_weight_list)
my_fruit_series
Output:
180 banana
120 banana
15 apple
650 apple
450 apple
dtype: object
# creating a view using string concatenation
my_fruit_series + ' is a fruit!'
Output:
180 banana is a fruit!
120 banana is a fruit!
15 apple is a fruit!
650 apple is a fruit!
450 apple is a fruit!
dtype: object
As we saw in the previous section with the basic operations, performing the operations does not change the underlying series. If we want to retain the change we have to reassign the result over the original.
# this doesn't change anything
my_fruit_series + ' is a fruit!'
my_fruit_series
Output:
180 banana
120 banana
15 apple
650 apple
450 apple
dtype: object
# but this does
my_fruit_series = my_fruit_series + ' is a fruit!'
my_fruit_series
Output:
180 banana is a fruit!
120 banana is a fruit!
15 apple is a fruit!
650 apple is a fruit!
450 apple is a fruit!
dtype: object
6.3.3. Sorting Series by Index and Value¶
When we have wanted to sort a Python list, we have relied on the
.sort()
list method which works by sorting the list ‘in place’ and
returning None. Sorting a series works much differently, for a few
reasons. First, when we are working with a series, we can sort on either
the index or sort on the values.
To sort on the index, we use .sort_index()
# sort on the index
my_fruit_series.sort_index()
Output:
15 apple is a fruit!
120 banana is a fruit!
180 banana is a fruit!
450 apple is a fruit!
650 apple is a fruit!
dtype: object
To sort on the values, we use .sort_values()
.
# sorting on the values
my_fruit_series.sort_values()
Output:
15 apple is a fruit!
650 apple is a fruit!
450 apple is a fruit!
180 banana is a fruit!
120 banana is a fruit!
dtype: object
Unlike sorting a list, when you sort a series, the sorting is not done ‘in place’. The previous two code cells show the series sorted by index and then sorted by values, but, in both cases, we only created a view, we did not assign the result to a variable and the original series remains in its original order.
# the actual series object was not changed
my_fruit_series
Output:
180 banana is a fruit!
120 banana is a fruit!
15 apple is a fruit!
650 apple is a fruit!
450 apple is a fruit!
dtype: object
If we want to retain the sorted series, we can reassign the result over the original series.
my_fruit_series = my_fruit_series.sort_values()
my_fruit_series
Output:
15 apple is a fruit!
650 apple is a fruit!
450 apple is a fruit!
180 banana is a fruit!
120 banana is a fruit!
dtype: object
my_fruit_series = my_fruit_series.sort_index()
my_fruit_series
Output:
15 apple is a fruit!
120 banana is a fruit!
180 banana is a fruit!
450 apple is a fruit!
650 apple is a fruit!
dtype: object
Pandas does actually allow for ‘in place’ sorting. If you want to sort a series ‘in place’, perhaps because your series is very large and you don’t want two copies in memory at the same time, sorting ‘in place’ can be done using an optional argument with both types of sort that we have covered. However, to keep things simple for now, I do not want you to use the ‘in place’ sorting argument in this class.
Similar to .sort()
, both .sort_index()
and .sort_values()
have optional arguments to control the direction of the sorting.
my_fruit_series = my_fruit_series.sort_index(ascending=False)
my_fruit_series
Output:
650 apple is a fruit!
450 apple is a fruit!
180 banana is a fruit!
120 banana is a fruit!
15 apple is a fruit!
dtype: object