5.1. Strings¶
Weâve been using strings from the beginning, and so you already have a general idea of what they are and know how to do a few things with them. Here, weâll go into more detail and show you more ways you can use and manipulate strings.
5.1.1. A String is a Sequence¶
The first detail to add is that a string is a sequence. And where have we
seen âsequenceâ before? Earlier, we saw that a sequence can be given to a
for
loop, and the loop will iterate over each of the elements in the given
sequence.
If a for loop accepts any sequence, and if a string is a sequence, then something like the following should work:
Try it out. What can we learn from that?
We see that it prints out each individual character in the string. That
is, each time the for
loop gets a new item from the string, it gets a single
character. Therefore, we can see that a string is a sequence of characters.
There are many other tools we can use with sequences, and weâll go over several below. All of these apply to any sequence, not just strings. Recall that lists are sequences as well. Most of the following tools and patterns work with lists just like they do with strings. Weâll focus on strings here and get to lists shortly.
5.1.1.1. Indexing¶
In addition to iterating over a sequence in a for
loop, there are other
things we can do with sequences. One of the most important is called
indexing. Indexing is a tool that lets us get a single element out of a
sequence. In the case of a string, it lets us get a single character out of
the string.
To perform indexing, we use the []
or âbracketâ operator with an integer:
The second statement extracts the character at index position 1 from the
fruit
variable and assigns it to the letter
variable. The expression
in the brackets is called an index. The index indicates which character in
the sequence you want (hence the name).
But why did it print âaâ and not âbâ? For most people, the first letter of âbananaâ is âb,â not âa.â But in Python and most other programming languages, an index is an offset from the beginning of the string, and the offset of the first letter is zero.
So âbâ is the letter âat index [or position] 0â of âbanana,â âaâ is the letter âat index 1,â and ânâ is the letter âat index 2.â
You can use any expression, including variables and operators, as an index, but the value of the index has to be an integer. Otherwise you get:
>>> letter = fruit[1.5]
TypeError: string indices must be integers
5.1.1.2. Using len()
with Strings¶
Recall the len()
built-in function. We can now see that it always returns the number of elements in a sequence. If the sequence we give it is a string, we get back the number of characters in the string.
To get the last letter of a string, you might be tempted to try something like this:
The reason for the IndexError
is that there is no letter in âbananaâ
with the index 6. Since we started counting at zero, the six letters are
numbered 0 to 5. To get the last character, you have to subtract 1 from
length
:
Alternatively, you can use negative indices, which count backward from the end
of the string. The expression fruit[-1]
yields the last letter,
fruit[-2]
yields the second to last, and so on.
5.1.1.3. Traversal Through a String with a Loop¶
A lot of computations involve processing a string one character at a time.
Often they start at the beginning, select each character in turn, do something
to it, and continue until the end. This pattern of processing is called a
traversal. Weâve seen above that we can accomplish this with a for
loop,
using a string as its sequence. Another way to write a traversal is with a
while
loop:
This loop traverses the string and displays each letter on a line by itself.
The loop condition is index < len(fruit)
, which can be considered to be
saying, âAs long as index
is still a valid index of fruit
â because all
valid indexes are less than the length of the string. So when index
is
equal to the length of the string, the condition is false, and the loop stops
executing.
With each value for index
counting up from 0, the body of the loop uses indexing
to get the character at that index from the string, and it prints it out.
Check your understanding
Write a while
loop that starts at the last character in the string and
works its way backwards to the first character in the string, printing each
letter on a separate line, except backwards.
5.1.1.4. Slicing¶
If we want a portion of a string, rather than a single character, we can use slicing. A segment of a string is called a slice. Selecting a slice is similar to selecting a character:
To perform slicing, place a :
inside the []
brackets with an index
written before and after it. The operator returns the portion of the string
from the first index up to but not including the second index.
If you omit the first index (before the colon), the slice starts at the beginning of the string. If you omit the second index, the slice goes to the end of the string:
5.1.1.5. Strings are Immutable¶
It is tempting to use the indexing operator on the left side of an assignment, with the intention of changing a character in a string. For example:
>>> greeting = 'Hello, world!'
>>> greeting[0] = 'J'
TypeError: 'str' object does not support item assignment
The âobjectâ in this case is the string and the âitemâ is the character you tried to assign. For now, an object is the same thing as a value, but we will refine that definition later. An item is one of the values in a sequence.
The reason for the error is that strings are immutable, which means you canât change an existing string. The best you can do is create a new string that is a variation on the original:
This example concatenates a new first letter onto a slice of greeting
. It
has no effect on the original string.
5.1.1.6. Looping and Counting¶
The following program counts the number of times the letter âaâ appears in a string:
This program demonstrates another pattern of computation called a counter.
The variable count
is initialized to 0 and then incremented each time an
âaâ is found. When the loop exits, count
contains the result: the total
number of aâs. We used this pattern back in the word count example
program.
5.1.1.7. The in
Operator¶
The word in
is a Boolean operator that takes two strings and returns
True
if the first appears as a substring in the second:
>>> 'a' in 'banana'
True
>>> 'seed' in 'banana'
False
The in
operator is commonly used in conditionals, as demonstrated in the
following example:
5.1.2. String Comparison¶
The comparison operators work on strings. To see if two strings are equal:
Other comparison operations are useful for putting words in alphabetical order:
Python does not handle uppercase and lowercase letters the same way that people do. All the uppercase letters come before all the lowercase letters, so if you enter âPineapple,â for example:
Your word, Pineapple, comes before banana.
A common way to address this problem is to convert strings to a standard format, such as all lowercase, before performing the comparison. The next section includes a way to do that.
5.1.3. string
Objects and Methods¶
Strings in Python can do a lot more than just hold a sequence of characters. Strings are an example of Python objects.
Definition
An object contains both data and methods, which are functions that are built into the object and can modify or perform operations on it.
As another way of putting it, objects âknow thingsâ and âcan do thingsâ:
Objects âknow thingsâ: an object holds data.
Objects âcan do thingsâ: an object contains code (the methods).
In the case of a string object, the objectâs data is the characters of the string itself. And there are a few ways to learn about what methods (code) it contains.
Python has a function called dir()
which lists the methods available in an
object. The type()
function shows the type of an object and the dir()
function shows the available methods.
>>> stuff = 'Hello world'
>>> type(stuff)
<class 'str'>
>>> dir(stuff)
['capitalize', 'casefold', 'center', 'count', 'encode',
'endswith', 'expandtabs', 'find', 'format', 'format_map',
'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit',
'isidentifier', 'islower', 'isnumeric', 'isprintable',
'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower',
'lstrip', 'maketrans', 'partition', 'replace', 'rfind',
'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip',
'split', 'splitlines', 'startswith', 'strip', 'swapcase',
'title', 'translate', 'upper', 'zfill']
While the dir()
function lists the methods, a better source of
documentation for string methods is the official Python documentation:
https://docs.python.org/3/library/stdtypes.html#string-methods.
Note
The official Python documentation uses a syntax that might be confusing. For
example, in find(sub[, start[, end]])
, the brackets indicate optional
arguments. So sub
is required, but start
is optional, and if you
include start
, then end
is optional.
Methods, like any other function, can be called to execute them. Calling a method is similar to calling a function (it takes arguments and can return a value), but to access a method within an object, we use dot notation just like when accessing functions within modules.
For example, the method upper()
takes a string and returns a new string
with all uppercase letters:
This form of dot notation specifies the name of the method, upper()
, and the
name of the string to apply the method to, word
. The parenthese are empty
because this method takes no arguments.
The find()
string method searches for the position of one string within
another:
In this example, we invoke find()
on word
and pass in the string we are
looking for as a parameter.
The find()
method can optionally take a second argument: the index where it
should start searching.
The final call to find()
there returns -1
to indicate the search string
was not found. The string 'nan'
is present in 'banana'
, but the
second argument started the search at index 4, beyond where 'nan'
starts.
One common task is to remove white space (spaces, tabs, or newlines) from the
beginning and end of a string using the strip()
method:
Some methods such as startswith()
return Boolean values.
Note that startswith()
requires case to match, so sometimes we take a line
and map it all to lowercase before we do any checking using the lower()
method.
>>> line = 'Have a nice day'
>>> line.startswith('h')
False
>>> line.lower()
'have a nice day'
>>> line.lower().startswith('h')
True
Check your understanding
There is a string method called count()
that counts the occurrence of
one string within another. Read about this method in Python string method
documentation
and write a short program that uses count()
to count the number of times
the letter âaâ occurs in a string the user types in.
5.1.4. Parsing Strings¶
Often, we want to look into a string and find a substring. For example if we were presented a series of lines formatted as follows:
From stephen.marquard@uct.ac.za  Sat Jan  5 09:14:16 2008
and we wanted to pull out only the second half of the address (i.e.,
uct.ac.za
) from each line, we can do this by using the find()
method and string slicing.
First, we will find the position of the at-sign in the string. Then we will find the position of the first space after the at-sign. And then we will use string slicing to extract the portion of the string which we are looking for.
We use the optional arguments for the find()
method that allow us to
specify the position in the string where we want find()
to start searching.
When we slice, we extract the characters from âone beyond the at-signâ and up
to but not including the index of the next space character.
The documentation for the find method describes the optional arguments.
5.1.5. String Formatting with F-Strings¶
When printing the results of some computation, we very frequently want to place data (such as in a variable) into a string of text. âF-Stringsâ give us a good way to accomplish that. They allow us to construct strings, replacing parts of the strings with the data stored in variables or calculated in expressions. Letâs look at an example:
The syntax might look a little strange. The f
before the string literal begins
may look out of place to you, but itâs an important part of the syntax. That
string prefix tells Python the string that follows should be treated in a
particular way; in this case, it is a format string.
A format string should contain one or more placeholders, written as { }
(known as âcurly bracesâ) with some variable or expression written inside.
Each { }
placeholder will be replaced with the value of whatever is
written inside. So you can see in the example above how each placeholder
ends up replaced with the value of the specified variable.
A placeholder can optionally contain additional information inside the { }
curly braces that specifies how the value included there should be formatted.
There are many options for controlling how the string is formatted, but the
more commonly-used options control how floating point values are printed and
allow for aligning values in columns. The following example demonstrates both.
In the f-string here, f"{bigger:>5} {smaller:5.3f}"
, the first
placeholderâs format is :>5
. It includes a :
to start the formatting
options, the >
makes the value âright-alignedâ and the 5
controls how
many characters the value is placed in. So it always uses 5 characters, and it
places the value on the right hand side of that space. The second
placeholderâs format, :5.3f
uses 5 characters, again, and the .3f
makes
it format it as a floating point value and place 3 digits after the decimal
point, regardless of the value itself. Values are left-aligned by default.
Try changing some of the values in the placeholders to see how it affects the
formatting.
There are many more options for controlling what is included in the string and how it is formatted. See fstring.help for a guide to f-strings that shows many of the possibilities.
Using f-strings is often easier than building strings by concatenating
different pieces, it provides more control than including multiple arguments in
a plain print()
statement, and it makes for clean, readable code. For
example: