2.1. Values and Data Types¶
Computers, and the programs that direct them, operate on data. Data is information. When the example program from the previous chapter reads in text from a file, that text is data. Each word it extracts from that text is a piece of data. The number it calculates, counting the number of words starting with a given letter, is data. That letter itself is data.
Storing and manipulating data is at the heart of everything a computer can do. In order to write programs, then, we will have to learn how to tell the computer to store and manipulate data. We’ll start with individual pieces of data.
In programming, we use the word value, rather than data or datum, to talk
about a specific piece of information – like a word or a number – that a
program works with. A few values we have seen so far are 0
, 1
, and
"Hello, World!"
.
There are many kinds of data. You might already have noticed that 1
is a
very different kind of thing than "Hello, World!"
. Therefore, values are
classified into different data types.
'Cat'
is text information. The text data type is called string, because
text is a string, or sequence, of characters. In Python, strings are always
enclosed in quotations marks, like this: 'Cat'
, or this "Cat"
.
93
is numeric information. There are a few different numeric data types.
This particular value is an example of the integer data type.
A value’s data type controls what the computer can do with that value. Because
'Cat'
is a string and 93
is an integer, certain kinds of actions (what
we will call ‘operations’) make sense with one but not with the other. For
example, if we try to divide two strings we get an error.
On the other hand, dividing two integers is an action the computer can perform:
If you are not sure what the type of a particular value is, the type()
function can tell you:
Not surprisingly, strings belong to the type str
and integers belong
to the type int
. Less obviously, numbers with a decimal point belong to a
type called float
, because these numbers are stored in the computer in a
format called floating point.
What about values like "17"
and "123.45"
? They look like numbers, but they
are in quotation marks like strings.
They’re strings! It’s important to understand and remember that "17"
and
17
are very different things to Python.
2.1.1. Strings¶
We have used two different kinds of quotation marks to create strings: single
'
and double "
. Python will take whatever follows a quotation mark as
the contents of a string up until it finds a matching quotation mark. Strings
enclosed in one kind of quote symbol can contain the other kind. For example
single quotations '
can be wrapped in double "
and double "
can be
wrapped in single.
What do you think will happen if a string contains a quotation mark of the same kind that encloses it?
These produce syntax errors because the quotation mark that we want to be inside the string actually ends the string, and then the rest of the line is invalid Python syntax. See if you can get the code above to work by changing the type of quotation marks used.
There is another way to fix this issue. To include a quote character
that is the same as the one used to start and end the string, the character
can be escaped by putting a backslash \
in front of it, as in "The string
\"four\" is four characters long."
.
Escaping with backslashes is used in many instances when we want Python to read something as text, not just with quotation marks.
And by the way: since strings are sequences of characters, and emoji are just sequences of characters…
2.1.2. Numbers¶
When you type a large integer, you might be tempted to use commas between groups of three digits, as in 1,000,000. This is not a valid integer in Python, but it is valid syntax:
Well, that’s not what we expected at all! Python interprets 1,000,000
as
three comma-separated integers, which it prints with spaces between.
Note
The print()
function will print as many different values as you give it,
as long as they are separated by commas. The values will be separated by
spaces in the output.
For example:
>>> print("Hello, World!", 1, 2, 123.45)
Hello, World! 1 2 123.45
This is the first example we have seen of a semantic error: the code is syntactically valid and runs without producing an error message, but it doesn’t do what we thought or wanted it to do. In this case, Python’s rule about what commas mean doesn’t exactly match what we might assume about them based on using commas in everyday writing.
Caution
Programming languages are formal languages with strict, precise rules about what is valid code and what that code means. The computer will do exactly what you tell it to do… so be careful about what you tell it to do!
2.1.3. Type Conversion Functions¶
Often data is in one form and we need it in another. For example, if a data set is stored in a text format, every value will be stored as a string even if it is really numeric data. Python provides a few type conversion functions that will attempt to convert data from one type into another. Each of the three data types we’ve seen so far has a matching function that converts into that type:
int()
float()
str()
The int()
function can convert a floating point number or a string into an
int. When given a floating point number, it discards the decimal portion of
the number, called truncation towards zero on the number line. For example:
Python won’t always succeed in converting from one data type to another.
The error shows that a string given to int()
has to
be a syntactically valid integer. Anything else will cause an error.
The float()
function converts an integer, float, or syntactically valid
string into a float.
And finally, str()
can convert just about anything into a string. The
applications of this are a bit less common, but it’s worth remembering it
exists.
Check your understanding
Q-1: For each value, write its type - int, float, or str - to the right.
1234
:
12.34
:
"1234"
:
'12.34'
:
"Hello, 1234!"
:
- 'Average'
- Nothing wrong with this one.
- '"Cheese!", she exclaimed.'
- Strings can contain quotation marks that aren't the same as the marks delimiting (surrounding) the string.
- 'Euler's Identity'
- Strings cannot contain qutation marks that are the same as the marks delimiting (surrounding) the string unless they are escaped (see above).
- '👁️❤️🐍'
- Emoji (or more broadly, Unicode characters) are allowed.
- "Hello, World!"
- A classic string.
Q-2: Which of the following are valid strings in Python? (Mark all that are correct.)
Q-3: For each type conversion function call, write the value it will produce to the right.
int(1234)
:
int(8.8)
:
float("1234")
:
float(42.42)
: