11.1. Objects (Refresher)

Before we talk about classes, it’s worth summarizing and refreshing everything we know about objects so far.

11.1.1. String Objects

We first introduced the term “object” and the concept when discussing Strings, and String objects were the first kind of object we discussed. A few key points repeated from that chapter:

Definition

An object contains both data and methods, which are functions that are built into the object and can modify or perform operations on it.

As another way of putting it, objects “know things” and “can do things”:

  • Objects “know things”: an object holds data.

  • Objects “can do things”: an object contains code (the methods).

In the case of a string object, the object’s data is the characters of the string itself. And there are a few ways to learn about what methods (code) it contains. For any object, the built-in Python function dir() will list the methods available in an object.

>>> mystring = 'Hello world'
>>> dir(mystring)
['capitalize', 'casefold', 'center', 'count', 'encode',
'endswith', 'expandtabs', 'find', 'format', 'format_map',
'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit',
'isidentifier', 'islower', 'isnumeric', 'isprintable',
'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower',
'lstrip', 'maketrans', 'partition', 'replace', 'rfind',
'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip',
'split', 'splitlines', 'startswith', 'strip', 'swapcase',
'title', 'translate', 'upper', 'zfill']

Methods, like any other function, can be called to execute them. Calling a method is similar to calling a function (it takes arguments and can return a value), but to access a method within an object, we use dot notation just like when accessing functions within modules.

For example, the method upper() takes a string and returns a new string with all uppercase letters:

This form of dot notation specifies the name of the method, upper(), and the name of the object to apply the method to, word. The parentheses are empty because this method takes no arguments.

See the rest of the section on String objects for more examples.

11.1.2. Objects in Pandas

In Pandas, the most important type of object we have seen and used is the DataFrame. DataFrames contain a wide range of methods, and we have explored some of those in the earlier chapter. If we use the dir() method to list all of a DataFrame’s methods, we find a long list:

>>> dir(df)
['T', 'abs', 'add', 'add_prefix', 'add_suffix', 'agg', 'aggregate', 'align',
'all', 'any', 'append', 'apply', 'applymap', 'as_matrix', 'asfreq', 'asof',
'assign', 'astype', 'at', 'at_time', 'axes', 'between_time', 'bfill',
'bool', 'boxplot', 'clip', 'clip_lower', 'clip_upper', 'columns', 'combine',
'combine_first', 'compound', 'copy', 'corr', 'corrwith', 'count', 'cov',
'cummax', 'cummin', 'cumprod', 'cumsum', 'describe', 'diff', 'div',
'divide', 'dot', 'drop', 'drop_duplicates', 'dropna', 'dtypes',
'duplicated', 'empty', 'eq', 'equals', 'eval', 'ewm', 'expanding', 'ffill',
'fillna', 'filter', 'first', 'first_valid_index', 'floordiv', 'from_dict',
'from_records', 'ftypes', 'ge', 'get', 'get_dtype_counts',
'get_ftype_counts', 'get_values', 'groupby', 'gt', 'head', 'hist', 'iat',
'idxmax', 'idxmin', 'iloc', 'index', 'infer_objects', 'info', 'insert',
'interpolate', 'isin', 'isna', 'isnull', 'items', 'iteritems', 'iterrows',
'itertuples', 'ix', 'join', 'keys', 'kurt', 'kurtosis', 'last',
'last_valid_index', 'le', 'loc', 'lookup', 'lt', 'mad', 'mask', 'max',
'mean', 'median', 'melt', 'memory_usage', 'merge', 'min', 'mod', 'mode',
'mul', 'multiply', 'ndim', 'ne', 'nlargest', 'notna', 'notnull',
'nsmallest', 'nunique', 'pct_change', 'pipe', 'pivot', 'pivot_table',
'plot', 'pop', 'pow', 'prod', 'product', 'quantile', 'query', 'radd',
'rank', 'rdiv', 'reindex', 'reindex_axis', 'reindex_like', 'rename',
'rename_axis', 'reorder_levels', 'replace', 'resample', 'reset_index',
'rfloordiv', 'rmod', 'rmul', 'rolling', 'round', 'rpow', 'rsub', 'rtruediv',
'sample', 'select', 'select_dtypes', 'sem', 'set_axis', 'set_index',
'shape', 'shift', 'size', 'skew', 'slice_shift', 'sort_index',
'sort_values', 'squeeze', 'stack', 'std', 'style', 'sub', 'subtract', 'sum',
'swapaxes', 'swaplevel', 'tail', 'take', 'to_clipboard', 'to_csv',
'to_dense', 'to_dict', 'to_excel', 'to_feather', 'to_gbq', 'to_hdf',
'to_html', 'to_json', 'to_latex', 'to_msgpack', 'to_panel', 'to_parquet',
'to_period', 'to_pickle', 'to_records', 'to_sparse', 'to_sql', 'to_stata',
'to_string', 'to_timestamp', 'to_xarray', 'transform', 'transpose',
'truediv', 'truncate', 'tshift', 'tz_convert', 'tz_localize', 'unstack',
'update', 'values', 'var', 'where', 'xs']

We have only explored a few of these. Clearly there is a great deal more functionality inside a DataFrame than we have covered so far.

DataFrames also introduce one new aspect of objects that we glossed over before: attributes.

Above, we said that objects “know things” and “can do things.” Methods are what an object can “do,” and attributes are what an object “knows.” In other words, an attribute is data stored inside an object and given a name, just like a method is code stored inside an object and given a name. And just as methods can be accessed via dot notation, so too can attributes.

One example we’ve seen is df.shape. The .shape attribute of a DataFrame contains the row and column counts for that DataFrame. Notice that we access it via dot notation (naming the attribute shape we want to access inside the object df) but that the name is not followed by parentheses. This is what differentiates an attribute from a method, and .shape is not a method.

Common Error

If you attempt to call an attribute, by putting parentheses after the name, you will encounter a TypeError exception. It will tell you that the attribute, whatever type it is, is not “callable.” Methods and functions can be called and thus are “callable.” If you see this error message, it should indicate to you that the name you are accessing is not a method but rather is an attribute that can be accessed without the parentheses.

For example, if df is an empty DataFrame (with 0 rows and 0 columns), attempting to call df.shape() results in an error:

>>> df.shape()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object is not callable

While accessing it as an attribute succeeds:

>>> df.shape
(0, 0)