How to work with the Python list data type

Python comes with a collection of built-in data types that make common data-wrangling operations easy. Among them is the list, a simple but versatile collection type. With a Python list, you can group Python objects together in a one-dimensional row that allows objects to be accessed by position, added, removed, sorted, and subdivided.

Python list basics

Defining a list in Python is easy—just use the bracket syntax to indicate items in a list, like this:


list_of_ints = [1, 2, 3]

Items in a list do not have to all be the same type; they can be any Python object. In the list below, assume Three is a function:


list_of_objects = ["One", TWO, Three, {"Four":4}, None]

Note that having mixed objects in a list can have implications for sorting the list. We’ll go into that later.

The biggest reason to use a list is to able to find objects by their position in the list. To do this, you use Python’s index notation: a number in brackets, starting at 0, that indicates the position of the item in the list.

In the list_of_ints example, list_of_ints[0] yields 1, list_of_ints[1] yields 2, and list_of_objects[4] would be the None object.

Python list indexing

If you use a positive integer for the index, the integer indicates the position of the item to look for. But if you use a negative integer, then the integer indicates the position starting from the end of the list. For example, using an index of -1 is a handy way to grab the last item from a list no matter the size of the list.

In that case, list_of_ints[-1] yields 3, and list_of_objects[-1] yields None.

You can also use an integer variable as your index. If x=0, list_of_ints[x] yields 1, and so on.

If you try to index beyond a list's boundaries you'll trigger an IndexError exception.

Adding and removing Python list items

Python has several ways you can add or remove items from a list:

.append() inserts an item at the end of the list. For example, list_of_ints.append(4) would turn list_of_ints into the list [1,2,3,4]. Appends are fast and efficient; it takes about the same amount of time to append one item to a list no matter how long the list is.
.extend() takes the contents of some iterable—such as another list—and adds each item from the iterable to the list as a separate item. This is useful if you want to quickly insert the contents of a list item-by-item into another list. (If you try to .append() one list to another, the entire list gets appended as a single object, rather than item-by-item.)
.pop() removes and returns the last item from the list. If we ran x = list_of_ints.pop() on the original list_of_ints, x would contain the value 3. (You don’t have to assign the results of .pop() to a value if you don’t need it.) .pop()operations are also fast and efficient.
.insert() inserts an item at some arbitrary position in the list. For example, list_of_ints.insert(0,10) would turn list_of_ints into [10,1,2,3]. Note that the closer you insert to the front of the list, the slower this operation will be, though you won’t see much of a slowdown unless your list has many thousands of elements or you’re doing the inserts in a tight loop.
.pop(x) removes the item at the index x. So list_of_ints.pop(0) would remove the item at index 0. Again, the closer you are to the front of the list, the slower this operation can be.
.remove(item) removes an item from a list, but not based on its index. Rather, .remove() removes the first occurrence of the object you specify, searching from the top of the list down. For [3,7,7,9,8].remove(7), the first 7 would be removed, resulting in the list [3,7,9,8]. This operation, too, can slow down for a large list, since it theoretically has to traverse the entire list to work.

Slicing a Python list

Lists can be divided up into new lists, a process called slicing. Python’s slice syntax lets you specify which part of a list to carve off and how to manipulate the carved-off portion.

You saw above how to use the bracket notation to get a single item from a list: my_list[2], for example. Slices use a variant of the same index notation (and follow the same indexing rules): list_object[start:stop:step].

Note the following:

start is the position in the list to start the slice.
stop is the position in the list where we stop slicing. In other words, that position and everything after it is omitted.
step is an optional “every nth element” indicator for the slice. By default this is 1, so the slice retains every element from the list it’s slicing from. Set step to 2, and you’ll select every second element, and so on.

Here are some examples. Consider this list:


slice_list = [1,2,3,4,5,6,7,8,9,0]
slice_list[0:5] = [1, 2, 3, 4, 5]

Note that we’re stopping at index 4, not index 5!


slice_list[0:5:2] = [1, 3, 5]

If you omit a particular slice index, Python assumes a default. Leave off the start index, and Python assumes the start of the list:


slice_list[:5] = [1, 2, 3, 4, 5]

Leave off the stop index, and Python assumes the end of the list:


slice_list[4:] = [5, 6, 7, 8, 9, 0]

The step element can also be negative. This lets us take slices that are reversed copies of the original:


slice_list[::-1] = [0, 9, 8, 7, 6, 5, 4, 3, 2, 1]

Note that you can slice in reverse by using start and stop indexes that go backwards, not forwards:


slice_list[5:2:-1] = [6, 5, 4]

Slicing and shallow copies

Also keep in mind that slices of lists are shallow copies of the original list. The original list remains unchanged. The elements inserted into the new list are the same kinds of references to those items as the ones in the old list.

For instance, if you have a class instance in a list and you make a slice containing that object, a distinct new class instance isn't created—the slice just now contains a different reference to the same class instance.

Slicing and out-of-bounds indexes

If you try to make a slice that's bigger than the item you're slicing—an "out of bounds" slice—you will not get an IndexError, but you will only get as much as the sliced item actually has. For instance:

[1,2,3][:10]

would yield [1,2,3]. This allows you to make slices without worrying too much about constraining the boundaries of the slice to the thing you're slicing.

Sorting a Python list

Python provides two ways to sort lists. You can generate a new, sorted list from the old one, or you can sort an existing list in-place. These options have different behaviors and different usage scenarios.

To create a new, sorted list, use the sorted() function on the old list:


new_list = sorted(old_list)

This will sort the contents of the list using Python’s default sorting methods. For strings, the default is lexical order; for numbers, it’s ascending values.

If you want to sort a list in reverse, pass the reverse parameter:


new_list = sorted(old_list, reverse=True)

The other way to sort, in-place sorting, performs the sort operation directly on the original list. To do this, use the list’s .sort()method:


old_list.sort()

.sort() also takes reverse as a parameter, allowing you to sort in reverse.

Note that the contents of the list need to be consistent for sorting to work. For instance, you can’t sort a mix of integers and strings, but you can sort a list that is all integers or all strings. Otherwise you’ll get a TypeError in the sort operation.

Both sorted() and .sort() also take a key parameter. The key parameter lets you provide a function that can be used to perform a custom sorting operation. When the list is sorted, each element is passed to the key function, and the resulting value is used for sorting. For instance, if we had a mix of integers and strings, and we wanted to sort them, we could use key, like this:


mixed_list = [1,"2",3,"4", None]

def sort_mixed(item):
    try:
        return int(item)
    except ValueError:
        return 0

sorted_list = sorted(mixed_list, key = sort_mixed)
print (sorted_list)

Note that this code wouldn’t convert each element of the list into an integer! Rather, it would use the integer value of each item as its sort value. Also note how we use a try/except block to trap any values that don’t translate cleanly into an integer, and return 0 for them by default.

Multidimensional list objects

Lists are by nature one-dimensional; they store everything in a single, flat row. But since lists can contain any type of object, including other lists, this makes it possible to create multidimensional lists.

Here's an example of a two-dimensional list:


two_dimensional_list = [
    [0,1,2],
    [3,4,5]
]

The outermost list, the first dimension, is two elements; the inner dimension, the lists within, are three elements each.

If you wanted to access the lists within, you'd use a stacked indexing syntax like this:


two_dimensional_list[0][2]

This would give you the first element in the outer list—the list of [0,1,2]—and then the third element from that—the 2.

Note that Python doesn't enforce any kind of dimensionality on objects like this. You could have a list of lists where each sublist is a totally different length, but you'd need to ensure you didn't generate an IndexError by using indexes that didn't match the object in question.

Python lists are not arrays

One important thing to know about lists in Python is that they aren’t “arrays.” Other languages, like C, have one-dimensional or multidimensional constructions called arrays that accept values of a single type. Lists are heterogenous; they can accept objects of any type.

What’s more, there is a separate array type in Python. The Python array is designed to emulate the behavior of an array in C, and it’s meant chiefly to allow Python to work with C arrays. The array type is useful in those cases, but in almost every pure-Python case you’ll want to use lists. For everyday work that would normally use a list, there's no performance advantage to using arrays instead.

When to use Python lists (and when not to)

So, when are Python lists most useful? A list is best when:

You need to find things quickly by their position in a collection. Accessing any position in a list takes the same amount of time, so there is no performance penalty for looking up even the millionth item in a list.
You’re adding and removing to the collection mainly by appending to the end or removing from the end, in the manner of a stack. Again, these operations take the same amount of time regardless of the length of the list.

A Python list is less suitable when:

You want to find an item in a list but you don’t know its position. You can do this with the .index() property. For instance, you could use list_of_ints.index(1) to find the index of the first occurrence of the number 1 in list_of_ints. Speed should not be not an issue if your list is only a few items long, but for lists thousands of items long, it means Python has to search the entire list. For a scenario like this, use a dictionary, where each item can be found using a key, and where the lookup time will be the same for each value.
You want to add or remove items from any position but the end. Each time you do this, Python must move every other item after the added or removed item. The longer the list, the greater the performance issue this becomes. Python’s deque object is a better fit if you want to add or remove objects freely from either the start or the end of the list.