Python Basic Data Structures

Array

Python's list is internally implemented as an array. Access an element by an index is O(1).

Indexing

Negative indices are also allowed. A handy way to access the last element is:

1a = [1, 2, 3]
2print(a[-1])  # print out 3

Subarray slicing indices are left-inclusive and right-exclusive. For example

1a = [0, 10, 20, 30, 40, 50]
2subarray = a[2:5]  # element from index 2 to index 4 (cuz right exclusive, 5-1=4)
3# subarray = [20, 30, 40]

This is very important. Mishandling the right-exclusivity is a common source of error in coding interview problems such as two-pointer problems.

There's also a shorthand for slicing from element 0. For example:

1a = [0, 10, 20, 30, 40, 50]
2subarray = a[:2]
3# subarray = [0, 10]

and slicing until

1a = [0, 10, 20, 30, 40, 50]
2subarray = a[2:]
3# subarray = [20, 30, 40, 50]

A note on right-exclusivity. Array and list indices are left inclusive and right exclusive in most languages including Python and Java. The reason for this design is up for speculation. According to a stackover flow post, Python inventor Guido Van Rossum was quoted as saying "I was swayed by the elegance of half-open intervals. Especially the invariant that when two slices are adjacent, the first slice's end index is the second slice's start index is just too beautiful to ignore. For example, suppose you split a string into three parts at indices i and j -- the parts would be a[:i], a[i:j], and a[j:]." For our practical purposes in solving coding problems, we often have to deal with edge cases such as an empty list. And this notation becomes convenient because to represent an empty array we can write a[i:i] instead of the more awkward a[i:i-1].

In general, to iterate through a list, for loop is easier to reason than while since there's no condition to manage that could skip elements. You can also use enumerate to get both index and element at the same time.

1nums = [0, 10, 20, 30, 40, 50]
2for i, num in enumerate(nums):
3    print(i, num)
4
5# 0 0
6# 1 10
7# 2 20
8# 3 30
9# 4 40
10# 5 50

Linked List

Python doesn't have a built-in linked list. Normally at an interview, you'll be given a definition like this

1class LinkedListNode:
2    def __init__(self, val, next=None):
3        self.val = val
4        self.next = next
  • append to end is O(1).
  • finding an element is O(N).

Besides problems that specifically ask for linked lists, you don't normally define and use linked list. If you need a list with O(1) append you can use the built-in list, and if you want O(1) for both prepend and append you can use the built-in deque.

Stack

Python's list also doubles as a stack. For a list l = []

  • push: append, O(1)
  • pop: pop,O(1)
  • size: len(l), O(1)
  • top: l[-1], O(1)

Stack is arguably the most magical data structure in computer science. In fact, with two stacks and a finite-state machine you can build a Turing Machine that can solve any problem a computer can solve.

Recursion and function calls are implemented with stack behind the scene. We'll discuss this in the recursion review module.

Queue

Python's collections.deque is a double-ended queue, which means you can push and pop an element from either end in constant time. In the coding interviews, this is what we normally use when we need a queue. For a deque q = deque()

  • enqueue: q.append, O(1)
  • dequeue: q.popleft(), O(1). Note that it's pop*left*. pop is also supported but it's for getting element at the end of the double-ended queue.
  • size: len(q), O(1)

In coding interviews, we see queues most often in breadth-first search. We'll also cover monotonic deque where elements are sorted inside the deque that is useful in solving some advanced coding problems.

Hash Table

Python's dict is an implementation of the hash table. For a dictionary d = {},

  • get using a key: d[key], O(1)
  • set a key, value: d[key] = value, O(1)
  • remove a key: del d[key], O(1)

It's worth mentioning that these are average case time complexity. A hash table's worst time complexity is actually O(N) due to hash collision and other things. For the vast majority of the cases and certainly most coding interviews, the assumption of constant time lookup/insert/delete is valid.

Use a hash table if you want to create a mapping from A to B. Many starter interview problems can be solved with a hash table.

Python dict has a convenient subclass Counter for counting hashable objects. When you pass an iterable object, such as a list or a string, to Counter(), it will return a new dict with elements as keys and their counts as values.

Counter