Python Basic Data Structures
Array
Python's list
is internally implemented as an array. Accessing an element by an index is O(1)
.
Indexing
Negative indices are also allowed. A handy way to access the last element is by using the index -1:
a = [1, 2, 3]
print(a[-1]) # print out 3
Subarray slicing indices are left-inclusive and right-exclusive. For example,
a = [0, 10, 20, 30, 40, 50] subarray = a[2:5] # element from index 2 to index 4 (cuz right exclusive, 5-1=4) # subarray = [20, 30, 40]
This is very important. Mishandling the right-exclusivity is a common source of error in coding interview problems such as two-pointer problems. This is called a off-by-one error. To learn more about this, watch this video:
There's also a shorthand for slicing from element 0. For example:
a = [0, 10, 20, 30, 40, 50] subarray = a[:2] # subarray = [0, 10]
and slicing to the end,
a = [0, 10, 20, 30, 40, 50] subarray = a[2:] # subarray = [20, 30, 40, 50]
A note on right-exclusivity. Array and list indices are left inclusive and right exclusive in most languages including Python and Java. The reason for this design is up for speculation. According to a stackover flow post, Python inventor Guido Van Rossum was quoted as saying "I was swayed by the elegance of half-open intervals. Especially the invariant that when two slices are adjacent, the first slice's end index is the second slice's start index is just too beautiful to ignore. For example,, suppose you split a string into three parts at indices i and j -- the parts would be
a[:i]
,a[i:j]
, anda[j:]
." For our practical purposes in solving coding problems, we often have to deal with edge cases such as an empty list. And this notation becomes convenient because to represent an empty array we can writea[i:i]
instead of the more awkwarda[i:i-1]
.
In general, to iterate through a list, for
loop is easier to reason than while
since there's no condition to manage that could skip elements. You can also use enumerate
to get both the index and element at the same time.
nums = [0, 10, 20, 30, 40, 50]
for i, num in enumerate(nums):
print(i, num)
# 0 0
# 1 10
# 2 20
# 3 30
# 4 40
# 5 50
Linked List
Python does not have a built-in linked list. Normally, in an interview, you would be given a definition like this.
class LinkedListNode:
def __init__(self, val, next=None):
self.val = val
self.next = next
- append to end is
O(N)
. - finding an element is
O(N)
.
Linked lists are less commonly used in Python compared to other data structures. Besides problems that specifically ask for linked lists, you don't normally define and use linked list. There are a handful of common interview problems on linked lists, such as reversing a linked list, merging two linked lists, getting the middle of a linked list, detecting cycle in a linked list.
Note that Python's built-in list
is not a linked list, but rather a dynamic array which we covered above.
Stack
Python's list
also doubles as a stack. For a list l = []
- push:
append
,O(1)
- pop:
pop
,O(1)
- size:
len(l)
,O(1)
- top:
l[-1]
,O(1)
A stack is arguably the most magical data structure in computer science. In fact, with two stacks and a finite-state machine you can build a Turing Machine that can solve any problem a computer can solve.
Recursion and function calls are implemented with a stack behind the scenes. We'll discuss this in the recursion review module.
Queue
Python's collections.deque
is a double-ended queue, which means you can push and pop an element from either end in constant time. In the coding interviews, this is what we normally use when we need a queue. For a deque q = deque()
- enqueue:
q.append
,O(1)
- dequeue:
q.popleft()
,O(1)
. Note that it'spop*left*
.pop
is also supported but it's for getting element at the end of the double-ended queue. - peek:
q[0]
,O(1)
. Accesses the first element of the queue just like an array. - size:
len(q)
,O(1)
In coding interviews, we see queues most often in breadth-first search. We'll also cover the monotonic deque, where elements are sorted inside the deque, which is useful in solving some advanced coding problems.
Hash Table
Python's dict
is an implementation of a hash table. For a dictionary d = {}
,
- get using a key:
d[key]
,O(1)
, gives KeyError if key isn't in the dictionary - set a key, value:
d[key] = value
,O(1)
- remove a key:
del d[key]
,O(1)
- size:
len(d)
,O(1)
, returns the number of items in the dictionary.
It's worth mentioning that these are average-case time complexities. A hash table's worst time complexity is actually O(N)
due to hash collision and others. For the vast majority of the cases and certainly most coding interviews, the assumption of constant time lookup/insert/delete is valid.
Use a hash table if you want to create a mapping from A to B. Many starter interview problems can be solved with a hash table.
Python dict
has a convenient subclass Counter
for counting hashable objects. When you pass an iterable object, such as a list or a string, to Counter()
, it will return a new dict
with elements as keys and their counts as values.