Treemap Data Structure in Python

What is a TreeMap?

A self-balancing binary search tree. In modern languages, this is normally implemented as a Red-black tree.

TreeMap Time Complexity

OperationAverageWorst Case
SearchO(log(n))O(log(n))
InsertAmortized O(1)O(log(n))
InsertAmortized O(1)O(log(n))

Java TreeMap

Java has collection library support (java.util.TreeMap) for TreeMap.

C++ TreeMap

C++'s std::map is a red-black tree TreeMap Implementation. Note that name is rather tricky here. std::map is TreeMap(ordered) and the more common hashmap is actually std::unordered_map.

Does Python Have TreeMap?

Not in the standard library. There are third-party libraries such as http://www.grantjenks.com/docs/sortedcontainers/

Do you need TreeMap for interviews?

TreeMap comes in handy when you need

  1. fast random access by key
  2. maintaining order of the key, as well as
  3. updating the key order quickly

If the requirements are only 1) and 2), we can get away with a hashmap and a list/array and that's exactly what most interview problems ask.

Here's an example:

Time Based Key-Value Store

The problem asks us to implement a data structure that maps one (key, timestamp) to one value and being able to look up using key and timestamp. We can solve this using a hashmap that stores key to (timestamp, value) pairs. When we receive a new data point, we append the (timestamp, value) pair. Because timestamps are strictly increasing, the list is always sorted and we can use binary search to answer the get query quickly.

from collections import defaultdict class TimeMap: def __init__(self): self.memo = defaultdict(list) def set(self, key: str, value: str, timestamp: int) -> None: # store a timestamp, value pair in the list self.memo[key].append([timestamp, value]) def get(self, key: str, timestamp: int) -> str: # binary search on timestamp in the list index = bisect.bisect(self.memo[key], [timestamp+1]) -1 if index < 0: return "" return self.memo[key][index][1]

This gives a time complexity of O(1) set and O(log(n)) get.

It's important to note that this problem has two unique conditions

  1. timestamps are strictly increasing
  2. we don't need to implement delete operation

If the timestamps are not strictly increasing, then to keep the (timestamp, value) pairs sorted, we need to find and insert into the position a new data point belongs to. This requires moving each element after the insertion point by one position which is an O(n) operation. Similarly, if we need to delete a timestamp, we need find, delete and move which is also an O(n) operation.

In this case, if python had TreeMap, we could have used it instead of the hashmap + binary search and achieve O(log(n)) for both get and set.

Other problems that are best solved by TreeMap but can also be solved using alternative methods:

Range Query and Segment Tree

If the problem involves range queries, we can implement a segment tree, which is much simpler than the red-black tree that backs TreeMap.

A segment Tree is a special data structure that allows quick range queries. It's not nearly as hard to implement correctly as red-black tree and is something you could do in an interview. It's a great option when you need to do range query and do not have access to builtin treemap. A quintessential question is Range Max.

In summary,

  • Python does not have builtin TreeMap
  • For interviews, we can normal solve TreeMap problems using hashmap + binary search
  • Use segment tree if the problem involves range query.

Treeset in Python

Now that we have covered Treemap, let's talk about Treeset.

What is a TreeSet?

A TreeSet is a set-based data structure that provides logarithmic time performance for common operations like add, remove, and contains. This efficiency arises from the fact that, under the hood, a TreeSet is typically backed by a balanced binary search tree, such as a Red-Black Tree.

Unlike regular sets, which don't guarantee any specific order of elements, TreeSet guarantees the elements to be in ascending order, according to their natural ordering or based on a provided comparator.

Does Python Have TreeSet?

Python's standard library does not include a TreeSet implementation out-of-the-box. Python developers typically leverage the built-in set for most use-cases or use sorted containers, like sortedcontainers.SortedSet, for situations where sorted order is essential.

However, the idea behind TreeSet can be found in many programming languages, with Java being the most notable proponent of the TreeSet data structure in its standard library.

LeetCode Problems involving TreeSet or Set-Based Operations:

  • Contains Duplicate III

    • This problem asks to check if there are two distinct indices i and j such that nums[i] and nums[j] are at most t apart and i and j are at most k apart. A TreeSet can help maintain a window of k numbers sorted, making it easy to check the conditions.
  • Max Sum of Rectangle No Larger Than K

    • This problem requires the use of a set-based structure to efficiently calculate the sum of sub-rectangles.
  • Data Stream as Disjoint Intervals

    • For this problem, one needs to efficiently handle incoming data points, and a TreeSet or a sorted set structure can help in merging overlapping intervals.

And that's a wrap for Python treemap and treeset. Check out the other pattens you probably should know for your next interview.