1649. Create Sorted Array through Instructions


Problem Description

The problem presents a scenario where we are given an integer array instructions and asked to build a sorted array using those instructions. Starting with an empty array nums, we insert each element from instructions from left to right. The key element is to calculate the cost associated with each insertion. This cost is computed by determining the minimum of two quantities: the count of elements currently in nums that are strictly less than the element being inserted, and the count of elements currently in nums that are strictly greater than the element being inserted.

After an element is inserted, if there are already duplicates of that element in nums, the insertion point can be before or after the existing ones, which doesn't affect the calculation of the cost.

The output is the sum of all individual costs (modulus 10^9 + 7 to handle large numbers) for inserting the entire instructions array into nums.

The challenge of the problem is to efficiently handle the cost calculations and insertions, as straightforward methods may not suffice for large datasets due to time complexity constraints.

Intuition

To arrive at the solution approach, one should recognize that the operations of insertion in nums and computing the count of elements less than or greater than a given value are crucial points where time efficiency matters. A naive iteration for each element would lead to a time complexity of O(n^2), which is not efficient.

The intuition is to use a Binary Indexed Tree (BIT) or a Segment Tree to efficiently query and update counts while maintaining a cumulative frequency of elements up to a certain value. A Binary Indexed Tree offers a more memory-efficient way to achieve this compared to a traditional Segment Tree, and both have a time complexity of O(log n) for updates and queries.

A BIT is a data structure that supports two operations efficiently on an array of numbers: updating the value of an element and computing the prefix sum of elements up to a certain index.

  1. When we insert an element x, we update the BIT to reflect that there is now one more occurrence of x.
  2. To find the cost of inserting x, we query the BIT:
    • To find the number of elements strictly less than x, we get a prefix sum query of x - 1, since this will give us the total count of elements below x.
    • To find the number of elements strictly greater than x, we subtract from the total elements inserted so far, the prefix sum of x (which includes x itself and all numbers below it).

With this approach, we can incrementally build our sorted array nums and calculate the insertion cost for each element efficiently.

The solution's implementation defines a Binary Indexed Tree class, with update and query methods. As we iterate through instructions, for each element, we use the BIT to calculate the cost of insertion, update the cumulative frequency of elements, and keep a running total of the cost modulo 10^9 + 7.

This approach ensures efficient handling of the required operations to calculate the cost with a time complexity of O(n log m), where n is the length of the instructions list and m is the range of numbers in instructions.

Solution Approach

The given solution makes use of a Binary Indexed Tree (BIT), which is also known as a Fenwick Tree. This data structure is particularly useful for solving problems that involve frequent updates and queries on prefix sums, which is the situation we encounter when determining the insertion cost for the instructions array.

Binary Indexed Tree

The Binary Indexed Tree is constructed to store the frequencies of numbers up to the maximum value present in the instructions array. Its size is initialized based on the maximum element to ensure that all elements can be accounted for.

Update Operation

To update the BIT, increment the frequency count of a number, tree.update(x, 1) is called. This affects all the subsequent elements that contain x in their binary representation sum path. Here's how the update method works:

  • Starting from our index, denoted by x, increment the corresponding BIT bucket by an update value (which, in this case, is 1 for each insertion).
  • The tree's buckets are designed such that each bucket holds a cumulative frequency of a range of indexes, determined by the least significant bit of the bucket's index.
  • After updating the current bucket, the next index to update is found by incrementing the current index x with the value x & -x (x AND the 2's complement of x), which gives the least significant bit that represents how much you can jump to the next affected bucket in the BIT.

Query Operation

To query the BIT (for example, for a value x), tree.query(x) is called, which returns the prefix sum for the first x elements. This operation is used to determine the number of elements less than the current element that we are trying to insert. Here's how the query method works:

  • Start from the index x, and keep adding the values at the indexed buckets to a running sum.
  • To move to the next index to be checked, we subtract from the current index x the value x & -x to step backwards through the BIT.
  • The process repeats until we reach the start of the array (index zero), by which point s contains the cumulative count of frequencies for indexes up to x.

The BIT's update and query operations optimize the calculation of the insertion cost as they allow for both the insertion and the count of elements less than or greater than a given value to be computed in O(log n) time complexity.

Integration in the Solution

In the context of the problem at hand, the BIT is used to keep track of the number of times each number in the instructions has been inserted into nums.

As each element in instructions is processed:

  1. We calculate the cost of insertion as the minimum of all numbers strictly less than (tree.query(x - 1)) and strictly greater than it (i - tree.query(x)).
  2. We update the BIT with the current number to reflect its insertion by incrementing its frequency in tree.update(x, 1).
  3. We accumulate the cost after each insertion operation. To prevent integer overflow, we take modulo 10^9 + 7 of the result.

By continuously updating the BIT and querying it for each element's associated cost, we arrive at the total insertion cost for the entire instructions array with the desired time efficiency.

💪
Level Up Your
Algo Skills

Example Walkthrough

Let's illustrate the solution approach using a small example with the array of integers instructions given as [1, 5, 6, 2].

When we start, our BIT is all zeroes, since no elements have been inserted yet.

  1. We insert the first element 1. There are no elements in the bit, so the cost is 0. Our BIT is updated to 1 at the index representing the number 1.

    Current BIT state (index: count): [0: 0, 1: 1, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0]

    Total cost: 0

  2. Inserting 5: The cost is the minimum of the count of numbers less than 5 (tree.query(4) which is 1) and the count of numbers greater than 5 (0 as we only have 1 and 5 so far). The cost is 0 since that's the minimum.

    Update the BIT for 5: [0: 0, 1: 1, 2: 0, 3: 0, 4: 0, 5: 1, 6: 0]

    Total cost: 0 + 0 = 0

  3. Inserting 6: The cost is the minimum of the count of numbers less than 6 (tree.query(5) which is the sum of counts for 1 and 5, hence 2) and the count of numbers greater than 6 (0).

    Update the BIT for 6: [0: 0, 1: 1, 2: 0, 3: 0, 4: 0, 5: 1, 6: 1]

    Total cost: 0 + 0 = 0

  4. Inserting 2: We calculate the cost as the minimum of the count of numbers less than 2 (tree.query(1) which is 1) and the count of numbers greater than 2 (2 - the sum of counts for 1 and 5).

    So the cost for inserting 2 is 1, and we update the BIT for 2: [0: 0, 1: 1, 2: 1, 3: 0, 4: 0, 5: 1, 6: 1]

    Total cost: 0 + 0 + 1 = 1

The final total cost for inserting all elements is 1, and as we insert the elements one by one, we use the BIT to efficiently calculate the cost rather than querying the nums array directly which would be less efficient. The final cost is outputted as the answer, which in this case remains 1 (since it's under the modulo 10^9 + 7).

Python Solution

1from typing import List
2
3class BinaryIndexedTree:
4    def __init__(self, size: int):
5        self.size = size
6        # Initialize the tree with 0 values, +1 because index 0 is not used
7        self.tree = [0] * (size + 1)
8
9    def update(self, index: int, value: int):
10        # Updates the tree with 'value' at the position 'index'
11        while index <= self.size:
12            self.tree[index] += value
13            # Move to next element to update
14            index += index & -index
15
16    def query(self, index: int) -> int:
17        # Queries the cumulative frequency up to the index
18        result = 0
19        while index:
20            result += self.tree[index]
21            # Move to parent element
22            index -= index & -index
23        return result
24
25
26class Solution:
27    def createSortedArray(self, instructions: List[int]) -> int:
28        # Find the maximum value in the instructions list for tree size
29        max_value = max(instructions)
30        # Initialize the Binary Indexed Tree with size max_value
31        tree = BinaryIndexedTree(max_value)
32        cost_sum = 0
33        # A large number for modulo operation to ensure result fits in 32-bit integer
34        modulo = 10**9 + 7
35      
36        # Iterate over the instructions to calculate the cost
37        for i, number in enumerate(instructions):
38            # Calculate the cost as the minimum of numbers less than the current number
39            # and numbers greater than it that came before the current (i-th) position
40            cost = min(tree.query(number - 1), i - tree.query(number))
41            cost_sum += cost
42            # Update the tree with current number
43            tree.update(number, 1)
44      
45        # Return the total cost modulo 10^9 + 7 to avoid large numbers
46        return cost_sum % modulo
47

Java Solution

1class BinaryIndexedTree {
2    private int size; // Size of the array
3    private int[] tree; // The Binary Indexed Tree (also, called Fenwick Tree)
4
5    // Constructor
6    public BinaryIndexedTree(int size) {
7        this.size = size;
8        this.tree = new int[size + 1];
9    }
10
11    // Updates the tree with an integer value 'v' at position 'x'
12    public void update(int x, int v) {
13        while (x <= size) {
14            tree[x] += v; // Increment the value at index x
15            x += x & -x; // Move to the next index to update
16        }
17    }
18
19    // Queries the sum of the first 'x' elements in the tree
20    public int query(int x) {
21        int sum = 0;
22        while (x > 0) {
23            sum += tree[x]; // Add the value at index x to the sum
24            x -= x & -x; // Move to the previous index to continue querying
25        }
26        return sum;
27    }
28}
29
30class Solution {
31    // Method to create a sorted array and return the minimum cost to do so
32    public int createSortedArray(int[] instructions) {
33        int maxValue = 0;
34        // Find the maximum value in instructions to determine the size of BinaryIndexedTree
35        for (int x : instructions) {
36            maxValue = Math.max(maxValue, x);
37        }
38        BinaryIndexedTree tree = new BinaryIndexedTree(maxValue);
39      
40        int totalCost = 0; // Initialize the total cost of insertion operations
41        final int mod = (int) 1e9 + 7; // The modulus value to ensure the result is within the bounds of an integer
42        for (int i = 0; i < instructions.length; ++i) {
43            int x = instructions[i];
44            // Cost is the minimum of the number of elements smaller than 'x' and the
45            // number of elements greater than 'x' that are already in the sorted array
46            int cost = Math.min(tree.query(x - 1), i - tree.query(x));
47            totalCost = (totalCost + cost) % mod; // Update the total cost while keeping it within the int bounds
48            tree.update(x, 1); // Insert 'x' by updating the tree
49        }
50        return totalCost; // Return the computed total cost
51    }
52}
53

C++ Solution

1#include <vector>
2#include <algorithm>
3
4using std::vector;
5using std::min;
6using std::max_element;
7
8// BinaryIndexedTree (also known as a Fenwick Tree) is a data structure that provides efficient methods for cumulative frequency queries and updates.
9class BinaryIndexedTree {
10public:
11    // Constructor to initialize a BinaryIndexedTree of size n.
12    explicit BinaryIndexedTree(int size)
13        : size_(size)
14        , tree_(size + 1, 0) {} // tree_ array holds the binary indexed tree with an extra element for simplicity.
15  
16    // Function to update the tree with a value at a given index.
17    void update(int index, int delta) {
18        while (index <= size_) {
19            tree_[index] += delta;
20            index += index & -index; // Climb up the tree by adding the least significant bit (LSB).
21        }
22    }
23  
24    // Query the cumulative frequency up to and including index.
25    int query(int index) {
26        int sum = 0;
27        while (index > 0) {
28            sum += tree_[index];
29            index -= index & -index; // Move to parent by subtracting the least significant bit (LSB).
30        }
31        return sum;
32    }
33
34private:
35    int size_;              // Represents the size of the input array for which the tree is built.
36    vector<int> tree_;      // The representation of the Binary Indexed Tree.
37};
38
39class Solution {
40public:
41    // Function to create a sorted array and calculate the cost of sorting.
42    int createSortedArray(vector<int>& instructions) {
43        int maxElement = *max_element(instructions.begin(), instructions.end()); // Find the maximum element in the instructions.
44        BinaryIndexedTree tree(maxElement); // Initialize a BinaryIndexedTree with size equal to the max element.
45        const int mod = 1e9 + 7; // Modulo value for output to prevent integer overflow.
46        int cost = 0; // Variable to store the total cost of inserting elements.
47      
48        for (int i = 0; i < instructions.size(); ++i) {
49            int num = instructions[i]; // Current instruction/element.
50          
51            // Calculate the cost of the current instruction:
52            // It is the minimum of all elements less than the current
53            // or all elements greater than the current that have already been processed.
54            int currentCost = min(tree.query(num - 1), i - tree.query(num));
55          
56            cost = (cost + currentCost) % mod; // Update the cost with modulo operation to keep it within bounds.
57            tree.update(num, 1); // Update the BinaryIndexedTree after processing an element.
58        }
59      
60        return cost; // Return the total cost of creating the sorted array.
61    }
62};
63

Typescript Solution

1// Define the size of the Binary Indexed Tree (BIT) and initialize the array.
2let size: number;
3let bitArray: number[];
4
5// Initialize the BIT with the given size.
6function initializeBIT(n: number): void {
7    size = n;
8    bitArray = new Array(n + 1).fill(0);
9}
10
11// Update the BIT at index 'index' with value 'value'.
12// This ultimately represents adding 'value' to all elements in the original array represented by the BIT at 'index' and beyond.
13function updateBIT(index: number, value: number): void {
14    while (index <= size) {
15        bitArray[index] += value;
16        index += index & -index;
17    }
18}
19
20// Query the BIT up to index 'index' to get the prefix sum of the original array.
21// This gives us the sum of elements from the start of the original array up to 'index'.
22function queryBIT(index: number): number {
23    let sum = 0;
24    while (index > 0) {
25        sum += bitArray[index];
26        index -= index & -index;
27    }
28    return sum;
29}
30
31// Function to create a sorted array following the instructions and calculate the cost.
32// 'instructions' is an array where 'instructions[i]' is the number to be added at step 'i'.
33// The cost is calculated as the minimum between the number of elements already in the array that are less than the current number
34// and the number of elements that are greater than the current number.
35function createSortedArray(instructions: number[]): number {
36    // Maximum number from the instructions to define the size of the BIT.
37    const maxElement = Math.max(...instructions);
38
39    // Initialize the BIT with the maximum element.
40    initializeBIT(maxElement);
41    let totalCost = 0;
42    const mod = 10 ** 9 + 7; // Modulus to ensure the result stays within integer limits after each addition.
43
44    // Process each instruction and calculate cost.
45    instructions.forEach((value, index) => {
46        // Calculate the cost of placing value into its correct position.
47        const cost = Math.min(queryBIT(value - 1), index - queryBIT(value));
48        // Update total cost, ensuring it does not exceed the modulus.
49        totalCost = (totalCost + cost) % mod;
50        // Update BIT to include the current value.
51        updateBIT(value, 1);
52    });
53
54    // Return the total cost after all instructions have been processed.
55    return totalCost;
56}
57

Time and Space Complexity

The provided code defines a BinaryIndexedTree (also known as a Fenwick Tree) and uses it in a Solution class to determine the cost of creating a sorted array from the given instructions.

Time Complexity

The main operations involved are update and query operations on the Binary Indexed Tree.

  • The update method has a time complexity of O(log n), where n is the size of the tree (or the maximum number in the instructions). This is because in the worst case, updating a single element requires updating all the tree levels that the element contributes to, and since a Binary Indexed Tree is a binary tree, there are O(log n) levels.

  • Similarly, the query method also has a time complexity of O(log n) for the same reasoning as the update method.

These operations are performed for every element in instructions, leading to a total time complexity of O(m * log n), where m is the number of elements in instructions and n is the highest value in instructions.

Space Complexity

The space complexity is determined by the storage requirements of the Binary Indexed Tree.

  • The Binary Indexed Tree c array has a size of n + 1, where n is the maximum number in instructions. Thus, the space complexity is O(n).

So overall the time complexity of the provided solution is O(m * log n) and the space complexity is O(n).

😈
Become an
Algo Monster

Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫