Facebook Pixel

1722. Minimize Hamming Distance After Swap Operations

Problem Description

You have two integer arrays source and target, both with the same length n. You're also given an array allowedSwaps where each element allowedSwaps[i] = [a_i, b_i] represents that you can swap the elements at positions a_i and b_i in the source array (using 0-based indexing). You can perform these swaps multiple times and in any order you want.

The Hamming distance between two arrays of equal length is the count of positions where the elements differ. In other words, it's the number of indices i where source[i] != target[i].

Your task is to find the minimum Hamming distance between source and target after performing any number of allowed swap operations on the source array.

For example, if you can swap elements at indices that are connected through the allowed swaps, you want to rearrange the source array optimally to match as many positions with target as possible. The positions that still don't match after optimal swapping contribute to the Hamming distance.

The key insight is that through the allowed swaps, certain indices become interconnected - if you can swap index a with b, and b with c, then elements at positions a, b, and c can be rearranged among themselves in any order. This forms groups of indices where elements can be freely rearranged within each group.

Flowchart Walkthrough

First, let's pin down the algorithm using the Flowchart. Here's a step-by-step walkthrough:

Is it a graph?

  • Yes: The problem can be modeled as a graph where each index is a node, and each allowed swap [a_i, b_i] creates an edge between nodes a_i and b_i. The swaps create connections between indices.

Is it a tree?

  • No: The graph formed by allowed swaps is not necessarily a tree. It can have cycles (e.g., if we can swap indices 0↔1, 1↔2, and 2↔0, this forms a cycle) and may have multiple disconnected components.

Is the problem related to directed acyclic graphs?

  • No: The swap relationships are bidirectional (if we can swap a with b, we can also swap b with a), so this is an undirected graph, not a DAG.

Is the problem related to shortest paths?

  • No: We're not trying to find the shortest path between nodes. Instead, we need to identify which indices belong to the same connected component.

Does the problem involve connectivity?

  • Yes: The core insight is that indices connected through a chain of allowed swaps form connected components. Within each component, elements can be rearranged freely among those positions.

Disjoint Set Union

  • Yes: DSU (Union-Find) is the perfect data structure for this problem. It efficiently groups indices into connected components based on the allowed swaps.

Conclusion: The flowchart leads us to use Disjoint Set Union (DSU) to identify connected components. While DSU isn't strictly DFS, it solves the same connectivity problem that DFS would solve (finding connected components), but more efficiently for this use case. We could alternatively use DFS to explore each connected component, but DSU provides a cleaner solution for grouping indices that can exchange elements through swaps.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is to recognize that the allowed swaps create groups of positions where elements can move freely among themselves. If we can swap position i with j, and position j with k, then through a series of swaps, we can move any element from positions i, j, or k to any of these three positions.

Think of it this way: imagine you have three cups at positions 0, 1, and 2. If you can swap cups at positions 0↔1 and 1↔2, then you can arrange the three cups in any order across these three positions. This forms a "connected component" where the elements are interchangeable.

Once we identify these connected components, the problem becomes much simpler. For each component:

  1. Collect all the elements from source at positions in this component
  2. Collect all the elements from target at positions in this component
  3. Match as many elements as possible between these two collections

For example, if a component has positions [0, 2, 3]:

  • From source: we might have elements [1, 5, 3] at these positions
  • From target: we might have elements [3, 5, 2] at these positions

Since we can rearrange source elements freely within the component, we can match 3 with 3 and 5 with 5. The element 1 from source can't match with 2 from target, contributing 1 to the Hamming distance.

The mathematical approach is to count the frequency of each element in both arrays within each component. If source has three 5's in a component but target only needs two 5's, we can match two of them. The unmatched elements contribute to the minimum Hamming distance.

This transforms the problem from "finding optimal swaps" to "finding connected components and matching elements within each component" - a much more tractable problem that we can solve efficiently using Union-Find to identify components and hash tables to count element frequencies.

Learn more about Depth-First Search and Union Find patterns.

Solution Approach

The solution uses Union-Find (Disjoint Set Union) to group indices into connected components, followed by frequency counting within each component.

Step 1: Initialize Union-Find Structure

n = len(source)
p = list(range(n))

We create a parent array p where initially each index is its own parent (each index forms its own component).

Step 2: Build Connected Components

for a, b in allowedSwaps:
    p[find(a)] = find(b)

For each allowed swap [a, b], we union the two indices by connecting their root parents. The find function with path compression ensures efficient lookups:

def find(x: int) -> int:
    if p[x] != x:
        p[x] = find(p[x])  # Path compression
    return p[x]

Step 3: Count Source Elements by Component

cnt = defaultdict(Counter)
for i, x in enumerate(source):
    j = find(i)
    cnt[j][x] += 1

We use a two-dimensional hash table cnt where:

  • First dimension: component ID (root parent)
  • Second dimension: element value frequency counter

For each position i in source, we find its component ID j and increment the count of element source[i] in that component.

Step 4: Match with Target Elements

ans = 0
for i, x in enumerate(target):
    j = find(i)
    cnt[j][x] -= 1
    ans += cnt[j][x] < 0

For each position i in target:

  • Find its component ID j
  • Decrement the count of element target[i] in that component
  • If the count goes negative, it means we don't have enough of this element in source to match, so we increment the answer

The clever part is that we're essentially doing a "greedy matching" - for each element needed in target, we try to use an available element from source within the same component. If cnt[j][x] becomes negative, it means we've exhausted all available copies of element x in component j, and this position contributes to the Hamming distance.

Time Complexity: O(n * α(n) + m) where n is array length, m is the number of allowed swaps, and α is the inverse Ackermann function (practically constant).

Space Complexity: O(n) for the Union-Find structure and the frequency counters.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a concrete example to understand the solution approach.

Given:

  • source = [1, 2, 3, 4]
  • target = [2, 1, 4, 5]
  • allowedSwaps = [[0, 1], [2, 3]]

Step 1: Initialize Union-Find

p = [0, 1, 2, 3]  # Each index is its own parent initially

Step 2: Build Connected Components

Processing [0, 1]:

  • find(0) = 0, find(1) = 1
  • Set p[1] = 0 (connect index 1 to index 0)
  • Now p = [0, 0, 2, 3]

Processing [2, 3]:

  • find(2) = 2, find(3) = 3
  • Set p[3] = 2 (connect index 3 to index 2)
  • Now p = [0, 0, 2, 2]

We have two components:

  • Component 0: indices {0, 1}
  • Component 2: indices {2, 3}

Step 3: Count Source Elements by Component

Processing source array [1, 2, 3, 4]:

  • Index 0: source[0] = 1, find(0) = 0cnt[0][1] = 1
  • Index 1: source[1] = 2, find(1) = 0cnt[0][2] = 1
  • Index 2: source[2] = 3, find(2) = 2cnt[2][3] = 1
  • Index 3: source[3] = 4, find(3) = 2cnt[2][4] = 1

Current state:

cnt[0] = {1: 1, 2: 1}  # Component 0 has one 1 and one 2
cnt[2] = {3: 1, 4: 1}  # Component 2 has one 3 and one 4

Step 4: Match with Target Elements

Processing target array [2, 1, 4, 5]:

  • Index 0: target[0] = 2, find(0) = 0

    • cnt[0][2] = 1 - 1 = 0 (we have a 2 to match!)
    • 0 >= 0, so no increment to answer
  • Index 1: target[1] = 1, find(1) = 0

    • cnt[0][1] = 1 - 1 = 0 (we have a 1 to match!)
    • 0 >= 0, so no increment to answer
  • Index 2: target[2] = 4, find(2) = 2

    • cnt[2][4] = 1 - 1 = 0 (we have a 4 to match!)
    • 0 >= 0, so no increment to answer
  • Index 3: target[3] = 5, find(3) = 2

    • cnt[2][5] = 0 - 1 = -1 (we don't have a 5 in this component!)
    • -1 < 0, so increment answer: ans = 1

Result: Minimum Hamming distance = 1

Why this works:

  • In component 0 (indices 0,1), we have elements {1,2} in source and need {2,1} in target. Perfect match possible through swapping!
  • In component 2 (indices 2,3), we have elements {3,4} in source but need {4,5} in target. We can match the 4, but we have a 3 where we need a 5, creating one mismatch.

The algorithm efficiently identifies that we can optimally rearrange to match everything except position 3, giving us the minimum Hamming distance of 1.

Solution Implementation

1from typing import List
2from collections import defaultdict, Counter
3
4class Solution:
5    def minimumHammingDistance(
6        self, source: List[int], target: List[int], allowedSwaps: List[List[int]]
7    ) -> int:
8        """
9        Calculate the minimum Hamming distance between source and target arrays
10        after performing allowed swaps.
11      
12        Args:
13            source: The source array
14            target: The target array to match
15            allowedSwaps: List of index pairs that can be swapped
16          
17        Returns:
18            The minimum Hamming distance achievable
19        """
20      
21        def find_root(index: int) -> int:
22            """
23            Find the root parent of an index using path compression.
24          
25            Args:
26                index: The index to find the root for
27              
28            Returns:
29                The root parent index
30            """
31            if parent[index] != index:
32                # Path compression: directly connect to root
33                parent[index] = find_root(parent[index])
34            return parent[index]
35      
36        # Initialize Union-Find structure
37        array_length = len(source)
38        parent = list(range(array_length))  # Each element is initially its own parent
39      
40        # Union operation: Connect all swappable indices into groups
41        for index_a, index_b in allowedSwaps:
42            # Connect the two indices by making them share the same root
43            parent[find_root(index_a)] = find_root(index_b)
44      
45        # Count frequency of values in each connected component for source array
46        component_value_counts = defaultdict(Counter)
47        for index, value in enumerate(source):
48            root = find_root(index)
49            component_value_counts[root][value] += 1
50      
51        # Calculate minimum Hamming distance
52        hamming_distance = 0
53        for index, value in enumerate(target):
54            root = find_root(index)
55            # Decrement the count of this value in its component
56            component_value_counts[root][value] -= 1
57            # If count goes negative, this position contributes to Hamming distance
58            if component_value_counts[root][value] < 0:
59                hamming_distance += 1
60      
61        return hamming_distance
62
1class Solution {
2    // Parent array for Union-Find (Disjoint Set Union) data structure
3    private int[] parent;
4
5    public int minimumHammingDistance(int[] source, int[] target, int[][] allowedSwaps) {
6        int n = source.length;
7      
8        // Initialize Union-Find: each element is its own parent initially
9        parent = new int[n];
10        for (int i = 0; i < n; i++) {
11            parent[i] = i;
12        }
13      
14        // Union operation: connect indices that can be swapped
15        // All indices in the same connected component can be freely rearranged
16        for (int[] swap : allowedSwaps) {
17            int index1 = swap[0];
18            int index2 = swap[1];
19            parent[find(index1)] = find(index2);
20        }
21      
22        // Map to store frequency of values in each connected component
23        // Key: root of component, Value: map of (value -> frequency) in source array
24        Map<Integer, Map<Integer, Integer>> componentFrequency = new HashMap<>();
25      
26        // Count frequency of each value in source array for each component
27        for (int i = 0; i < n; i++) {
28            int root = find(i);
29            componentFrequency
30                .computeIfAbsent(root, k -> new HashMap<>())
31                .merge(source[i], 1, Integer::sum);
32        }
33      
34        // Calculate minimum Hamming distance
35        int hammingDistance = 0;
36      
37        // For each position in target array, check if the value can be matched
38        // by swapping within the same connected component
39        for (int i = 0; i < n; i++) {
40            int root = find(i);
41            Map<Integer, Integer> frequencyMap = componentFrequency.get(root);
42          
43            // Decrement frequency of target[i] in the component
44            // If frequency becomes negative, it means this value cannot be matched
45            if (frequencyMap.merge(target[i], -1, Integer::sum) < 0) {
46                hammingDistance++;
47            }
48        }
49      
50        return hammingDistance;
51    }
52
53    // Find operation with path compression for Union-Find
54    private int find(int x) {
55        if (parent[x] != x) {
56            // Path compression: make all nodes point directly to root
57            parent[x] = find(parent[x]);
58        }
59        return parent[x];
60    }
61}
62
1class Solution {
2public:
3    int minimumHammingDistance(vector<int>& source, vector<int>& target, vector<vector<int>>& allowedSwaps) {
4        int n = source.size();
5      
6        // Initialize parent array for Union-Find (Disjoint Set Union)
7        vector<int> parent(n);
8        iota(parent.begin(), parent.end(), 0); // Initialize each element as its own parent
9      
10        // Find function with path compression for Union-Find
11        function<int(int)> findRoot = [&](int x) {
12            if (x == parent[x]) {
13                return x;
14            }
15            // Path compression: make every node point directly to root
16            return parent[x] = findRoot(parent[x]);
17        };
18      
19        // Union operation: merge connected components based on allowed swaps
20        for (auto& swap : allowedSwaps) {
21            int root1 = findRoot(swap[0]);
22            int root2 = findRoot(swap[1]);
23            parent[root1] = root2; // Union the two components
24        }
25      
26        // Count frequency of each value in source array for each connected component
27        // Key: component root, Value: map of (value -> frequency)
28        unordered_map<int, unordered_map<int, int>> componentValueCount;
29        for (int i = 0; i < n; ++i) {
30            int root = findRoot(i);
31            componentValueCount[root][source[i]]++;
32        }
33      
34        // Calculate minimum Hamming distance
35        int hammingDistance = 0;
36        for (int i = 0; i < n; ++i) {
37            int root = findRoot(i);
38            // Decrement the count of target[i] in the component
39            // If count becomes negative, it means we don't have enough of this value
40            // in the component to match the target
41            if (--componentValueCount[root][target[i]] < 0) {
42                hammingDistance++;
43            }
44        }
45      
46        return hammingDistance;
47    }
48};
49
1/**
2 * Calculates the minimum Hamming distance between source and target arrays
3 * after performing allowed swaps using Union-Find algorithm
4 * @param source - The source array of integers
5 * @param target - The target array of integers
6 * @param allowedSwaps - Array of pairs representing indices that can be swapped
7 * @returns The minimum Hamming distance achievable
8 */
9function minimumHammingDistance(
10    source: number[],
11    target: number[],
12    allowedSwaps: number[][],
13): number {
14    const arrayLength: number = source.length;
15  
16    // Initialize parent array for Union-Find data structure
17    // Each element initially points to itself as parent
18    const parent: number[] = Array.from({ length: arrayLength }, (_, index) => index);
19  
20    /**
21     * Find operation with path compression for Union-Find
22     * Finds the root parent of a given element
23     * @param element - The element to find root parent for
24     * @returns The root parent of the element
25     */
26    const findRoot = (element: number): number => {
27        // Path compression: make every node point directly to root
28        if (parent[element] !== element) {
29            parent[element] = findRoot(parent[element]);
30        }
31        return parent[element];
32    };
33  
34    // Union operation: connect all swappable indices into connected components
35    for (const [indexA, indexB] of allowedSwaps) {
36        // Connect the roots of both indices
37        parent[findRoot(indexA)] = findRoot(indexB);
38    }
39  
40    // Map to store frequency of values in each connected component
41    // Key: root parent, Value: Map of (value -> frequency)
42    const componentValueFrequency: Map<number, Map<number, number>> = new Map();
43  
44    // Count frequency of source values in each connected component
45    for (let i = 0; i < arrayLength; i++) {
46        const rootParent = findRoot(i);
47      
48        // Initialize frequency map for this component if not exists
49        if (!componentValueFrequency.has(rootParent)) {
50            componentValueFrequency.set(rootParent, new Map());
51        }
52      
53        const frequencyMap = componentValueFrequency.get(rootParent)!;
54        // Increment frequency count for source value at index i
55        frequencyMap.set(source[i], (frequencyMap.get(source[i]) ?? 0) + 1);
56    }
57  
58    let minimumDistance = 0;
59  
60    // Check target values and calculate mismatches
61    for (let i = 0; i < arrayLength; i++) {
62        const rootParent = findRoot(i);
63        const frequencyMap = componentValueFrequency.get(rootParent)!;
64      
65        // Decrement frequency count for target value at index i
66        frequencyMap.set(target[i], (frequencyMap.get(target[i]) ?? 0) - 1);
67      
68        // If frequency becomes negative, it means this target value
69        // doesn't have a matching source value in the same component
70        if (frequencyMap.get(target[i])! < 0) {
71            minimumDistance++;
72        }
73    }
74  
75    return minimumDistance;
76}
77

Time and Space Complexity

Time Complexity: O(n × α(n)) where n is the length of the arrays and α(n) is the inverse Ackermann function.

  • The Union-Find operations (find and union) take O(α(n)) amortized time per operation due to path compression
  • Processing allowedSwaps: For each swap pair [a, b], we perform two find operations and one union operation, resulting in O(m × α(n)) where m is the number of allowed swaps
  • First loop through source array: O(n × α(n)) as we call find for each element and update the counter
  • Second loop through target array: O(n × α(n)) as we call find for each element and update the counter
  • Overall: O((m + n) × α(n)), which simplifies to O(n × α(n)) when m ≤ n

Space Complexity: O(n)

  • Parent array p: O(n) space
  • Counter dictionary cnt: In the worst case, stores up to n elements distributed across different components, resulting in O(n) space
  • Other variables use O(1) space
  • Total space complexity: O(n)

Learn more about how to find time and space complexity quickly.

Common Pitfalls

Pitfall 1: Incorrect Union-Find Implementation Without Path Compression

Problem: A common mistake is implementing the find function without path compression or implementing it incorrectly:

# Incorrect - No path compression
def find(x):
    while p[x] != x:
        x = p[x]
    return x

# Incorrect - Wrong path compression
def find(x):
    if p[x] != x:
        return find(p[x])  # Doesn't update p[x]
    return p[x]

Without proper path compression, the Union-Find operations can degrade to O(n) time complexity in the worst case, making the overall solution inefficient for large inputs.

Solution: Always implement path compression by updating the parent pointer during the recursive call:

def find(x):
    if p[x] != x:
        p[x] = find(p[x])  # Update parent to root directly
    return p[x]

Pitfall 2: Misunderstanding the Matching Logic

Problem: Some might try to match elements greedily position by position without considering the entire component:

# Incorrect approach
ans = 0
for i in range(n):
    root = find(i)
    if source[i] != target[i]:
        # Try to find target[i] in the same component
        # This doesn't guarantee optimal matching!
        ans += 1

This approach fails because it doesn't account for the fact that we need to optimally match ALL elements within a component simultaneously.

Solution: Use the frequency counting approach where we track available elements in each component and consume them as needed:

# Correct approach - tracks availability across the entire component
for i, x in enumerate(target):
    j = find(i)
    cnt[j][x] -= 1
    ans += cnt[j][x] < 0  # Only count if we've exhausted this value

Pitfall 3: Not Handling Empty allowedSwaps Array

Problem: If allowedSwaps is empty, each index is its own component. Some implementations might not handle this edge case properly, especially if they assume swaps always exist.

Solution: The Union-Find initialization naturally handles this case:

parent = list(range(n))  # Each index starts as its own component
# If allowedSwaps is empty, the loop doesn't execute
for a, b in allowedSwaps:
    parent[find(a)] = find(b)

Pitfall 4: Using Wrong Data Structure for Frequency Counting

Problem: Using a simple dictionary instead of defaultdict(Counter) can lead to KeyError exceptions:

# Incorrect - prone to KeyError
cnt = {}
for i, x in enumerate(source):
    j = find(i)
    if j not in cnt:
        cnt[j] = {}
    if x not in cnt[j]:
        cnt[j][x] = 0
    cnt[j][x] += 1

Solution: Use defaultdict(Counter) for cleaner and safer code:

cnt = defaultdict(Counter)
for i, x in enumerate(source):
    j = find(i)
    cnt[j][x] += 1  # Automatically handles missing keys

Pitfall 5: Confusing Component ID with Array Index

Problem: Some might incorrectly use the array index directly instead of finding its component root:

# Incorrect - uses index directly instead of component root
for i, x in enumerate(source):
    cnt[i][x] += 1  # Should be cnt[find(i)][x] += 1

This completely breaks the component grouping logic.

Solution: Always use find(i) to get the component ID:

for i, x in enumerate(source):
    j = find(i)  # Get component root first
    cnt[j][x] += 1
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Consider the classic dynamic programming of fibonacci numbers, what is the recurrence relation?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More