1722. Minimize Hamming Distance After Swap Operations
Problem Description
You have two integer arrays source
and target
, both with the same length n
. You're also given an array allowedSwaps
where each element allowedSwaps[i] = [a_i, b_i]
represents that you can swap the elements at positions a_i
and b_i
in the source
array (using 0-based indexing). You can perform these swaps multiple times and in any order you want.
The Hamming distance between two arrays of equal length is the count of positions where the elements differ. In other words, it's the number of indices i
where source[i] != target[i]
.
Your task is to find the minimum Hamming distance between source
and target
after performing any number of allowed swap operations on the source
array.
For example, if you can swap elements at indices that are connected through the allowed swaps, you want to rearrange the source
array optimally to match as many positions with target
as possible. The positions that still don't match after optimal swapping contribute to the Hamming distance.
The key insight is that through the allowed swaps, certain indices become interconnected - if you can swap index a
with b
, and b
with c
, then elements at positions a
, b
, and c
can be rearranged among themselves in any order. This forms groups of indices where elements can be freely rearranged within each group.
Flowchart Walkthrough
First, let's pin down the algorithm using the Flowchart. Here's a step-by-step walkthrough:
Is it a graph?
- Yes: The problem can be modeled as a graph where each index is a node, and each allowed swap
[a_i, b_i]
creates an edge between nodesa_i
andb_i
. The swaps create connections between indices.
Is it a tree?
- No: The graph formed by allowed swaps is not necessarily a tree. It can have cycles (e.g., if we can swap indices 0↔1, 1↔2, and 2↔0, this forms a cycle) and may have multiple disconnected components.
Is the problem related to directed acyclic graphs?
- No: The swap relationships are bidirectional (if we can swap
a
withb
, we can also swapb
witha
), so this is an undirected graph, not a DAG.
Is the problem related to shortest paths?
- No: We're not trying to find the shortest path between nodes. Instead, we need to identify which indices belong to the same connected component.
Does the problem involve connectivity?
- Yes: The core insight is that indices connected through a chain of allowed swaps form connected components. Within each component, elements can be rearranged freely among those positions.
Disjoint Set Union
- Yes: DSU (Union-Find) is the perfect data structure for this problem. It efficiently groups indices into connected components based on the allowed swaps.
Conclusion: The flowchart leads us to use Disjoint Set Union (DSU) to identify connected components. While DSU isn't strictly DFS, it solves the same connectivity problem that DFS would solve (finding connected components), but more efficiently for this use case. We could alternatively use DFS to explore each connected component, but DSU provides a cleaner solution for grouping indices that can exchange elements through swaps.
Intuition
The key insight is to recognize that the allowed swaps create groups of positions where elements can move freely among themselves. If we can swap position i
with j
, and position j
with k
, then through a series of swaps, we can move any element from positions i
, j
, or k
to any of these three positions.
Think of it this way: imagine you have three cups at positions 0, 1, and 2. If you can swap cups at positions 0↔1 and 1↔2, then you can arrange the three cups in any order across these three positions. This forms a "connected component" where the elements are interchangeable.
Once we identify these connected components, the problem becomes much simpler. For each component:
- Collect all the elements from
source
at positions in this component - Collect all the elements from
target
at positions in this component - Match as many elements as possible between these two collections
For example, if a component has positions [0, 2, 3]
:
- From
source
: we might have elements[1, 5, 3]
at these positions - From
target
: we might have elements[3, 5, 2]
at these positions
Since we can rearrange source
elements freely within the component, we can match 3
with 3
and 5
with 5
. The element 1
from source can't match with 2
from target, contributing 1 to the Hamming distance.
The mathematical approach is to count the frequency of each element in both arrays within each component. If source
has three 5's in a component but target
only needs two 5's, we can match two of them. The unmatched elements contribute to the minimum Hamming distance.
This transforms the problem from "finding optimal swaps" to "finding connected components and matching elements within each component" - a much more tractable problem that we can solve efficiently using Union-Find to identify components and hash tables to count element frequencies.
Learn more about Depth-First Search and Union Find patterns.
Solution Approach
The solution uses Union-Find (Disjoint Set Union) to group indices into connected components, followed by frequency counting within each component.
Step 1: Initialize Union-Find Structure
n = len(source)
p = list(range(n))
We create a parent array p
where initially each index is its own parent (each index forms its own component).
Step 2: Build Connected Components
for a, b in allowedSwaps: p[find(a)] = find(b)
For each allowed swap [a, b]
, we union the two indices by connecting their root parents. The find
function with path compression ensures efficient lookups:
def find(x: int) -> int:
if p[x] != x:
p[x] = find(p[x]) # Path compression
return p[x]
Step 3: Count Source Elements by Component
cnt = defaultdict(Counter)
for i, x in enumerate(source):
j = find(i)
cnt[j][x] += 1
We use a two-dimensional hash table cnt
where:
- First dimension: component ID (root parent)
- Second dimension: element value frequency counter
For each position i
in source
, we find its component ID j
and increment the count of element source[i]
in that component.
Step 4: Match with Target Elements
ans = 0
for i, x in enumerate(target):
j = find(i)
cnt[j][x] -= 1
ans += cnt[j][x] < 0
For each position i
in target
:
- Find its component ID
j
- Decrement the count of element
target[i]
in that component - If the count goes negative, it means we don't have enough of this element in
source
to match, so we increment the answer
The clever part is that we're essentially doing a "greedy matching" - for each element needed in target
, we try to use an available element from source
within the same component. If cnt[j][x]
becomes negative, it means we've exhausted all available copies of element x
in component j
, and this position contributes to the Hamming distance.
Time Complexity: O(n * α(n) + m)
where n
is array length, m
is the number of allowed swaps, and α
is the inverse Ackermann function (practically constant).
Space Complexity: O(n)
for the Union-Find structure and the frequency counters.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a concrete example to understand the solution approach.
Given:
source = [1, 2, 3, 4]
target = [2, 1, 4, 5]
allowedSwaps = [[0, 1], [2, 3]]
Step 1: Initialize Union-Find
p = [0, 1, 2, 3] # Each index is its own parent initially
Step 2: Build Connected Components
Processing [0, 1]
:
find(0) = 0
,find(1) = 1
- Set
p[1] = 0
(connect index 1 to index 0) - Now
p = [0, 0, 2, 3]
Processing [2, 3]
:
find(2) = 2
,find(3) = 3
- Set
p[3] = 2
(connect index 3 to index 2) - Now
p = [0, 0, 2, 2]
We have two components:
- Component 0: indices {0, 1}
- Component 2: indices {2, 3}
Step 3: Count Source Elements by Component
Processing source array [1, 2, 3, 4]
:
- Index 0:
source[0] = 1
,find(0) = 0
→cnt[0][1] = 1
- Index 1:
source[1] = 2
,find(1) = 0
→cnt[0][2] = 1
- Index 2:
source[2] = 3
,find(2) = 2
→cnt[2][3] = 1
- Index 3:
source[3] = 4
,find(3) = 2
→cnt[2][4] = 1
Current state:
cnt[0] = {1: 1, 2: 1} # Component 0 has one 1 and one 2 cnt[2] = {3: 1, 4: 1} # Component 2 has one 3 and one 4
Step 4: Match with Target Elements
Processing target array [2, 1, 4, 5]
:
-
Index 0:
target[0] = 2
,find(0) = 0
cnt[0][2] = 1 - 1 = 0
(we have a 2 to match!)0 >= 0
, so no increment to answer
-
Index 1:
target[1] = 1
,find(1) = 0
cnt[0][1] = 1 - 1 = 0
(we have a 1 to match!)0 >= 0
, so no increment to answer
-
Index 2:
target[2] = 4
,find(2) = 2
cnt[2][4] = 1 - 1 = 0
(we have a 4 to match!)0 >= 0
, so no increment to answer
-
Index 3:
target[3] = 5
,find(3) = 2
cnt[2][5] = 0 - 1 = -1
(we don't have a 5 in this component!)-1 < 0
, so increment answer:ans = 1
Result: Minimum Hamming distance = 1
Why this works:
- In component 0 (indices 0,1), we have elements {1,2} in source and need {2,1} in target. Perfect match possible through swapping!
- In component 2 (indices 2,3), we have elements {3,4} in source but need {4,5} in target. We can match the 4, but we have a 3 where we need a 5, creating one mismatch.
The algorithm efficiently identifies that we can optimally rearrange to match everything except position 3, giving us the minimum Hamming distance of 1.
Solution Implementation
1from typing import List
2from collections import defaultdict, Counter
3
4class Solution:
5 def minimumHammingDistance(
6 self, source: List[int], target: List[int], allowedSwaps: List[List[int]]
7 ) -> int:
8 """
9 Calculate the minimum Hamming distance between source and target arrays
10 after performing allowed swaps.
11
12 Args:
13 source: The source array
14 target: The target array to match
15 allowedSwaps: List of index pairs that can be swapped
16
17 Returns:
18 The minimum Hamming distance achievable
19 """
20
21 def find_root(index: int) -> int:
22 """
23 Find the root parent of an index using path compression.
24
25 Args:
26 index: The index to find the root for
27
28 Returns:
29 The root parent index
30 """
31 if parent[index] != index:
32 # Path compression: directly connect to root
33 parent[index] = find_root(parent[index])
34 return parent[index]
35
36 # Initialize Union-Find structure
37 array_length = len(source)
38 parent = list(range(array_length)) # Each element is initially its own parent
39
40 # Union operation: Connect all swappable indices into groups
41 for index_a, index_b in allowedSwaps:
42 # Connect the two indices by making them share the same root
43 parent[find_root(index_a)] = find_root(index_b)
44
45 # Count frequency of values in each connected component for source array
46 component_value_counts = defaultdict(Counter)
47 for index, value in enumerate(source):
48 root = find_root(index)
49 component_value_counts[root][value] += 1
50
51 # Calculate minimum Hamming distance
52 hamming_distance = 0
53 for index, value in enumerate(target):
54 root = find_root(index)
55 # Decrement the count of this value in its component
56 component_value_counts[root][value] -= 1
57 # If count goes negative, this position contributes to Hamming distance
58 if component_value_counts[root][value] < 0:
59 hamming_distance += 1
60
61 return hamming_distance
62
1class Solution {
2 // Parent array for Union-Find (Disjoint Set Union) data structure
3 private int[] parent;
4
5 public int minimumHammingDistance(int[] source, int[] target, int[][] allowedSwaps) {
6 int n = source.length;
7
8 // Initialize Union-Find: each element is its own parent initially
9 parent = new int[n];
10 for (int i = 0; i < n; i++) {
11 parent[i] = i;
12 }
13
14 // Union operation: connect indices that can be swapped
15 // All indices in the same connected component can be freely rearranged
16 for (int[] swap : allowedSwaps) {
17 int index1 = swap[0];
18 int index2 = swap[1];
19 parent[find(index1)] = find(index2);
20 }
21
22 // Map to store frequency of values in each connected component
23 // Key: root of component, Value: map of (value -> frequency) in source array
24 Map<Integer, Map<Integer, Integer>> componentFrequency = new HashMap<>();
25
26 // Count frequency of each value in source array for each component
27 for (int i = 0; i < n; i++) {
28 int root = find(i);
29 componentFrequency
30 .computeIfAbsent(root, k -> new HashMap<>())
31 .merge(source[i], 1, Integer::sum);
32 }
33
34 // Calculate minimum Hamming distance
35 int hammingDistance = 0;
36
37 // For each position in target array, check if the value can be matched
38 // by swapping within the same connected component
39 for (int i = 0; i < n; i++) {
40 int root = find(i);
41 Map<Integer, Integer> frequencyMap = componentFrequency.get(root);
42
43 // Decrement frequency of target[i] in the component
44 // If frequency becomes negative, it means this value cannot be matched
45 if (frequencyMap.merge(target[i], -1, Integer::sum) < 0) {
46 hammingDistance++;
47 }
48 }
49
50 return hammingDistance;
51 }
52
53 // Find operation with path compression for Union-Find
54 private int find(int x) {
55 if (parent[x] != x) {
56 // Path compression: make all nodes point directly to root
57 parent[x] = find(parent[x]);
58 }
59 return parent[x];
60 }
61}
62
1class Solution {
2public:
3 int minimumHammingDistance(vector<int>& source, vector<int>& target, vector<vector<int>>& allowedSwaps) {
4 int n = source.size();
5
6 // Initialize parent array for Union-Find (Disjoint Set Union)
7 vector<int> parent(n);
8 iota(parent.begin(), parent.end(), 0); // Initialize each element as its own parent
9
10 // Find function with path compression for Union-Find
11 function<int(int)> findRoot = [&](int x) {
12 if (x == parent[x]) {
13 return x;
14 }
15 // Path compression: make every node point directly to root
16 return parent[x] = findRoot(parent[x]);
17 };
18
19 // Union operation: merge connected components based on allowed swaps
20 for (auto& swap : allowedSwaps) {
21 int root1 = findRoot(swap[0]);
22 int root2 = findRoot(swap[1]);
23 parent[root1] = root2; // Union the two components
24 }
25
26 // Count frequency of each value in source array for each connected component
27 // Key: component root, Value: map of (value -> frequency)
28 unordered_map<int, unordered_map<int, int>> componentValueCount;
29 for (int i = 0; i < n; ++i) {
30 int root = findRoot(i);
31 componentValueCount[root][source[i]]++;
32 }
33
34 // Calculate minimum Hamming distance
35 int hammingDistance = 0;
36 for (int i = 0; i < n; ++i) {
37 int root = findRoot(i);
38 // Decrement the count of target[i] in the component
39 // If count becomes negative, it means we don't have enough of this value
40 // in the component to match the target
41 if (--componentValueCount[root][target[i]] < 0) {
42 hammingDistance++;
43 }
44 }
45
46 return hammingDistance;
47 }
48};
49
1/**
2 * Calculates the minimum Hamming distance between source and target arrays
3 * after performing allowed swaps using Union-Find algorithm
4 * @param source - The source array of integers
5 * @param target - The target array of integers
6 * @param allowedSwaps - Array of pairs representing indices that can be swapped
7 * @returns The minimum Hamming distance achievable
8 */
9function minimumHammingDistance(
10 source: number[],
11 target: number[],
12 allowedSwaps: number[][],
13): number {
14 const arrayLength: number = source.length;
15
16 // Initialize parent array for Union-Find data structure
17 // Each element initially points to itself as parent
18 const parent: number[] = Array.from({ length: arrayLength }, (_, index) => index);
19
20 /**
21 * Find operation with path compression for Union-Find
22 * Finds the root parent of a given element
23 * @param element - The element to find root parent for
24 * @returns The root parent of the element
25 */
26 const findRoot = (element: number): number => {
27 // Path compression: make every node point directly to root
28 if (parent[element] !== element) {
29 parent[element] = findRoot(parent[element]);
30 }
31 return parent[element];
32 };
33
34 // Union operation: connect all swappable indices into connected components
35 for (const [indexA, indexB] of allowedSwaps) {
36 // Connect the roots of both indices
37 parent[findRoot(indexA)] = findRoot(indexB);
38 }
39
40 // Map to store frequency of values in each connected component
41 // Key: root parent, Value: Map of (value -> frequency)
42 const componentValueFrequency: Map<number, Map<number, number>> = new Map();
43
44 // Count frequency of source values in each connected component
45 for (let i = 0; i < arrayLength; i++) {
46 const rootParent = findRoot(i);
47
48 // Initialize frequency map for this component if not exists
49 if (!componentValueFrequency.has(rootParent)) {
50 componentValueFrequency.set(rootParent, new Map());
51 }
52
53 const frequencyMap = componentValueFrequency.get(rootParent)!;
54 // Increment frequency count for source value at index i
55 frequencyMap.set(source[i], (frequencyMap.get(source[i]) ?? 0) + 1);
56 }
57
58 let minimumDistance = 0;
59
60 // Check target values and calculate mismatches
61 for (let i = 0; i < arrayLength; i++) {
62 const rootParent = findRoot(i);
63 const frequencyMap = componentValueFrequency.get(rootParent)!;
64
65 // Decrement frequency count for target value at index i
66 frequencyMap.set(target[i], (frequencyMap.get(target[i]) ?? 0) - 1);
67
68 // If frequency becomes negative, it means this target value
69 // doesn't have a matching source value in the same component
70 if (frequencyMap.get(target[i])! < 0) {
71 minimumDistance++;
72 }
73 }
74
75 return minimumDistance;
76}
77
Time and Space Complexity
Time Complexity: O(n × α(n))
where n
is the length of the arrays and α(n)
is the inverse Ackermann function.
- The Union-Find operations (find and union) take
O(α(n))
amortized time per operation due to path compression - Processing
allowedSwaps
: For each swap pair[a, b]
, we perform two find operations and one union operation, resulting inO(m × α(n))
wherem
is the number of allowed swaps - First loop through source array:
O(n × α(n))
as we call find for each element and update the counter - Second loop through target array:
O(n × α(n))
as we call find for each element and update the counter - Overall:
O((m + n) × α(n))
, which simplifies toO(n × α(n))
whenm ≤ n
Space Complexity: O(n)
- Parent array
p
:O(n)
space - Counter dictionary
cnt
: In the worst case, stores up ton
elements distributed across different components, resulting inO(n)
space - Other variables use
O(1)
space - Total space complexity:
O(n)
Learn more about how to find time and space complexity quickly.
Common Pitfalls
Pitfall 1: Incorrect Union-Find Implementation Without Path Compression
Problem: A common mistake is implementing the find
function without path compression or implementing it incorrectly:
# Incorrect - No path compression
def find(x):
while p[x] != x:
x = p[x]
return x
# Incorrect - Wrong path compression
def find(x):
if p[x] != x:
return find(p[x]) # Doesn't update p[x]
return p[x]
Without proper path compression, the Union-Find operations can degrade to O(n) time complexity in the worst case, making the overall solution inefficient for large inputs.
Solution: Always implement path compression by updating the parent pointer during the recursive call:
def find(x):
if p[x] != x:
p[x] = find(p[x]) # Update parent to root directly
return p[x]
Pitfall 2: Misunderstanding the Matching Logic
Problem: Some might try to match elements greedily position by position without considering the entire component:
# Incorrect approach
ans = 0
for i in range(n):
root = find(i)
if source[i] != target[i]:
# Try to find target[i] in the same component
# This doesn't guarantee optimal matching!
ans += 1
This approach fails because it doesn't account for the fact that we need to optimally match ALL elements within a component simultaneously.
Solution: Use the frequency counting approach where we track available elements in each component and consume them as needed:
# Correct approach - tracks availability across the entire component
for i, x in enumerate(target):
j = find(i)
cnt[j][x] -= 1
ans += cnt[j][x] < 0 # Only count if we've exhausted this value
Pitfall 3: Not Handling Empty allowedSwaps Array
Problem: If allowedSwaps
is empty, each index is its own component. Some implementations might not handle this edge case properly, especially if they assume swaps always exist.
Solution: The Union-Find initialization naturally handles this case:
parent = list(range(n)) # Each index starts as its own component
# If allowedSwaps is empty, the loop doesn't execute
for a, b in allowedSwaps:
parent[find(a)] = find(b)
Pitfall 4: Using Wrong Data Structure for Frequency Counting
Problem: Using a simple dictionary instead of defaultdict(Counter)
can lead to KeyError exceptions:
# Incorrect - prone to KeyError
cnt = {}
for i, x in enumerate(source):
j = find(i)
if j not in cnt:
cnt[j] = {}
if x not in cnt[j]:
cnt[j][x] = 0
cnt[j][x] += 1
Solution: Use defaultdict(Counter)
for cleaner and safer code:
cnt = defaultdict(Counter)
for i, x in enumerate(source):
j = find(i)
cnt[j][x] += 1 # Automatically handles missing keys
Pitfall 5: Confusing Component ID with Array Index
Problem: Some might incorrectly use the array index directly instead of finding its component root:
# Incorrect - uses index directly instead of component root
for i, x in enumerate(source):
cnt[i][x] += 1 # Should be cnt[find(i)][x] += 1
This completely breaks the component grouping logic.
Solution: Always use find(i)
to get the component ID:
for i, x in enumerate(source):
j = find(i) # Get component root first
cnt[j][x] += 1
Consider the classic dynamic programming of fibonacci numbers, what is the recurrence relation?
Recommended Readings
https assets algo monster cover_photos dfs svg Depth First Search Prereqs Recursion Review problems recursion_intro Trees problems tree_intro With a solid understanding of recursion under our belts we are now ready to tackle one of the most useful techniques in coding interviews Depth First Search DFS As the name suggests
Union Find Disjoint Set Union Data Structure Introduction Prerequisite Depth First Search Review problems dfs_intro Once we have a strong grasp of recursion and Depth First Search we can now introduce Disjoint Set Union DSU This data structure is motivated by the following problem Suppose we have sets of elements
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Want a Structured Path to Master System Design Too? Don’t Miss This!