477. Total Hamming Distance
Problem Description
The problem is related to the concept of Hamming distance, which is a measure of difference between two integers represented in binary. More specifically, it is the number of bit positions in which the two bits are different. The problem asks us to calculate the sum of Hamming distances between all pairs of integers in an array of integers, nums
.
To tackle this problem, we are given an array of integers and we want to find the pair-wise Hamming distances and then sum all those distances together. If we have N integers in our array, there are (N*(N-1))/2 pairs, where each pair's Hamming distance contributes to the sum.
Consider an example with smaller numbers and do this task manually. For [4, 14, 2]
, represented in binary as [0100, 1110, 0010]
, the Hamming distances between each pair of numbers would be calculated and summed up like this:
- The Hamming distance between
4 (0100)
and14 (1110)
is 3. - The Hamming distance between
4 (0100)
and2 (0010)
is 1. - The Hamming distance between
14 (1110)
and2 (0010)
is 2.
The sum of all the Hamming distances is 3 + 1 + 2 = 6.
Intuition
The brute-force approach to calculating the sum of all Hamming distances between pairs in the array nums
would be to compare every pair and count the bits that are different. However, calculating pairwise Hamming distances for large arrays would be an inefficient solution with a time complexity of O(n^2 * k), where n is the number of integers in nums
and k is the number of bits.
A clever way to tackle this problem is to count, for each bit position, the number of integers in the array that have a '1' in that position and the number that have a '0'. This is based on the concept that the Hamming distance for a particular bit position across all pairs is simply the number of times '1' occurs at that position multiplied by the number of times '0' occurs at the same position, because a difference occurs only if one is 1 and the other is 0.
For instance, if at the ith bit position, we have a
numbers with '1' and b
numbers with '0', the total Hamming distance contributed from the ith bit position would be a * b
. This is true because each of the a
numbers with '1' will have a Hamming distance of 1 with each of the b
numbers with '0'. Therefore, by doing this for all bit positions (0 to 30 for 31-bit integers, since the problem mention doesn't include negative numbers that would need the 32nd bit for the sign) and summing up the results, we get the total Hamming distance for all pairs.
This method is more efficient because it only requires us to pass through all the numbers for each bit position, making it O(n*k), which is significantly faster than the brute-force method, especially for a large number of integers in nums
.
The given solution uses this efficient method to calculate the sum of all Hamming distances in the array nums
using a loop that iterates over each bit position, followed by another loop to count occurrences of '1's and '0's, and then combining those counts to calculate the contribution of each bit position to the total Hamming distance.
Learn more about Math patterns.
Solution Approach
The implementation of the solution involves a bit manipulation technique and an understanding of combinatorics.
Here are the steps of the algorithm used in the provided Python code:
- Initialize
ans
to 0. This variable will hold the sum of Hamming distances. - Use a loop to iterate over each bit position. In this case, we loop from 0 to 30, inclusive, because an integer is 32 bits in size and we assume we're dealing with positive numbers only (no need for the sign bit).
- Inside this loop, initialize two counters
a
andb
to 0. Countera
will keep track of the number of 1's andb
of the number of 0's for the current bit position across all numbers innums
. - Inner loop through each number
v
innums
:- Shift
v
right byi
bits and perform a bitwise AND with1
((v >> i) & 1
) to isolate the bit at the current bit position. - If the result is 1, increment counter
a
(since this number contributes a 1 to the current bit position). - If the result is 0, increment counter
b
(since this number contributes a 0 to the current bit position).
- Shift
- Outside the inner loop but still inside the first loop, multiply
a
andb
and add the result toans
. This is based on the observation that each pair of numbers contributes 1 to the Hamming distance sum for this bit position if one of them has a 1 and the other has a 0. The total contribution from this bit position is therefore the product of the numbers of 1's and 0's. - After completing the loops,
ans
contains the sum of the Hamming distances, and we return this value.
In terms of data structures, the algorithm uses a list to store the input numbers and two integers as counters. There are no complex data structures required. The primary pattern used is bit manipulation, specifically shifting and masking to access individual bits of integers. The algorithm's time complexity is O(n*k), where n is the number of integers in the input array and k is the number of bits we are considering (31 in this case).
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example to illustrate the solution approach using the array [4, 14, 2]
. Follow along with the steps of the provided algorithm.
Step 1: Initialize ans
to 0.
Step 2: We will iterate over each bit position from 0 to 30. For simplicity, let's assume we only deal with 4-bit binary representations since the highest number here is 14. So, we loop from 0 to 3.
Step 3 & 4: Let's start with the least significant bit (LSB), i.e., bit position 0.
- Initialize counters
a
andb
to 0 for this bit position. - Inner loop over the numbers in
nums
:- For
4 (0100)
:(4 >> 0) & 1
equals0
, so we incrementb
. - For
14 (1110)
:(14 >> 0) & 1
equals0
, so we incrementb
. - For
2 (0010)
:(2 >> 0) & 1
equals0
, so we incrementb
.
- For
- After loop,
a
is0
andb
is3
for this bit position.
Step 5: Multiply a
and b
and add the result to ans
.
a * b
is0
because there are no1
s in this bit position across all numbers. So,ans
remains0
.
Step 6: Continue this process for the next bit positions (1, 2, 3).
-
Bit position 1:
a
would be 2 (4
and2
have 1 in this bit).b
would be 1 (14
has 0 in this bit).a * b
gives us2 * 1 = 2
, soans
updates to2
.
-
Bit position 2:
a
would be 1 (14
has 1 in this bit).b
would be 2 (4
and2
have 0 in this bit).a * b
gives us1 * 2 = 2
, soans
updates to4
.
-
Bit position 3:
a
would be 1 (14
has 1 in this bit).b
would be 2 (4
and2
have 0 in this bit).a * b
gives us1 * 2 = 2
, soans
updates to6
.
Step 7: After completing the loops for all bit positions, ans
is 6
, which is the sum of the Hamming distances for all pairs in the array [4, 14, 2]
.
Result: The final sum of Hamming distances computed is 6
. This result corresponds to adding the pairwise Hamming distances: 3
(from 4
and 14
), 1
(from 4
and 2
), and 2
(from 14
and 2
), as was calculated manually at the beginning of the problem description.
Solution Implementation
1class Solution:
2 def totalHammingDistance(self, nums: List[int]) -> int:
3 total_distance = 0 # Initialize total Hamming distance
4
5 # Loop over each bit position (0 to 30, 31 bits for signed 32-bit integer)
6 for bit_position in range(31):
7 count_one = count_zero = 0 # Initialize counts for 1s and 0s in the current bit position
8
9 # Iterate over each number in the input list
10 for num in nums:
11 # Right shift the number by the bit position and get the least significant bit
12 bit = (num >> bit_position) & 1
13
14 # Increment the count of ones or zeros based on the least significant bit
15 if bit:
16 count_one += 1
17 else:
18 count_zero += 1
19
20 # Update the total distance by adding the product of one's and zero's counts
21 # Each pair contributes one to the Hamming distance if one bit is 0 and the other is 1
22 total_distance += count_one * count_zero
23
24 return total_distance # Return the computed total Hamming distance
25
1class Solution {
2 public int totalHammingDistance(int[] nums) {
3 int totalDistance = 0; // Initialize a variable to hold the total Hamming distance.
4
5 // Iterate over each bit position (0 to 30 since the problem statement implies 32-bit integers and excludes the sign bit).
6 for (int i = 0; i < 31; ++i) {
7 int countOnes = 0; // Counter for the number of 1s at the ith bit position across all numbers.
8 int countZeros = 0; // Counter for the number of 0s at the ith bit position across all numbers.
9
10 // Check each number in the array for the ith bit.
11 for (int num : nums) {
12 // Shift the number i bits to the right and check if the least significant bit is 1.
13 int bit = (num >> i) & 1;
14
15 // Increment the respective counter based on the bit value (1 or 0).
16 if (bit == 1) {
17 countOnes++;
18 } else {
19 countZeros++;
20 }
21 }
22
23 // The Hamming distance for the ith bit is the product of the number of 1s and 0s.
24 // Each pair of different bits (one 1 and one 0) at the ith position contributes to one Hamming distance.
25 totalDistance += countOnes * countZeros;
26 }
27
28 return totalDistance; // Return the sum of the Hamming distances across all bit positions.
29 }
30}
31
1#include <vector>
2
3class Solution {
4public:
5 int totalHammingDistance(std::vector<int>& nums) {
6 int totalDistance = 0; // This will accumulate the total Hamming distance.
7
8 // Loop over each bit position (integer in C++ is typically 32 bits, but the problem can be assuming 31-bit integers).
9 for (int bitPosition = 0; bitPosition < 31; ++bitPosition) {
10 int countOnes = 0; // Number of elements with the current bit set to 1.
11 int countZeros = 0; // Number of elements with the current bit set to 0.
12
13 // Iterate over all the numbers in the array.
14 for (int num : nums) {
15 // Isolate the bit at the current position 'bitPosition'.
16 int bitValue = (num >> bitPosition) & 1;
17
18 // Increment the count of ones or zeros based on the bit value.
19 if (bitValue == 1) {
20 countOnes++;
21 } else {
22 countZeros++;
23 }
24 }
25
26 // The Hamming distance contributed by the current bit position is
27 // the product of the count of ones and the count of zeros,
28 // since each 1 can form a pair with each 0 resulting in a difference.
29 totalDistance += countOnes * countZeros;
30 }
31 return totalDistance; // Return the computed total Hamming distance.
32 }
33};
34
1// This function calculates the total Hamming distance between all pairs of numbers in an array.
2function totalHammingDistance(nums: number[]): number {
3 let totalDistance = 0; // This will accumulate the total Hamming distance.
4
5 // Loop over each bit position (integer in TypeScript is typically 32 bits).
6 for (let bitPosition = 0; bitPosition < 31; ++bitPosition) {
7 let countOnes = 0; // Number of elements with the current bit set to 1.
8 let countZeros = 0; // Number of elements with the current bit set to 0.
9
10 // Iterate over all the numbers in the array.
11 for (const num of nums) {
12 // Isolate the bit at the current position 'bitPosition'.
13 const bitValue = (num >> bitPosition) & 1;
14
15 // Increment the count of ones or zeros based on the bit value.
16 if (bitValue === 1) {
17 countOnes++;
18 } else {
19 countZeros++;
20 }
21 }
22
23 // The Hamming distance contributed by the current bit position is
24 // the product of the count of ones and the count of zeros,
25 // since each 1 can form a pair with each 0 resulting in a difference.
26 totalDistance += countOnes * countZeros;
27 }
28
29 // Return the computed total Hamming distance.
30 return totalDistance;
31}
32
Time and Space Complexity
The provided Python code calculates the total Hamming distance between all pairs of integers in an input array. Each integer is represented in 32-bit binary form (assuming a maximum of 31 usable bits due to the problem constraints). Here's the analysis of the complexities:
Time Complexity:
To analyze the time complexity, we observe that the code consists of two nested loops:
- The outer loop runs a fixed 31 times, corresponding to the number of bits in a 32-bit integer (minus the sign bit).
- The inner loop iterates through every element in the input list
nums
.
Since the outer loop is constant, we are mostly concerned with the inner loop, which has a runtime proportional to n
, where n
is the length of nums
. Iterating through nums
happens once for each bit, so the overall time complexity is O(n * 31), which simplifies to O(n)
.
Thus, the time complexity of the code is O(n)
.
Space Complexity:
The space complexity is determined by the amount of extra space used by the program, which does not grow with the input. The variables ans
, a
, b
, and t
use a fixed amount of space regardless of the input size. No additional data structures that scale with the input size are being used.
Hence, the space complexity of the code is O(1)
.
Learn more about how to find time and space complexity quickly using problem constraints.
In a binary min heap, the minimum element can be found in:
Recommended Readings
Math for Technical Interviews How much math do I need to know for technical interviews The short answer is about high school level math Computer science is often associated with math and some universities even place their computer science department under the math faculty However the reality is that you
Patterns The Shortest Path Algorithm for Coding Interviews The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we can determine the
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Got a question?ย Ask the Monster Assistantย anything you don't understand.
Still not clear? ย Submitย the part you don't understand to our editors. Or join ourย Discord and ask the community.