Facebook Pixel

3805. Count Caesar Cipher Pairs

MediumArrayHash TableMathStringCounting
LeetCode ↗

Problem Description

You are given an array words containing n strings. Every string in this array has the same length m and is made up of only lowercase English letters.

We say two strings s and t are similar if we can make them equal by applying the following operation any number of times (including zero times):

  • Pick either s or t.
  • Replace every letter in the chosen string with the next letter in the alphabet, wrapping around cyclically. After 'z', the next letter is 'a'.

The key idea here is that the operation shifts all characters of a string by the same amount at once. So two strings are similar if one can be obtained from the other by shifting every character by some common offset (cyclically).

Your task is to count the number of index pairs (i, j) that satisfy both of these conditions:

  • i < j
  • words[i] and words[j] are similar.

Return an integer representing the total number of such pairs.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

How We Pick the Algorithm

Why Hash Table / Counting?

This problem maps to Hash Table / Counting through a short path in the full flowchart.

Linkedlist?noFastlookup orcounting?yesHash Table /Counting

Storing seen values in a hash table provides constant-time lookup and counting.

Open in Flowchart

Intuition

The operation always shifts every character of a string by the same amount at the same time. This means that no matter how many times we apply the operation, the relative differences between consecutive characters of a string never change. Only the absolute position of all characters changes together.

For example, if we have the string "abc", applying the operation once gives "bcd", then "cde", and so on. Notice that in all of these, the gap from the first character to the second is always 1, and from the second to the third is always 1. These gaps stay fixed forever.

This gives us a powerful insight: two strings are similar if and only if they have the same pattern of differences between characters. The actual starting letter does not matter, because we can always shift one string until its starting letter matches the other.

So the trick is to convert each string into a canonical form, a single representative form that is identical for all strings similar to each other. The natural way to do this is to "anchor" the first character to a fixed letter (here we choose 'z'), and then shift all the other characters by the same offset k = 'z' - s[0]. After this transformation, any two similar strings collapse into exactly the same string.

Once every string is reduced to its canonical form, the problem becomes simple counting. We use a hash table to count how many strings map to each canonical form. If a particular form appears v times, then any two of those strings form a valid similar pair. The number of such pairs is the combination C(v, 2) = v * (v - 1) / 2. Summing this value over all distinct forms gives the final answer.

Pattern Learn more about Math patterns.

Solution Approach

We use String Transformation + Counting to solve this problem efficiently.

Data structure used: a hash table cnt (implemented with defaultdict(int)) that maps each canonical form of a string to the number of times it appears.

Step-by-step walkthrough:

  1. Iterate over every string s in words. For each one, we build its canonical form so that all strings similar to it become identical.

  2. Compute the offset. We decide to anchor the first character to 'z'. The amount we need to shift the first character is k = ord('z') - ord(t[0]). This same offset k must be applied to every character, because the operation always shifts all characters by the same value.

  3. Shift the remaining characters. For each index i from 1 to len(t) - 1, we apply the offset cyclically:

    t[i] = chr((ord(t[i]) - ord('a') + k) % 26 + ord('a'))

    Here we map the character into the range 0..25, add the offset k, take modulo 26 to wrap around the alphabet, and convert it back to a letter.

  4. Anchor the first character. Since the first character was always meant to become 'z', we directly set t[0] = 'z'. This avoids recomputing it and guarantees the anchor is consistent.

  5. Record the canonical form. We join the list back into a string with "".join(t) and increment its count in the hash table: cnt["".join(t)] += 1.

  6. Count the pairs. After processing all strings, each distinct canonical form with count v contributes v * (v - 1) / 2 pairs, since any two of those v strings are similar to each other. We sum this over all values in the hash table:

    sum(v * (v - 1) // 2 for v in cnt.values())

Complexity analysis:

  • Time complexity: O(n * m), where n is the number of strings and m is the length of each string. We process each character of every string exactly once when building the canonical forms.
  • Space complexity: O(n * m), used by the hash table to store the canonical form of each string.

Example Walkthrough

Let's trace through a small example to see how the solution approach works.

Input: words = ["abc", "bcd", "xyz", "acb"]

We want to count pairs (i, j) with i < j where the two strings are similar.

We'll process each string by anchoring its first character to 'z', then shifting every other character by the same offset k.


String 0: "abc"

  • First character is 'a'. Offset: k = ord('z') - ord('a') = 25 - 0 = 25.
  • Shift index 1 ('b'): (1 + 25) % 26 = 26 % 26 = 0'a'.
  • Shift index 2 ('c'): (2 + 25) % 26 = 27 % 26 = 1'b'.
  • Anchor first character to 'z'.
  • Canonical form: "zab"cnt = {"zab": 1}

String 1: "bcd"

  • First character is 'b'. Offset: k = 25 - 1 = 24.
  • Shift index 1 ('c'): (2 + 24) % 26 = 26 % 26 = 0'a'.
  • Shift index 2 ('d'): (3 + 24) % 26 = 27 % 26 = 1'b'.
  • Anchor first character to 'z'.
  • Canonical form: "zab"cnt = {"zab": 2}

Notice "abc" and "bcd" collapse to the same form "zab". That makes sense: shifting "abc" once gives "bcd", so they are similar.


String 2: "xyz"

  • First character is 'x'. Offset: k = 25 - 23 = 2.
  • Shift index 1 ('y'): (24 + 2) % 26 = 26 % 26 = 0'a'.
  • Shift index 2 ('z'): (25 + 2) % 26 = 27 % 26 = 1'b'.
  • Anchor first character to 'z'.
  • Canonical form: "zab"cnt = {"zab": 3}

"xyz" also has consecutive +1 gaps, so it joins the same group.


String 3: "acb"

  • First character is 'a'. Offset: k = 25 - 0 = 25.
  • Shift index 1 ('c'): (2 + 25) % 26 = 27 % 26 = 1'b'.
  • Shift index 2 ('b'): (1 + 25) % 26 = 26 % 26 = 0'a'.
  • Anchor first character to 'z'.
  • Canonical form: "zba"cnt = {"zab": 3, "zba": 1}

The gaps here are different (+2, then -1), so it lands in its own group.


Counting the pairs

Now we apply C(v, 2) = v * (v - 1) / 2 to each group:

Canonical formCount vPairs v(v-1)/2
"zab"33 * 2 / 2 = 3
"zba"11 * 0 / 2 = 0

Total = 3 + 0 = 3


Verifying the answer

The group {"abc", "bcd", "xyz"} at indices 0, 1, 2 produces these pairs:

  • (0, 1)"abc" & "bcd"
  • (0, 2)"abc" & "xyz"
  • (1, 2)"bcd" & "xyz"

That's exactly 3 pairs, matching our computed result. The string "acb" is similar to none of the others, so it contributes nothing — consistent with its singleton group.

Solution Implementation

1from collections import defaultdict
2from typing import List
3
4
5class Solution:
6    def countPairs(self, words: List[str]) -> int:
7        # Map each word's canonical (shift-normalized) form to its frequency.
8        canonical_count: defaultdict[str, int] = defaultdict(int)
9
10        for word in words:
11            chars = list(word)
12
13            # Offset needed to shift the first character up to 'z'.
14            # This normalizes every word relative to its first letter, so that
15            # words equal under a uniform cyclic alphabet shift collapse together.
16            shift = ord("z") - ord(chars[0])
17
18            # Apply the same cyclic shift to the remaining characters.
19            for i in range(1, len(chars)):
20                shifted_value = (ord(chars[i]) - ord("a") + shift) % 26
21                chars[i] = chr(shifted_value + ord("a"))
22
23            # Anchor the first character to 'z' as the fixed reference point.
24            chars[0] = "z"
25
26            # Accumulate the count for this canonical form.
27            canonical_count["".join(chars)] += 1
28
29        # Within each group of size v, the number of valid pairs is C(v, 2).
30        return sum(group_size * (group_size - 1) // 2 for group_size in canonical_count.values())
31
1class Solution {
2    public long countPairs(String[] words) {
3        // Map to store the count of each normalized word form
4        Map<String, Integer> countMap = new HashMap<>();
5
6        // Total number of matching pairs
7        long answer = 0;
8
9        // Normalize every word and accumulate its occurrence count
10        for (String word : words) {
11            char[] chars = word.toCharArray();
12
13            // Compute the shift that maps the first character to 'z'
14            int shift = 'z' - chars[0];
15
16            // Apply the same circular shift to all remaining characters,
17            // so that words differing only by a uniform shift collapse
18            // to the same canonical (normalized) representation
19            for (int i = 1; i < chars.length; i++) {
20                chars[i] = (char) ('a' + (chars[i] - 'a' + shift) % 26);
21            }
22
23            // Fix the first character to 'z' as the anchor of the normalized form
24            chars[0] = 'z';
25
26            // Increase the count for this normalized word
27            countMap.merge(new String(chars), 1, Integer::sum);
28        }
29
30        // For each group of identical normalized words of size v,
31        // the number of pairs is C(v, 2) = v * (v - 1) / 2
32        for (int value : countMap.values()) {
33            answer += 1L * value * (value - 1) / 2;
34        }
35
36        return answer;
37    }
38}
39
1class Solution {
2public:
3    long long countPairs(vector<string>& words) {
4        // Map from a normalized string form to the number of words sharing it
5        unordered_map<string, int> groupCount;
6        long long ans = 0;
7
8        for (auto& word : words) {
9            string normalized = word;
10
11            // Compute the shift that maps the first character to 'z'
12            int shift = 'z' - normalized[0];
13
14            // Apply the same circular shift to every subsequent character,
15            // preserving the relative pattern between characters
16            for (int i = 1; i < static_cast<int>(normalized.size()); i++) {
17                normalized[i] = 'a' + (normalized[i] - 'a' + shift) % 26;
18            }
19
20            // Anchor the first character to 'z' for all words
21            normalized[0] = 'z';
22
23            // Group words by their normalized representation
24            groupCount[normalized]++;
25        }
26
27        // For each group of size v, the number of valid pairs is v * (v - 1) / 2
28        for (auto& [key, count] : groupCount) {
29            ans += 1LL * count * (count - 1) / 2;
30        }
31
32        return ans;
33    }
34};
35
1/**
2 * Counts the number of pairs of words that share the same "shift pattern".
3 *
4 * Two words are considered a matching pair if one can be transformed into the
5 * other by applying a single uniform Caesar shift (mod 26) to every character.
6 * To detect this, each word is normalized: a shift `k` is chosen so the first
7 * character maps to 'z', and the same shift is applied to the remaining
8 * characters. Words producing the same normalized key belong to the same group.
9 *
10 * @param words - The list of input words.
11 * @returns The number of valid pairs.
12 */
13function countPairs(words: string[]): number {
14    // Maps each normalized key to the count of words producing that key.
15    const keyCount = new Map<string, number>();
16    let answer = 0;
17
18    for (const word of words) {
19        const chars = word.split('');
20
21        // Shift amount needed to move the first character to 'z'.
22        const shift = 'z'.charCodeAt(0) - chars[0].charCodeAt(0);
23
24        // Apply the same shift (mod 26) to every character after the first.
25        for (let i = 1; i < chars.length; i++) {
26            const shifted = (chars[i].charCodeAt(0) - 97 + shift) % 26;
27            chars[i] = String.fromCharCode(97 + shifted);
28        }
29
30        // The first character always normalizes to 'z'.
31        chars[0] = 'z';
32
33        // Build the normalized key and increment its occurrence count.
34        const key = chars.join('');
35        keyCount.set(key, (keyCount.get(key) || 0) + 1);
36    }
37
38    // For each group of size v, there are C(v, 2) = v * (v - 1) / 2 pairs.
39    for (const count of keyCount.values()) {
40        answer += (count * (count - 1)) / 2;
41    }
42
43    return answer;
44}
45

Time and Space Complexity

  • Time Complexity: O(n × m), where n is the length of the string array words, and m is the maximum length of the strings. For each of the n strings, we iterate over all its characters to construct the normalized form t (the loop for i in range(1, len(t)) and the "".join(t) operation both cost O(m)). Computing the dictionary key and updating the count via cnt[...] += 1 requires hashing the string of length m, which also costs O(m). The final summation sum(v * (v - 1) // 2 for v in cnt.values()) iterates over at most n entries, costing O(n). Therefore the overall time complexity is O(n × m).

  • Space Complexity: O(n × m). The dictionary cnt may store up to n distinct keys, each being a normalized string of length up to m, requiring O(n × m) space in the worst case. The temporary list t and the joined string occupy O(m) space per iteration, which is dominated by the dictionary. Hence the overall space complexity is O(n × m).

Pattern Learn more about how to find time and space complexity quickly.

Common Pitfalls

Pitfall: Computing the shift offset based on the wrong anchor or mixing up the shift direction

The most frequent mistake in this solution is inconsistently choosing the anchor, which breaks the whole normalization scheme. The canonical-form approach only works if every word is normalized using the same rule. A subtle bug appears when you anchor the first character to 'z' but then forget that the offset shift must be the same value applied to all characters, including the first.

Consider this buggy variation:

for word in words:
    chars = list(word)
    shift = ord("z") - ord(chars[0])
    # BUG: applying shift starting from index 0 but using a different formula,
    #      or computing shift relative to 'a' instead of the first char
    for i in range(len(chars)):
        shifted_value = (ord(chars[i]) - ord("a") + shift) % 26
        chars[i] = chr(shifted_value + ord("a"))
    canonical_count["".join(chars)] += 1

This actually works and is arguably cleaner — but the danger is when people mix the two styles: they compute the loop over range(1, len) (skipping index 0) and also forget to set chars[0] = "z". In that case index 0 keeps its original (un-shifted) value, so two genuinely similar words map to different canonical forms and pairs are undercounted.

Why it happens

The original code splits the work into two parts:

  1. Loop shifts indices 1..m-1.
  2. Index 0 is set directly with chars[0] = "z".

If a refactor drops step 2 (a very common edit), the first character is left untouched and the canonical form is corrupted.

Concrete failing example

Take words = ["ab", "bc"]. These are similar (shift "ab" by 1 → "bc").

  • Correct: both normalize to "zb"-style anchored form → 1 pair.
  • Buggy (missing chars[0] = "z"): "ab"chars[0]='a', and "bc"chars[0]='b', giving different canonical strings → 0 pairs (wrong).

Solution

Make the normalization a single, self-consistent transformation so there is no separate step that can be forgotten. Shift all characters (including index 0) with one uniform loop:

from collections import defaultdict
from typing import List


class Solution:
    def countPairs(self, words: List[str]) -> int:
        canonical_count: defaultdict[str, int] = defaultdict(int)

        for word in words:
            # Offset that maps the first character to 'a' (any fixed anchor works,
            # as long as it is identical for every word).
            shift = (ord("a") - ord(word[0])) % 26

            # One uniform loop over ALL characters — no special-casing index 0.
            canonical = "".join(
                chr((ord(ch) - ord("a") + shift) % 26 + ord("a"))
                for ch in word
            )

            canonical_count[canonical] += 1

        return sum(v * (v - 1) // 2 for v in canonical_count.values())

Why this is safer

  • No special case: Index 0 is handled by the same formula as every other character, so there is no separate assignment to forget.
  • Uniform offset: shift is computed once and applied to all positions, exactly mirroring the problem's "shift every letter by the same amount" rule.
  • Anchor choice is irrelevant: Whether you anchor the first char to 'a' or 'z', the grouping is identical — what matters is that the relative differences between characters are preserved, and the same anchor is used for every word.

Related pitfall: Integer overflow on the pair count

In languages like C++ or Java, v * (v - 1) / 2 can overflow a 32-bit integer when many identical words exist (e.g., v ≈ 10^5 gives ~5 * 10^9). Always use a 64-bit type (long/long long) for the accumulator. In Python this is a non-issue since integers are arbitrary precision, but it is worth flagging when porting the solution.

Ready to land your dream job?

Unlock your dream job with a 5-minute quiz for a personalized study roadmap!

Get My Roadmap
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Get a Personalized Study Roadmap:

What's the output of running the following function using input 56?

1KEYBOARD = {
2    '2': 'abc',
3    '3': 'def',
4    '4': 'ghi',
5    '5': 'jkl',
6    '6': 'mno',
7    '7': 'pqrs',
8    '8': 'tuv',
9    '9': 'wxyz',
10}
11
12def letter_combinations_of_phone_number(digits):
13    def dfs(path, res):
14        if len(path) == len(digits):
15            res.append(''.join(path))
16            return
17
18        next_number = digits[len(path)]
19        for letter in KEYBOARD[next_number]:
20            path.append(letter)
21            dfs(path, res)
22            path.pop()
23
24    res = []
25    dfs([], res)
26    return res
27
1private static final Map<Character, char[]> KEYBOARD = Map.of(
2    '2', "abc".toCharArray(),
3    '3', "def".toCharArray(),
4    '4', "ghi".toCharArray(),
5    '5', "jkl".toCharArray(),
6    '6', "mno".toCharArray(),
7    '7', "pqrs".toCharArray(),
8    '8', "tuv".toCharArray(),
9    '9', "wxyz".toCharArray()
10);
11
12public static List<String> letterCombinationsOfPhoneNumber(String digits) {
13    List<String> res = new ArrayList<>();
14    dfs(new StringBuilder(), res, digits.toCharArray());
15    return res;
16}
17
18private static void dfs(StringBuilder path, List<String> res, char[] digits) {
19    if (path.length() == digits.length) {
20        res.add(path.toString());
21        return;
22    }
23    char next_digit = digits[path.length()];
24    for (char letter : KEYBOARD.get(next_digit)) {
25        path.append(letter);
26        dfs(path, res, digits);
27        path.deleteCharAt(path.length() - 1);
28    }
29}
30
1const KEYBOARD = {
2    '2': 'abc',
3    '3': 'def',
4    '4': 'ghi',
5    '5': 'jkl',
6    '6': 'mno',
7    '7': 'pqrs',
8    '8': 'tuv',
9    '9': 'wxyz',
10}
11
12function letter_combinations_of_phone_number(digits) {
13    let res = [];
14    dfs(digits, [], res);
15    return res;
16}
17
18function dfs(digits, path, res) {
19    if (path.length === digits.length) {
20        res.push(path.join(''));
21        return;
22    }
23    let next_number = digits.charAt(path.length);
24    for (let letter of KEYBOARD[next_number]) {
25        path.push(letter);
26        dfs(digits, path, res);
27        path.pop();
28    }
29}
30

Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More