Facebook Pixel

271. Encode and Decode Strings 🔒

MediumDesignArrayString
Leetcode Link

Problem Description

This problem asks you to design an encoding and decoding system for a list of strings. You need to implement two functions:

  1. encode: Takes a list of strings and converts it into a single string
  2. decode: Takes the encoded string and converts it back to the original list of strings

The key challenge is that the strings in the list can contain any characters, including special characters, spaces, or even characters that might typically be used as delimiters. Your encoding scheme must be able to handle all possible string contents and accurately reconstruct the original list.

Example workflow:

  • Machine 1 has a list of strings like ["Hello", "World"]
  • It encodes this list into a single string using your encode function
  • This encoded string is sent over the network to Machine 2
  • Machine 2 uses your decode function to reconstruct the exact original list ["Hello", "World"]

Solution Approach:

The solution uses a length-prefixing technique:

Encoding Process:

  • For each string in the list, first write its length as a fixed 4-digit number (padded with leading zeros if necessary)
  • Then append the actual string content
  • Concatenate all these encoded pieces together

For example, ["Hello", "World"] becomes "0005Hello0005World" where:

  • "0005" indicates the next string has length 5
  • "Hello" is the first string
  • "0005" indicates the next string has length 5
  • "World" is the second string

Decoding Process:

  • Read the first 4 characters to get the length of the next string
  • Extract that many characters as one string from the original list
  • Repeat until the entire encoded string is processed

This approach works because:

  • The length prefix tells us exactly how many characters belong to each string
  • We don't need to worry about delimiter conflicts since we're using length information rather than special separator characters
  • The fixed 4-digit format ensures we always know where each length value ends
Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The main challenge in this problem is finding a way to combine multiple strings into one while preserving the boundaries between them. Let's think through why common approaches might fail:

Why simple delimiters don't work: If we try to use a delimiter like "," or "|" to separate strings, what happens when the original strings contain these delimiters? For example, if we encode ["hello,world", "test"] using commas, we'd get "hello,world,test". When decoding, we can't tell if this should be ["hello", "world", "test"] or ["hello,world", "test"].

Why escape sequences are complex: We could use escape sequences (like replacing "," with "\," in the strings), but this requires careful handling of the escape character itself and makes the implementation more error-prone.

The key insight - use metadata: Instead of trying to find a "safe" delimiter that won't appear in the strings, we can store metadata about each string. The most useful metadata is the string's length. If we know a string is exactly 5 characters long, we can read exactly 5 characters without worrying about what those characters are.

Why fixed-width length encoding: We encode the length as a fixed 4-digit number because:

  • If we used variable-length numbers, we'd face the same delimiter problem (how do we know where the length ends and the string begins?)
  • With 4 digits, we can handle strings up to length 9999, which is sufficient for most practical cases
  • We always know to read exactly 4 characters to get the length, then read exactly that many more characters to get the string

This approach elegantly sidesteps the delimiter problem entirely. Each string is self-contained with its length prefix, and we can decode the entire list by repeatedly reading length-value pairs until we've consumed the entire encoded string.

Solution Approach

Let's walk through the implementation step by step:

Encoding Algorithm

The encoding process transforms a list of strings into a single string by prefixing each string with its length:

  1. Initialize an empty list ans to collect encoded parts
  2. For each string s in the input list:
    • Calculate the length of the string using len(s)
    • Format this length as a 4-digit string using "{:4}".format(len(s))
      • This creates a string with exactly 4 characters, padding with spaces if needed
      • For example: length 5 becomes " 5", length 123 becomes " 123"
    • Concatenate the 4-digit length with the actual string content
    • Append this to the ans list
  3. Join all parts using "".join(ans) to create the final encoded string

Example encoding process:

  • Input: ["Hello", "World"]
  • Process string "Hello": length is 5 → " 5Hello"
  • Process string "World": length is 5 → " 5World"
  • Final encoded string: " 5Hello 5World"

Decoding Algorithm

The decoding process extracts strings from the encoded format by reading length prefixes:

  1. Initialize variables:

    • ans = empty list to store decoded strings
    • i = 0 (current position in the encoded string)
    • n = length of the encoded string
  2. While we haven't processed the entire string (i < n):

    • Read the length prefix: Extract 4 characters starting at position i using s[i : i + 4]
    • Convert to integer: Parse these 4 characters as an integer to get the string length
    • Move pointer: Advance i by 4 to skip past the length prefix
    • Extract the string: Read size characters starting at position i using s[i : i + size]
    • Store the string: Append the extracted string to ans
    • Move pointer: Advance i by size to move to the next encoded string
  3. Return the list of decoded strings

Example decoding process:

  • Input: " 5Hello 5World"
  • Read first 4 chars: " 5" → length = 5
  • Read next 5 chars: "Hello" → first string
  • Read next 4 chars: " 5" → length = 5
  • Read next 5 chars: "World" → second string
  • Result: ["Hello", "World"]

Key Design Decisions

  • Fixed-width length encoding (4 digits): Ensures we always know exactly how many characters represent the length
  • No delimiter characters: Completely avoids the problem of strings containing delimiter characters
  • Sequential processing: Both encoding and decoding process strings in order, maintaining the original sequence
  • String slicing: Python's slice notation s[i:j] efficiently extracts substrings without additional memory allocation for intermediate results

This approach has O(n) time complexity for both encoding and decoding, where n is the total length of all strings combined.

Ready to land your dream job?

Unlock your dream job with a 3-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through encoding and decoding the list ["Hi", "I'm", "Sam!"]:

Encoding Process:

  1. Start with empty result: ""

  2. Process "Hi" (length = 2):

    • Format length as 4 digits: " 2" (2 with 3 leading spaces)
    • Append string: " 2Hi"
    • Result so far: " 2Hi"
  3. Process "I'm" (length = 3):

    • Format length as 4 digits: " 3"
    • Append string: " 3I'm"
    • Result so far: " 2Hi 3I'm"
  4. Process "Sam!" (length = 4):

    • Format length as 4 digits: " 4"
    • Append string: " 4Sam!"
    • Final encoded result: " 2Hi 3I'm 4Sam!"

Decoding Process:

Starting with encoded string: " 2Hi 3I'm 4Sam!" (total length = 18)

  1. Position i = 0:

    • Read 4 chars at position 0-3: " 2" → length = 2
    • Move to position 4
    • Read 2 chars at position 4-5: "Hi"
    • Add "Hi" to result list
    • Move to position 6
  2. Position i = 6:

    • Read 4 chars at position 6-9: " 3" → length = 3
    • Move to position 10
    • Read 3 chars at position 10-12: "I'm"
    • Add "I'm" to result list
    • Move to position 13
  3. Position i = 13:

    • Read 4 chars at position 13-16: " 4" → length = 4
    • Move to position 17
    • Read 4 chars at position 17-20: "Sam!"
    • Add "Sam!" to result list
    • Move to position 21
  4. Position i = 21 ≥ 18 (string length), so stop

Final decoded result: ["Hi", "I'm", "Sam!"]

Notice how the special characters (apostrophe and exclamation mark) are handled naturally without any escaping needed, since we rely on length prefixes rather than delimiters.

Solution Implementation

1from typing import List
2
3
4class Codec:
5    def encode(self, strs: List[str]) -> str:
6        """Encodes a list of strings to a single string.
7      
8        Args:
9            strs: List of strings to encode
10          
11        Returns:
12            A single encoded string containing all input strings
13        """
14        encoded_parts = []
15      
16        # For each string, prepend its length as a 4-character fixed-width field
17        for string in strs:
18            # Format length as 4 digits with leading spaces if needed
19            length_prefix = f"{len(string):4}"
20            # Append length prefix followed by the actual string
21            encoded_parts.append(length_prefix + string)
22      
23        # Join all encoded parts into a single string
24        return "".join(encoded_parts)
25
26    def decode(self, s: str) -> List[str]:
27        """Decodes a single string to a list of strings.
28      
29        Args:
30            s: The encoded string to decode
31          
32        Returns:
33            List of decoded strings
34        """
35        decoded_strings = []
36        index = 0
37        total_length = len(s)
38      
39        # Process the encoded string by reading length prefixes and extracting strings
40        while index < total_length:
41            # Read the 4-character length prefix
42            string_length = int(s[index:index + 4])
43            index += 4
44          
45            # Extract the string of the specified length
46            decoded_string = s[index:index + string_length]
47            decoded_strings.append(decoded_string)
48            index += string_length
49      
50        return decoded_strings
51
52
53# Your Codec object will be instantiated and called as such:
54# codec = Codec()
55# codec.decode(codec.encode(strs))
56
1public class Codec {
2  
3    /**
4     * Encodes a list of strings to a single string.
5     * Format: For each string, append "length#string"
6     * Example: ["abc", "def"] -> "3#abc3#def"
7     * 
8     * @param strs List of strings to encode
9     * @return Encoded single string
10     */
11    public String encode(List<String> strs) {
12        StringBuilder encodedString = new StringBuilder();
13      
14        // For each string, append its length followed by a delimiter and the string itself
15        for (String str : strs) {
16            encodedString.append(str.length())
17                         .append('#')
18                         .append(str);
19        }
20      
21        return encodedString.toString();
22    }
23  
24    /**
25     * Decodes a single string to a list of strings.
26     * Parses the format "length#string" for each encoded string
27     * 
28     * @param s Encoded string to decode
29     * @return List of decoded strings
30     */
31    public List<String> decode(String s) {
32        List<String> decodedStrings = new ArrayList<>();
33        int currentIndex = 0;
34        int totalLength = s.length();
35      
36        // Process the encoded string until we reach the end
37        while (currentIndex < totalLength) {
38            // Find the delimiter '#' to extract the length
39            int delimiterIndex = s.indexOf('#', currentIndex);
40          
41            // Parse the length value
42            int stringLength = Integer.parseInt(s.substring(currentIndex, delimiterIndex));
43          
44            // Extract the string based on its length
45            int stringStartIndex = delimiterIndex + 1;
46            int stringEndIndex = stringStartIndex + stringLength;
47            decodedStrings.add(s.substring(stringStartIndex, stringEndIndex));
48          
49            // Move to the next encoded string
50            currentIndex = stringEndIndex;
51        }
52      
53        return decodedStrings;
54    }
55}
56```
57
58## Solution 2: Keeping closer to original approach (limited to strings of length 65,535)
59
60```java
61public class Codec {
62  
63    /**
64     * Encodes a list of strings to a single string.
65     * Format: For each string, prepend its length as a character
66     * Note: This approach limits string length to 65,535 characters
67     * 
68     * @param strs List of strings to encode
69     * @return Encoded single string
70     */
71    public String encode(List<String> strs) {
72        StringBuilder encodedString = new StringBuilder();
73      
74        // Iterate through each string in the list
75        for (String str : strs) {
76            // Cast length to char (stores as 2-byte character)
77            // WARNING: This limits string length to Character.MAX_VALUE (65,535)
78            encodedString.append((char) str.length());
79          
80            // Append the actual string content
81            encodedString.append(str);
82        }
83      
84        return encodedString.toString();
85    }
86  
87    /**
88     * Decodes a single string to a list of strings.
89     * Reads the length from the first character, then extracts the string
90     * 
91     * @param s Encoded string to decode
92     * @return List of decoded strings
93     */
94    public List<String> decode(String s) {
95        List<String> decodedStrings = new ArrayList<>();
96        int currentIndex = 0;
97        int totalLength = s.length();
98      
99        // Process the encoded string until we reach the end
100        while (currentIndex < totalLength) {
101            // Read the length stored as a character
102            int stringLength = s.charAt(currentIndex);
103            currentIndex++;
104          
105            // Extract the string of the specified length
106            String extractedString = s.substring(currentIndex, currentIndex + stringLength);
107            decodedStrings.add(extractedString);
108          
109            // Move the index forward by the length of the extracted string
110            currentIndex += stringLength;
111        }
112      
113        return decodedStrings;
114    }
115}
116
1class Codec {
2public:
3    /**
4     * Encodes a list of strings to a single string.
5     * Format: [length_bytes][string_content][length_bytes][string_content]...
6     * Each string is prefixed with its length stored as raw bytes (4 bytes for int)
7     * 
8     * @param strs Vector of strings to encode
9     * @return Encoded single string
10     */
11    string encode(vector<string>& strs) {
12        string encoded_result;
13      
14        for (const string& str : strs) {
15            // Get the length of current string
16            int length = str.size();
17          
18            // Append the length as raw bytes (4 bytes for int)
19            encoded_result.append(reinterpret_cast<const char*>(&length), sizeof(int));
20          
21            // Append the actual string content
22            encoded_result.append(str);
23        }
24      
25        return encoded_result;
26    }
27
28    /**
29     * Decodes a single string back to a list of strings.
30     * Reads the encoded format: [length_bytes][string_content]...
31     * 
32     * @param s Encoded string to decode
33     * @return Vector of decoded strings
34     */
35    vector<string> decode(string s) {
36        vector<string> decoded_strings;
37        int current_pos = 0;
38        int total_length = s.size();
39      
40        while (current_pos < total_length) {
41            // Read the length of the next string (4 bytes)
42            int string_length = 0;
43            memcpy(&string_length, s.data() + current_pos, sizeof(int));
44            current_pos += sizeof(int);
45          
46            // Extract the string content based on the length read
47            decoded_strings.push_back(s.substr(current_pos, string_length));
48            current_pos += string_length;
49        }
50      
51        return decoded_strings;
52    }
53};
54
55// Your Codec object will be instantiated and called as such:
56// Codec codec;
57// codec.decode(codec.encode(strs));
58
1/**
2 * Encodes a list of strings to a single string.
3 * Format: [length_as_4chars][string_content][length_as_4chars][string_content]...
4 * Each string is prefixed with its length encoded as 4 characters
5 * 
6 * @param strs Array of strings to encode
7 * @return Encoded single string
8 */
9function encode(strs: string[]): string {
10    let encodedResult = '';
11  
12    for (const str of strs) {
13        // Get the length of current string
14        const length = str.length;
15      
16        // Convert length to 4-character string representation
17        // Using 4 characters to store the length value
18        const lengthChars = String.fromCharCode(
19            (length >> 24) & 0xFF,
20            (length >> 16) & 0xFF,
21            (length >> 8) & 0xFF,
22            length & 0xFF
23        );
24      
25        // Append the length as 4 characters
26        encodedResult += lengthChars;
27      
28        // Append the actual string content
29        encodedResult += str;
30    }
31  
32    return encodedResult;
33}
34
35/**
36 * Decodes a single string back to a list of strings.
37 * Reads the encoded format: [length_as_4chars][string_content]...
38 * 
39 * @param s Encoded string to decode
40 * @return Array of decoded strings
41 */
42function decode(s: string): string[] {
43    const decodedStrings: string[] = [];
44    let currentPos = 0;
45    const totalLength = s.length;
46  
47    while (currentPos < totalLength) {
48        // Read the length of the next string (4 characters)
49        const stringLength = 
50            (s.charCodeAt(currentPos) << 24) |
51            (s.charCodeAt(currentPos + 1) << 16) |
52            (s.charCodeAt(currentPos + 2) << 8) |
53            s.charCodeAt(currentPos + 3);
54        currentPos += 4;
55      
56        // Extract the string content based on the length read
57        decodedStrings.push(s.substring(currentPos, currentPos + stringLength));
58        currentPos += stringLength;
59    }
60  
61    return decodedStrings;
62}
63

Time and Space Complexity

Time Complexity: O(n) where n is the total length of all strings combined.

  • For encode(): We iterate through each string once and perform constant-time operations (formatting length and concatenation to list). The string join operation at the end is O(n) where n is the total length of all strings.

  • For decode(): We traverse the encoded string once from start to end. Each character is visited exactly once, and substring extraction s[i : i + size] takes O(size) time. The total time across all substrings is O(n).

Space Complexity: O(n) where n is the total length of all strings.

  • For encode(): The ans list stores formatted strings that together have length proportional to the input (4 extra characters per string for length encoding plus the original strings). The final joined string also takes O(n) space.

  • For decode(): The ans list stores all decoded strings which together have the same total length as the original input, taking O(n) space.

Common Pitfalls

1. String Length Exceeding Fixed Width

The most critical pitfall in this implementation is that it assumes all strings have a length that can be represented in 4 digits (maximum 9999 characters). If a string has 10,000 or more characters, the formatting f"{len(string):4}" will exceed the 4-character width, breaking the decoding logic.

Example of the problem:

codec = Codec()
long_string = "a" * 10000  # String with 10,000 characters
encoded = codec.encode([long_string])
# The length "10000" takes 5 characters, not 4
# This breaks the fixed-width assumption
decoded = codec.decode(encoded)  # Will fail or produce incorrect results

Solution: Use a more robust length encoding scheme:

def encode(self, strs: List[str]) -> str:
    encoded_parts = []
    for string in strs:
        # Use delimiter with escape sequence
        # Format: "length:string" where : is a delimiter
        length_str = str(len(string))
        # Include the length of the length string itself
        encoded_parts.append(f"{len(length_str)}:{length_str}{string}")
    return "".join(encoded_parts)

def decode(self, s: str) -> List[str]:
    decoded_strings = []
    i = 0
    while i < len(s):
        # Read length of length
        colon_pos = s.index(':', i)
        len_of_len = int(s[i:colon_pos])
        i = colon_pos + 1
        # Read actual length
        string_length = int(s[i:i + len_of_len])
        i += len_of_len
        # Read string
        decoded_strings.append(s[i:i + string_length])
        i += string_length
    return decoded_strings

2. Alternative: Using Non-Printable Delimiter

Another approach that avoids the length limitation:

def encode(self, strs: List[str]) -> str:
    # Use a non-printable character as delimiter
    # Format: "length\x00string" for each string
    result = []
    for string in strs:
        result.append(f"{len(string)}\x00{string}")
    return "".join(result)

def decode(self, s: str) -> List[str]:
    if not s:
        return []
  
    decoded_strings = []
    i = 0
    while i < len(s):
        # Find the null character delimiter
        null_pos = s.index('\x00', i)
        length = int(s[i:null_pos])
        i = null_pos + 1
        decoded_strings.append(s[i:i + length])
        i += length
    return decoded_strings

3. Most Robust: Variable-Length Encoding

The most flexible solution uses a format like "length#string":

def encode(self, strs: List[str]) -> str:
    return ''.join(f"{len(s)}#{s}" for s in strs)

def decode(self, s: str) -> List[str]:
    decoded = []
    i = 0
    while i < len(s):
        # Find the delimiter '#'
        delimiter_pos = s.index('#', i)
        length = int(s[i:delimiter_pos])
        i = delimiter_pos + 1
        decoded.append(s[i:i + length])
        i += length
    return decoded

This approach:

  • Handles strings of any length
  • Uses '#' as a delimiter between length and content
  • The length prefix tells us exactly how many characters to read, so '#' appearing in the string content doesn't matter
Discover Your Strengths and Weaknesses: Take Our 3-Minute Quiz to Tailor Your Study Plan:

Which algorithm is best for finding the shortest distance between two points in an unweighted graph?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More