271. Encode and Decode Strings 🔒
Problem Description
This problem asks you to design an encoding and decoding system for a list of strings. You need to implement two functions:
encode
: Takes a list of strings and converts it into a single stringdecode
: Takes the encoded string and converts it back to the original list of strings
The key challenge is that the strings in the list can contain any characters, including special characters, spaces, or even characters that might typically be used as delimiters. Your encoding scheme must be able to handle all possible string contents and accurately reconstruct the original list.
Example workflow:
- Machine 1 has a list of strings like
["Hello", "World"]
- It encodes this list into a single string using your
encode
function - This encoded string is sent over the network to Machine 2
- Machine 2 uses your
decode
function to reconstruct the exact original list["Hello", "World"]
Solution Approach:
The solution uses a length-prefixing technique:
Encoding Process:
- For each string in the list, first write its length as a fixed 4-digit number (padded with leading zeros if necessary)
- Then append the actual string content
- Concatenate all these encoded pieces together
For example, ["Hello", "World"]
becomes "0005Hello0005World"
where:
"0005"
indicates the next string has length 5"Hello"
is the first string"0005"
indicates the next string has length 5"World"
is the second string
Decoding Process:
- Read the first 4 characters to get the length of the next string
- Extract that many characters as one string from the original list
- Repeat until the entire encoded string is processed
This approach works because:
- The length prefix tells us exactly how many characters belong to each string
- We don't need to worry about delimiter conflicts since we're using length information rather than special separator characters
- The fixed 4-digit format ensures we always know where each length value ends
Intuition
The main challenge in this problem is finding a way to combine multiple strings into one while preserving the boundaries between them. Let's think through why common approaches might fail:
Why simple delimiters don't work:
If we try to use a delimiter like ","
or "|"
to separate strings, what happens when the original strings contain these delimiters? For example, if we encode ["hello,world", "test"]
using commas, we'd get "hello,world,test"
. When decoding, we can't tell if this should be ["hello", "world", "test"]
or ["hello,world", "test"]
.
Why escape sequences are complex:
We could use escape sequences (like replacing ","
with "\,"
in the strings), but this requires careful handling of the escape character itself and makes the implementation more error-prone.
The key insight - use metadata: Instead of trying to find a "safe" delimiter that won't appear in the strings, we can store metadata about each string. The most useful metadata is the string's length. If we know a string is exactly 5 characters long, we can read exactly 5 characters without worrying about what those characters are.
Why fixed-width length encoding: We encode the length as a fixed 4-digit number because:
- If we used variable-length numbers, we'd face the same delimiter problem (how do we know where the length ends and the string begins?)
- With 4 digits, we can handle strings up to length 9999, which is sufficient for most practical cases
- We always know to read exactly 4 characters to get the length, then read exactly that many more characters to get the string
This approach elegantly sidesteps the delimiter problem entirely. Each string is self-contained with its length prefix, and we can decode the entire list by repeatedly reading length-value pairs until we've consumed the entire encoded string.
Solution Approach
Let's walk through the implementation step by step:
Encoding Algorithm
The encoding process transforms a list of strings into a single string by prefixing each string with its length:
- Initialize an empty list
ans
to collect encoded parts - For each string
s
in the input list:- Calculate the length of the string using
len(s)
- Format this length as a 4-digit string using
"{:4}".format(len(s))
- This creates a string with exactly 4 characters, padding with spaces if needed
- For example: length 5 becomes
" 5"
, length 123 becomes" 123"
- Concatenate the 4-digit length with the actual string content
- Append this to the
ans
list
- Calculate the length of the string using
- Join all parts using
"".join(ans)
to create the final encoded string
Example encoding process:
- Input:
["Hello", "World"]
- Process string "Hello": length is 5 →
" 5Hello"
- Process string "World": length is 5 →
" 5World"
- Final encoded string:
" 5Hello 5World"
Decoding Algorithm
The decoding process extracts strings from the encoded format by reading length prefixes:
-
Initialize variables:
ans
= empty list to store decoded stringsi
= 0 (current position in the encoded string)n
= length of the encoded string
-
While we haven't processed the entire string (
i < n
):- Read the length prefix: Extract 4 characters starting at position
i
usings[i : i + 4]
- Convert to integer: Parse these 4 characters as an integer to get the string length
- Move pointer: Advance
i
by 4 to skip past the length prefix - Extract the string: Read
size
characters starting at positioni
usings[i : i + size]
- Store the string: Append the extracted string to
ans
- Move pointer: Advance
i
bysize
to move to the next encoded string
- Read the length prefix: Extract 4 characters starting at position
-
Return the list of decoded strings
Example decoding process:
- Input:
" 5Hello 5World"
- Read first 4 chars:
" 5"
→ length = 5 - Read next 5 chars:
"Hello"
→ first string - Read next 4 chars:
" 5"
→ length = 5 - Read next 5 chars:
"World"
→ second string - Result:
["Hello", "World"]
Key Design Decisions
- Fixed-width length encoding (4 digits): Ensures we always know exactly how many characters represent the length
- No delimiter characters: Completely avoids the problem of strings containing delimiter characters
- Sequential processing: Both encoding and decoding process strings in order, maintaining the original sequence
- String slicing: Python's slice notation
s[i:j]
efficiently extracts substrings without additional memory allocation for intermediate results
This approach has O(n)
time complexity for both encoding and decoding, where n
is the total length of all strings combined.
Ready to land your dream job?
Unlock your dream job with a 3-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through encoding and decoding the list ["Hi", "I'm", "Sam!"]
:
Encoding Process:
-
Start with empty result:
""
-
Process "Hi" (length = 2):
- Format length as 4 digits:
" 2"
(2 with 3 leading spaces) - Append string:
" 2Hi"
- Result so far:
" 2Hi"
- Format length as 4 digits:
-
Process "I'm" (length = 3):
- Format length as 4 digits:
" 3"
- Append string:
" 3I'm"
- Result so far:
" 2Hi 3I'm"
- Format length as 4 digits:
-
Process "Sam!" (length = 4):
- Format length as 4 digits:
" 4"
- Append string:
" 4Sam!"
- Final encoded result:
" 2Hi 3I'm 4Sam!"
- Format length as 4 digits:
Decoding Process:
Starting with encoded string: " 2Hi 3I'm 4Sam!"
(total length = 18)
-
Position i = 0:
- Read 4 chars at position 0-3:
" 2"
→ length = 2 - Move to position 4
- Read 2 chars at position 4-5:
"Hi"
- Add "Hi" to result list
- Move to position 6
- Read 4 chars at position 0-3:
-
Position i = 6:
- Read 4 chars at position 6-9:
" 3"
→ length = 3 - Move to position 10
- Read 3 chars at position 10-12:
"I'm"
- Add "I'm" to result list
- Move to position 13
- Read 4 chars at position 6-9:
-
Position i = 13:
- Read 4 chars at position 13-16:
" 4"
→ length = 4 - Move to position 17
- Read 4 chars at position 17-20:
"Sam!"
- Add "Sam!" to result list
- Move to position 21
- Read 4 chars at position 13-16:
-
Position i = 21 ≥ 18 (string length), so stop
Final decoded result: ["Hi", "I'm", "Sam!"]
Notice how the special characters (apostrophe and exclamation mark) are handled naturally without any escaping needed, since we rely on length prefixes rather than delimiters.
Solution Implementation
1from typing import List
2
3
4class Codec:
5 def encode(self, strs: List[str]) -> str:
6 """Encodes a list of strings to a single string.
7
8 Args:
9 strs: List of strings to encode
10
11 Returns:
12 A single encoded string containing all input strings
13 """
14 encoded_parts = []
15
16 # For each string, prepend its length as a 4-character fixed-width field
17 for string in strs:
18 # Format length as 4 digits with leading spaces if needed
19 length_prefix = f"{len(string):4}"
20 # Append length prefix followed by the actual string
21 encoded_parts.append(length_prefix + string)
22
23 # Join all encoded parts into a single string
24 return "".join(encoded_parts)
25
26 def decode(self, s: str) -> List[str]:
27 """Decodes a single string to a list of strings.
28
29 Args:
30 s: The encoded string to decode
31
32 Returns:
33 List of decoded strings
34 """
35 decoded_strings = []
36 index = 0
37 total_length = len(s)
38
39 # Process the encoded string by reading length prefixes and extracting strings
40 while index < total_length:
41 # Read the 4-character length prefix
42 string_length = int(s[index:index + 4])
43 index += 4
44
45 # Extract the string of the specified length
46 decoded_string = s[index:index + string_length]
47 decoded_strings.append(decoded_string)
48 index += string_length
49
50 return decoded_strings
51
52
53# Your Codec object will be instantiated and called as such:
54# codec = Codec()
55# codec.decode(codec.encode(strs))
56
1public class Codec {
2
3 /**
4 * Encodes a list of strings to a single string.
5 * Format: For each string, append "length#string"
6 * Example: ["abc", "def"] -> "3#abc3#def"
7 *
8 * @param strs List of strings to encode
9 * @return Encoded single string
10 */
11 public String encode(List<String> strs) {
12 StringBuilder encodedString = new StringBuilder();
13
14 // For each string, append its length followed by a delimiter and the string itself
15 for (String str : strs) {
16 encodedString.append(str.length())
17 .append('#')
18 .append(str);
19 }
20
21 return encodedString.toString();
22 }
23
24 /**
25 * Decodes a single string to a list of strings.
26 * Parses the format "length#string" for each encoded string
27 *
28 * @param s Encoded string to decode
29 * @return List of decoded strings
30 */
31 public List<String> decode(String s) {
32 List<String> decodedStrings = new ArrayList<>();
33 int currentIndex = 0;
34 int totalLength = s.length();
35
36 // Process the encoded string until we reach the end
37 while (currentIndex < totalLength) {
38 // Find the delimiter '#' to extract the length
39 int delimiterIndex = s.indexOf('#', currentIndex);
40
41 // Parse the length value
42 int stringLength = Integer.parseInt(s.substring(currentIndex, delimiterIndex));
43
44 // Extract the string based on its length
45 int stringStartIndex = delimiterIndex + 1;
46 int stringEndIndex = stringStartIndex + stringLength;
47 decodedStrings.add(s.substring(stringStartIndex, stringEndIndex));
48
49 // Move to the next encoded string
50 currentIndex = stringEndIndex;
51 }
52
53 return decodedStrings;
54 }
55}
56```
57
58## Solution 2: Keeping closer to original approach (limited to strings of length 65,535)
59
60```java
61public class Codec {
62
63 /**
64 * Encodes a list of strings to a single string.
65 * Format: For each string, prepend its length as a character
66 * Note: This approach limits string length to 65,535 characters
67 *
68 * @param strs List of strings to encode
69 * @return Encoded single string
70 */
71 public String encode(List<String> strs) {
72 StringBuilder encodedString = new StringBuilder();
73
74 // Iterate through each string in the list
75 for (String str : strs) {
76 // Cast length to char (stores as 2-byte character)
77 // WARNING: This limits string length to Character.MAX_VALUE (65,535)
78 encodedString.append((char) str.length());
79
80 // Append the actual string content
81 encodedString.append(str);
82 }
83
84 return encodedString.toString();
85 }
86
87 /**
88 * Decodes a single string to a list of strings.
89 * Reads the length from the first character, then extracts the string
90 *
91 * @param s Encoded string to decode
92 * @return List of decoded strings
93 */
94 public List<String> decode(String s) {
95 List<String> decodedStrings = new ArrayList<>();
96 int currentIndex = 0;
97 int totalLength = s.length();
98
99 // Process the encoded string until we reach the end
100 while (currentIndex < totalLength) {
101 // Read the length stored as a character
102 int stringLength = s.charAt(currentIndex);
103 currentIndex++;
104
105 // Extract the string of the specified length
106 String extractedString = s.substring(currentIndex, currentIndex + stringLength);
107 decodedStrings.add(extractedString);
108
109 // Move the index forward by the length of the extracted string
110 currentIndex += stringLength;
111 }
112
113 return decodedStrings;
114 }
115}
116
1class Codec {
2public:
3 /**
4 * Encodes a list of strings to a single string.
5 * Format: [length_bytes][string_content][length_bytes][string_content]...
6 * Each string is prefixed with its length stored as raw bytes (4 bytes for int)
7 *
8 * @param strs Vector of strings to encode
9 * @return Encoded single string
10 */
11 string encode(vector<string>& strs) {
12 string encoded_result;
13
14 for (const string& str : strs) {
15 // Get the length of current string
16 int length = str.size();
17
18 // Append the length as raw bytes (4 bytes for int)
19 encoded_result.append(reinterpret_cast<const char*>(&length), sizeof(int));
20
21 // Append the actual string content
22 encoded_result.append(str);
23 }
24
25 return encoded_result;
26 }
27
28 /**
29 * Decodes a single string back to a list of strings.
30 * Reads the encoded format: [length_bytes][string_content]...
31 *
32 * @param s Encoded string to decode
33 * @return Vector of decoded strings
34 */
35 vector<string> decode(string s) {
36 vector<string> decoded_strings;
37 int current_pos = 0;
38 int total_length = s.size();
39
40 while (current_pos < total_length) {
41 // Read the length of the next string (4 bytes)
42 int string_length = 0;
43 memcpy(&string_length, s.data() + current_pos, sizeof(int));
44 current_pos += sizeof(int);
45
46 // Extract the string content based on the length read
47 decoded_strings.push_back(s.substr(current_pos, string_length));
48 current_pos += string_length;
49 }
50
51 return decoded_strings;
52 }
53};
54
55// Your Codec object will be instantiated and called as such:
56// Codec codec;
57// codec.decode(codec.encode(strs));
58
1/**
2 * Encodes a list of strings to a single string.
3 * Format: [length_as_4chars][string_content][length_as_4chars][string_content]...
4 * Each string is prefixed with its length encoded as 4 characters
5 *
6 * @param strs Array of strings to encode
7 * @return Encoded single string
8 */
9function encode(strs: string[]): string {
10 let encodedResult = '';
11
12 for (const str of strs) {
13 // Get the length of current string
14 const length = str.length;
15
16 // Convert length to 4-character string representation
17 // Using 4 characters to store the length value
18 const lengthChars = String.fromCharCode(
19 (length >> 24) & 0xFF,
20 (length >> 16) & 0xFF,
21 (length >> 8) & 0xFF,
22 length & 0xFF
23 );
24
25 // Append the length as 4 characters
26 encodedResult += lengthChars;
27
28 // Append the actual string content
29 encodedResult += str;
30 }
31
32 return encodedResult;
33}
34
35/**
36 * Decodes a single string back to a list of strings.
37 * Reads the encoded format: [length_as_4chars][string_content]...
38 *
39 * @param s Encoded string to decode
40 * @return Array of decoded strings
41 */
42function decode(s: string): string[] {
43 const decodedStrings: string[] = [];
44 let currentPos = 0;
45 const totalLength = s.length;
46
47 while (currentPos < totalLength) {
48 // Read the length of the next string (4 characters)
49 const stringLength =
50 (s.charCodeAt(currentPos) << 24) |
51 (s.charCodeAt(currentPos + 1) << 16) |
52 (s.charCodeAt(currentPos + 2) << 8) |
53 s.charCodeAt(currentPos + 3);
54 currentPos += 4;
55
56 // Extract the string content based on the length read
57 decodedStrings.push(s.substring(currentPos, currentPos + stringLength));
58 currentPos += stringLength;
59 }
60
61 return decodedStrings;
62}
63
Time and Space Complexity
Time Complexity: O(n)
where n
is the total length of all strings combined.
-
For
encode()
: We iterate through each string once and perform constant-time operations (formatting length and concatenation to list). The string join operation at the end isO(n)
wheren
is the total length of all strings. -
For
decode()
: We traverse the encoded string once from start to end. Each character is visited exactly once, and substring extractions[i : i + size]
takesO(size)
time. The total time across all substrings isO(n)
.
Space Complexity: O(n)
where n
is the total length of all strings.
-
For
encode()
: Theans
list stores formatted strings that together have length proportional to the input (4 extra characters per string for length encoding plus the original strings). The final joined string also takesO(n)
space. -
For
decode()
: Theans
list stores all decoded strings which together have the same total length as the original input, takingO(n)
space.
Common Pitfalls
1. String Length Exceeding Fixed Width
The most critical pitfall in this implementation is that it assumes all strings have a length that can be represented in 4 digits (maximum 9999 characters). If a string has 10,000 or more characters, the formatting f"{len(string):4}"
will exceed the 4-character width, breaking the decoding logic.
Example of the problem:
codec = Codec() long_string = "a" * 10000 # String with 10,000 characters encoded = codec.encode([long_string]) # The length "10000" takes 5 characters, not 4 # This breaks the fixed-width assumption decoded = codec.decode(encoded) # Will fail or produce incorrect results
Solution: Use a more robust length encoding scheme:
def encode(self, strs: List[str]) -> str:
encoded_parts = []
for string in strs:
# Use delimiter with escape sequence
# Format: "length:string" where : is a delimiter
length_str = str(len(string))
# Include the length of the length string itself
encoded_parts.append(f"{len(length_str)}:{length_str}{string}")
return "".join(encoded_parts)
def decode(self, s: str) -> List[str]:
decoded_strings = []
i = 0
while i < len(s):
# Read length of length
colon_pos = s.index(':', i)
len_of_len = int(s[i:colon_pos])
i = colon_pos + 1
# Read actual length
string_length = int(s[i:i + len_of_len])
i += len_of_len
# Read string
decoded_strings.append(s[i:i + string_length])
i += string_length
return decoded_strings
2. Alternative: Using Non-Printable Delimiter
Another approach that avoids the length limitation:
def encode(self, strs: List[str]) -> str:
# Use a non-printable character as delimiter
# Format: "length\x00string" for each string
result = []
for string in strs:
result.append(f"{len(string)}\x00{string}")
return "".join(result)
def decode(self, s: str) -> List[str]:
if not s:
return []
decoded_strings = []
i = 0
while i < len(s):
# Find the null character delimiter
null_pos = s.index('\x00', i)
length = int(s[i:null_pos])
i = null_pos + 1
decoded_strings.append(s[i:i + length])
i += length
return decoded_strings
3. Most Robust: Variable-Length Encoding
The most flexible solution uses a format like "length#string":
def encode(self, strs: List[str]) -> str:
return ''.join(f"{len(s)}#{s}" for s in strs)
def decode(self, s: str) -> List[str]:
decoded = []
i = 0
while i < len(s):
# Find the delimiter '#'
delimiter_pos = s.index('#', i)
length = int(s[i:delimiter_pos])
i = delimiter_pos + 1
decoded.append(s[i:i + length])
i += length
return decoded
This approach:
- Handles strings of any length
- Uses '#' as a delimiter between length and content
- The length prefix tells us exactly how many characters to read, so '#' appearing in the string content doesn't matter
Which algorithm is best for finding the shortest distance between two points in an unweighted graph?
Recommended Readings
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Want a Structured Path to Master System Design Too? Don’t Miss This!