937. Reorder Data in Log Files
Problem Description
You have an array of logs where each log is a string containing words separated by spaces. The first word in each log is an identifier, and the remaining words form the content.
There are two types of logs:
- Letter-logs: After the identifier, all words contain only lowercase English letters
- Digit-logs: After the identifier, all words contain only digits
Your task is to reorder these logs according to these rules:
- All letter-logs must appear before all digit-logs
- Letter-logs should be sorted lexicographically by their content (everything after the identifier). If two letter-logs have identical content, sort them by their identifiers
- Digit-logs should keep their original relative order from the input
For example, if you have logs like ["a1 9 2 3 1", "g1 act car", "zo4 4 7", "ab1 off key dog"]
, the letter-logs are "g1 act car"
and "ab1 off key dog"
, while the digit-logs are "a1 9 2 3 1"
and "zo4 4 7"
.
The solution uses a custom sorting key function that:
- Assigns letter-logs a tuple
(0, content, identifier)
to ensure they come first and are sorted by content then identifier - Assigns digit-logs a tuple
(1,)
to ensure they come after letter-logs and maintain their original order
The split(" ", 1)
method splits each log into exactly two parts: the identifier and everything else. The sorting algorithm then handles the rest based on the custom key.
Intuition
The key insight is recognizing that we need different sorting behaviors for different types of logs. Letter-logs need complex sorting rules while digit-logs just need to maintain their original order. This suggests we can leverage Python's stable sort property along with a custom comparison key.
When we think about how to handle these different requirements in a single sort operation, we realize we can use tuple comparison. Python compares tuples element by element from left to right, which gives us a powerful way to encode multiple sorting criteria.
The first challenge is separating letter-logs from digit-logs while ensuring letter-logs always come before digit-logs. We can achieve this by assigning different first elements in our sorting tuples - 0
for letter-logs and 1
for digit-logs. Since 0 < 1
, all letter-logs will naturally sort before digit-logs.
For letter-logs, we need to sort by content first, then by identifier if contents match. This maps perfectly to a tuple structure: (0, content, identifier)
. When Python compares two letter-log tuples, it first sees they both start with 0
, then compares the content strings lexicographically, and only if those are equal does it compare the identifiers.
For digit-logs, we want to preserve their original order. Python's sort is stable, meaning elements that compare equal maintain their original relative positions. By giving all digit-logs the same sorting key (1,)
, they all compare as equal to each other (after the first element), so they maintain their original order while being placed after all letter-logs.
The elegance of this approach is that we encode all our sorting requirements into a single key function, letting Python's built-in sort handle all the complexity. The split(" ", 1)
cleverly separates the identifier from the rest of the content in one operation, and checking rest[0].isalpha()
quickly determines the log type.
Learn more about Sorting patterns.
Solution Approach
The solution implements a custom sorting approach using Python's sorted()
function with a key function that categorizes and processes each log appropriately.
Implementation Steps:
-
Define the Key Function: Create a function
f(log)
that takes a log string and returns a sorting key tuple. -
Split Each Log: Use
log.split(" ", 1)
to separate each log into two parts:id_
: the identifier (first word)rest
: everything after the identifier (remaining content)
The parameter
1
in split ensures we only split at the first space, keeping the rest of the content together. -
Determine Log Type: Check if the first character of
rest
is alphabetic usingrest[0].isalpha()
:- If
True
, it's a letter-log - If
False
, it's a digit-log
- If
-
Return Appropriate Sorting Key:
- For letter-logs: Return tuple
(0, rest, id_)
0
ensures letter-logs come firstrest
provides primary sorting by contentid_
provides secondary sorting by identifier when contents are equal
- For digit-logs: Return tuple
(1,)
1
ensures digit-logs come after letter-logs- Single-element tuple makes all digit-logs equal in comparison, preserving original order due to stable sort
- For letter-logs: Return tuple
-
Apply the Sort: Use
sorted(logs, key=f)
to sort the entire logs array based on the custom key function.
The beauty of this approach is that it leverages Python's tuple comparison rules and stable sorting property to handle all requirements in a single pass. The time complexity is O(n log n)
where n
is the number of logs, dominated by the sorting operation. The space complexity is O(n)
for storing the sorted result and the temporary keys during sorting.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example with the logs: ["b1 car dog", "a2 5 7", "g3 cat", "a1 9 4"]
Step 1: Identify log types
"b1 car dog"
β split intoid_="b1"
,rest="car dog"
β"c".isalpha()
= True β letter-log"a2 5 7"
β split intoid_="a2"
,rest="5 7"
β"5".isalpha()
= False β digit-log"g3 cat"
β split intoid_="g3"
,rest="cat"
β"c".isalpha()
= True β letter-log"a1 9 4"
β split intoid_="a1"
,rest="9 4"
β"9".isalpha()
= False β digit-log
Step 2: Generate sorting keys
"b1 car dog"
β(0, "car dog", "b1")
"a2 5 7"
β(1,)
"g3 cat"
β(0, "cat", "g3")
"a1 9 4"
β(1,)
Step 3: Sort by keys When comparing tuples, Python compares element by element from left to right:
- Compare
(0, "car dog", "b1")
vs(1,)
: First elements are 0 vs 1, so letter-log comes first - Compare
(0, "cat", "g3")
vs(1,)
: First elements are 0 vs 1, so letter-log comes first - Compare
(0, "car dog", "b1")
vs(0, "cat", "g3")
: First elements equal (both 0), compare second elements "car dog" vs "cat" β "car dog" > "cat" lexicographically - Both digit-logs have key
(1,)
so they're equal and maintain original order
Step 4: Final sorted order
"g3 cat"
(key:(0, "cat", "g3")
)"b1 car dog"
(key:(0, "car dog", "b1")
)"a2 5 7"
(key:(1,)
) - first digit-log in original order"a1 9 4"
(key:(1,)
) - second digit-log in original order
Result: ["g3 cat", "b1 car dog", "a2 5 7", "a1 9 4"]
The letter-logs are sorted by content ("cat" before "car dog"), and the digit-logs maintain their original relative order.
Solution Implementation
1class Solution:
2 def reorderLogFiles(self, logs: List[str]) -> List[str]:
3 """
4 Reorder log files according to the following rules:
5 1. Letter-logs come before digit-logs
6 2. Letter-logs are sorted lexicographically by content, then by identifier if content is the same
7 3. Digit-logs maintain their relative order
8
9 Args:
10 logs: List of log strings, each with format "identifier content"
11
12 Returns:
13 Reordered list of logs
14 """
15
16 def get_sort_key(log: str) -> tuple:
17 """
18 Generate a sorting key for each log entry.
19
20 For letter-logs: returns (0, content, identifier) for proper sorting
21 For digit-logs: returns (1,) to maintain relative order
22
23 Args:
24 log: A single log string
25
26 Returns:
27 Tuple used as sorting key
28 """
29 # Split log into identifier and content (split only on first space)
30 identifier, content = log.split(" ", 1)
31
32 # Check if this is a letter-log (first character of content is alphabetic)
33 if content[0].isalpha():
34 # Letter-log: sort by (0, content, identifier)
35 # 0 ensures letter-logs come first
36 return (0, content, identifier)
37 else:
38 # Digit-log: return (1,) to place after letter-logs
39 # Single element tuple maintains relative order among digit-logs
40 return (1,)
41
42 # Sort logs using the custom key function
43 return sorted(logs, key=get_sort_key)
44
1class Solution {
2 public String[] reorderLogFiles(String[] logs) {
3 // Sort the logs array using a custom comparator
4 Arrays.sort(logs, (firstLog, secondLog) -> {
5 // Split each log into identifier and content (limit to 2 parts)
6 // Format: "identifier content content..."
7 String[] firstLogParts = firstLog.split(" ", 2);
8 String[] secondLogParts = secondLog.split(" ", 2);
9
10 // Check if each log is a letter-log by examining the first character of content
11 boolean isFirstLogLetter = Character.isLetter(firstLogParts[1].charAt(0));
12 boolean isSecondLogLetter = Character.isLetter(secondLogParts[1].charAt(0));
13
14 // Case 1: Both logs are letter-logs
15 if (isFirstLogLetter && isSecondLogLetter) {
16 // Compare the content parts lexicographically
17 int contentComparison = firstLogParts[1].compareTo(secondLogParts[1]);
18
19 // If contents are different, return the comparison result
20 if (contentComparison != 0) {
21 return contentComparison;
22 }
23
24 // If contents are identical, compare identifiers as tiebreaker
25 return firstLogParts[0].compareTo(secondLogParts[0]);
26 }
27
28 // Case 2: Mixed log types or both digit-logs
29 // Letter-logs come before digit-logs (-1 means first comes before second)
30 // If both are digit-logs, maintain original order (return 0)
31 return isFirstLogLetter ? -1 : (isSecondLogLetter ? 1 : 0);
32 });
33
34 return logs;
35 }
36}
37
1class Solution {
2public:
3 vector<string> reorderLogFiles(vector<string>& logs) {
4 // Use stable_sort to maintain relative order of equal elements
5 stable_sort(logs.begin(), logs.end(), [](const string& log1, const string& log2) {
6 // Find the position of first space to separate identifier and content
7 int spacePos1 = log1.find(' ');
8 int spacePos2 = log2.find(' ');
9
10 // Extract log identifiers (everything before first space)
11 string identifier1 = log1.substr(0, spacePos1);
12 string identifier2 = log2.substr(0, spacePos2);
13
14 // Extract log contents (everything after first space)
15 string content1 = log1.substr(spacePos1 + 1);
16 string content2 = log2.substr(spacePos2 + 1);
17
18 // Check if logs are letter-logs (first character of content is alphabetic)
19 bool isLetterLog1 = isalpha(content1[0]);
20 bool isLetterLog2 = isalpha(content2[0]);
21
22 // Both logs are letter-logs
23 if (isLetterLog1 && isLetterLog2) {
24 // Sort by content lexicographically
25 if (content1 != content2) {
26 return content1 < content2;
27 }
28 // If contents are identical, sort by identifier
29 return identifier1 < identifier2;
30 }
31
32 // Mixed case: letter-logs come before digit-logs
33 // isLetterLog1 > isLetterLog2 ensures:
34 // - If log1 is letter and log2 is digit: true (letter comes first)
35 // - If log1 is digit and log2 is letter: false (letter comes first)
36 // - If both are digit-logs: false (maintain original order)
37 return isLetterLog1 > isLetterLog2;
38 });
39
40 return logs;
41 }
42};
43
1/**
2 * Reorders log files according to specific rules:
3 * - Letter-logs come before digit-logs
4 * - Letter-logs are sorted lexicographically by content, then by identifier if content is the same
5 * - Digit-logs maintain their original relative order
6 *
7 * @param logs - Array of log strings in format "identifier content"
8 * @returns Reordered array of log strings
9 */
10function reorderLogFiles(logs: string[]): string[] {
11 return logs.sort((log1: string, log2: string): number => {
12 // Split each log into identifier and content parts
13 // Using regex to split at first space and capture everything after
14 const [identifier1, content1]: string[] = log1.split(/ (.+)/);
15 const [identifier2, content2]: string[] = log2.split(/ (.+)/);
16
17 // Check if the first character of content is a letter (not a digit)
18 const isLetterLog1: boolean = isNaN(Number(content1[0]));
19 const isLetterLog2: boolean = isNaN(Number(content2[0]));
20
21 // Both logs are letter-logs
22 if (isLetterLog1 && isLetterLog2) {
23 // Compare contents lexicographically
24 const contentComparison: number = content1.localeCompare(content2);
25
26 // If contents are different, sort by content
27 if (contentComparison !== 0) {
28 return contentComparison;
29 }
30
31 // If contents are the same, sort by identifier
32 return identifier1.localeCompare(identifier2);
33 }
34
35 // Handle mixed log types or both digit-logs
36 // Letter-logs come before digit-logs (-1)
37 // Digit-logs maintain original order (0)
38 return isLetterLog1 ? -1 : isLetterLog2 ? 1 : 0;
39 });
40}
41
Time and Space Complexity
Time Complexity: O(n Γ log n Γ m)
The code uses Python's sorted()
function which has a time complexity of O(n Γ log n)
for sorting n
elements. However, for each comparison during sorting, the key function f()
is called, which performs:
log.split(" ", 1)
:O(m)
wherem
is the average length of each log string- String slicing and character checking:
O(1)
Since the sorting algorithm makes O(n Γ log n)
comparisons and each comparison involves O(m)
work for string operations, the total time complexity is O(n Γ log n Γ m)
.
Note: The reference answer simplifies this to O(n Γ log n)
by either assuming m
is constant or considering it negligible compared to the sorting complexity.
Space Complexity: O(n Γ m)
The space complexity includes:
- The sorted list returned by
sorted()
:O(n)
references to the original log strings - The key function creates tuples for each log during sorting:
O(n)
tuples - Each tuple contains string references from
split()
operation:O(m)
for the "rest" part of each log - Total auxiliary space:
O(n Γ m)
The reference answer states O(n)
, which considers only the number of logs without accounting for the string lengths stored in the sorting keys.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Incorrect Split Limit
One of the most common mistakes is using log.split(" ")
without the limit parameter, which would split the entire log into multiple parts instead of just separating the identifier from the content.
Incorrect:
parts = log.split(" ") # This splits into ALL words identifier = parts[0] content = " ".join(parts[1:]) # Need to rejoin, inefficient
Correct:
identifier, content = log.split(" ", 1) # Splits only at first space
2. Wrong Content Type Detection
Another pitfall is incorrectly determining whether a log is a letter-log or digit-log by checking the wrong part of the string or using inappropriate methods.
Incorrect:
# Checking the identifier instead of content if identifier[0].isalpha(): # Wrong! Should check content # Or checking if entire content is alphabetic if content.isalpha(): # Wrong! Spaces between words would fail
Correct:
if content[0].isalpha(): # Check first character of content only
3. Improper Sorting Key for Digit-logs
Using an incomplete tuple for digit-logs can break the stable sort property or cause comparison errors.
Incorrect:
# Using index might not preserve stable sort in all implementations return (1, index) # Or returning a non-tuple return 1 # Not a tuple, might cause comparison issues
Correct:
return (1,) # Single-element tuple ensures proper comparison
4. Edge Case: Empty Logs or Missing Content
Not handling edge cases where logs might be malformed or have minimal content.
Solution:
def get_sort_key(log: str) -> tuple:
# Handle potential edge cases
if " " not in log:
# Log has no content, treat as digit-log to maintain order
return (1,)
identifier, content = log.split(" ", 1)
if not content: # Empty content after identifier
return (1,)
if content[0].isalpha():
return (0, content, identifier)
else:
return (1,)
5. Mutating Original List
Using list.sort()
instead of sorted()
when the original list should be preserved.
Incorrect (if original should be preserved):
logs.sort(key=get_sort_key) # Modifies original list return logs
Correct:
return sorted(logs, key=get_sort_key) # Returns new sorted list
Which of the following is a min heap?
Recommended Readings
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Donβt Miss This!