Facebook Pixel

937. Reorder Data in Log Files

Problem Description

You have an array of logs where each log is a string containing words separated by spaces. The first word in each log is an identifier, and the remaining words form the content.

There are two types of logs:

  • Letter-logs: After the identifier, all words contain only lowercase English letters
  • Digit-logs: After the identifier, all words contain only digits

Your task is to reorder these logs according to these rules:

  1. All letter-logs must appear before all digit-logs
  2. Letter-logs should be sorted lexicographically by their content (everything after the identifier). If two letter-logs have identical content, sort them by their identifiers
  3. Digit-logs should keep their original relative order from the input

For example, if you have logs like ["a1 9 2 3 1", "g1 act car", "zo4 4 7", "ab1 off key dog"], the letter-logs are "g1 act car" and "ab1 off key dog", while the digit-logs are "a1 9 2 3 1" and "zo4 4 7".

The solution uses a custom sorting key function that:

  • Assigns letter-logs a tuple (0, content, identifier) to ensure they come first and are sorted by content then identifier
  • Assigns digit-logs a tuple (1,) to ensure they come after letter-logs and maintain their original order

The split(" ", 1) method splits each log into exactly two parts: the identifier and everything else. The sorting algorithm then handles the rest based on the custom key.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is recognizing that we need different sorting behaviors for different types of logs. Letter-logs need complex sorting rules while digit-logs just need to maintain their original order. This suggests we can leverage Python's stable sort property along with a custom comparison key.

When we think about how to handle these different requirements in a single sort operation, we realize we can use tuple comparison. Python compares tuples element by element from left to right, which gives us a powerful way to encode multiple sorting criteria.

The first challenge is separating letter-logs from digit-logs while ensuring letter-logs always come before digit-logs. We can achieve this by assigning different first elements in our sorting tuples - 0 for letter-logs and 1 for digit-logs. Since 0 < 1, all letter-logs will naturally sort before digit-logs.

For letter-logs, we need to sort by content first, then by identifier if contents match. This maps perfectly to a tuple structure: (0, content, identifier). When Python compares two letter-log tuples, it first sees they both start with 0, then compares the content strings lexicographically, and only if those are equal does it compare the identifiers.

For digit-logs, we want to preserve their original order. Python's sort is stable, meaning elements that compare equal maintain their original relative positions. By giving all digit-logs the same sorting key (1,), they all compare as equal to each other (after the first element), so they maintain their original order while being placed after all letter-logs.

The elegance of this approach is that we encode all our sorting requirements into a single key function, letting Python's built-in sort handle all the complexity. The split(" ", 1) cleverly separates the identifier from the rest of the content in one operation, and checking rest[0].isalpha() quickly determines the log type.

Learn more about Sorting patterns.

Solution Approach

The solution implements a custom sorting approach using Python's sorted() function with a key function that categorizes and processes each log appropriately.

Implementation Steps:

  1. Define the Key Function: Create a function f(log) that takes a log string and returns a sorting key tuple.

  2. Split Each Log: Use log.split(" ", 1) to separate each log into two parts:

    • id_: the identifier (first word)
    • rest: everything after the identifier (remaining content)

    The parameter 1 in split ensures we only split at the first space, keeping the rest of the content together.

  3. Determine Log Type: Check if the first character of rest is alphabetic using rest[0].isalpha():

    • If True, it's a letter-log
    • If False, it's a digit-log
  4. Return Appropriate Sorting Key:

    • For letter-logs: Return tuple (0, rest, id_)
      • 0 ensures letter-logs come first
      • rest provides primary sorting by content
      • id_ provides secondary sorting by identifier when contents are equal
    • For digit-logs: Return tuple (1,)
      • 1 ensures digit-logs come after letter-logs
      • Single-element tuple makes all digit-logs equal in comparison, preserving original order due to stable sort
  5. Apply the Sort: Use sorted(logs, key=f) to sort the entire logs array based on the custom key function.

The beauty of this approach is that it leverages Python's tuple comparison rules and stable sorting property to handle all requirements in a single pass. The time complexity is O(n log n) where n is the number of logs, dominated by the sorting operation. The space complexity is O(n) for storing the sorted result and the temporary keys during sorting.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a small example with the logs: ["b1 car dog", "a2 5 7", "g3 cat", "a1 9 4"]

Step 1: Identify log types

  • "b1 car dog" β†’ split into id_="b1", rest="car dog" β†’ "c".isalpha() = True β†’ letter-log
  • "a2 5 7" β†’ split into id_="a2", rest="5 7" β†’ "5".isalpha() = False β†’ digit-log
  • "g3 cat" β†’ split into id_="g3", rest="cat" β†’ "c".isalpha() = True β†’ letter-log
  • "a1 9 4" β†’ split into id_="a1", rest="9 4" β†’ "9".isalpha() = False β†’ digit-log

Step 2: Generate sorting keys

  • "b1 car dog" β†’ (0, "car dog", "b1")
  • "a2 5 7" β†’ (1,)
  • "g3 cat" β†’ (0, "cat", "g3")
  • "a1 9 4" β†’ (1,)

Step 3: Sort by keys When comparing tuples, Python compares element by element from left to right:

  1. Compare (0, "car dog", "b1") vs (1,): First elements are 0 vs 1, so letter-log comes first
  2. Compare (0, "cat", "g3") vs (1,): First elements are 0 vs 1, so letter-log comes first
  3. Compare (0, "car dog", "b1") vs (0, "cat", "g3"): First elements equal (both 0), compare second elements "car dog" vs "cat" β†’ "car dog" > "cat" lexicographically
  4. Both digit-logs have key (1,) so they're equal and maintain original order

Step 4: Final sorted order

  1. "g3 cat" (key: (0, "cat", "g3"))
  2. "b1 car dog" (key: (0, "car dog", "b1"))
  3. "a2 5 7" (key: (1,)) - first digit-log in original order
  4. "a1 9 4" (key: (1,)) - second digit-log in original order

Result: ["g3 cat", "b1 car dog", "a2 5 7", "a1 9 4"]

The letter-logs are sorted by content ("cat" before "car dog"), and the digit-logs maintain their original relative order.

Solution Implementation

1class Solution:
2    def reorderLogFiles(self, logs: List[str]) -> List[str]:
3        """
4        Reorder log files according to the following rules:
5        1. Letter-logs come before digit-logs
6        2. Letter-logs are sorted lexicographically by content, then by identifier if content is the same
7        3. Digit-logs maintain their relative order
8      
9        Args:
10            logs: List of log strings, each with format "identifier content"
11      
12        Returns:
13            Reordered list of logs
14        """
15      
16        def get_sort_key(log: str) -> tuple:
17            """
18            Generate a sorting key for each log entry.
19          
20            For letter-logs: returns (0, content, identifier) for proper sorting
21            For digit-logs: returns (1,) to maintain relative order
22          
23            Args:
24                log: A single log string
25          
26            Returns:
27                Tuple used as sorting key
28            """
29            # Split log into identifier and content (split only on first space)
30            identifier, content = log.split(" ", 1)
31          
32            # Check if this is a letter-log (first character of content is alphabetic)
33            if content[0].isalpha():
34                # Letter-log: sort by (0, content, identifier)
35                # 0 ensures letter-logs come first
36                return (0, content, identifier)
37            else:
38                # Digit-log: return (1,) to place after letter-logs
39                # Single element tuple maintains relative order among digit-logs
40                return (1,)
41      
42        # Sort logs using the custom key function
43        return sorted(logs, key=get_sort_key)
44
1class Solution {
2    public String[] reorderLogFiles(String[] logs) {
3        // Sort the logs array using a custom comparator
4        Arrays.sort(logs, (firstLog, secondLog) -> {
5            // Split each log into identifier and content (limit to 2 parts)
6            // Format: "identifier content content..."
7            String[] firstLogParts = firstLog.split(" ", 2);
8            String[] secondLogParts = secondLog.split(" ", 2);
9          
10            // Check if each log is a letter-log by examining the first character of content
11            boolean isFirstLogLetter = Character.isLetter(firstLogParts[1].charAt(0));
12            boolean isSecondLogLetter = Character.isLetter(secondLogParts[1].charAt(0));
13          
14            // Case 1: Both logs are letter-logs
15            if (isFirstLogLetter && isSecondLogLetter) {
16                // Compare the content parts lexicographically
17                int contentComparison = firstLogParts[1].compareTo(secondLogParts[1]);
18              
19                // If contents are different, return the comparison result
20                if (contentComparison != 0) {
21                    return contentComparison;
22                }
23              
24                // If contents are identical, compare identifiers as tiebreaker
25                return firstLogParts[0].compareTo(secondLogParts[0]);
26            }
27          
28            // Case 2: Mixed log types or both digit-logs
29            // Letter-logs come before digit-logs (-1 means first comes before second)
30            // If both are digit-logs, maintain original order (return 0)
31            return isFirstLogLetter ? -1 : (isSecondLogLetter ? 1 : 0);
32        });
33      
34        return logs;
35    }
36}
37
1class Solution {
2public:
3    vector<string> reorderLogFiles(vector<string>& logs) {
4        // Use stable_sort to maintain relative order of equal elements
5        stable_sort(logs.begin(), logs.end(), [](const string& log1, const string& log2) {
6            // Find the position of first space to separate identifier and content
7            int spacePos1 = log1.find(' ');
8            int spacePos2 = log2.find(' ');
9          
10            // Extract log identifiers (everything before first space)
11            string identifier1 = log1.substr(0, spacePos1);
12            string identifier2 = log2.substr(0, spacePos2);
13          
14            // Extract log contents (everything after first space)
15            string content1 = log1.substr(spacePos1 + 1);
16            string content2 = log2.substr(spacePos2 + 1);
17          
18            // Check if logs are letter-logs (first character of content is alphabetic)
19            bool isLetterLog1 = isalpha(content1[0]);
20            bool isLetterLog2 = isalpha(content2[0]);
21          
22            // Both logs are letter-logs
23            if (isLetterLog1 && isLetterLog2) {
24                // Sort by content lexicographically
25                if (content1 != content2) {
26                    return content1 < content2;
27                }
28                // If contents are identical, sort by identifier
29                return identifier1 < identifier2;
30            }
31          
32            // Mixed case: letter-logs come before digit-logs
33            // isLetterLog1 > isLetterLog2 ensures:
34            // - If log1 is letter and log2 is digit: true (letter comes first)
35            // - If log1 is digit and log2 is letter: false (letter comes first)
36            // - If both are digit-logs: false (maintain original order)
37            return isLetterLog1 > isLetterLog2;
38        });
39      
40        return logs;
41    }
42};
43
1/**
2 * Reorders log files according to specific rules:
3 * - Letter-logs come before digit-logs
4 * - Letter-logs are sorted lexicographically by content, then by identifier if content is the same
5 * - Digit-logs maintain their original relative order
6 * 
7 * @param logs - Array of log strings in format "identifier content"
8 * @returns Reordered array of log strings
9 */
10function reorderLogFiles(logs: string[]): string[] {
11    return logs.sort((log1: string, log2: string): number => {
12        // Split each log into identifier and content parts
13        // Using regex to split at first space and capture everything after
14        const [identifier1, content1]: string[] = log1.split(/ (.+)/);
15        const [identifier2, content2]: string[] = log2.split(/ (.+)/);
16
17        // Check if the first character of content is a letter (not a digit)
18        const isLetterLog1: boolean = isNaN(Number(content1[0]));
19        const isLetterLog2: boolean = isNaN(Number(content2[0]));
20
21        // Both logs are letter-logs
22        if (isLetterLog1 && isLetterLog2) {
23            // Compare contents lexicographically
24            const contentComparison: number = content1.localeCompare(content2);
25          
26            // If contents are different, sort by content
27            if (contentComparison !== 0) {
28                return contentComparison;
29            }
30          
31            // If contents are the same, sort by identifier
32            return identifier1.localeCompare(identifier2);
33        }
34
35        // Handle mixed log types or both digit-logs
36        // Letter-logs come before digit-logs (-1)
37        // Digit-logs maintain original order (0)
38        return isLetterLog1 ? -1 : isLetterLog2 ? 1 : 0;
39    });
40}
41

Time and Space Complexity

Time Complexity: O(n Γ— log n Γ— m)

The code uses Python's sorted() function which has a time complexity of O(n Γ— log n) for sorting n elements. However, for each comparison during sorting, the key function f() is called, which performs:

  • log.split(" ", 1): O(m) where m is the average length of each log string
  • String slicing and character checking: O(1)

Since the sorting algorithm makes O(n Γ— log n) comparisons and each comparison involves O(m) work for string operations, the total time complexity is O(n Γ— log n Γ— m).

Note: The reference answer simplifies this to O(n Γ— log n) by either assuming m is constant or considering it negligible compared to the sorting complexity.

Space Complexity: O(n Γ— m)

The space complexity includes:

  • The sorted list returned by sorted(): O(n) references to the original log strings
  • The key function creates tuples for each log during sorting: O(n) tuples
  • Each tuple contains string references from split() operation: O(m) for the "rest" part of each log
  • Total auxiliary space: O(n Γ— m)

The reference answer states O(n), which considers only the number of logs without accounting for the string lengths stored in the sorting keys.

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Incorrect Split Limit

One of the most common mistakes is using log.split(" ") without the limit parameter, which would split the entire log into multiple parts instead of just separating the identifier from the content.

Incorrect:

parts = log.split(" ")  # This splits into ALL words
identifier = parts[0]
content = " ".join(parts[1:])  # Need to rejoin, inefficient

Correct:

identifier, content = log.split(" ", 1)  # Splits only at first space

2. Wrong Content Type Detection

Another pitfall is incorrectly determining whether a log is a letter-log or digit-log by checking the wrong part of the string or using inappropriate methods.

Incorrect:

# Checking the identifier instead of content
if identifier[0].isalpha():  # Wrong! Should check content

# Or checking if entire content is alphabetic
if content.isalpha():  # Wrong! Spaces between words would fail

Correct:

if content[0].isalpha():  # Check first character of content only

3. Improper Sorting Key for Digit-logs

Using an incomplete tuple for digit-logs can break the stable sort property or cause comparison errors.

Incorrect:

# Using index might not preserve stable sort in all implementations
return (1, index)  

# Or returning a non-tuple
return 1  # Not a tuple, might cause comparison issues

Correct:

return (1,)  # Single-element tuple ensures proper comparison

4. Edge Case: Empty Logs or Missing Content

Not handling edge cases where logs might be malformed or have minimal content.

Solution:

def get_sort_key(log: str) -> tuple:
    # Handle potential edge cases
    if " " not in log:
        # Log has no content, treat as digit-log to maintain order
        return (1,)
  
    identifier, content = log.split(" ", 1)
  
    if not content:  # Empty content after identifier
        return (1,)
  
    if content[0].isalpha():
        return (0, content, identifier)
    else:
        return (1,)

5. Mutating Original List

Using list.sort() instead of sorted() when the original list should be preserved.

Incorrect (if original should be preserved):

logs.sort(key=get_sort_key)  # Modifies original list
return logs

Correct:

return sorted(logs, key=get_sort_key)  # Returns new sorted list
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which of the following is a min heap?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More