Facebook Pixel

3374. First Letter Capitalization II

HardDatabase
Leetcode Link

Problem Description

This problem asks you to transform text content by applying specific capitalization rules. Given a table user_content with columns content_id and content_text, you need to process each text entry according to these rules:

  1. Basic Capitalization: Convert the first letter of each word to uppercase and all remaining letters to lowercase. For example, "hello WORLD" becomes "Hello World".

  2. Hyphenated Words: Words containing hyphens require special handling. Each part separated by a hyphen should be capitalized independently. For instance:

    • "quick-brown" becomes "Quick-Brown"
    • "modern-day" becomes "Modern-Day"
    • "FRONT-end" becomes "Front-End"
  3. Preserve Formatting: All other characters, spacing, and punctuation should remain unchanged.

The solution uses a string processing approach that:

  • Splits the text into words by spaces
  • For each word, checks if it contains a hyphen
  • If hyphenated, splits by hyphen, capitalizes each part, then rejoins with hyphen
  • If not hyphenated, simply capitalizes the word normally
  • Joins all processed words back together with spaces

The capitalize() method in Python is particularly useful here as it automatically converts the first character to uppercase and the rest to lowercase, handling the basic capitalization requirement perfectly.

The output should include the original text (renamed as original_text), the converted text (as converted_text), and the content_id, demonstrating the transformation applied to each row of text content.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is recognizing that this is a text processing problem with two distinct levels of splitting and transformation. When we look at the examples, we notice a pattern: regular words follow standard title case rules, but hyphenated words need special treatment where each hyphen-separated part is capitalized independently.

The natural approach is to break down the problem hierarchically:

  1. First split by spaces to get individual words
  2. Then check each word for hyphens and handle accordingly

Why this two-level approach? Consider the text "the QUICK-brown fox". If we tried to handle everything at once, we'd struggle to distinguish between spaces (word separators) and hyphens (sub-word separators). By processing in layers, we maintain the structure of the text.

The beauty of Python's capitalize() method is that it handles both uppercase and lowercase conversions in one operation - it doesn't matter if the input is "HELLO", "hello", or "HeLLo", it always produces "Hello". This eliminates the need for complex case checking logic.

For hyphenated words, we apply the same capitalize() logic to each part. The pattern "-".join([part.capitalize() for part in word.split("-")]) elegantly handles any number of hyphen-separated parts. Even if a word has multiple hyphens like "front-end-development", each part gets capitalized correctly.

The conditional expression if "-" in word acts as our decision point - it's a simple check that routes each word to the appropriate processing path. This branching logic, combined with list comprehension, allows us to process the entire text in a single pass, making the solution both readable and efficient.

Solution Approach

The solution implements a text transformation pipeline using pandas DataFrame operations and custom string processing logic.

Step 1: Define the Text Conversion Function

We create a nested function convert_text that processes each text entry:

def convert_text(text: str) -> str:
    return " ".join(
        (
            "-".join([part.capitalize() for part in word.split("-")])
            if "-" in word
            else word.capitalize()
        )
        for word in text.split(" ")
    )

This function works through several layers:

  • text.split(" ") breaks the input text into individual words
  • For each word, we check if it contains a hyphen using if "-" in word
  • If hyphenated: word.split("-") splits it into parts, part.capitalize() capitalizes each part, and "-".join() reassembles them
  • If not hyphenated: word.capitalize() directly applies title case
  • Finally, " ".join() combines all processed words back into a single string

Step 2: Apply the Transformation

We use pandas' apply() method to run our conversion function on each row:

user_content["converted_text"] = user_content["content_text"].apply(convert_text)

This creates a new column converted_text with the transformed text while preserving the original data.

Step 3: Format the Output

The final step renames and reorders columns to match the expected output format:

return user_content.rename(columns={"content_text": "original_text"})[
    ["content_id", "original_text", "converted_text"]
]
  • rename(columns={"content_text": "original_text"}) changes the column name to match requirements
  • Column selection [["content_id", "original_text", "converted_text"]] ensures the correct order

Algorithm Complexity:

  • Time: O(n × m) where n is the number of rows and m is the average length of text
  • Space: O(n × m) for storing the converted text column

The solution leverages Python's built-in string methods and list comprehensions for clean, efficient text processing, while pandas handles the DataFrame operations seamlessly.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a concrete example to illustrate how the solution processes text with both regular and hyphenated words.

Input:

content_id: 1
content_text: "the QUICK-brown fox JUMPS-OVER the lazy-DOG"

Step-by-Step Processing:

  1. Split by spaces: First, we break the text into individual words:

    ["the", "QUICK-brown", "fox", "JUMPS-OVER", "the", "lazy-DOG"]
  2. Process each word:

    • "the" → No hyphen detected → Apply capitalize()"The"

    • "QUICK-brown" → Hyphen detected → Split by "-": ["QUICK", "brown"]

      • "QUICK" → capitalize() → "Quick"
      • "brown" → capitalize() → "Brown"
      • Join with hyphen → "Quick-Brown"
    • "fox" → No hyphen → Apply capitalize()"Fox"

    • "JUMPS-OVER" → Hyphen detected → Split by "-": ["JUMPS", "OVER"]

      • "JUMPS" → capitalize() → "Jumps"
      • "OVER" → capitalize() → "Over"
      • Join with hyphen → "Jumps-Over"
    • "the" → No hyphen → Apply capitalize()"The"

    • "lazy-DOG" → Hyphen detected → Split by "-": ["lazy", "DOG"]

      • "lazy" → capitalize() → "Lazy"
      • "DOG" → capitalize() → "Dog"
      • Join with hyphen → "Lazy-Dog"
  3. Join all processed words: Combine with spaces:

    "The Quick-Brown Fox Jumps-Over The Lazy-Dog"

Final Output:

content_id: 1
original_text: "the QUICK-brown fox JUMPS-OVER the lazy-DOG"
converted_text: "The Quick-Brown Fox Jumps-Over The Lazy-Dog"

Notice how the capitalize() method handles both uppercase and lowercase inputs uniformly - "QUICK" becomes "Quick" just as "lazy" becomes "Lazy". The hyphen acts as a delimiter that triggers separate capitalization for each part, while spaces are preserved as word boundaries.

Solution Implementation

1import pandas as pd
2
3
4def capitalize_content(user_content: pd.DataFrame) -> pd.DataFrame:
5    """
6    Capitalizes the first letter of each word in content text, including hyphenated words.
7  
8    Args:
9        user_content: DataFrame containing content_text column to be capitalized
10      
11    Returns:
12        DataFrame with original_text and converted_text columns
13    """
14  
15    def convert_text(text: str) -> str:
16        """
17        Helper function to capitalize each word in a text string.
18        Handles both regular words and hyphenated words.
19      
20        Args:
21            text: Input string to be converted
22          
23        Returns:
24            String with all words capitalized
25        """
26        # Split text into words by spaces
27        words = text.split(" ")
28        capitalized_words = []
29      
30        for word in words:
31            # Check if word contains hyphen
32            if "-" in word:
33                # Split by hyphen, capitalize each part, then rejoin
34                hyphenated_parts = word.split("-")
35                capitalized_parts = [part.capitalize() for part in hyphenated_parts]
36                capitalized_word = "-".join(capitalized_parts)
37            else:
38                # Simply capitalize regular word
39                capitalized_word = word.capitalize()
40          
41            capitalized_words.append(capitalized_word)
42      
43        # Join all capitalized words back with spaces
44        return " ".join(capitalized_words)
45  
46    # Apply the conversion function to the content_text column
47    user_content["converted_text"] = user_content["content_text"].apply(convert_text)
48  
49    # Rename content_text to original_text for clarity
50    user_content = user_content.rename(columns={"content_text": "original_text"})
51  
52    # Return only the required columns in specified order
53    return user_content[["content_id", "original_text", "converted_text"]]
54
1import java.util.ArrayList;
2import java.util.List;
3import java.util.stream.Collectors;
4
5public class ContentCapitalizer {
6  
7    /**
8     * Capitalizes the first letter of each word in content text, including hyphenated words.
9     * 
10     * @param userContent DataFrame containing content_text column to be capitalized
11     * @return DataFrame with original_text and converted_text columns
12     */
13    public DataFrame capitalizeContent(DataFrame userContent) {
14        // Apply the conversion function to the content_text column
15        userContent.addColumn("converted_text", 
16            userContent.getColumn("content_text").apply(text -> convertText((String) text)));
17      
18        // Rename content_text to original_text for clarity
19        userContent.renameColumn("content_text", "original_text");
20      
21        // Return only the required columns in specified order
22        return userContent.select("content_id", "original_text", "converted_text");
23    }
24  
25    /**
26     * Helper function to capitalize each word in a text string.
27     * Handles both regular words and hyphenated words.
28     * 
29     * @param text Input string to be converted
30     * @return String with all words capitalized
31     */
32    private String convertText(String text) {
33        // Split text into words by spaces
34        String[] words = text.split(" ");
35        List<String> capitalizedWords = new ArrayList<>();
36      
37        for (String word : words) {
38            String capitalizedWord;
39          
40            // Check if word contains hyphen
41            if (word.contains("-")) {
42                // Split by hyphen, capitalize each part, then rejoin
43                String[] hyphenatedParts = word.split("-");
44                List<String> capitalizedParts = new ArrayList<>();
45              
46                for (String part : hyphenatedParts) {
47                    capitalizedParts.add(capitalize(part));
48                }
49              
50                capitalizedWord = String.join("-", capitalizedParts);
51            } else {
52                // Simply capitalize regular word
53                capitalizedWord = capitalize(word);
54            }
55          
56            capitalizedWords.add(capitalizedWord);
57        }
58      
59        // Join all capitalized words back with spaces
60        return String.join(" ", capitalizedWords);
61    }
62  
63    /**
64     * Capitalizes the first letter of a string and lowercases the rest.
65     * 
66     * @param str Input string to capitalize
67     * @return Capitalized string
68     */
69    private String capitalize(String str) {
70        if (str == null || str.isEmpty()) {
71            return str;
72        }
73        return str.substring(0, 1).toUpperCase() + str.substring(1).toLowerCase();
74    }
75}
76
1#include <string>
2#include <vector>
3#include <sstream>
4#include <algorithm>
5#include <cctype>
6
7struct DataFrame {
8    std::vector<int> content_id;
9    std::vector<std::string> content_text;
10    std::vector<std::string> original_text;
11    std::vector<std::string> converted_text;
12};
13
14class Solution {
15public:
16    /**
17     * Capitalizes the first letter of each word in content text, including hyphenated words.
18     * 
19     * @param user_content DataFrame containing content_text column to be capitalized
20     * @return DataFrame with original_text and converted_text columns
21     */
22    DataFrame capitalize_content(DataFrame& user_content) {
23        DataFrame result;
24      
25        // Copy content_id to result
26        result.content_id = user_content.content_id;
27      
28        // Process each content_text entry
29        for (const std::string& text : user_content.content_text) {
30            // Store original text
31            result.original_text.push_back(text);
32          
33            // Convert and store the capitalized text
34            result.converted_text.push_back(convert_text(text));
35        }
36      
37        return result;
38    }
39  
40private:
41    /**
42     * Helper function to capitalize each word in a text string.
43     * Handles both regular words and hyphenated words.
44     * 
45     * @param text Input string to be converted
46     * @return String with all words capitalized
47     */
48    std::string convert_text(const std::string& text) {
49        std::stringstream ss(text);
50        std::string word;
51        std::vector<std::string> capitalized_words;
52      
53        // Split text into words by spaces
54        while (ss >> word) {
55            std::string capitalized_word;
56          
57            // Check if word contains hyphen
58            if (word.find('-') != std::string::npos) {
59                // Process hyphenated word
60                std::stringstream hyphen_stream(word);
61                std::string part;
62                std::vector<std::string> capitalized_parts;
63              
64                // Split by hyphen, capitalize each part
65                while (std::getline(hyphen_stream, part, '-')) {
66                    capitalized_parts.push_back(capitalize_word(part));
67                }
68              
69                // Join capitalized parts with hyphen
70                for (size_t i = 0; i < capitalized_parts.size(); ++i) {
71                    capitalized_word += capitalized_parts[i];
72                    if (i < capitalized_parts.size() - 1) {
73                        capitalized_word += "-";
74                    }
75                }
76            } else {
77                // Simply capitalize regular word
78                capitalized_word = capitalize_word(word);
79            }
80          
81            capitalized_words.push_back(capitalized_word);
82        }
83      
84        // Join all capitalized words back with spaces
85        std::string result;
86        for (size_t i = 0; i < capitalized_words.size(); ++i) {
87            result += capitalized_words[i];
88            if (i < capitalized_words.size() - 1) {
89                result += " ";
90            }
91        }
92      
93        return result;
94    }
95  
96    /**
97     * Capitalizes the first letter of a word and lowercases the rest.
98     * 
99     * @param word Word to capitalize
100     * @return Capitalized word
101     */
102    std::string capitalize_word(const std::string& word) {
103        if (word.empty()) {
104            return word;
105        }
106      
107        std::string result = word;
108      
109        // Capitalize first character
110        result[0] = std::toupper(result[0]);
111      
112        // Lowercase remaining characters
113        for (size_t i = 1; i < result.length(); ++i) {
114            result[i] = std::tolower(result[i]);
115        }
116      
117        return result;
118    }
119};
120
1/**
2 * Capitalizes the first letter of each word in content text, including hyphenated words.
3 * 
4 * @param userContent - DataFrame containing content_text column to be capitalized
5 * @returns DataFrame with original_text and converted_text columns
6 */
7function capitalize_content(userContent: DataFrame): DataFrame {
8  
9    /**
10     * Helper function to capitalize each word in a text string.
11     * Handles both regular words and hyphenated words.
12     * 
13     * @param text - Input string to be converted
14     * @returns String with all words capitalized
15     */
16    function convertText(text: string): string {
17        // Split text into words by spaces
18        const words: string[] = text.split(" ");
19        const capitalizedWords: string[] = [];
20      
21        // Process each word
22        for (const word of words) {
23            let capitalizedWord: string;
24          
25            // Check if word contains hyphen
26            if (word.includes("-")) {
27                // Split by hyphen, capitalize each part, then rejoin
28                const hyphenatedParts: string[] = word.split("-");
29                const capitalizedParts: string[] = hyphenatedParts.map(part => 
30                    part.charAt(0).toUpperCase() + part.slice(1).toLowerCase()
31                );
32                capitalizedWord = capitalizedParts.join("-");
33            } else {
34                // Simply capitalize regular word
35                capitalizedWord = word.charAt(0).toUpperCase() + word.slice(1).toLowerCase();
36            }
37          
38            capitalizedWords.push(capitalizedWord);
39        }
40      
41        // Join all capitalized words back with spaces
42        return capitalizedWords.join(" ");
43    }
44  
45    // Apply the conversion function to the content_text column
46    userContent["converted_text"] = userContent["content_text"].apply(convertText);
47  
48    // Rename content_text to original_text for clarity
49    userContent = userContent.rename({ columns: { "content_text": "original_text" } });
50  
51    // Return only the required columns in specified order
52    return userContent[["content_id", "original_text", "converted_text"]];
53}
54

Time and Space Complexity

Time Complexity: O(n * m * k)

Where:

  • n is the number of rows in the DataFrame
  • m is the average number of words per content_text entry
  • k is the average length of each word

The analysis breaks down as follows:

  • Iterating through each row in the DataFrame: O(n)
  • For each row, the convert_text function:
    • Splits the text by spaces: O(length of text)
    • For each word in the split result: O(m)
      • Checks if "-" is in the word: O(k)
      • If hyphenated, splits by "-": O(k) and capitalizes each part: O(k * number of parts)
      • If not hyphenated, capitalizes the word: O(k)
      • Joins the parts back with "-": O(k)
    • Joins all words back with spaces: O(m * k)
  • DataFrame operations (rename, column selection): O(n)

The dominant operation is processing each word in each row, resulting in O(n * m * k).

Space Complexity: O(n * m * k)

The space breakdown:

  • Input DataFrame storage: O(n * m * k) for the original content_text
  • The converted_text column: O(n * m * k) for storing the capitalized version
  • Temporary space during processing:
    • Split results for each text: O(m * k)
    • List comprehension results: O(m * k)
  • Output DataFrame with three columns: O(n * m * k)

The space complexity is dominated by storing the original and converted text columns in the DataFrame, giving us O(n * m * k).

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Multiple Consecutive Spaces or Leading/Trailing Spaces

The current implementation using split(" ") doesn't handle multiple consecutive spaces correctly. When there are multiple spaces between words, split(" ") creates empty strings in the resulting list, which leads to incorrect output formatting.

Example Problem:

  • Input: "hello world" (two spaces)
  • Current output: "Hello World"
  • But empty strings from split cause issues: ['hello', '', 'world']

Solution: Use split() without arguments instead of split(" "). This automatically handles multiple spaces and strips leading/trailing whitespace:

def convert_text(text: str) -> str:
    words = text.split()  # Handles multiple spaces automatically
    capitalized_words = []
  
    for word in words:
        if "-" in word:
            parts = word.split("-")
            capitalized_word = "-".join(part.capitalize() for part in parts)
        else:
            capitalized_word = word.capitalize()
        capitalized_words.append(capitalized_word)
  
    return " ".join(capitalized_words)

2. Empty Hyphen Parts

Words with consecutive hyphens or hyphens at the start/end (like "--word", "word-", or "pre--fix") can create empty strings when split by hyphen, causing the capitalize() method to fail or produce unexpected results.

Example Problem:

  • Input: "front--end" (double hyphen)
  • Split result: ['front', '', 'end']
  • capitalize() on empty string returns empty string

Solution: Filter out empty parts or preserve the original hyphen structure:

if "-" in word:
    parts = word.split("-")
    # Only capitalize non-empty parts
    capitalized_parts = [part.capitalize() if part else "" for part in parts]
    capitalized_word = "-".join(capitalized_parts)

3. Special Characters and Numbers

Words starting with numbers or special characters might not behave as expected with capitalize(). For instance, "3rd" remains "3rd" after capitalize(), not "3Rd".

Example Problem:

  • Input: "123abc"
  • Output: "123abc" (no change, first character is a digit)

Solution: If you need to capitalize the first letter regardless of position:

def smart_capitalize(word):
    if not word:
        return word
    # Find the first letter and capitalize it
    for i, char in enumerate(word):
        if char.isalpha():
            return word[:i] + char.upper() + word[i+1:].lower()
    return word.lower()  # No letters found

4. Unicode and Non-ASCII Characters

The solution might not handle international characters or special Unicode properly, especially in languages with different capitalization rules.

Example Problem:

  • Input: "café" or "über"
  • Some systems might not capitalize accented characters correctly

Solution: Python's capitalize() generally handles Unicode well, but for specific locale-aware capitalization:

import locale
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')  # Or appropriate locale

def locale_capitalize(word):
    return word[0].upper() + word[1:].lower() if word else word

5. Performance with Very Long Texts

Creating multiple intermediate lists and strings can be memory-inefficient for very long texts.

Solution: Use generator expressions for better memory efficiency:

def convert_text(text: str) -> str:
    return " ".join(
        "-".join(part.capitalize() for part in word.split("-"))
        if "-" in word else word.capitalize()
        for word in text.split()
    )
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

A heap is a ...?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More