722. Remove Comments
Problem Description
You need to remove all comments from a C++ program. The program is given as an array of strings source
, where each string represents one line of the source code.
There are two types of comments in C++:
-
Line comments: Starting with
"//"
, everything from this point to the end of the current line should be ignored. -
Block comments: Starting with
"/*"
and ending with"*/"
, all characters between these markers should be ignored, even if they span multiple lines. Note that"/*/"
does not end a block comment because the ending would overlap with the beginning.
Important rules:
- The first comment marker encountered takes precedence. If
"//"
appears inside a block comment, it's ignored. Similarly, if"/*"
appears inside any comment, it's also ignored. - After removing comments, empty lines should not be included in the output.
- Every string in the result must be non-empty.
- Block comments can span multiple lines and effectively join parts of different lines together.
- Every
"/*"
is guaranteed to have a matching"*/"
.
The algorithm processes the source code line by line, maintaining a flag blockComment
to track whether we're currently inside a block comment. For each line:
- If inside a block comment, look for
"*/"
to end it - If not in a block comment, check for
"/*"
to start one or"//"
to ignore the rest of the line - Valid characters outside comments are collected and form the output lines
The solution returns the source code with all comments removed, preserving the original format but excluding empty lines.
Intuition
The key insight is that we need to process the source code character by character while maintaining state about whether we're currently inside a block comment. This is necessary because block comments can span multiple lines, so we can't process each line independently.
Think of it like reading through the code with a highlighter - we need to know at any point whether we should be "highlighting" (ignoring) text or not. The challenge is that comment markers have different behaviors:
"/*"
starts a highlighting region that continues until we find"*/"
"//"
only highlights until the end of the current line- Once we're highlighting (in a comment), other comment markers don't matter
This naturally leads to a state machine approach where we track whether we're in a block comment (blockComment
flag). As we scan through each character:
- If we're in a block comment, we only care about finding
"*/"
to exit - If we're not in a block comment, we need to check for both
"/*"
and"//"
The tricky part is handling how block comments can merge lines. When a block comment spans multiple lines like:
line1 /* comment starts comment continues */ line2
The result should be "line1 line2"
on a single line. This is why we use a temporary buffer t
to accumulate valid characters. We only create a new output line when:
- We finish processing a source line
- We're not currently in a block comment
- We have accumulated some valid characters
By processing character by character and maintaining these states, we can correctly handle all the edge cases of nested appearances and multi-line comments.
Solution Approach
We implement a character-by-character scanning approach with state tracking. Here's how the solution works:
Data Structures Used:
ans
: List to store the final result (lines with comments removed)t
: Temporary list to accumulate valid characters for the current line being builtblock_comment
: Boolean flag to track if we're currently inside a block comment
Algorithm Steps:
-
Initialize State: Start with
block_comment = False
and an empty temporary buffert
. -
Process Each Line: For each line
s
in the source:- Set index
i = 0
and get line lengthm = len(s)
- Process characters one by one using a while loop
- Set index
-
Character Processing Logic:
Case 1: Inside Block Comment (
block_comment == True
):- Check if current and next characters form
"*/"
- If yes: Set
block_comment = False
and skip both characters (i += 1
extra) - If no: Just move to next character (ignore current character)
Case 2: Outside Block Comment (
block_comment == False
):- Check if current and next characters form
"/*"
:- If yes: Set
block_comment = True
and skip both characters
- If yes: Set
- Check if current and next characters form
"//"
:- If yes: Break from current line (rest of line is commented)
- Otherwise:
- Current character is valid, append to
t
- Current character is valid, append to
- Check if current and next characters form
-
Line Completion: After processing all characters in a line:
- If
block_comment == False
andt
is not empty:- Join characters in
t
to form a string and add toans
- Clear
t
for next line
- Join characters in
- If still in block comment or
t
is empty, continue to next line without outputting
- If
-
Return Result: Return
ans
containing all non-empty lines after comment removal
Key Implementation Details:
- We check
i + 1 < m
before accessing two-character sequences to avoid index out of bounds - The
i += 1
at the end of the main loop moves to the next character - When we find a two-character comment marker, we do an additional
i += 1
to skip both characters - The temporary buffer
t
allows us to build lines that may span multiple source lines (when block comments join them)
This approach handles all edge cases correctly:
- Multi-line block comments that join parts of different lines
- Nested comment markers that should be ignored
- Empty lines after comment removal that shouldn't appear in output
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example to illustrate how the solution handles different comment types:
Input:
source = ["int main() {", " /* block comment", " continues */ int x = 5;", " // line comment", " return 0;", "}"]
Processing Step by Step:
Line 0: "int main() {"
block_comment = False
,t = []
- Scan each character: No comment markers found
- Add all characters to
t
:['i','n','t',' ','m','a','i','n','(',')',' ','{']
- End of line: Not in block comment and
t
is not empty - Output:
"int main() {"
added toans
- Clear
t
for next line
Line 1: " /* block comment"
block_comment = False
,t = []
- Characters
' ',' '
added tot
- At index 2-3: Found
"/*"
→ Setblock_comment = True
, skip both characters - Continue scanning: remaining characters are ignored (we're in block comment)
- End of line: Still in block comment, so don't output anything
- Keep
t = [' ',' ']
for potential joining with next line
Line 2: " continues */ int x = 5;"
block_comment = True
,t = [' ',' ']
(from previous line)- Scan for
"*/"
: Found at index 12-13 - Set
block_comment = False
, skip the"*/"
- Continue from index 14: Add
' ','i','n','t',' ','x',' ','=',' ','5',';'
tot
- End of line: Not in block comment and
t
is not empty - Output:
" int x = 5;"
added toans
(notice the two spaces from line 1 are preserved) - Clear
t
Line 3: " // line comment"
block_comment = False
,t = []
- Characters
' ',' '
added tot
- At index 2-3: Found
"//"
→ Break (rest of line is commented) - End of line: Not in block comment but
t
only has spaces - Output:
" "
added toans
- Clear
t
Line 4: " return 0;"
block_comment = False
,t = []
- Scan each character: No comment markers found
- Add all characters to
t
- End of line: Output
" return 0;"
added toans
Line 5: "}"
block_comment = False
,t = []
- Single character
'}'
added tot
- End of line: Output
"}"
added toans
Final Result:
ans = ["int main() {", " int x = 5;", " ", " return 0;", "}"]
This example demonstrates:
- How block comments spanning multiple lines join text from different lines
- How line comments cause the rest of the line to be ignored
- How the state (
block_comment
flag) carries across lines - How the temporary buffer
t
accumulates characters until a valid line is complete
Solution Implementation
1class Solution:
2 def removeComments(self, source: List[str]) -> List[str]:
3 """
4 Remove all comments from source code.
5 Handles both single-line comments (//) and multi-line comments (/* */).
6
7 Args:
8 source: List of strings representing lines of source code
9
10 Returns:
11 List of strings with all comments removed
12 """
13 result = []
14 current_line = []
15 in_block_comment = False
16
17 for line in source:
18 index = 0
19 line_length = len(line)
20
21 while index < line_length:
22 if in_block_comment:
23 # Currently inside a block comment, look for closing */
24 if index + 1 < line_length and line[index:index + 2] == "*/":
25 in_block_comment = False
26 index += 1 # Skip the closing */ by advancing one extra position
27 else:
28 # Not in a block comment, check for comment starts
29 if index + 1 < line_length and line[index:index + 2] == "/*":
30 # Start of block comment
31 in_block_comment = True
32 index += 1 # Skip the opening /* by advancing one extra position
33 elif index + 1 < line_length and line[index:index + 2] == "//":
34 # Single-line comment starts, ignore rest of the line
35 break
36 else:
37 # Regular character, add to current line
38 current_line.append(line[index])
39
40 index += 1
41
42 # If not in a block comment and we have accumulated characters,
43 # add the line to results
44 if not in_block_comment and current_line:
45 result.append("".join(current_line))
46 current_line.clear()
47
48 return result
49
1class Solution {
2 public List<String> removeComments(String[] source) {
3 // List to store the final result after removing comments
4 List<String> result = new ArrayList<>();
5
6 // StringBuilder to accumulate characters for the current line
7 StringBuilder currentLine = new StringBuilder();
8
9 // Flag to track if we're currently inside a block comment
10 boolean insideBlockComment = false;
11
12 // Process each line in the source code
13 for (String line : source) {
14 int lineLength = line.length();
15
16 // Process each character in the current line
17 for (int i = 0; i < lineLength; i++) {
18
19 if (insideBlockComment) {
20 // Check for end of block comment "*/"
21 if (i + 1 < lineLength && line.charAt(i) == '*' && line.charAt(i + 1) == '/') {
22 insideBlockComment = false;
23 i++; // Skip the next character '/'
24 }
25 // Otherwise, skip this character since we're inside a block comment
26
27 } else {
28 // Check for start of block comment "/*"
29 if (i + 1 < lineLength && line.charAt(i) == '/' && line.charAt(i + 1) == '*') {
30 insideBlockComment = true;
31 i++; // Skip the next character '*'
32
33 // Check for line comment "//"
34 } else if (i + 1 < lineLength && line.charAt(i) == '/' && line.charAt(i + 1) == '/') {
35 break; // Ignore the rest of the line
36
37 // Regular character, add to current line
38 } else {
39 currentLine.append(line.charAt(i));
40 }
41 }
42 }
43
44 // If we're not inside a block comment and have accumulated characters,
45 // add the line to result and reset the StringBuilder
46 if (!insideBlockComment && currentLine.length() > 0) {
47 result.add(currentLine.toString());
48 currentLine.setLength(0); // Clear the StringBuilder for next line
49 }
50 }
51
52 return result;
53 }
54}
55
1class Solution {
2public:
3 vector<string> removeComments(vector<string>& source) {
4 vector<string> result;
5 string currentLine;
6 bool inBlockComment = false;
7
8 // Process each line in the source code
9 for (const auto& line : source) {
10 int lineLength = line.size();
11
12 // Process each character in the current line
13 for (int i = 0; i < lineLength; ++i) {
14 if (inBlockComment) {
15 // Look for end of block comment "*/"
16 if (i + 1 < lineLength && line[i] == '*' && line[i + 1] == '/') {
17 inBlockComment = false;
18 ++i; // Skip the '/' character
19 }
20 // Otherwise, skip characters inside block comment
21 } else {
22 // Check for start of block comment "/*"
23 if (i + 1 < lineLength && line[i] == '/' && line[i + 1] == '*') {
24 inBlockComment = true;
25 ++i; // Skip the '*' character
26 }
27 // Check for line comment "//"
28 else if (i + 1 < lineLength && line[i] == '/' && line[i + 1] == '/') {
29 break; // Ignore rest of the line
30 }
31 // Regular character, add to current line
32 else {
33 currentLine.push_back(line[i]);
34 }
35 }
36 }
37
38 // If not in block comment and current line has content, add to result
39 if (!inBlockComment && !currentLine.empty()) {
40 result.emplace_back(currentLine);
41 currentLine.clear();
42 }
43 }
44
45 return result;
46 }
47};
48
1/**
2 * Removes comments from source code lines
3 * Handles both single-line (//) and multi-line (/* */) comments
4 * @param source - Array of source code lines
5 * @returns Array of source code lines with comments removed
6 */
7function removeComments(source: string[]): string[] {
8 // Result array to store processed lines
9 const result: string[] = [];
10
11 // Buffer to accumulate characters for the current line being processed
12 const currentLineBuffer: string[] = [];
13
14 // Flag to track if we're currently inside a block comment
15 let isInsideBlockComment = false;
16
17 // Process each line of source code
18 for (const line of source) {
19 const lineLength = line.length;
20
21 // Process each character in the current line
22 for (let charIndex = 0; charIndex < lineLength; charIndex++) {
23 if (isInsideBlockComment) {
24 // Check for end of block comment (*/)
25 if (charIndex + 1 < lineLength && line.slice(charIndex, charIndex + 2) === '*/') {
26 isInsideBlockComment = false;
27 charIndex++; // Skip the next character as it's part of */
28 }
29 // Skip all other characters while inside block comment
30 } else {
31 // Check for start of block comment (/*)
32 if (charIndex + 1 < lineLength && line.slice(charIndex, charIndex + 2) === '/*') {
33 isInsideBlockComment = true;
34 charIndex++; // Skip the next character as it's part of /*
35 }
36 // Check for single-line comment (//)
37 else if (charIndex + 1 < lineLength && line.slice(charIndex, charIndex + 2) === '//') {
38 break; // Skip the rest of the line
39 }
40 // Regular character - add to buffer
41 else {
42 currentLineBuffer.push(line[charIndex]);
43 }
44 }
45 }
46
47 // If not inside a block comment and buffer has content, add line to result
48 if (!isInsideBlockComment && currentLineBuffer.length > 0) {
49 result.push(currentLineBuffer.join(''));
50 currentLineBuffer.length = 0; // Clear the buffer for next line
51 }
52 }
53
54 return result;
55}
56
Time and Space Complexity
The time complexity is O(L)
, where L
is the total length of all source code strings combined. The algorithm processes each character in the source code exactly once. Even though there's a nested loop structure (outer loop iterating through lines and inner while loop iterating through characters in each line), each character is visited only once during the entire execution. The operations performed for each character (checking for comment markers, appending to list) are all O(1)
operations.
The space complexity is O(L)
, where L
is the total length of the source code. In the worst case, when there are no comments to remove, the algorithm needs to store all the original characters. The temporary list t
can hold up to L
characters across all iterations before being joined and added to the answer list. The answer list ans
also stores the processed strings, which in the worst case could be nearly as long as the original input. Therefore, the total space used is proportional to the input size.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
Pitfall 1: Incorrect Handling of Comment Marker Boundaries
The Problem:
A common mistake is checking for comment markers without verifying that we have enough characters remaining in the line. This leads to index out of bounds errors when a line ends with a single /
or *
.
Incorrect Implementation:
# WRONG - will crash if line ends with single '/' if line[index] == '/' and line[index + 1] == '/': break
The Solution:
Always check that index + 1 < line_length
before accessing two-character sequences:
# CORRECT - safely checks bounds first if index + 1 < line_length and line[index:index + 2] == "//": break
Pitfall 2: Forgetting to Skip Both Characters of Comment Markers
The Problem:
When encountering a two-character comment marker (/*
, */
, or //
), developers often forget to skip both characters, leading to incorrect parsing where the second character of the marker gets treated as regular code.
Incorrect Implementation:
# WRONG - only increments index once at the end of loop if index + 1 < line_length and line[index:index + 2] == "/*": in_block_comment = True # Missing: index += 1
The Solution:
Add an extra index += 1
when processing two-character markers:
# CORRECT - skips both characters of the comment marker if index + 1 < line_length and line[index:index + 2] == "/*": in_block_comment = True index += 1 # Extra increment to skip both characters
Pitfall 3: Creating New Line Buffer for Each Source Line
The Problem: Creating a new line buffer for each source line breaks the ability to join parts of different lines when a block comment spans multiple lines. This results in incorrect output when code continues after a multi-line block comment.
Incorrect Implementation:
# WRONG - resets buffer each iteration for line in source: current_line = [] # Creates new buffer each time # ... processing ...
Example that breaks:
Input: ["int x = 1; /* comment", "continues */ int y = 2;"] Wrong: ["int x = 1; ", "int y = 2;"] Right: ["int x = 1; int y = 2;"]
The Solution: Maintain the buffer across iterations and only clear it after adding a complete line to results:
# CORRECT - maintains buffer across lines current_line = [] # Created once outside the loop for line in source: # ... processing ... if not in_block_comment and current_line: result.append("".join(current_line)) current_line.clear() # Only clear after outputting
Pitfall 4: Outputting Empty Lines
The Problem: After removing comments, some lines may become empty, but the problem requires excluding empty lines from the output. Forgetting to check if the accumulated buffer is non-empty before adding to results violates this requirement.
Incorrect Implementation:
# WRONG - might add empty strings if not in_block_comment: result.append("".join(current_line)) # No check if current_line is empty
The Solution: Always check that the buffer contains characters before adding to results:
# CORRECT - only adds non-empty lines if not in_block_comment and current_line: result.append("".join(current_line)) current_line.clear()
The three-steps of Depth First Search are:
- Identify states;
- Draw the state-space tree;
- DFS on the state-space tree.
Recommended Readings
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Want a Structured Path to Master System Design Too? Don’t Miss This!