722. Remove Comments
Problem Description
The problem requires us to write a function that removes comments from a given C++ program. The source code of the program is provided as an array of strings where each string represents a single line of code. The objective is to eliminate two types of comments from the code:
- Line comments, which start with
//
and continue until the end of the line. - Block comments, which start with
/*
and end with*/
. Block comments can span multiple lines and the block is closed only when a non-overlapping*/
is found.
A few important points to consider while solving this problem:
- The first occurrence of either type of comment takes precedence.
- Comments within comments are ignored (e.g.,
//
within/*...*/
or/*
within another/*...*/
or//
). - Lines that become empty after removal of comments should not be included in the output.
- Control characters, single or double quotes will not interfere with the comments.
- It is guaranteed that all block comments will be closed.
After processing the source code and removing the comments, the function should return the processed source code as an array of strings, with each string representing a non-empty line of the processed code.
Intuition
To solve this problem, we iterate through each line of the source code and use markers to identify the presence of a block comment. If we are inside a block comment, we check for the closing */
sequence. If found, we ignore everything up to this point and mark that we've exited the block comment. If we are not in a block comment, we check for the start of a block comment /*
or a line comment //
. Upon finding a //
, we can ignore the rest of the line. For a /*
, we set the marker indicating we are inside a block comment, and continue scanning from there. The key is to process characters based on the current context (inside a block comment or not).
Here's the approach broken down:
- Initialize a list
ans
to collect the processed lines and a temporary listt
that will hold the characters for the current line being processed. - Use a boolean
block_comment
to track whether we are inside a block comment. - Iterate through each line in the source code.
- For each line, iterate through all characters in the line.
- If we're inside a block comment (
block_comment
isTrue
), we look for the closing*/
. Once found, we setblock_comment
toFalse
and continue scanning the rest of the line. - If we're not inside a block comment, we look for either
/*
to start a block comment,//
to skip the rest of the line, or any other character to add to our temporary listt
. - After we've processed a line, if we are not in a block comment and
t
is not empty, joint
into a string and add it toans
. Then, cleart
for the next line of code. - Once all lines are processed, return
ans
.
By following this approach, we systematically scan through the source code, correctly handling both line and block comments, and accumulating only the non-commented parts of the code.
Solution Approach
The Reference Solution Approach involves using a simple state machine with two states: inside a block comment, and outside of a block comment. The implementation leverages basic string manipulation techniques and a simple control flow to parse the lines of code and remove comments.
Here is a walk-through of the implementation step-by-step:
-
Initialize Tracking Variables: We create a list
ans
to hold our output strings, and a listt
to accumulate the characters for the current line. In addition, we have a boolean flagblock_comment
to keep track of whether we are currently inside a block comment. -
Iterate Over Each Line: We loop over each line
s
in the inputsource
. For each line, we process characters sequentially. -
Iterate Over Characters in Line: Inside the line, we use a
while
loop to iterate over each character by indexi
. The indexi
is increased when processing characters, and if certain conditions are met (like the end of a block comment), it may be increased by additional amounts to skip over comment markers. -
Process Inside Block Comment: If we're inside a block comment (when
block_comment
isTrue
), we check if the current and next characters form the*/
sequence. If they do, we setblock_comment
toFalse
and skip the*/
by incrementingi
by 2. Otherwise, we move on to the next character. -
Process Outside Block Comment: When we're not in a block comment, we look for either the start of a new block comment
/*
or a line comment//
. If we find/*
, we setblock_comment
toTrue
and skip the characters/*
by incrementingi
. If we find//
, we break out of the loop as the rest of the line is a comment and should be ignored. -
Accumulate Non-Comment Characters: If the character doesn't signal the beginning of a comment, we append it to the temporary list
t
. -
Handle the End of Line: Once we reach the end of a line, if we are not inside a block comment and
t
is not empty, we convertt
into a string and add it toans
. We then cleart
to start processing the next line. -
Return the Result: After processing all lines,
ans
will contain the source code with all comments removed. We returnans
.
By using this approach, we avoid the complexities of regex parsing or other more sophisticated text processing libraries and directly implement a context-sensitive parser. The focus is on correctly adjusting the state (block_comment
) and efficiently building the output while iterating through the source code.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's illustrate the solution approach with a simple example. Suppose our source
input is the following array of strings, representing a C++ program:
[ "int main() {", // Line 1 " /* This is a", // Line 2 " multiline comment */", // Line 3 " int a = 0; // variable", // Line 4 " cout << a;", // Line 5 "}" ]
Following the Reference Solution Approach, this is how the function will work:
-
Initialize Tracking Variables:
ans
is initialized to an empty list[]
.t
is initialized to an empty list[]
.block_comment
is initialized toFalse
.
-
Iterate Over Each Line:
- We start with the first line
"int main() {"
. Since there are no comments, each character is added tot
.
- We start with the first line
-
Iterate Over Characters in Line:
- Moving to Line 2, we begin to iterate over the characters. Upon encountering
/*
, we recognize the start of a block comment and setblock_comment
toTrue
.
- Moving to Line 2, we begin to iterate over the characters. Upon encountering
-
Process Inside Block Comment:
- We continue to Line 3 while still inside the block comment. We find the ending
*/
sequence and setblock_comment
back toFalse
after incrementing the index to skip it.
- We continue to Line 3 while still inside the block comment. We find the ending
-
Process Outside Block Comment:
- On Line 4, we are not in a block comment. We process characters until the line comment
//
is found. At this point, we skip the rest of the line.
- On Line 4, we are not in a block comment. We process characters until the line comment
-
Accumulate Non-Comment Characters:
- For lines 5 and 6, since there are no comments, all characters go into
t
.
- For lines 5 and 6, since there are no comments, all characters go into
-
Handle the End of Line:
- At the end of each line (except for lines 2 and 3, which are inside the block comment), we check
t
:- After Line 1,
t
contains"int main() {"
. We add it toans
and cleart
. - After Line 4,
t
contains" int a = 0; "
. This is added toans
. - After Line 5,
t
contains" cout << a;"
. This too is added toans
. - Line 6 is just the closing brace, so it's added similarly.
- After Line 1,
- At the end of each line (except for lines 2 and 3, which are inside the block comment), we check
-
Return the Result:
- After all lines are processed, we have
ans
as["int main() {", " int a = 0; ", " cout << a;", "}"]
. - We return
ans
.
- After all lines are processed, we have
Here, we've successfully removed all comments (both line and block) from the C++ source code. The code now accurately reflects the logical part of the program without any comments.
Solution Implementation
1class Solution:
2 def removeComments(self, source: List[str]) -> List[str]:
3 final_output = [] # List to store the output after removing comments
4 current_line = [] # Buffer to store characters of the current line being processed
5 in_block_comment = False # Flag to indicate if we are inside a block comment or not
6
7 for line in source:
8 index, line_length = 0, len(line)
9
10 while index < line_length:
11 if in_block_comment:
12 # Check if the block comment ends on this line
13 if index + 1 < line_length and line[index:index + 2] == "*/":
14 in_block_comment = False # End the block comment
15 index += 1 # Skip the ending characters of the block comment
16 else:
17 # Check for the start of a block comment
18 if index + 1 < line_length and line[index:index + 2] == "/*":
19 in_block_comment = True # Start the block comment
20 index += 1 # Skip the starting characters of the block comment
21 # Check for the start of a line comment
22 elif index + 1 < line_length and line[index:index + 2] == "//":
23 break # Ignore the rest of the line
24 else:
25 current_line.append(line[index]) # Add the current character to the line buffer
26 index += 1 # Move to the next character
27
28 # If not in a block comment and the line buffer contains characters, add it to the result
29 if not in_block_comment and current_line:
30 final_output.append("".join(current_line))
31 current_line.clear() # Clear the line buffer for the next input line
32
33 return final_output
34
1class Solution {
2 public List<String> removeComments(String[] source) {
3 // A list to hold the final result without comments.
4 List<String> result = new ArrayList<>();
5
6 // StringBuilder to construct strings without comments.
7 StringBuilder lineWithoutComments = new StringBuilder();
8
9 // A flag to indicate if we are inside a block comment.
10 boolean inBlockComment = false;
11
12 // Iterate over each line in the source code array.
13 for (String line : source) {
14 int lineLength = line.length();
15 for (int i = 0; i < lineLength; ++i) {
16 // If in a block comment, look for the end of the block comment.
17 if (inBlockComment) {
18 if (i + 1 < lineLength && line.charAt(i) == '*' && line.charAt(i + 1) == '/') {
19 inBlockComment = false;
20 i++; // Skip the '/' character of the end block comment.
21 }
22 } else {
23 // If not in a block comment, look for the start of any comment.
24 if (i + 1 < lineLength && line.charAt(i) == '/' && line.charAt(i + 1) == '*') {
25 inBlockComment = true;
26 i++; // Skip the '*' character of the start block comment.
27 } else if (i + 1 < lineLength && line.charAt(i) == '/' && line.charAt(i + 1) == '/') {
28 // Found a line comment, break out of the loop to ignore the rest of the line.
29 break;
30 } else {
31 // Character is not part of a comment; append it to the current line.
32 lineWithoutComments.append(line.charAt(i));
33 }
34 }
35 }
36 // If the current line is completed and we are not in a block comment, add the line to the result.
37 if (!inBlockComment && lineWithoutComments.length() > 0) {
38 result.add(lineWithoutComments.toString());
39 // Reset the StringBuilder for the next line.
40 lineWithoutComments.setLength(0);
41 }
42 }
43 // Return the list of lines without comments.
44 return result;
45 }
46}
47
1#include <vector>
2#include <string>
3
4class Solution {
5public:
6 std::vector<std::string> removeComments(std::vector<std::string>& source) {
7 std::vector<std::string> cleanedSource; // This will hold the final result without comments.
8 std::string lineBuffer; // Buffer to construct a new string without comments for each line.
9 bool inBlockComment = false; // State variable to check if currently inside a block comment or not.
10
11 // Iterate over each line in the source code.
12 for (const auto& line : source) {
13 int length = line.size(); // Length of the current line
14
15 // Iterate over each character in the line.
16 for (int i = 0; i < length; ++i) {
17 if (inBlockComment) {
18 // If currently inside a block comment, look for the end of the block comment.
19 if (i + 1 < length && line[i] == '*' && line[i + 1] == '/') {
20 inBlockComment = false; // Found the end of block comment.
21 ++i; // Skip the '/' of the block comment end.
22 }
23 } else {
24 // If not in a block comment, check for the start of any comment.
25 if (i + 1 < length && line[i] == '/' && line[i + 1] == '*') {
26 inBlockComment = true; // Found the start of a block comment.
27 ++i; // Skip the '*' of the block comment start.
28 } else if (i + 1 < length && line[i] == '/' && line[i + 1] == '/') {
29 // Found the start of a line comment, ignore the rest of the line.
30 break;
31 } else {
32 // This character is not part of a comment, add it to the buffer.
33 lineBuffer.push_back(line[i]);
34 }
35 }
36 }
37
38 // If we're at the end of a line and we're not in a block comment and the buffer is not empty,
39 // then we have a line without comments to add to the result.
40 if (!inBlockComment && !lineBuffer.empty()) {
41 cleanedSource.emplace_back(lineBuffer);
42 lineBuffer.clear(); // Clear the buffer for the next line of source code.
43 }
44 }
45
46 // Return the source code with comments removed.
47 return cleanedSource;
48 }
49};
50
1function removeComments(source: string[]): string[] {
2 const result: string[] = []; // Array to store the results
3 const temp: string[] = []; // Temporary array to collect non-comment characters
4 let isBlockCommentActive = false; // Flag to track block comment status
5
6 // Iterate through each line of the source code
7 for (const line of source) {
8 const lineLength = line.length; // Store current line length
9
10 // Iterate through characters of the line
11 for (let i = 0; i < lineLength; ++i) {
12 if (isBlockCommentActive) {
13 // Check if the end of block comment is reached
14 if (i + 1 < lineLength && line.slice(i, i + 2) === '*/') {
15 isBlockCommentActive = false; // Turn off block comment flag
16 i++; // Skip next character as it's part of the comment end
17 }
18 } else {
19 // Check for start of a block comment
20 if (i + 1 < lineLength && line.slice(i, i + 2) === '/*') {
21 isBlockCommentActive = true; // Turn on block comment flag
22 i++; // Skip next character as it's part of the comment start
23 } else if (i + 1 < lineLength && line.slice(i, i + 2) === '//') {
24 // If line comment is found, ignore the rest of the line
25 break;
26 } else {
27 // If no comment is found, add the character to the temp array
28 temp.push(line[i]);
29 }
30 }
31 }
32
33 // If not in a block comment and temp array has collected characters
34 if (!isBlockCommentActive && temp.length > 0) {
35 result.push(temp.join('')); // Join characters and add to result array
36 temp.length = 0; // Clear temporary array for next line
37 }
38 }
39 return result; // Return the array with comments removed
40}
41
Time and Space Complexity
The provided Python code defines a method to remove both line comments (starting with //
) and block comments (enclosed between /*
and */
) from a list of source code strings.
Time Complexity:
The time complexity of the function is primarily determined by the number of characters in the input source
array and the operations performed for each character. Here's the breakdown:
- The function loops through each string
s
in thesource
array. - For each string, a nested loop scans each character to identify and handle comments, with each iteration checking pairs of characters for comment patterns (
//
or/*
,*/
).
Considering n
to be the total number of characters across all the strings in the source
array:
- The worst-case scenario occurs when there are no comments or only at the end, so we check every character pair, resulting in O(n) time complexity.
- Each character is assessed at most twice (for opening and closing block comments), maintaining the linear relationship.
Hence, the overall time complexity of the function is O(n)
.
Space Complexity:
For space complexity, we consider any additional memory used by the algorithm besides the input itself:
- A list
t
is maintained to construct the strings after removing comments, which in the worst case, will contain all characters that are not within comments. In the worst case, this is linear to the input, adding up toO(n)
. - The variable
block_comment
uses a constant amount of space, contributing an additional O(1) complexity. - The list
ans
is used to store the final output strings. In the worst-case scenario where no comments are present, it will contain a copy of all the individual strings from the source, resulting inO(n)
space.
Therefore, the total space complexity is O(n)
as the function stores intermediate and final results proportional to the input size.
Learn more about how to find time and space complexity quickly using problem constraints.
What data structure does Breadth-first search typically uses to store intermediate states?
Recommended Readings
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Want a Structured Path to Master System Design Too? Don’t Miss This!