A clear explanation of removing line comments and block comments from source code using a state machine.
Problem Restatement
We are given a C++ program as a list of strings:
sourceEach string is one line of code.
We need to remove two kinds of comments:
| Comment Type | Meaning |
|---|---|
Line comment // | Ignore // and everything after it on the same line |
Block comment /* ... */ | Ignore everything from /* until the next non-overlapping */ |
After removing comments, return the remaining source code as a list of non-empty strings.
If a line becomes empty after comment removal, do not include it in the output.
Block comments can span multiple lines, so parts of different original lines may join together. The first effective comment marker takes precedence: // inside a block comment is ignored, and /* inside a line comment or block comment is ignored. The problem also guarantees that every opened block comment is eventually closed.
Input and Output
| Item | Meaning |
|---|---|
| Input | source, a list of source-code lines |
| Output | Source-code lines after removing comments |
| Line comment | Starts with // and ends at the current line |
| Block comment | Starts with /* and ends at the next non-overlapping */ |
| Empty output lines | Must be omitted |
| Strings and quotes | No quote-related cases interfere with comments |
The function shape is:
class Solution:
def removeComments(self, source: list[str]) -> list[str]:
...Examples
Example 1:
source = [
"/*Test program */",
"int main()",
"{ ",
" // variable declaration ",
"int a, b, c;",
"/* This is a test",
" multiline ",
" comment for ",
" testing */",
"a = b + c;",
"}"
]Output:
[
"int main()",
"{ ",
" ",
"int a, b, c;",
"a = b + c;",
"}"
]The line comment removes everything after //, but the spaces before it remain.
The multi-line block comment removes several full lines.
Example 2:
source = ["a/*comment", "line", "more_comment*/b"]The block comment starts in the first line and ends in the third line.
The remaining code is:
"ab"Output:
["ab"]First Thought: Remove Markers Line by Line
A tempting approach is to process each line independently.
For each line:
- Find
//. - Find
/*. - Cut the line accordingly.
This fails because block comments can span multiple lines.
For example:
["a/*comment", "line", "more_comment*/b"]The output should be:
["ab"]So we need state that persists across lines.
Key Insight
Use a state machine with one boolean variable:
in_blockThis tells us whether we are currently inside a block comment.
When in_block is False:
| Pattern | Action |
|---|---|
// | Stop reading the current line |
/* | Enter block comment |
| Other character | Append it to the current output buffer |
When in_block is True:
| Pattern | Action |
|---|---|
*/ | Leave block comment |
| Anything else | Ignore it |
The output buffer should continue across lines while inside a block comment, because newlines inside block comments are removed.
Algorithm
Use:
result = []
in_block = False
buffer = []For each line in source:
- Set
i = 0. - If we are not in a block comment at the start of the line, start with a fresh
buffer. - Scan characters from left to right.
- If
in_blockisTrue:- If the next two characters are
*/, setin_block = Falseand skip both characters. - Otherwise skip the current character.
- If the next two characters are
- If
in_blockisFalse:- If the next two characters are
//, stop scanning this line. - If the next two characters are
/*, setin_block = Trueand skip both characters. - Otherwise append the current character to
buffer.
- If the next two characters are
- After finishing the line, if not inside a block comment and
bufferis non-empty, append it toresult.
Correctness
The algorithm scans the source in reading order: line by line, left to right.
When it is outside a block comment, it treats // as the start of a line comment and ignores the rest of that line. This matches the required behavior of line comments.
When it is outside a block comment and sees /*, it enters block-comment mode and ignores all later characters until a matching non-overlapping */ is found. This matches the required behavior of block comments.
When it is inside a block comment, it ignores all characters except the first following */. Therefore any // or /* inside the block comment has no effect, as required.
The algorithm only appends characters when they are outside every comment. Therefore no comment text appears in the output.
If a block comment spans multiple lines, the buffer is not emitted until the block comment ends. This correctly removes the implicit newlines inside the block comment and joins code before and after the block.
Finally, the algorithm appends a line only when the buffer is non-empty and we are not inside a block comment. Therefore every returned string is non-empty, and every kept character appears in the correct order.
Complexity
Let N be the total number of characters in all source lines.
| Metric | Value | Why |
|---|---|---|
| Time | O(N) | Each character is scanned at most once |
| Space | O(N) | The output may contain most of the input characters |
Implementation
class Solution:
def removeComments(self, source: list[str]) -> list[str]:
result = []
in_block = False
buffer = []
for line in source:
i = 0
if not in_block:
buffer = []
while i < len(line):
two = line[i:i + 2]
if in_block:
if two == "*/":
in_block = False
i += 2
else:
i += 1
else:
if two == "//":
break
elif two == "/*":
in_block = True
i += 2
else:
buffer.append(line[i])
i += 1
if not in_block and buffer:
result.append("".join(buffer))
return resultCode Explanation
The result stores final non-empty lines:
result = []The state variable tracks whether we are inside a block comment:
in_block = FalseThe buffer stores the current output line:
buffer = []At the start of a new source line, we reset the buffer only if we are not inside a block comment:
if not in_block:
buffer = []If we are inside a block comment, the same buffer must continue, because code before the block and code after the block may join.
This slice checks the next two characters:
two = line[i:i + 2]Inside a block comment, only */ matters:
if two == "*/":
in_block = False
i += 2Everything else is ignored:
else:
i += 1Outside a block comment, // ends the current line:
if two == "//":
breakOutside a block comment, /* starts a block comment:
elif two == "/*":
in_block = True
i += 2Normal characters are appended:
else:
buffer.append(line[i])
i += 1After a line finishes, append only if we are not inside a block comment and the buffer has content:
if not in_block and buffer:
result.append("".join(buffer))Example Walkthrough
Use:
source = ["a/*comment", "line", "more_comment*/b"]Start outside a block comment.
Line 1:
"a/*comment"Append "a".
Then see "/*" and enter block-comment mode.
The buffer is:
["a"]Do not output yet because the block comment is still open.
Line 2:
"line"We are inside a block comment, so all characters are ignored.
Line 3:
"more_comment*/b"Ignore characters until "*/".
Exit block-comment mode.
Then append "b".
Now the buffer is:
["a", "b"]Output:
["ab"]Testing
def test_remove_comments():
s = Solution()
source = [
"/*Test program */",
"int main()",
"{ ",
" // variable declaration ",
"int a, b, c;",
"/* This is a test",
" multiline ",
" comment for ",
" testing */",
"a = b + c;",
"}",
]
expected = [
"int main()",
"{ ",
" ",
"int a, b, c;",
"a = b + c;",
"}",
]
assert s.removeComments(source) == expected
assert s.removeComments(
["a/*comment", "line", "more_comment*/b"]
) == ["ab"]
assert s.removeComments(
["int a = 1; // initialize a"]
) == ["int a = 1; "]
assert s.removeComments(
["code/* block */more"]
) == ["codemore"]
assert s.removeComments(
["/* whole line */"]
) == []
assert s.removeComments(
["a/*/b*/c"]
) == ["ac"]
print("all tests passed")
test_remove_comments()Test coverage:
| Test | Why |
|---|---|
| Official-style multi-line example | Confirms line and block comments together |
| Block comment across lines | Confirms newline deletion and joining |
| Line comment | Confirms rest of line is ignored |
| Inline block comment | Confirms code before and after joins |
| Whole-line block comment | Confirms empty line is omitted |
| Non-overlapping close marker | Confirms /*/ does not close the block immediately |