Skip to content

LeetCode 722: Remove Comments

A clear explanation of removing line comments and block comments from source code using a state machine.

Problem Restatement

We are given a C++ program as a list of strings:

source

Each string is one line of code.

We need to remove two kinds of comments:

Comment TypeMeaning
Line comment //Ignore // and everything after it on the same line
Block comment /* ... */Ignore everything from /* until the next non-overlapping */

After removing comments, return the remaining source code as a list of non-empty strings.

If a line becomes empty after comment removal, do not include it in the output.

Block comments can span multiple lines, so parts of different original lines may join together. The first effective comment marker takes precedence: // inside a block comment is ignored, and /* inside a line comment or block comment is ignored. The problem also guarantees that every opened block comment is eventually closed.

Input and Output

ItemMeaning
Inputsource, a list of source-code lines
OutputSource-code lines after removing comments
Line commentStarts with // and ends at the current line
Block commentStarts with /* and ends at the next non-overlapping */
Empty output linesMust be omitted
Strings and quotesNo quote-related cases interfere with comments

The function shape is:

class Solution:
    def removeComments(self, source: list[str]) -> list[str]:
        ...

Examples

Example 1:

source = [
    "/*Test program */",
    "int main()",
    "{ ",
    "  // variable declaration ",
    "int a, b, c;",
    "/* This is a test",
    "   multiline  ",
    "   comment for ",
    "   testing */",
    "a = b + c;",
    "}"
]

Output:

[
    "int main()",
    "{ ",
    "  ",
    "int a, b, c;",
    "a = b + c;",
    "}"
]

The line comment removes everything after //, but the spaces before it remain.

The multi-line block comment removes several full lines.

Example 2:

source = ["a/*comment", "line", "more_comment*/b"]

The block comment starts in the first line and ends in the third line.

The remaining code is:

"ab"

Output:

["ab"]

First Thought: Remove Markers Line by Line

A tempting approach is to process each line independently.

For each line:

  1. Find //.
  2. Find /*.
  3. Cut the line accordingly.

This fails because block comments can span multiple lines.

For example:

["a/*comment", "line", "more_comment*/b"]

The output should be:

["ab"]

So we need state that persists across lines.

Key Insight

Use a state machine with one boolean variable:

in_block

This tells us whether we are currently inside a block comment.

When in_block is False:

PatternAction
//Stop reading the current line
/*Enter block comment
Other characterAppend it to the current output buffer

When in_block is True:

PatternAction
*/Leave block comment
Anything elseIgnore it

The output buffer should continue across lines while inside a block comment, because newlines inside block comments are removed.

Algorithm

Use:

result = []
in_block = False
buffer = []

For each line in source:

  1. Set i = 0.
  2. If we are not in a block comment at the start of the line, start with a fresh buffer.
  3. Scan characters from left to right.
  4. If in_block is True:
    • If the next two characters are */, set in_block = False and skip both characters.
    • Otherwise skip the current character.
  5. If in_block is False:
    • If the next two characters are //, stop scanning this line.
    • If the next two characters are /*, set in_block = True and skip both characters.
    • Otherwise append the current character to buffer.
  6. After finishing the line, if not inside a block comment and buffer is non-empty, append it to result.

Correctness

The algorithm scans the source in reading order: line by line, left to right.

When it is outside a block comment, it treats // as the start of a line comment and ignores the rest of that line. This matches the required behavior of line comments.

When it is outside a block comment and sees /*, it enters block-comment mode and ignores all later characters until a matching non-overlapping */ is found. This matches the required behavior of block comments.

When it is inside a block comment, it ignores all characters except the first following */. Therefore any // or /* inside the block comment has no effect, as required.

The algorithm only appends characters when they are outside every comment. Therefore no comment text appears in the output.

If a block comment spans multiple lines, the buffer is not emitted until the block comment ends. This correctly removes the implicit newlines inside the block comment and joins code before and after the block.

Finally, the algorithm appends a line only when the buffer is non-empty and we are not inside a block comment. Therefore every returned string is non-empty, and every kept character appears in the correct order.

Complexity

Let N be the total number of characters in all source lines.

MetricValueWhy
TimeO(N)Each character is scanned at most once
SpaceO(N)The output may contain most of the input characters

Implementation

class Solution:
    def removeComments(self, source: list[str]) -> list[str]:
        result = []
        in_block = False
        buffer = []

        for line in source:
            i = 0

            if not in_block:
                buffer = []

            while i < len(line):
                two = line[i:i + 2]

                if in_block:
                    if two == "*/":
                        in_block = False
                        i += 2
                    else:
                        i += 1
                else:
                    if two == "//":
                        break
                    elif two == "/*":
                        in_block = True
                        i += 2
                    else:
                        buffer.append(line[i])
                        i += 1

            if not in_block and buffer:
                result.append("".join(buffer))

        return result

Code Explanation

The result stores final non-empty lines:

result = []

The state variable tracks whether we are inside a block comment:

in_block = False

The buffer stores the current output line:

buffer = []

At the start of a new source line, we reset the buffer only if we are not inside a block comment:

if not in_block:
    buffer = []

If we are inside a block comment, the same buffer must continue, because code before the block and code after the block may join.

This slice checks the next two characters:

two = line[i:i + 2]

Inside a block comment, only */ matters:

if two == "*/":
    in_block = False
    i += 2

Everything else is ignored:

else:
    i += 1

Outside a block comment, // ends the current line:

if two == "//":
    break

Outside a block comment, /* starts a block comment:

elif two == "/*":
    in_block = True
    i += 2

Normal characters are appended:

else:
    buffer.append(line[i])
    i += 1

After a line finishes, append only if we are not inside a block comment and the buffer has content:

if not in_block and buffer:
    result.append("".join(buffer))

Example Walkthrough

Use:

source = ["a/*comment", "line", "more_comment*/b"]

Start outside a block comment.

Line 1:

"a/*comment"

Append "a".

Then see "/*" and enter block-comment mode.

The buffer is:

["a"]

Do not output yet because the block comment is still open.

Line 2:

"line"

We are inside a block comment, so all characters are ignored.

Line 3:

"more_comment*/b"

Ignore characters until "*/".

Exit block-comment mode.

Then append "b".

Now the buffer is:

["a", "b"]

Output:

["ab"]

Testing

def test_remove_comments():
    s = Solution()

    source = [
        "/*Test program */",
        "int main()",
        "{ ",
        "  // variable declaration ",
        "int a, b, c;",
        "/* This is a test",
        "   multiline  ",
        "   comment for ",
        "   testing */",
        "a = b + c;",
        "}",
    ]

    expected = [
        "int main()",
        "{ ",
        "  ",
        "int a, b, c;",
        "a = b + c;",
        "}",
    ]

    assert s.removeComments(source) == expected

    assert s.removeComments(
        ["a/*comment", "line", "more_comment*/b"]
    ) == ["ab"]

    assert s.removeComments(
        ["int a = 1; // initialize a"]
    ) == ["int a = 1; "]

    assert s.removeComments(
        ["code/* block */more"]
    ) == ["codemore"]

    assert s.removeComments(
        ["/* whole line */"]
    ) == []

    assert s.removeComments(
        ["a/*/b*/c"]
    ) == ["ac"]

    print("all tests passed")

test_remove_comments()

Test coverage:

TestWhy
Official-style multi-line exampleConfirms line and block comments together
Block comment across linesConfirms newline deletion and joining
Line commentConfirms rest of line is ignored
Inline block commentConfirms code before and after joins
Whole-line block commentConfirms empty line is omitted
Non-overlapping close markerConfirms /*/ does not close the block immediately