Skip to content

LeetCode 726: Number of Atoms

Parse a chemical formula with nested parentheses, atom names, and multipliers using recursive descent.

Problem Restatement

We are given a valid chemical formula string.

The formula contains atom names, numbers, and parentheses.

We need to count how many times each atom appears, then return the result as a string sorted by atom name.

An atom name starts with one uppercase letter. It may be followed by lowercase letters.

Examples:

"H"
"Mg"
"Ca"

A number after an atom means the atom appears that many times.

"H2"   # H appears 2 times
"O"    # O appears 1 time

Parentheses group a formula. A number after the closing parenthesis multiplies everything inside the group.

"(OH)2"

This means:

O appears 2 times
H appears 2 times

The official problem asks us to return atom counts in lexicographic order. Atom counts equal to 1 are omitted in the output.

Input and Output

ItemMeaning
InputA string formula
OutputA string containing atom names and counts
Formula validityThe input formula is valid
Sort orderAtom names must be sorted lexicographically
Count formattingOmit the number when the count is 1

Example function shape:

def countOfAtoms(formula: str) -> str:
    ...

Examples

Example 1

formula = "H2O"

The atoms are:

AtomCount
H2
O1

The result is:

"H2O"

We write H2 because hydrogen appears twice.

We write O, not O1, because counts of 1 are omitted.

Example 2

formula = "Mg(OH)2"

The group (OH) is multiplied by 2.

So:

Mg = 1
O = 2
H = 2

After sorting atom names alphabetically:

H, Mg, O

The result is:

"H2MgO2"

Example 3

formula = "K4(ON(SO3)2)2"

Work from the inside:

(SO3)2

means:

S = 2
O = 6

Then:

ON(SO3)2

means:

O = 1 + 6 = 7
N = 1
S = 2

Then the outer 2 multiplies the whole group:

O = 14
N = 2
S = 4

Add K4:

K = 4
N = 2
O = 14
S = 4

Sorted output:

"K4N2O14S4"

First Thought: Manual Scanning

A simple formula like this is easy:

"H2O"

We scan from left to right.

When we see an uppercase letter, we parse the full atom name.

Then we parse the number after it.

If there is no number, the count is 1.

This works for formulas without parentheses.

But parentheses create nested scopes.

For example:

"Mg(OH)2"

The 2 after ) applies to both O and H.

For nested parentheses, a multiplier may apply to a whole inner formula, not only one atom.

So we need a parser that understands groups.

Key Insight

A chemical formula has recursive structure.

A formula can contain:

  1. An atom, possibly followed by a number.
  2. A parenthesized formula, possibly followed by a number.
  3. Several formulas concatenated together.

That means recursive descent fits naturally.

When we see (, we recursively parse the formula inside the parentheses.

When we see ), the current recursive call ends.

After the ), we parse the multiplier and apply it to every atom count inside that group.

Each recursive call returns a map:

atom -> count

For example, parsing:

"OH"

returns:

{"O": 1, "H": 1}

Parsing:

"(OH)2"

returns:

{"O": 2, "H": 2}

Algorithm

Use an index i that moves through the formula string.

Define a helper function parse().

parse() reads the formula starting at the current index and returns atom counts for the current group.

Inside parse():

  1. Create an empty counter.
  2. While i is inside the string:
    • If formula[i] == "(", skip it and recursively parse the inner group.
    • If formula[i] == ")", skip it, read the number after it, multiply the current group, and return.
    • Otherwise, parse an atom name and its optional number.
  3. Return the counter.

Parsing an atom:

  1. The first character is uppercase.
  2. Consume following lowercase letters.
  3. Read the optional number.
  4. Add that count to the current counter.

Parsing a number:

  1. Consume consecutive digits.
  2. If no digits are found, return 1.
  3. Otherwise, convert the digits to an integer.

Correctness

The parser processes the formula according to its grammar.

When the parser sees an atom, it reads the complete atom name and the number directly following it. If there is no number, it uses count 1. Therefore, every atom outside parentheses is counted with the correct local count.

When the parser sees (, it starts a recursive parse for the nested group. That recursive call counts exactly the atoms inside the matching parentheses. When it reaches ), it reads the multiplier after the closing parenthesis and multiplies every atom count in that group. Therefore, grouped formulas are counted with the correct multiplier.

Nested groups are handled by the same rule. An inner group returns its already multiplied counts to the outer group. The outer group may then apply another multiplier. This matches the meaning of nested chemical formulas.

The top-level call parses the whole string. Since the input formula is valid, every atom and group belongs to exactly one parse scope. The returned counter therefore contains the total count for every atom.

Finally, sorting the atom names and omitting counts equal to 1 produces the required output format.

Complexity

Let n be the length of formula, and let k be the number of distinct atom names.

MetricValueWhy
TimeO(n + k log k)Parsing scans the string once, then atom names are sorted
SpaceO(n + k)Recursion depth can be O(n), and the counter stores up to k atoms

The parsing work is linear because each character is consumed once.

The final sort costs O(k log k).

Implementation

from collections import defaultdict

class Solution:
    def countOfAtoms(self, formula: str) -> str:
        n = len(formula)
        i = 0

        def read_number() -> int:
            nonlocal i

            start = i
            while i < n and formula[i].isdigit():
                i += 1

            if start == i:
                return 1

            return int(formula[start:i])

        def read_atom() -> str:
            nonlocal i

            start = i
            i += 1

            while i < n and formula[i].islower():
                i += 1

            return formula[start:i]

        def parse() -> dict[str, int]:
            nonlocal i

            counts = defaultdict(int)

            while i < n:
                ch = formula[i]

                if ch == "(":
                    i += 1
                    inner = parse()

                    for atom, count in inner.items():
                        counts[atom] += count

                elif ch == ")":
                    i += 1
                    factor = read_number()

                    for atom in list(counts.keys()):
                        counts[atom] *= factor

                    return counts

                else:
                    atom = read_atom()
                    count = read_number()
                    counts[atom] += count

            return counts

        counts = parse()

        answer = []
        for atom in sorted(counts):
            answer.append(atom)

            if counts[atom] > 1:
                answer.append(str(counts[atom]))

        return "".join(answer)

Code Explanation

We keep one shared index:

i = 0

The helper functions mutate this index as they consume characters.

read_number() reads a sequence of digits:

def read_number() -> int:
    nonlocal i

    start = i
    while i < n and formula[i].isdigit():
        i += 1

    if start == i:
        return 1

    return int(formula[start:i])

If no digits exist at the current position, the multiplier is 1.

read_atom() reads the atom name:

def read_atom() -> str:
    nonlocal i

    start = i
    i += 1

    while i < n and formula[i].islower():
        i += 1

    return formula[start:i]

The first character is uppercase. Then we consume all lowercase letters that belong to the same atom name.

The main parser is parse().

When it sees an opening parenthesis:

if ch == "(":
    i += 1
    inner = parse()

It recursively parses the group inside.

The recursive call handles the matching closing parenthesis.

After the inner group returns, we merge its atom counts into the current scope:

for atom, count in inner.items():
    counts[atom] += count

When parse() sees a closing parenthesis:

elif ch == ")":
    i += 1
    factor = read_number()

It reads the multiplier after the group.

Then it multiplies all counts in the current scope:

for atom in list(counts.keys()):
    counts[atom] *= factor

Then it returns to the caller.

When the parser sees an atom, it reads the atom and its count:

atom = read_atom()
count = read_number()
counts[atom] += count

At the end, we sort atom names and build the output:

for atom in sorted(counts):
    answer.append(atom)

    if counts[atom] > 1:
        answer.append(str(counts[atom]))

Counts equal to 1 are not printed.

Testing

def run_tests():
    s = Solution()

    assert s.countOfAtoms("H2O") == "H2O"
    assert s.countOfAtoms("Mg(OH)2") == "H2MgO2"
    assert s.countOfAtoms("K4(ON(SO3)2)2") == "K4N2O14S4"

    assert s.countOfAtoms("Be32") == "Be32"
    assert s.countOfAtoms("H") == "H"
    assert s.countOfAtoms("(H)") == "H"
    assert s.countOfAtoms("(H)2") == "H2"
    assert s.countOfAtoms("((H)2O)3") == "H6O3"
    assert s.countOfAtoms("NaCl") == "ClNa"

    print("all tests passed")

run_tests()
TestWhy
"H2O"Basic atom counts
"Mg(OH)2"One parenthesized group
"K4(ON(SO3)2)2"Nested parentheses
"Be32"Multi-letter atom and multi-digit count
"H"Single atom with implicit count
"(H)"Group with implicit multiplier
"((H)2O)3"Nested group multiplier
"NaCl"Lexicographic output order