Skip to content

LeetCode 193: Valid Phone Numbers

A clear explanation of the Valid Phone Numbers shell problem using grep and regular expressions.

Problem Restatement

We are given a text file named file.txt.

Each line contains one phone number candidate.

We need to print only the valid phone numbers.

A valid phone number must match one of these two formats:

xxx-xxx-xxxx

or:

(xxx) xxx-xxxx

Here, each x is a digit.

For example:

987-123-4567
123 456 7890
(123) 456-7890

The valid lines are:

987-123-4567
(123) 456-7890

Input and Output

ItemMeaning
InputA file named file.txt
OutputAll valid phone number lines
Format 1xxx-xxx-xxxx
Format 2(xxx) xxx-xxxx
LanguageBash shell script

Examples

Input file:

987-123-4567
123 456 7890
(123) 456-7890

Output:

987-123-4567
(123) 456-7890

The first line is valid because it has three digits, a dash, three digits, a dash, then four digits.

The second line is invalid because it uses spaces instead of the required dash format.

The third line is valid because it has the area code inside parentheses, followed by a space, then three digits, a dash, and four digits.

First Thought: Match the Valid Pattern

This is a pattern matching problem.

We do not need to transform the file. We only need to print lines that match one of two exact regular expressions.

The two valid formats are:

[0-9][0-9][0-9]-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]

and:

([0-9][0-9][0-9]) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]

In regular expression form, we can write repeated digits more compactly as:

[0-9]{3}-[0-9]{3}-[0-9]{4}

and:

\([0-9]{3}\) [0-9]{3}-[0-9]{4}

The parentheses must be escaped because parentheses have special meaning in extended regular expressions.

Key Insight

We must match the whole line, not just part of the line.

That is why we use:

^

for the beginning of the line, and:

$

for the end of the line.

Without these anchors, a line like this could be incorrectly accepted:

abc 987-123-4567 xyz

It contains a valid-looking phone number inside the line, but the whole line is not a valid phone number.

Algorithm

Use grep with extended regular expressions.

The full pattern is:

^(\([0-9]{3}\) |[0-9]{3}-)[0-9]{3}-[0-9]{4}$

This means:

PartMeaning
^Start of the line
(Start of a regex group
$[0-9]{3}$Match (xxx)
``
[0-9]{3}-Match xxx-
)End of the regex group
[0-9]{3}Match the next three digits
-Match a dash
[0-9]{4}Match the final four digits
$End of the line

So the first group chooses between the two allowed prefixes:

(xxx) 

or:

xxx-

Then the rest is shared:

xxx-xxxx

Correctness

The regular expression starts with ^, so matching must begin at the first character of the line.

The first group matches exactly one of the two valid prefixes: either an area code in parentheses followed by one space, or three digits followed by a dash.

After that, the expression requires three digits, one dash, and four digits.

The expression ends with $, so no extra characters may appear after the final digit.

Therefore, a line is printed if and only if the whole line has one of the two valid phone number formats.

Complexity

Let n be the number of lines in the file, and let m be the maximum line length.

MetricValueWhy
TimeO(nm)grep checks each line against the pattern
SpaceO(1)The command streams the file line by line

Implementation

# Read from the file file.txt and output all valid phone numbers to stdout.
grep -E '^(\([0-9]{3}\) |[0-9]{3}-)[0-9]{3}-[0-9]{4}$' file.txt

Code Explanation

The command uses grep -E:

grep -E

The -E option enables extended regular expressions. This lets us use {3}, {4}, grouping, and | more cleanly.

The first valid prefix is:

\([0-9]{3}\) 

This matches an opening parenthesis, three digits, a closing parenthesis, and one space.

The second valid prefix is:

[0-9]{3}-

This matches three digits followed by a dash.

The shared suffix is:

[0-9]{3}-[0-9]{4}

This matches three digits, a dash, and four digits.

The anchors:

^

and:

$

make sure the whole line matches the phone number format.

Testing

Create a sample file:

cat > file.txt << 'EOF'
987-123-4567
123 456 7890
(123) 456-7890
EOF

Run the solution:

grep -E '^(\([0-9]{3}\) |[0-9]{3}-)[0-9]{3}-[0-9]{4}$' file.txt

Expected output:

987-123-4567
(123) 456-7890

Test invalid extra characters:

cat > file.txt << 'EOF'
abc 987-123-4567
987-123-4567 xyz
987-123-4567
EOF

Expected output:

987-123-4567

Test both valid formats:

cat > file.txt << 'EOF'
111-222-3333
(111) 222-3333
111 222 3333
(111)-222-3333
EOF

Expected output:

111-222-3333
(111) 222-3333