LeetCode 955: Delete Columns to Make Sorted II

Problem Restatement

We are given an array strs of strings.

All strings have the same length.

We may delete any set of column indices. When we delete a column, we delete that character position from every string.

After deleting some columns, the remaining strings must be in lexicographic order:

strs[0] <= strs[1] <= strs[2] <= ... <= strs[n - 1]

Return the minimum number of deleted columns.

The official constraints are:

Constraint	Value
`1 <= strs.length <= 100`	Number of strings
`1 <= strs[i].length <= 100`	Length of each string
Characters	Lowercase English letters

Source: LeetCode problem statement.

Input and Output

Item	Meaning
Input	Array `strs` of equal-length lowercase strings
Output	Minimum number of columns to delete
Operation	Delete one character position from every string

Example function shape:

def minDeletionSize(strs: list[str]) -> int:
    ...

Examples

Example 1:

strs = ["ca", "bb", "ac"]

Output:

If we delete column 0, the strings become:

["a", "b", "c"]

Now the array is sorted.

Example 2:

strs = ["xc", "yb", "za"]

Output:

The array is already sorted lexicographically:

"xc" <= "yb" <= "za"

We do not need to delete any column.

Example 3:

strs = ["zyx", "wvu", "tsr"]

Output:

Every column creates an ordering problem, so all columns must be deleted.

First Thought: Try All Column Subsets

A direct approach is to try every subset of columns.

For each subset, delete those columns and check whether the resulting rows are sorted.

But if each string has length m, there are:

2^m

possible subsets.

Since m can be 100, this is far too large.

We need a greedy method.

Key Insight

Lexicographic order is decided from left to right.

Once two adjacent rows are already ordered by an earlier kept column, later columns cannot break that pair.

For example:

row1 = "a..."
row2 = "b..."

Since a < b, we already know:

row1 < row2

No later character matters for this pair.

So we only need to worry about adjacent row pairs that are still tied by all kept columns so far.

We track these pairs with an array:

sorted_pair[i] = True

meaning:

strs[i] is already strictly less than strs[i + 1]

using previous kept columns.

Algorithm

Let:

n = len(strs)
m = len(strs[0])

Create:

sorted_pair = [False] * (n - 1)

Then scan columns from left to right.

For each column col:

Check whether keeping this column would break any unresolved adjacent pair.
A pair is unresolved if sorted_pair[row] == False.
If for any unresolved pair:

strs[row][col] > strs[row + 1][col]

then this column must be deleted.

If the column is deleted, do not update sorted_pair.
If the column is kept, mark newly ordered pairs:

strs[row][col] < strs[row + 1][col]

as resolved.

The answer is the number of deleted columns.

Why Deleting a Bad Column Is Safe

Suppose a column causes this for an unresolved pair:

strs[i][col] > strs[i + 1][col]

Unresolved means all previously kept columns were equal for these two rows.

So if we keep this column, the first differing character between the two rows becomes this bad column.

That would force:

strs[i] > strs[i + 1]

which violates sorted order.

Therefore, this column cannot be kept in any valid solution extending the previous kept columns. We must delete it.

Correctness

We prove that the algorithm deletes the minimum number of columns.

The algorithm processes columns from left to right, the same order used by lexicographic comparison.

For each adjacent pair of rows, once an earlier kept column makes the upper row strictly smaller than the lower row, that pair is permanently safe. Later columns cannot change their relative order.

Now consider a column that the algorithm deletes. It deletes the column only when there exists an unresolved adjacent pair i such that:

strs[i][col] > strs[i + 1][col]

Because the pair is unresolved, all previously kept columns are equal for these two rows. If this column were kept, it would be the first differing kept column for this pair, making the upper row larger than the lower row. That would make the final array unsorted.

So every deleted column is necessary.

For a column that the algorithm keeps, it does not create any inversion among unresolved adjacent pairs. Already resolved pairs are safe by earlier columns. Therefore, keeping the column preserves the possibility of a sorted final array.

After all columns are processed, every adjacent pair is either already resolved by some kept column, or all kept characters are equal between the two rows. In both cases, the upper row is less than or equal to the lower row.

Thus the final array is sorted, and since every deleted column was necessary, the number of deletions is minimal.

Complexity

Let:

n = len(strs)
m = len(strs[0])

Metric	Value	Why
Time	`O(n * m)`	Each column checks adjacent row pairs
Space	`O(n)`	The `sorted_pair` array stores state for adjacent pairs

Implementation

class Solution:
    def minDeletionSize(self, strs: list[str]) -> int:
        n = len(strs)
        m = len(strs[0])

        sorted_pair = [False] * (n - 1)
        deletions = 0

        for col in range(m):
            must_delete = False

            for row in range(n - 1):
                if (
                    not sorted_pair[row]
                    and strs[row][col] > strs[row + 1][col]
                ):
                    must_delete = True
                    break

            if must_delete:
                deletions += 1
                continue

            for row in range(n - 1):
                if strs[row][col] < strs[row + 1][col]:
                    sorted_pair[row] = True

        return deletions

Code Explanation

We store whether each adjacent pair is already safely ordered:

sorted_pair = [False] * (n - 1)

For example, sorted_pair[2] describes the pair:

strs[2], strs[3]

For each column:

for col in range(m):

we first check whether this column would create an invalid ordering.

if (
    not sorted_pair[row]
    and strs[row][col] > strs[row + 1][col]
):

This condition means:

This adjacent pair is still tied by previous kept columns.
The current column would make the upper row larger.

So this column must be deleted.

If the column is deleted:

deletions += 1
continue

we skip the update step, because deleted columns cannot help sort any pair.

If the column is kept, we update resolved pairs:

if strs[row][col] < strs[row + 1][col]:
    sorted_pair[row] = True

Once a pair is resolved, it remains resolved.

Finally:

return deletions

returns the minimum number of deleted columns.

Testing

def run_tests():
    s = Solution()

    assert s.minDeletionSize(["ca", "bb", "ac"]) == 1
    assert s.minDeletionSize(["xc", "yb", "za"]) == 0
    assert s.minDeletionSize(["zyx", "wvu", "tsr"]) == 3

    assert s.minDeletionSize(["abc"]) == 0
    assert s.minDeletionSize(["aa", "aa"]) == 0
    assert s.minDeletionSize(["ba", "aa"]) == 1
    assert s.minDeletionSize(["abx", "agz", "bgc", "bfc"]) == 1

    print("all tests passed")

run_tests()

Test meaning:

Test	Why
`["ca","bb","ac"]`	Basic deletion example
`["xc","yb","za"]`	Already sorted
`["zyx","wvu","tsr"]`	Every column must be deleted
Single string	Always sorted
Equal strings	Nondecreasing order allows equality
`["ba","aa"]`	Bad first column must be deleted
Mixed resolved and unresolved pairs	Checks greedy state tracking