A detailed explanation of finding the median of two sorted arrays using binary search over partitions.
Problem Restatement
We are given two sorted arrays:
nums1
nums2Their sizes are m and n.
We need to return the median of the two arrays as if both arrays were merged into one sorted array.
The required time complexity is:
O(log(m + n))So we should not solve this by fully merging the arrays. The LeetCode statement requires a logarithmic-time solution.
Input and Output
| Item | Meaning |
|---|---|
| Input | Two sorted integer arrays nums1 and nums2 |
| Output | The median of all values from both arrays |
| Required time | O(log(m + n)) |
| Important detail | The arrays are already sorted |
Example function shape:
def findMedianSortedArrays(nums1: list[int], nums2: list[int]) -> float:
...Examples
Example 1:
nums1 = [1, 3]
nums2 = [2]The merged array would be:
[1, 2, 3]There are three elements, so the median is the middle element:
2Output:
2.00000Example 2:
nums1 = [1, 2]
nums2 = [3, 4]The merged array would be:
[1, 2, 3, 4]There are four elements, so the median is the average of the two middle elements:
(2 + 3) / 2 = 2.5Output:
2.50000First Thought: Merge Both Arrays
Because both arrays are sorted, the simplest idea is to merge them like the merge step in merge sort.
After merging, the problem becomes easy.
If the total length is odd, return the middle value.
If the total length is even, return the average of the two middle values.
Code:
class Solution:
def findMedianSortedArrays(
self,
nums1: list[int],
nums2: list[int]
) -> float:
merged = []
i = 0
j = 0
while i < len(nums1) and j < len(nums2):
if nums1[i] <= nums2[j]:
merged.append(nums1[i])
i += 1
else:
merged.append(nums2[j])
j += 1
while i < len(nums1):
merged.append(nums1[i])
i += 1
while j < len(nums2):
merged.append(nums2[j])
j += 1
total = len(merged)
mid = total // 2
if total % 2 == 1:
return float(merged[mid])
return (merged[mid - 1] + merged[mid]) / 2Problem With Merging
This solution is correct, but it takes linear time.
| Metric | Value |
|---|---|
| Time | O(m + n) |
| Space | O(m + n) |
The problem explicitly asks for O(log(m + n)) time, so full merging does not satisfy the requirement.
Key Insight
The median splits a sorted array into two halves.
For example:
[1, 2, 3, 4, 5, 6]The left half is:
[1, 2, 3]The right half is:
[4, 5, 6]For the split to be correct:
- The left half must contain the correct number of elements.
- Every value in the left half must be less than or equal to every value in the right half.
We do not need to build the merged array.
We only need to find where to cut nums1 and where to cut nums2.
The cuts form a partition.
nums1: left part | right part
nums2: left part | right partTogether, both left parts should contain half of the total elements.
Together, both right parts should contain the rest.
Partition Idea
Assume we cut nums1 at index i.
That means:
nums1[0:i] goes to the left half
nums1[i:] goes to the right halfThen we must cut nums2 at index j so that the total left side has the correct size.
Let:
total = len(nums1) + len(nums2)
half = (total + 1) // 2Then:
j = half - iNow we only need to check whether the partition is valid.
The boundary values are:
| Name | Meaning |
|---|---|
A_left | Biggest value on the left side of nums1 |
A_right | Smallest value on the right side of nums1 |
B_left | Biggest value on the left side of nums2 |
B_right | Smallest value on the right side of nums2 |
The partition is valid when:
A_left <= B_right
B_left <= A_rightIf both are true, all values on the combined left side are less than or equal to all values on the combined right side.
Why Binary Search Works
We binary search the cut position i in the smaller array.
If:
A_left > B_rightthen we took too many elements from nums1.
The cut in nums1 is too far right.
So we move left.
If:
B_left > A_rightthen we took too few elements from nums1.
The cut in nums1 is too far left.
So we move right.
Because each step discards half of the possible cut positions, the search is logarithmic.
Visual Walkthrough
Use:
nums1 = [1, 3]
nums2 = [2]We always binary search the smaller array, so swap them:
A = [2]
B = [1, 3]Total length:
total = 3
half = 2Try cutting A after one element:
A: [2] | []
B: [1] | [3]Boundary values:
A_left = 2
A_right = infinity
B_left = 1
B_right = 3Check:
A_left <= B_right means 2 <= 3 true
B_left <= A_right means 1 <= infinity trueSo the partition is valid.
Because total length is odd, the median is the largest value on the left side:
max(2, 1) = 2Algorithm
Let A be the smaller array and B be the larger array.
- Set
total = len(A) + len(B). - Set
half = (total + 1) // 2. - Binary search the cut position
iinA. - Compute
j = half - i. - Read the four boundary values:
A_leftA_rightB_leftB_right
- If the partition is valid, compute the median.
- If
A_left > B_right, move the search left. - Otherwise, move the search right.
Correctness
The algorithm searches for a partition where the combined left side has exactly half elements.
The value of j is chosen from half - i, so every tested partition has the correct left-side size.
The partition condition checks the only two cross-array cases that can break sorted order:
A_left <= B_right
B_left <= A_rightInside each array, order is already guaranteed because both arrays are sorted.
So when both inequalities hold, every element on the combined left side is less than or equal to every element on the combined right side.
At that point:
If the total length is odd, the median is the largest value on the left side.
If the total length is even, the median is the average of:
largest value on the left side
smallest value on the right sideThe binary search update is also correct.
When A_left > B_right, the cut in A is too far right, so the algorithm moves left.
When B_left > A_right, the cut in A is too far left, so the algorithm moves right.
Therefore the algorithm finds the correct partition and returns the correct median.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(log(min(m, n))) | Binary search runs on the smaller array |
| Space | O(1) | Only a few variables are stored |
This satisfies the required O(log(m + n)) time bound.
Implementation
class Solution:
def findMedianSortedArrays(
self,
nums1: list[int],
nums2: list[int]
) -> float:
A = nums1
B = nums2
if len(A) > len(B):
A, B = B, A
m = len(A)
n = len(B)
total = m + n
half = (total + 1) // 2
left = 0
right = m
while left <= right:
i = (left + right) // 2
j = half - i
A_left = A[i - 1] if i > 0 else float("-inf")
A_right = A[i] if i < m else float("inf")
B_left = B[j - 1] if j > 0 else float("-inf")
B_right = B[j] if j < n else float("inf")
if A_left <= B_right and B_left <= A_right:
if total % 2 == 1:
return float(max(A_left, B_left))
left_max = max(A_left, B_left)
right_min = min(A_right, B_right)
return (left_max + right_min) / 2
if A_left > B_right:
right = i - 1
else:
left = i + 1
return 0.0Code Explanation
We first make sure A is the smaller array:
if len(A) > len(B):
A, B = B, AThis keeps the binary search range small and avoids invalid partition ranges.
Then:
total = m + n
half = (total + 1) // 2half is the number of elements that should appear on the combined left side.
The binary search range is:
left = 0
right = mA cut at 0 means no elements from A go left.
A cut at m means all elements from A go left.
For each cut:
i = (left + right) // 2
j = half - ii cuts A.
j cuts B.
Then we read the boundary values:
A_left = A[i - 1] if i > 0 else float("-inf")
A_right = A[i] if i < m else float("inf")The infinities handle empty sides.
If the cut is before the first element, there is no left value, so we use negative infinity.
If the cut is after the last element, there is no right value, so we use positive infinity.
The same idea applies to B.
The partition is valid when:
if A_left <= B_right and B_left <= A_right:For odd total length:
return float(max(A_left, B_left))For even total length:
return (max(A_left, B_left) + min(A_right, B_right)) / 2If A_left is too large:
right = i - 1Otherwise:
left = i + 1Testing
def run_tests():
s = Solution()
assert s.findMedianSortedArrays([1, 3], [2]) == 2.0
assert s.findMedianSortedArrays([1, 2], [3, 4]) == 2.5
assert s.findMedianSortedArrays([], [1]) == 1.0
assert s.findMedianSortedArrays([2], []) == 2.0
assert s.findMedianSortedArrays([0, 0], [0, 0]) == 0.0
assert s.findMedianSortedArrays([1, 2, 3], [4, 5, 6, 7]) == 4.0
assert s.findMedianSortedArrays([1, 4, 7], [2, 3, 5, 6]) == 4.0
assert s.findMedianSortedArrays([-5, -3, -1], [-2]) == -2.5
print("all tests passed")
run_tests()| Test | Why |
|---|---|
[1, 3], [2] | Standard odd-length example |
[1, 2], [3, 4] | Standard even-length example |
[], [1] | One array is empty |
[2], [] | Other array is empty |
[0, 0], [0, 0] | Duplicate values |
[1, 2, 3], [4, 5, 6, 7] | Arrays do not overlap |
[1, 4, 7], [2, 3, 5, 6] | Interleaved arrays |
| Negative values | Median can be negative or fractional |