Project Euler Problem 352
Each one of the 25 sheep in a flock must be tested for a rare virus, known to affect 2 of the sheep population.
Solution
Answer: 378563.260589
Let $p$ be the infection probability for each sheep, and let
$$q = 1-p.$$
We must compute the optimal expected number of tests $T(s,p)$ for $s=10000$, summed over
$$p=0.01,0.02,\dots,0.50.$$
The key difficulty is that after a pooled test comes back positive, we gain information: we know that at least one sheep in that pool is infected.
This conditional information changes the optimal continuation strategy.
1. Mathematical analysis
We define two DP functions.
A. $T(n)$
$T(n)$ = minimum expected number of tests needed to completely classify $n$ sheep when each sheep is independently infected with probability $p$.
B. $C(n)$
$C(n)$ = minimum expected number of additional tests needed to classify $n$ sheep given that we already know at least one is infected.
Clearly,
$$C(1)=0$$
because if one sheep is known to belong to a positive pool, it must be infected.
Also,
$$T(0)=0,\qquad T(1)=1.$$
2. Recurrence for $T(n)$
Suppose the first action is:
- choose a group of size $m$,
- perform one pooled test on those $m$ sheep.
Case 1: pooled test negative
Probability:
$$q^m.$$
Then all $m$ sheep are clean, and we still must process the remaining $n-m$ sheep:
$$T(n-m).$$
Case 2: pooled test positive
Probability:
$$1-q^m.$$
Now we must fully resolve those $m$ sheep knowing at least one is infected:
$$C(m),$$
and then still process the remaining $n-m$:
$$T(n-m).$$
Including the pooled test itself:
$$T(n) = \min_{1\le m\le n} \left[ 1 + T(n-m) + (1-q^m)C(m) \right].$$
3. Recurrence for $C(n)$
Now assume:
- we already know among these $n$ sheep,
- at least one is infected.
We split off a subgroup of size $m$ and test it.
Conditional probabilities
The probability the tested subgroup is negative while the whole group is positive:
$$\frac{q^m-q^n}{1-q^n}.$$
Why?
- subgroup negative: $q^m$,
- all $n$ negative: $q^n$,
- subtract impossible overlap.
Similarly, the probability the subgroup test is positive:
$$\frac{1-q^m}{1-q^n}.$$
If subgroup negative
Then:
- subgroup is fully resolved,
- the remaining $n-m$ sheep still contain at least one infected sheep.
Cost:
$$C(n-m).$$
If subgroup positive
Then:
- subgroup requires $C(m)$,
- the remaining sheep are now unconditional again, costing $T(n-m)$.
Including the test itself:
$$C(n) = \min_{1\le m<n} \left[ 1 + \frac{q^m-q^n}{1-q^n}C(n-m) + \frac{1-q^m}{1-q^n}\left(C(m)+T(n-m)\right) \right].$$
These two recurrences completely determine the optimum.
4. Python implementation
from math import inf
def optimal_tests(p, nmax):
q = 1.0 - p
# T[n] = optimal expected tests for n sheep
T = [0.0] * (nmax + 1)
# C[n] = optimal expected additional tests
# given at least one infected among n sheep
C = [0.0] * (nmax + 1)
T[1] = 1.0
C[1] = 0.0
# Empirically the optimal split sizes stay small.
# This bound is safely above all optimal values needed.
LIMIT = min(nmax, int(2 / p) + 50)
for n in range(2, nmax + 1):
# ---- Compute C[n] ----
denom = 1.0 - q ** n
best = inf
upper = min(n - 1, LIMIT)
for m in range(1, upper + 1):
prob_negative = (q ** m - q ** n) / denom
prob_positive = (1.0 - q ** m) / denom
value = (
1.0
+ prob_negative * C[n - m]
+ prob_positive * (C[m] + T[n - m])
)
if value < best:
best = value
C[n] = best
# ---- Compute T[n] ----
best = inf
upper = min(n, LIMIT)
for m in range(1, upper + 1):
value = (
1.0
+ T[n - m]
+ (1.0 - q ** m) * C[m]
)
if value < best:
best = value
T[n] = best
return T[nmax]
# Verify examples
print(round(optimal_tests(0.02, 25), 6))
print(round(optimal_tests(0.10, 25), 6))
# Final computation
answer = 0.0
for k in range(1, 51):
p = k / 100.0
answer += optimal_tests(p, 10000)
print(f"{answer:.6f}")
5. Code walkthrough
Arrays
T = [0.0] * (nmax + 1)
C = [0.0] * (nmax + 1)
Store DP values.
Computing $C[n]$
prob_negative = (q ** m - q ** n) / denom
prob_positive = (1.0 - q ** m) / denom
These are the conditional probabilities given that the whole group already contains an infection.
Computing $T[n]$
value = (
1.0
+ T[n - m]
+ (1.0 - q ** m) * C[m]
)
This implements exactly:
$$1 + T(n-m) + (1-q^m)C(m).$$
6. Verification against the examples
The program reproduces the values given in the statement:
$$T(25,0.02)=4.155452$$
and
$$T(25,0.10)=12.702124.$$
So the DP is correct.
7. Final computation
Summing
$$T(10000,p) \quad\text{for}\quad p=0.01,0.02,\dots,0.50$$
gives:
$$378563.26058948983$$
Rounded to six decimal places:
$$378563.260589$$
Answer: 378563.260589