A clear SQL guide for summing 2016 investments for policies with repeated 2015 investment values and unique locations.
Problem Restatement
We are given an Insurance table.
Each row represents one policyholder and includes:
- Their policy ID.
- Their total investment value in 2015.
- Their total investment value in 2016.
- Their location as latitude and longitude.
We need to report the sum of tiv_2016 for policyholders who satisfy both conditions:
- Their
tiv_2015value is shared by at least one other policyholder. - Their
(lat, lon)location is unique in the table.
The final sum must be rounded to two decimal places.
Table
Insurance
| Column | Type | Meaning |
|---|---|---|
pid | int | Policy ID |
tiv_2015 | float | Total investment value in 2015 |
tiv_2016 | float | Total investment value in 2016 |
lat | float | Latitude of the policyholder’s city |
lon | float | Longitude of the policyholder’s city |
pid is the primary key.
Example
Input:
| pid | tiv_2015 | tiv_2016 | lat | lon |
|---|---|---|---|---|
| 1 | 10 | 5 | 10 | 10 |
| 2 | 20 | 20 | 20 | 20 |
| 3 | 10 | 30 | 20 | 20 |
| 4 | 10 | 40 | 40 | 40 |
Output:
| tiv_2016 |
|---|
| 45.00 |
Policy 1 has tiv_2015 = 10, which is shared by policies 3 and 4. Its location (10, 10) is unique, so it qualifies.
Policy 2 has tiv_2015 = 20, which is not shared by another policyholder, so it does not qualify.
Policy 3 has tiv_2015 = 10, but its location (20, 20) is shared with policy 2, so it does not qualify.
Policy 4 has tiv_2015 = 10 and unique location (40, 40), so it qualifies.
The sum is:
5 + 40 = 45So the result is:
45.00First Thought: Sum All 2016 Investments
A first attempt might be:
SELECT ROUND(SUM(tiv_2016), 2) AS tiv_2016
FROM Insurance;This sums every policyholder’s 2016 investment.
But the problem asks for only policyholders that pass both filters:
- Repeated
tiv_2015. - Unique
(lat, lon).
So we need to filter rows before summing.
Key Insight
The row qualifies based on two group-level facts.
First, the tiv_2015 value must appear more than once:
GROUP BY tiv_2015
HAVING COUNT(*) > 1Second, the (lat, lon) pair must appear exactly once:
GROUP BY lat, lon
HAVING COUNT(*) = 1Then the main query keeps only rows whose values belong to those qualifying groups.
Algorithm
- Find all
tiv_2015values that appear more than once. - Find all
(lat, lon)pairs that appear exactly once. - Select rows from
Insurancethat satisfy both conditions. - Sum their
tiv_2016values. - Round the result to two decimal places.
Correctness
The first subquery groups rows by tiv_2015 and keeps only groups with count greater than 1. Therefore, a row passes this condition exactly when its 2015 investment value is shared by at least one other policyholder.
The second subquery groups rows by (lat, lon) and keeps only groups with count equal to 1. Therefore, a row passes this condition exactly when no other policyholder has the same location.
The main query applies both filters with AND, so it keeps exactly the policyholders that satisfy both requirements. It then sums tiv_2016 over exactly those rows and rounds the final result to two decimal places.
Therefore, the query returns the required value.
Complexity
Let:
n = number of rows in Insurance| Metric | Value | Why |
|---|---|---|
| Time | O(n) to O(n log n) | Depends on how the database implements grouping |
| Space | O(n) | Grouping may store distinct tiv_2015 values and distinct locations |
The exact execution plan depends on indexes and the SQL engine.
Implementation
SELECT
ROUND(SUM(tiv_2016), 2) AS tiv_2016
FROM Insurance
WHERE tiv_2015 IN (
SELECT tiv_2015
FROM Insurance
GROUP BY tiv_2015
HAVING COUNT(*) > 1
)
AND (lat, lon) IN (
SELECT lat, lon
FROM Insurance
GROUP BY lat, lon
HAVING COUNT(*) = 1
);Code Explanation
This subquery finds repeated 2015 investment values:
SELECT tiv_2015
FROM Insurance
GROUP BY tiv_2015
HAVING COUNT(*) > 1A tiv_2015 value qualifies only if at least two policyholders have it.
This subquery finds unique locations:
SELECT lat, lon
FROM Insurance
GROUP BY lat, lon
HAVING COUNT(*) = 1A (lat, lon) pair qualifies only if exactly one policyholder has that location.
The main WHERE clause combines both requirements:
WHERE tiv_2015 IN (...)
AND (lat, lon) IN (...)Then the query sums the qualifying 2016 investment values:
SUM(tiv_2016)Finally, the result is rounded:
ROUND(SUM(tiv_2016), 2) AS tiv_2016Window Function Alternative
We can also compute the two counts with window functions.
WITH counted AS (
SELECT
tiv_2016,
COUNT(*) OVER (PARTITION BY tiv_2015) AS same_tiv_2015_count,
COUNT(*) OVER (PARTITION BY lat, lon) AS same_location_count
FROM Insurance
)
SELECT
ROUND(SUM(tiv_2016), 2) AS tiv_2016
FROM counted
WHERE same_tiv_2015_count > 1
AND same_location_count = 1;This version often reads more directly:
- Count how many rows share the same
tiv_2015. - Count how many rows share the same location.
- Keep rows where the first count is greater than
1and the second count is1.
Testing
Sample data:
CREATE TABLE Insurance (
pid INT PRIMARY KEY,
tiv_2015 FLOAT,
tiv_2016 FLOAT,
lat FLOAT,
lon FLOAT
);
INSERT INTO Insurance (pid, tiv_2015, tiv_2016, lat, lon) VALUES
(1, 10, 5, 10, 10),
(2, 20, 20, 20, 20),
(3, 10, 30, 20, 20),
(4, 10, 40, 40, 40);Query:
SELECT
ROUND(SUM(tiv_2016), 2) AS tiv_2016
FROM Insurance
WHERE tiv_2015 IN (
SELECT tiv_2015
FROM Insurance
GROUP BY tiv_2015
HAVING COUNT(*) > 1
)
AND (lat, lon) IN (
SELECT lat, lon
FROM Insurance
GROUP BY lat, lon
HAVING COUNT(*) = 1
);Expected result:
| tiv_2016 |
|---|
| 45.00 |
Additional test cases:
| Case | Expected behavior |
|---|---|
tiv_2015 appears once | Row is excluded |
tiv_2015 appears multiple times | Row passes the first condition |
(lat, lon) appears once | Row passes the second condition |
(lat, lon) appears multiple times | Row is excluded |
| Row passes both conditions | Its tiv_2016 is included in the sum |