A clear explanation of designing a randomized multiset with average O(1) insert, remove, and getRandom operations.
Problem Restatement
We need to implement a data structure called RandomizedCollection.
It behaves like a multiset, so duplicate values are allowed.
It must support three operations in average O(1) time:
| Operation | Meaning |
|---|---|
insert(val) | Insert one occurrence of val |
remove(val) | Remove one occurrence of val if present |
getRandom() | Return a random element from the current collection |
For getRandom(), every stored occurrence has equal probability. So if the collection contains:
[1, 1, 2]then 1 should be returned with probability 2 / 3, and 2 should be returned with probability 1 / 3.
insert(val) returns True only when val was not already present before this insertion. Otherwise, it returns False.
remove(val) returns True if one occurrence was removed. Otherwise, it returns False.
Input and Output
| Method | Input | Output |
|---|---|---|
RandomizedCollection() | None | Initializes the collection |
insert(val) | Integer val | True if first occurrence, otherwise False |
remove(val) | Integer val | True if one occurrence was removed, otherwise False |
getRandom() | None | A random current element |
Example class shape:
class RandomizedCollection:
def __init__(self):
...
def insert(self, val: int) -> bool:
...
def remove(self, val: int) -> bool:
...
def getRandom(self) -> int:
...Examples
Create an empty collection:
collection = RandomizedCollection()Insert 1:
collection.insert(1)This returns:
Truebecause 1 was not already present.
Insert another 1:
collection.insert(1)This returns:
Falsebecause 1 was already present.
Now the collection is:
[1, 1]Insert 2:
collection.insert(2)This returns:
TrueNow the collection is:
[1, 1, 2]Call:
collection.getRandom()The result should follow these probabilities:
| Value | Probability |
|---|---|
1 | 2 / 3 |
2 | 1 / 3 |
Remove one 1:
collection.remove(1)This returns:
TrueNow the collection contains one 1 and one 2.
First Thought: Reuse RandomizedSet
LeetCode 380 used:
| Structure | Meaning |
|---|---|
| List | Store all values for random access |
| Hash map | Map each value to its index |
That works when values are unique.
For this problem, duplicates are allowed. A value may appear at many indices.
So this map is no longer enough:
value -> indexWe need:
value -> set of indicesFor example, if the array is:
values = [1, 1, 2]then the index map should be:
{
1: {0, 1},
2: {2},
}This lets us remove one occurrence of a value in average O(1) time.
Key Insight
Use two structures:
| Structure | Meaning |
|---|---|
values | A list containing every occurrence |
indices | A map from value to the set of positions where it appears |
The list allows getRandom() in O(1) by choosing a random index.
The map of sets allows insert() and remove() to find and update occurrences in average O(1).
The hard operation is remove(val).
As in LeetCode 380, we avoid deleting from the middle of the list. We remove by swapping the target occurrence with the last occurrence, then popping the end.
Swap and Pop With Duplicates
Suppose:
values = [1, 1, 2, 3]
indices = {
1: {0, 1},
2: {2},
3: {3},
}Now remove one occurrence of 1.
Pick one index of 1, for example:
remove_idx = 0The last value is:
last_val = 3
last_idx = 3Move 3 into index 0:
values = [3, 1, 2, 3]Now update the index sets:
indices[1].remove(0)
indices[3].remove(3)
indices[3].add(0)Then pop the last element:
values = [3, 1, 2]Now the structures are correct:
indices = {
1: {1},
2: {2},
3: {0},
}Algorithm
Initialize:
values = []
indices = {}For insert(val):
- Check whether
valis already present. - Append
valtovalues. - Add the new index to
indices[val]. - Return
Trueifvalwas absent before insertion, otherwiseFalse.
For remove(val):
- If
valis absent, returnFalse. - Take one index from
indices[val]. - Read the last value from
values. - Move the last value into the removed index.
- Update the index set for the last value.
- Pop the last element from
values. - If
indices[val]becomes empty, delete it from the map. - Return
True.
For getRandom():
- Pick a random element from
values. - Return it.
Correctness
The list values stores every current occurrence in the collection.
The map indices stores exactly the list positions for each value.
When insert(val) is called, the algorithm appends one new occurrence to values and records its new index in indices[val]. Therefore the list and map both include the new occurrence.
When remove(val) is called and val is absent, there is no occurrence to remove, so returning False is correct.
When remove(val) is called and val is present, the algorithm removes exactly one index from indices[val]. This represents removing one occurrence. To keep list deletion constant time, it moves the last list element into the removed index and updates that moved value’s index set. Then it pops the last list slot.
After the swap and pop, every remaining occurrence still appears exactly once in values, and every index set in indices points to the correct positions.
If a value has no remaining indices, deleting the key keeps the map consistent with the collection.
getRandom() chooses uniformly among indices in values. Since each occurrence has exactly one slot in values, each occurrence has equal probability. Therefore values with more occurrences are returned with proportionally higher probability.
Complexity
Let n be the number of stored occurrences.
| Operation | Time | Why |
|---|---|---|
insert(val) | O(1) average | List append and hash set insert |
remove(val) | O(1) average | Hash set pop, index updates, list pop |
getRandom() | O(1) | Random list access |
| Space | O(n) | Store every occurrence and its index |
Implementation
import random
from collections import defaultdict
class RandomizedCollection:
def __init__(self):
self.values = []
self.indices = defaultdict(set)
def insert(self, val: int) -> bool:
was_absent = len(self.indices[val]) == 0
self.indices[val].add(len(self.values))
self.values.append(val)
return was_absent
def remove(self, val: int) -> bool:
if len(self.indices[val]) == 0:
return False
remove_idx = self.indices[val].pop()
last_val = self.values[-1]
last_idx = len(self.values) - 1
self.values[remove_idx] = last_val
self.indices[last_val].add(remove_idx)
self.indices[last_val].discard(last_idx)
self.values.pop()
if len(self.indices[val]) == 0:
del self.indices[val]
return True
def getRandom(self) -> int:
return random.choice(self.values)Code Explanation
We store all occurrences in a list:
self.values = []We map each value to all indices where it appears:
self.indices = defaultdict(set)In insert, we check whether this value had no occurrences before:
was_absent = len(self.indices[val]) == 0Then we add the new index:
self.indices[val].add(len(self.values))
self.values.append(val)For removal, we first check whether the value exists:
if len(self.indices[val]) == 0:
return FalseThen we remove one stored index for that value:
remove_idx = self.indices[val].pop()We read the last value and last index:
last_val = self.values[-1]
last_idx = len(self.values) - 1Move the last value into the removed index:
self.values[remove_idx] = last_valThen update the moved value’s index set:
self.indices[last_val].add(remove_idx)
self.indices[last_val].discard(last_idx)Using discard is safe even when remove_idx == last_idx.
Finally, remove the last list slot:
self.values.pop()If the removed value no longer has occurrences, delete its key:
if len(self.indices[val]) == 0:
del self.indices[val]For random selection:
return random.choice(self.values)Each list slot is equally likely, and duplicates occupy multiple slots.
Testing
def test_randomized_collection():
c = RandomizedCollection()
assert c.insert(1) is True
assert c.insert(1) is False
assert c.insert(2) is True
assert sorted(c.values) == [1, 1, 2]
assert c.remove(1) is True
assert sorted(c.values) == [1, 2]
assert c.remove(1) is True
assert sorted(c.values) == [2]
assert c.remove(1) is False
assert c.getRandom() == 2
assert c.insert(2) is False
assert c.insert(3) is True
assert sorted(c.values) == [2, 2, 3]
assert c.remove(2) is True
assert sorted(c.values) in ([2, 3], [2, 3])
assert c.getRandom() in {2, 3}
print("all tests passed")
test_randomized_collection()Test meaning:
| Test | Why |
|---|---|
Insert first 1 | First occurrence returns True |
Insert duplicate 1 | Duplicate insertion returns False |
Remove 1 once | Only one occurrence is removed |
Remove 1 twice | Second occurrence can still be removed |
Remove missing 1 | Missing value returns False |
| Single remaining value | getRandom() must return that value |
Duplicate 2 | Confirms repeated values remain valid |
| Remove one duplicate | Confirms only one occurrence is deleted |