A clear explanation of designing a randomized set with average O(1) insert, remove, and getRandom operations.
Problem Restatement
We need to implement a data structure called RandomizedSet.
It supports three operations:
| Operation | Meaning |
|---|---|
insert(val) | Insert val if it does not already exist |
remove(val) | Remove val if it exists |
getRandom() | Return a random element from the current set |
Each operation must run in average O(1) time.
For getRandom(), every element currently in the set must have the same probability of being returned. The problem guarantees that getRandom() is only called when the set has at least one element. The constraints allow at most 2 * 10^5 calls, and values fit in signed 32-bit integer range.
Input and Output
| Method | Input | Output |
|---|---|---|
RandomizedSet() | None | Initializes the data structure |
insert(val) | Integer val | True if inserted, False if already present |
remove(val) | Integer val | True if removed, False if absent |
getRandom() | None | A uniformly random current element |
Example class shape:
class RandomizedSet:
def __init__(self):
...
def insert(self, val: int) -> bool:
...
def remove(self, val: int) -> bool:
...
def getRandom(self) -> int:
...Examples
Example:
randomizedSet = RandomizedSet()Insert 1:
randomizedSet.insert(1)This returns:
Truebecause 1 was not already present.
Remove 2:
randomizedSet.remove(2)This returns:
Falsebecause 2 does not exist.
Insert 2:
randomizedSet.insert(2)This returns:
TrueNow the set contains:
{1, 2}Call:
randomizedSet.getRandom()This should return either 1 or 2, each with probability 1 / 2.
Remove 1:
randomizedSet.remove(1)This returns:
TrueNow the set contains only:
{2}Insert 2 again:
randomizedSet.insert(2)This returns:
Falsebecause 2 is already present.
Now:
randomizedSet.getRandom()must return:
2First Thought: Use a Hash Set
A hash set gives fast insert and remove.
values = set()Then:
insert(val) -> O(1) average
remove(val) -> O(1) averageBut getRandom() is the problem.
A hash set does not support direct random indexing. To choose a random element uniformly, we would need to convert it to a list or iterate through it, which costs O(n).
So a hash set alone is not enough.
Key Insight
We need two abilities at the same time:
| Need | Good structure |
|---|---|
| Check whether a value exists | Hash map |
| Pick a random element by index | Array |
An array lets us do this in O(1):
values[random_index]A hash map lets us find the position of a value in O(1) average time:
index[val]So we store both:
| Structure | Meaning |
|---|---|
values | List of all current values |
pos | Map from value to its index in values |
The only hard part is removal.
Removing from the middle of a Python list costs O(n) because elements after it must shift left.
To avoid that, we use swap and pop.
Swap and Pop Removal
Suppose the list is:
values = [10, 20, 30, 40]and we want to remove 20.
The index of 20 is 1.
Removing directly from index 1 would shift 30 and 40.
Instead:
- Move the last value
40into index1. - Update
pos[40]to1. - Pop the last element.
The list changes like this:
[10, 20, 30, 40]
[10, 40, 30, 40]
[10, 40, 30]Now 20 is gone, and we removed from the end of the list.
Popping from the end is O(1).
Algorithm
Initialize:
values = []
pos = {}For insert(val):
- If
valexists inpos, returnFalse. - Store its index as the current list length.
- Append
valtovalues. - Return
True.
For remove(val):
- If
valdoes not exist inpos, returnFalse. - Get its index.
- Get the last value in
values. - Move the last value into the removed value’s index.
- Update the last value’s index in
pos. - Pop the last list element.
- Delete
valfrompos. - Return
True.
For getRandom():
- Choose a random element from
values. - Return it.
Correctness
The list values stores exactly the elements currently in the set.
The map pos stores the current index of every value in values.
When insert(val) succeeds, val was not present before. The algorithm appends it to values and records its index in pos, so both structures remain consistent.
When insert(val) fails, val already exists, so leaving the structures unchanged is correct.
When remove(val) fails, val is absent, so leaving the structures unchanged is correct.
When remove(val) succeeds, the algorithm removes exactly val. If val is not already the last element, the last element is moved into val’s old position, and its index is updated in pos. Then the duplicate last slot is popped. Finally, val is removed from pos.
After this process, all remaining values still appear exactly once in values, and every value has the correct index in pos.
getRandom() chooses uniformly from the list values. Since each current set element appears exactly once in values, every element has equal probability of being returned.
Therefore all operations satisfy the required behavior.
Complexity
Let n be the number of current elements.
| Operation | Time | Why |
|---|---|---|
insert(val) | O(1) average | Hash map lookup and list append |
remove(val) | O(1) average | Hash map lookup, swap, pop |
getRandom() | O(1) | Random list access |
| Space | O(n) | Store each value in the list and map |
Implementation
import random
class RandomizedSet:
def __init__(self):
self.values = []
self.pos = {}
def insert(self, val: int) -> bool:
if val in self.pos:
return False
self.pos[val] = len(self.values)
self.values.append(val)
return True
def remove(self, val: int) -> bool:
if val not in self.pos:
return False
idx = self.pos[val]
last = self.values[-1]
self.values[idx] = last
self.pos[last] = idx
self.values.pop()
del self.pos[val]
return True
def getRandom(self) -> int:
return random.choice(self.values)Code Explanation
The constructor creates the two structures:
self.values = []
self.pos = {}self.values supports random access.
self.pos maps each value to its index inside self.values.
For insertion, we first reject duplicates:
if val in self.pos:
return FalseThen we record the index and append the value:
self.pos[val] = len(self.values)
self.values.append(val)For removal, we first reject missing values:
if val not in self.pos:
return FalseThen we find the index of the value being removed:
idx = self.pos[val]We also read the last value:
last = self.values[-1]Move the last value into the removed value’s slot:
self.values[idx] = last
self.pos[last] = idxThen remove the old last slot and delete val from the map:
self.values.pop()
del self.pos[val]This also works when val is already the last value. In that case, the assignment writes the value to the same place, then the pop removes it, then the map entry is deleted.
For random selection:
return random.choice(self.values)Since each value appears once in the list, the probability is uniform.
Testing
def test_randomized_set():
rs = RandomizedSet()
assert rs.insert(1) is True
assert rs.remove(2) is False
assert rs.insert(2) is True
assert rs.getRandom() in {1, 2}
assert rs.remove(1) is True
assert rs.insert(2) is False
assert rs.getRandom() == 2
assert rs.remove(2) is True
assert rs.insert(-10) is True
assert rs.insert(100) is True
assert rs.remove(-10) is True
assert rs.getRandom() == 100
print("all tests passed")
test_randomized_set()Test meaning:
| Test | Why |
|---|---|
| Insert new value | Should return True |
| Remove missing value | Should return False |
| Insert duplicate | Should return False |
getRandom() with two values | Must return one current value |
| Remove one value | Remaining value should still work |
| Remove last value | Checks swap and pop when removing the end |
| Negative value | Confirms full integer range behavior |