A clear explanation of implementing a simplified Twitter using hash maps, sets, timestamps, and a heap.
Problem Restatement
We need to design a simplified Twitter system.
The system supports four operations:
| Method | Meaning |
|---|---|
postTweet(userId, tweetId) | User userId posts a tweet with ID tweetId |
getNewsFeed(userId) | Return up to 10 most recent tweet IDs visible to userId |
follow(followerId, followeeId) | followerId starts following followeeId |
unfollow(followerId, followeeId) | followerId stops following followeeId |
A user’s news feed contains:
- Tweets posted by the user themself.
- Tweets posted by users they follow.
The feed must be ordered from most recent to least recent.
Each tweetId is unique. The problem asks for at most the 10 most recent tweet IDs in the feed.
Input and Output
| Item | Meaning |
|---|---|
| Constructor | Twitter() initializes the object |
| Input users | Integer user IDs |
| Input tweets | Integer tweet IDs |
| Output feed | List of up to 10 tweet IDs |
| Ordering | Most recent first |
Example method shape:
class Twitter:
def __init__(self):
...
def postTweet(self, userId: int, tweetId: int) -> None:
...
def getNewsFeed(self, userId: int) -> list[int]:
...
def follow(self, followerId: int, followeeId: int) -> None:
...
def unfollow(self, followerId: int, followeeId: int) -> None:
...Examples
Example calls:
twitter = Twitter()
twitter.postTweet(1, 5)
twitter.getNewsFeed(1)
twitter.follow(1, 2)
twitter.postTweet(2, 6)
twitter.getNewsFeed(1)
twitter.unfollow(1, 2)
twitter.getNewsFeed(1)Outputs:
[5]
[6, 5]
[5]Step by step:
| Operation | Result |
|---|---|
User 1 posts tweet 5 | User 1 feed is [5] |
User 1 follows user 2 | User 1 can now see user 2 tweets |
User 2 posts tweet 6 | User 1 feed is [6, 5] |
User 1 unfollows user 2 | User 1 feed returns to [5] |
First Thought: Collect and Sort
A simple design is:
- Store every tweet with a timestamp.
- Store follow relationships.
- For
getNewsFeed(userId), collect all tweets from the user and followees. - Sort them by timestamp.
- Return the first 10 tweet IDs.
This is correct, but it can be slow if users have many tweets.
For example, if a user follows many active users, sorting all visible tweets every time is wasteful because we only need the 10 newest tweets.
Key Insight
Each user’s tweets are naturally ordered by posting time.
So instead of sorting all visible tweets, we can merge the recent tweet lists from:
user themself + all followeesThis is a classic top-k merge problem.
We use:
| Data | Structure |
|---|---|
| User tweets | Hash map: userId -> list of tweets |
| Follow graph | Hash map: followerId -> set of followeeIds |
| Recency order | Global increasing timestamp |
| Feed merge | Heap |
Each tweet can be stored as:
(timestamp, tweetId)When returning the feed, we use a heap to always pick the most recent available tweet.
Python has a min-heap, so we store negative timestamps to simulate a max-heap.
Algorithm
Maintain:
self.time
self.tweets
self.followingWhere:
| Variable | Meaning |
|---|---|
self.time | Increasing timestamp |
self.tweets[userId] | List of tweets posted by this user |
self.following[userId] | Set of users this user follows |
For postTweet(userId, tweetId):
- Append
(time, tweetId)totweets[userId]. - Increase
time.
For follow(followerId, followeeId):
- Add
followeeIdtofollowing[followerId].
For unfollow(followerId, followeeId):
- Remove
followeeIdfromfollowing[followerId]if it exists. - Do not let this break the user’s own feed. We will always include the user themself during
getNewsFeed.
For getNewsFeed(userId):
- Build the set of visible users.
- Add
userIdto that set. - For each visible user, push their newest tweet into the heap.
- Pop the newest tweet from the heap.
- After popping a tweet from one user, push that user’s next older tweet.
- Stop after 10 tweets or when the heap is empty.
Correctness
Each tweet receives a strictly increasing timestamp when it is posted. Therefore, larger timestamps represent newer tweets.
For a given user, their tweet list is stored in posting order. The newest tweet is at the end of the list.
In getNewsFeed, the heap initially contains the newest tweet from every visible user. The newest tweet in the entire feed must be among these candidates, because every visible user’s remaining tweets are no newer than their own newest tweet.
When the heap pops one tweet, the algorithm appends that tweet ID to the answer. Then it inserts the next older tweet from the same user. This preserves the same invariant: the heap contains the newest not-yet-returned tweet from every visible user that still has tweets left.
By repeating this process, the algorithm returns tweets from most recent to least recent.
The loop stops after 10 tweets, so the method returns at most 10 tweet IDs. If fewer than 10 tweets exist, the heap becomes empty and all visible tweets are returned.
Therefore, getNewsFeed returns exactly the required feed.
Complexity
Let:
| Symbol | Meaning |
|---|---|
F | Number of users visible in the feed: user themself plus followees |
K | Number of tweets returned, at most 10 |
| Operation | Time | Space |
|---|---|---|
postTweet | O(1) | O(1) extra |
follow | O(1) average | O(1) extra |
unfollow | O(1) average | O(1) extra |
getNewsFeed | O(F + K log F) | O(F) |
Since K <= 10, the feed operation is efficient even when users have many old tweets.
Implementation
from collections import defaultdict
import heapq
class Twitter:
def __init__(self):
self.time = 0
self.tweets = defaultdict(list)
self.following = defaultdict(set)
def postTweet(self, userId: int, tweetId: int) -> None:
self.tweets[userId].append((self.time, tweetId))
self.time += 1
def getNewsFeed(self, userId: int) -> list[int]:
users = set(self.following[userId])
users.add(userId)
heap = []
for uid in users:
user_tweets = self.tweets[uid]
if user_tweets:
index = len(user_tweets) - 1
time, tweet_id = user_tweets[index]
heapq.heappush(heap, (-time, tweet_id, uid, index))
feed = []
while heap and len(feed) < 10:
neg_time, tweet_id, uid, index = heapq.heappop(heap)
feed.append(tweet_id)
next_index = index - 1
if next_index >= 0:
next_time, next_tweet_id = self.tweets[uid][next_index]
heapq.heappush(
heap,
(-next_time, next_tweet_id, uid, next_index),
)
return feed
def follow(self, followerId: int, followeeId: int) -> None:
if followerId != followeeId:
self.following[followerId].add(followeeId)
def unfollow(self, followerId: int, followeeId: int) -> None:
self.following[followerId].discard(followeeId)Code Explanation
The timestamp starts at 0:
self.time = 0Each posted tweet is stored with the current timestamp:
self.tweets[userId].append((self.time, tweetId))
self.time += 1The timestamp gives us global ordering across all users.
Follow relationships are stored as a set:
self.following = defaultdict(set)Using a set means duplicate follows do not create duplicate feed entries.
In getNewsFeed, we include both followees and the user themself:
users = set(self.following[userId])
users.add(userId)Then we add the newest tweet from each visible user into the heap:
index = len(user_tweets) - 1
time, tweet_id = user_tweets[index]
heapq.heappush(heap, (-time, tweet_id, uid, index))The heap stores:
(-time, tweet_id, uid, index)We use -time because Python’s heap is a min-heap. A larger timestamp becomes a smaller negative number, so newer tweets come out first.
When we pop a tweet, we add it to the feed:
neg_time, tweet_id, uid, index = heapq.heappop(heap)
feed.append(tweet_id)Then we push the next older tweet from the same user:
next_index = index - 1This is the same merge idea used to merge sorted lists.
For follow, we avoid self-follow:
if followerId != followeeId:
self.following[followerId].add(followeeId)For unfollow, discard is safe even if the follow relationship does not exist:
self.following[followerId].discard(followeeId)Testing
def run_tests():
twitter = Twitter()
twitter.postTweet(1, 5)
assert twitter.getNewsFeed(1) == [5]
twitter.follow(1, 2)
twitter.postTweet(2, 6)
assert twitter.getNewsFeed(1) == [6, 5]
twitter.unfollow(1, 2)
assert twitter.getNewsFeed(1) == [5]
twitter = Twitter()
for tweet_id in range(20):
twitter.postTweet(1, tweet_id)
assert twitter.getNewsFeed(1) == [19, 18, 17, 16, 15, 14, 13, 12, 11, 10]
twitter = Twitter()
twitter.postTweet(1, 101)
twitter.postTweet(2, 201)
twitter.postTweet(1, 102)
twitter.postTweet(2, 202)
twitter.follow(3, 1)
twitter.follow(3, 2)
assert twitter.getNewsFeed(3) == [202, 102, 201, 101]
twitter.unfollow(3, 1)
assert twitter.getNewsFeed(3) == [202, 201]
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
| Single user post | Checks basic tweet storage |
| Follow then feed | Checks followee tweets appear |
| Unfollow then feed | Checks followee tweets disappear |
| More than 10 tweets | Checks feed limit |
| Multiple users | Checks global recency ordering |