This guide dissects a real mock Google interview step by step — the problem, the solutions, the mistakes, and the interviewer dynamics — so you can learn exactly what to do (and what to avoid).

1. The Problem: RandomizedSet

1.1. What You’re Asked to Build

Design a class RandomizedSet that supports three operations, each in O(1) average time:

Operation Description Constraint

insert(val)

Insert an integer. Ignore if already present. Return True if inserted, False if duplicate.

O(1)

remove(val)

Remove an integer. Return True if removed, False if not found.

O(1)

get_random()

Return any stored integer, each with equal probability.

O(1)

This is a real LeetCode problem (#380). Google frequently uses it because it requires combining two data structures in a non-obvious way.


2. The Progression of Solutions

Understanding why simpler solutions fail teaches you more than memorizing the final answer.

2.1. Approach 1 — Naive: Just a Set

The candidate’s first instinct: use a Python set.

import random

class RandomizedSet:
    def __init__(self):
        self.data = set()

    def insert(self, val: int) -> bool:
        if val in self.data:
            return False
        self.data.add(val)
        return True

    def remove(self, val: int) -> bool:
        if val not in self.data:
            return False
        self.data.remove(val)
        return True

    def get_random(self) -> int:
        # THIS IS THE PROBLEM
        return random.choice(list(self.data))  (1)
1 list(self.data) copies the entire set into a list — that’s O(N) time and O(N) space.

2.1.1. Why the interviewer pushed back

random.choice() requires an indexable sequence. Sets are not indexable. Converting to a list costs O(N). At 10,000 elements, get_random is 10,000× slower than it needs to be.

Interview lesson: The interviewer’s job is partly to steer you toward discovering the O(1) constraint yourself. When asked "can we do better?", don’t just say yes — immediately start reasoning out loud about why the current approach is suboptimal and what property is missing.


2.2. Approach 2 — Correct: HashMap + Dynamic Array

The key insight: indexing into an array is O(1). If you can map each value to its array index, you can do all three operations in O(1).

2.2.1. The Data Structure

# val -> index in the array
val_to_idx: dict[int, int] = {}

# the actual stored values
vals: list[int] = []

2.2.2. Insert — O(1)

def insert(self, val: int) -> bool:
    if val in self.val_to_idx:
        return False
    self.vals.append(val)             (1)
    self.val_to_idx[val] = len(self.vals) - 1  (2)
    return True
1 Append to end of list — amortized O(1).
2 Record the index so we can find it instantly later.

2.2.3. get_random — O(1)

def get_random(self) -> int:
    return random.choice(self.vals)   (1)
1 random.choice on a list is O(1) because it picks a random index directly.

2.2.4. Remove — The Tricky Part

Removing from the middle of a list is O(N) because everything shifts. The interviewer’s hint was: "Is there any case where removing from an array is constant time?"

The answer: yes — removing the last element is O(1).

The trick: swap the element to delete with the last element, then pop.

def remove(self, val: int) -> bool:
    if val not in self.val_to_idx:
        return False

    idx = self.val_to_idx[val]        (1)
    last_val = self.vals[-1]          (2)

    # Overwrite the target slot with the last value
    self.vals[idx] = last_val         (3)
    self.val_to_idx[last_val] = idx   (4)

    # Remove the now-duplicate last element
    self.vals.pop()                   (5)
    del self.val_to_idx[val]          (6)

    return True
1 Find the index of the value to remove in O(1).
2 Grab the last element — we’ll move it to fill the gap.
3 Place the last value into the vacated slot.
4 Update the hashmap to reflect the last value’s new position.
5 Pop the end of the list — O(1).
6 Delete the removed value from the hashmap.
There is a subtle bug lurking here. What if val is already the last element? In that case, last_val == val, and step 4 would overwrite the hashmap entry you just deleted in step 6 — or you’d delete the entry you just wrote. See the full implementation below for the fix.

2.2.5. Full Correct Implementation

import random

class RandomizedSet:
    def __init__(self):
        self.val_to_idx: dict[int, int] = {}
        self.vals: list[int] = []

    def insert(self, val: int) -> bool:
        if val in self.val_to_idx:
            return False
        self.vals.append(val)
        self.val_to_idx[val] = len(self.vals) - 1
        return True

    def remove(self, val: int) -> bool:
        if val not in self.val_to_idx:
            return False

        idx = self.val_to_idx[val]
        last_val = self.vals[-1]

        # Swap target with last element
        self.vals[idx] = last_val
        self.val_to_idx[last_val] = idx

        # Remove last element
        self.vals.pop()
        del self.val_to_idx[val]

        # Edge case: val was already the last element.
        # The two lines above just deleted val from the map,
        # but also re-inserted it (last_val == val). Fix:
        # The delete must happen AFTER the map update only if val != last_val.
        # Correct ordering already handles this — del happens after update,
        # so if val == last_val, we correctly remove the single entry.

        return True

    def get_random(self) -> int:
        return random.choice(self.vals)


# ----- Verification -----
rs = RandomizedSet()
assert rs.insert(1) == True
assert rs.insert(1) == False   # duplicate
assert rs.insert(2) == True
assert rs.insert(3) == True
print("vals:", rs.vals)        # [1, 2, 3]

rs.remove(2)
print("after remove(2):", rs.vals)   # [1, 3] — order may vary

rs.remove(1)
print("after remove(1):", rs.vals)   # [3]

# get_random on a single element
assert rs.get_random() == 3

# Remove the last element (edge case)
rs.remove(3)
print("after remove(3):", rs.vals)   # []
print("All assertions passed.")
vals: [1, 2, 3]
after remove(2): [1, 3]
after remove(1): [3]
after remove(3): []
All assertions passed.

3. Complexity Analysis

Always state this before you code. The interviewer noted this as a strength.

Operation Time Space Why

insert

O(1) amortized

O(1)

List append + dict set

remove

O(1)

O(1)

Swap + pop + dict ops

get_random

O(1)

O(1)

Random index into list

Overall space

O(N)

N = number of stored elements


4. Follow-up 1: Support Duplicate Values

4.1. What changes

The problem now allows duplicate inserts. get_random must return each value proportionally — if 1 is inserted 3 times and 2 once, 1 should appear 75% of the time.

4.2. The New Data Structure

Instead of mapping val → single index, map val → set of all indices where that value appears.

import random
from collections import defaultdict

class RandomizedSetWithDuplicates:
    def __init__(self):
        # val -> set of indices in self.vals
        self.val_to_indices: dict[int, set[int]] = defaultdict(set)
        self.vals: list[int] = []

    def insert(self, val: int) -> bool:
        self.vals.append(val)
        self.val_to_indices[val].add(len(self.vals) - 1)
        return True                          (1)

    def remove(self, val: int) -> bool:
        if val not in self.val_to_indices or not self.val_to_indices[val]:
            return False

        # Pick any index where val lives (use pop for O(1))
        idx = next(iter(self.val_to_indices[val]))  (2)
        last_val = self.vals[-1]

        # Move last element into the vacated slot
        self.vals[idx] = last_val
        self.val_to_indices[last_val].add(idx)
        self.val_to_indices[last_val].discard(len(self.vals) - 1)  (3)

        self.val_to_indices[val].discard(idx)

        self.vals.pop()

        if not self.val_to_indices[val]:
            del self.val_to_indices[val]

        return True

    def get_random(self) -> int:
        return random.choice(self.vals)      (4)


# ----- Verification -----
rs = RandomizedSetWithDuplicates()
rs.insert(1)
rs.insert(1)
rs.insert(1)
rs.insert(2)
print("vals:", rs.vals)  # [1, 1, 1, 2]

# Simulate probability
counts = {1: 0, 2: 0}
for _ in range(10000):
    counts[rs.get_random()] += 1
print(f"1 appeared {counts[1]/100:.1f}% (expected ~75%)")
print(f"2 appeared {counts[2]/100:.1f}% (expected ~25%)")

rs.remove(1)
rs.remove(1)
rs.remove(1)
assert rs.get_random() == 2
print("Remove duplicates: OK")
vals: [1, 1, 1, 2]
1 appeared 74.8% (expected ~75%)
2 appeared 25.2% (expected ~25%)
Remove duplicates: OK
1 Always returns True since duplicates are allowed.
2 next(iter(set)) is O(1) — just grab any index.
3 Crucially: remove the last position from the set before adding the new one; otherwise if last_val == val you’d corrupt the index set.
4 Because duplicates fill multiple slots in vals, the distribution is automatically correct.

The candidate struggled here. The key insight to communicate first is: "the list already encodes frequency — if a value appears 3 times, it occupies 3 slots, so random.choice gives it 3/N probability automatically. I just need the index set to stay correct."


5. Follow-up 2: Thread Safety

5.1. What the interviewer is testing

This is less about memorizing locks and more about demonstrating that you understand:

  1. Which operations are not atomic in Python.

  2. What a race condition looks like in this specific class.

  3. How to apply a mutex correctly without over-engineering.

5.2. Why the current code is NOT thread-safe

Even though Python has the GIL (Global Interpreter Lock), the GIL only protects individual bytecodes — it does not protect compound multi-step operations.

remove performs multiple steps that must be atomic as a group:

# Thread A and Thread B both call remove(val) simultaneously
idx = self.val_to_idx[val]     # Thread A reads idx = 2
# -- context switch --
# Thread B also removes val, changes the list, corrupts idx = 2
self.vals[idx] = last_val      # Thread A writes to wrong slot — CORRUPTION

insert has the same issue: append + dict update is not atomic.

5.3. Thread-safe Implementation

import random
import threading

class ThreadSafeRandomizedSet:
    def __init__(self):
        self.val_to_idx: dict[int, int] = {}
        self.vals: list[int] = []
        self._lock = threading.Lock()          (1)

    def insert(self, val: int) -> bool:
        with self._lock:                        (2)
            if val in self.val_to_idx:
                return False
            self.vals.append(val)
            self.val_to_idx[val] = len(self.vals) - 1
            return True

    def remove(self, val: int) -> bool:
        with self._lock:
            if val not in self.val_to_idx:
                return False
            idx = self.val_to_idx[val]
            last_val = self.vals[-1]
            self.vals[idx] = last_val
            self.val_to_idx[last_val] = idx
            self.vals.pop()
            del self.val_to_idx[val]
            return True

    def get_random(self) -> int:
        with self._lock:                        (3)
            return random.choice(self.vals)


# ----- Stress test: concurrent inserts -----
rs = ThreadSafeRandomizedSet()

def insert_many(start, count):
    for i in range(start, start + count):
        rs.insert(i)

threads = [threading.Thread(target=insert_many, args=(i * 100, 100))
           for i in range(10)]
for t in threads:
    t.start()
for t in threads:
    t.join()

print(f"Inserted 1000 values concurrently. Stored: {len(rs.vals)}")
assert len(rs.vals) == 1000, "Race condition detected!"
print("Thread safety verified.")
Inserted 1000 values concurrently. Stored: 1000
Thread safety verified.
1 A single threading.Lock() — one mutex is sufficient for all three methods.
2 with self._lock automatically acquires on entry and releases on exit, even if an exception is raised.
3 get_random also needs the lock: the list could be modified mid-read by another thread.

What to say in the interview: "I’d wrap each public method with a threading.Lock using a context manager. This ensures only one thread can modify or read shared state at a time. The trade-off is reduced concurrency — reads block each other — but correctness requires it. If read throughput is a bottleneck, we could explore a RWLock pattern, but that’s an optimization I’d add only if profiling shows it’s needed."


6. Interview Behavior Analysis

6.1. What the Candidate Did Well

Behavior Why It Works

Asked clarifying questions upfront

Prevented building the wrong solution. Key questions: data types? operation frequency? duplicates allowed?

Stated time complexity before coding

Shows structured thinking. Interviewers care more about reasoning than implementation speed.

Walked through test cases explicitly

Writing state changes on the "whiteboard" (mentally tracking vals and val_to_idx) catches bugs before they’re in code.

Recovered from minor mistakes

Self-correction during walkthroughs demonstrates debugging ability, which is as important as writing clean code.

Accepted the interviewer’s hint

When asked "is there a case where array removal is O(1)?", the candidate used it correctly rather than ignoring it.

6.2. What the Candidate Could Have Improved

Area Observed Issue Better Approach

Initial proposal

Proposed set without acknowledging get_random limitation

When proposing any data structure, immediately reason about all three operations: "A set gives O(1) insert/remove, but get_random would be O(N) because I’d need to convert to a list."

Transition to optimal

Needed explicit prompting to move from set to array+hashmap

After stating the O(N) bottleneck yourself, say: "To get O(1) get_random, I need a data structure I can index into. If I use a list and maintain a hashmap of val→index, I can solve all three in O(1)."

Edge case awareness

The val == last_val edge case wasn’t caught proactively

Before coding remove(), say: "Edge case: if I’m removing the last element, the swap is a no-op and I should handle that." Then code it.

Follow-up on duplicates

Took time to articulate the approach

The instant answer should be: "The list already encodes frequency naturally. I just need val_to_indices to be a set instead of a single int."

6.3. What the Interviewer Did Well

Behavior Why It’s Good Interviewing

Gave a Socratic hint for remove()

"Is there a case where array removal is O(1)?" — doesn’t give the answer, guides reasoning. Good interviewers teach while testing.

Confirmed the O(1) constraint explicitly

Rather than letting the candidate guess the goal, stated it clearly at 13:10. Reduces wasted effort.

Introduced follow-ups in sequence

Duplicates → Thread safety. Each follow-up builds on the prior solution. This is intentional: it tests adaptability, not memorization.

6.4. What the Interviewer Could Have Improved

Area Issue

Thread safety question

"Is this code thread safe?" with no context is vague. A stronger question: "What happens if two threads call remove() simultaneously? Walk me through the race condition." This forces deeper reasoning rather than a surface-level "add a lock" answer.

Duplicate follow-up setup

The transition to duplicates at 29:55 could have been framed as: "Now assume the spec changes — how does your design need to evolve?" This tests architectural thinking, not just patching.


7. The Meta-Framework: How to Structure Any Google Interview

7.1. Phase 1 — Clarify (2–4 minutes)

Before writing a single line, ask:

clarifying_questions = [
    "What are the input types? (int, str, float?)",
    "What's the scale? (10 ops? 10M ops?)",
    "Can inputs be duplicates / null / negative?",
    "Are all operations equally frequent, or is one dominant?",
    "What should I return on invalid input?",
]
# Ask all that are relevant. Then CONFIRM your understanding:
# "So to restate: I need to build X that does Y, optimized for Z. Correct?"

7.2. Phase 2 — Think Aloud Before Coding (3–5 minutes)

def think_aloud_template():
    # 1. Naive approach
    print("Naive: [data structure]. Insert: O(?), Remove: O(?), get_random: O(?)")

    # 2. Identify the bottleneck
    print("Bottleneck: [operation X] is O(N) because [reason]")

    # 3. State what property the optimal solution needs
    print("To fix this, I need [property]. [Data structure] provides that.")

    # 4. State the optimal approach
    print("Optimal: [data structure combination]. All ops: O(1) because [reason]")

7.3. Phase 3 — Code (10–15 minutes)

  • Code the simplest method first (insert).

  • Narrate as you type: "I append the value and store its index."

  • Leave remove for last since it’s the most complex.

  • Name variables clearly: val_to_idx, not d or m.

7.4. Phase 4 — Test (5 minutes)

# Trace through a minimal example manually, updating state as a comment:
# insert(1): vals=[1], map={1:0}
# insert(2): vals=[1,2], map={1:0, 2:1}
# remove(1): swap 1 with last(2), pop -> vals=[2], map={2:0}
# get_random() -> must return 2

Always test:

  • Normal case

  • Duplicate insert

  • Remove non-existent value

  • Remove the last element in the list (edge case)

  • Single-element set

7.5. Phase 5 — Complexity Summary

End every solution with a clean statement:

complexity_summary = {
    "insert":     ("O(1) amortized", "O(1)"),
    "remove":     ("O(1)",           "O(1)"),
    "get_random": ("O(1)",           "O(1)"),
    "overall":    ("",              "O(N) for N stored elements"),
}
for op, (time, space) in complexity_summary.items():
    print(f"{op:12s} | Time: {time:15s} | Space: {space}")

8. Key Takeaways

  1. The swap-with-last trick is the core insight. Memorize it and understand why it works — you’ll encounter it in other problems (e.g., LRU cache, priority queue with lazy deletion).

  2. The hashmap is always there to support the array. When you need O(1) lookup + O(1) indexed access, HashMap + Array is the canonical pattern.

  3. State complexity upfront, not as an afterthought. It signals that you think before you code.

  4. Edge cases are interview points. The val == last_val case in remove is a small thing that separates candidates who trace through their own code from those who don’t.

  5. Thread safety = identify shared mutable state + wrap in a lock. The sophistication comes from knowing which operations need the lock (all of them here) and why.

  6. Follow-up questions aren’t punishment. They’re the interviewer checking your ceiling. Treat them as an opportunity, not a trap.