Exam Question Bank
50 questions covering all DSA-AS topics — Consensus, Storage, Processing, AI Systems
Showing 50 of 50 questions
State the FLP impossibility result. What assumptions does it make and what does it prove? Does this mean consensus is impossible in practice?
Compare 2PC and Paxos for achieving distributed consensus. What failure modes does each handle differently?
In Paxos, why must a proposer in Phase 2 use the value with the highest ballot number it received in Phase 1 responses, rather than its own preferred value?
What is the minimum number of replicas needed to tolerate f crash failures in Paxos/Raft?
Explain why Paxos cannot guarantee liveness. Give a concrete scenario where Paxos loops indefinitely.
Describe the Raft leader election process. What prevents two leaders from being elected in the same term?
How does Raft's log replication ensure that committed entries are never lost, even after leader failures?
What is the difference between crash fault tolerance (CFT) and Byzantine fault tolerance (BFT)?
Describe the PBFT normal operation protocol. Why does it require 3f+1 replicas to tolerate f Byzantine failures, while Raft requires only 2f+1 for crash failures?
Why does Bitcoin use double SHA-256 (SHA256(SHA256(x))) rather than a single hash?
Explain proof-of-work as a solution to Byzantine consensus. What are its fault tolerance assumptions? Under what attack does it fail?
What is the nothing-at-stake problem in Proof-of-Stake? How does slashing address it?
What is a Merkle tree? How is it used in the Bitcoin block structure?
Compare PoW and PoS: energy usage, attack cost, finality guarantees.
Explain Bigtable's 3-level tablet location hierarchy. Why exactly 3 levels?
Trace a complete read request in Bigtable from client to data, including all cache miss scenarios.
What is a column family in Bigtable? What is a column qualifier? Give an example.
What happens when a tablet server fails in Bigtable? Walk through the recovery process.
Compare minor compaction, merging compaction, and major compaction in Bigtable. Why are they necessary?
Why does data not flow through the Bigtable master? What are the implications for scalability?
How does Bigtable use Chubby? What would happen if Chubby became unavailable?
Why does Bigtable store rows sorted by row key? How does this affect query design?
Design a Bigtable schema for storing web crawl data. Justify your choice of row key and column families.
Explain consistent hashing. Why does adding/removing a node only affect O(K/N) keys (where K = number of keys, N = number of nodes)?
What is the purpose of virtual nodes in Dynamo? How do they improve load distribution in a heterogeneous cluster?
Dynamo uses W+R > N for quorum. If N=3, W=2, R=2, is strong consistency guaranteed? What failure scenario breaks this?
Describe hinted handoff. When does it fail to provide durability?
Explain how vector clocks work in Dynamo. When does Dynamo return multiple values to the client, and what is the client expected to do?
Why does Dynamo use MD5 rather than a cryptographic hash like SHA-256 for key placement?
Compare Bigtable and Dynamo along these dimensions: data model, consistency model, CAP positioning, partitioning strategy, conflict resolution.
What is an RDD? What are its key properties (immutability, partitioning, lineage)?
What is the difference between a transformation and an action in Spark? Give two examples of each.
How does Spark use lineage for fault tolerance? Compare this to replication.
Why is reduceByKey more efficient than groupByKey followed by a reduce? Explain in terms of network traffic.
What is a D-Stream in Spark Streaming? How does it differ from a traditional stream processing system?
What is the minimum latency achievable with D-Streams? What determines it?
Explain the difference between transformation lineage recovery and checkpoint recovery in D-Streams. When is each used?
How does Flink treat batch processing differently from Spark? What is the philosophical difference?
What is a watermark in Flink? Why is it necessary for correct event-time windowing?
Compare tumbling windows and sliding windows in Flink. Give a use case for each.
Explain Flink's barrier-based checkpointing mechanism. How does it achieve exactly-once semantics without stopping the stream?
What is the trade-off between allowed lateness (Δ in watermarks) and result correctness?
What is the difference between a Ray task and a Ray actor? Give an example use case for each.
Explain Ray's Global Control Store (GCS). What information does it store? Why is decoupling control state from the scheduler important?
How does Ray handle fault tolerance for stateless tasks vs stateful actors differently?
What is the parameter server pattern for distributed training? What are its limitations at scale?
What is AllReduce? How does ring-AllReduce solve the parameter server bottleneck?
What problem does SkyPilot solve? How does it relate to Ray?
Describe the SkyPilot optimizer. What factors does it consider when placing a task across clouds?
What is data gravity? Why does it complicate multi-cloud optimization?