Back to Overview

Exam Question Bank

50 questions covering all DSA-AS topics — Consensus, Storage, Processing, AI Systems

Progress0 / 50 answered
Consensus0/14
Bigtable0/9
Dynamo0/7
Spark0/7
Flink0/5
AI Systems0/8

Showing 50 of 50 questions

1
ConsensusFundamentals / FLPHardQ1

State the FLP impossibility result. What assumptions does it make and what does it prove? Does this mean consensus is impossible in practice?

No answer yet
2
Consensus2PC vs PaxosMediumQ2

Compare 2PC and Paxos for achieving distributed consensus. What failure modes does each handle differently?

No answer yet
3
ConsensusPaxosMediumQ3

In Paxos, why must a proposer in Phase 2 use the value with the highest ballot number it received in Phase 1 responses, rather than its own preferred value?

No answer yet
4
ConsensusPaxos / Raft — QuorumsEasyQ4

What is the minimum number of replicas needed to tolerate f crash failures in Paxos/Raft?

No answer yet
5
ConsensusPaxos — LivenessHardQ5

Explain why Paxos cannot guarantee liveness. Give a concrete scenario where Paxos loops indefinitely.

No answer yet
6
ConsensusRaft — Leader ElectionMediumQ6

Describe the Raft leader election process. What prevents two leaders from being elected in the same term?

No answer yet
7
ConsensusRaft — Log ReplicationMediumQ7

How does Raft's log replication ensure that committed entries are never lost, even after leader failures?

No answer yet
8
ConsensusCFT vs BFTEasyQ8

What is the difference between crash fault tolerance (CFT) and Byzantine fault tolerance (BFT)?

No answer yet
9
ConsensusPBFTHardQ9

Describe the PBFT normal operation protocol. Why does it require 3f+1 replicas to tolerate f Byzantine failures, while Raft requires only 2f+1 for crash failures?

No answer yet
10
ConsensusBitcoin / HashingMediumQ10

Why does Bitcoin use double SHA-256 (SHA256(SHA256(x))) rather than a single hash?

No answer yet
11
ConsensusProof of WorkHardQ11

Explain proof-of-work as a solution to Byzantine consensus. What are its fault tolerance assumptions? Under what attack does it fail?

No answer yet
12
ConsensusProof of StakeMediumQ12

What is the nothing-at-stake problem in Proof-of-Stake? How does slashing address it?

No answer yet
13
ConsensusBitcoin / Merkle TreesEasyQ13

What is a Merkle tree? How is it used in the Bitcoin block structure?

No answer yet
14
ConsensusPoW vs PoSMediumQ14

Compare PoW and PoS: energy usage, attack cost, finality guarantees.

No answer yet
15
Storage/BigtableTablet Location HierarchyMediumQ15

Explain Bigtable's 3-level tablet location hierarchy. Why exactly 3 levels?

No answer yet
16
Storage/BigtableRead PathHardQ16

Trace a complete read request in Bigtable from client to data, including all cache miss scenarios.

No answer yet
17
Storage/BigtableData ModelEasyQ17

What is a column family in Bigtable? What is a column qualifier? Give an example.

No answer yet
18
Storage/BigtableFault ToleranceMediumQ18

What happens when a tablet server fails in Bigtable? Walk through the recovery process.

No answer yet
19
Storage/BigtableCompactionHardQ19

Compare minor compaction, merging compaction, and major compaction in Bigtable. Why are they necessary?

No answer yet
20
Storage/BigtableArchitecture / ScalabilityMediumQ20

Why does data not flow through the Bigtable master? What are the implications for scalability?

No answer yet
21
Storage/BigtableChubby DependencyMediumQ21

How does Bigtable use Chubby? What would happen if Chubby became unavailable?

No answer yet
22
Storage/BigtableRow Key DesignEasyQ22

Why does Bigtable store rows sorted by row key? How does this affect query design?

No answer yet
23
Storage/BigtableSchema DesignHardQ23

Design a Bigtable schema for storing web crawl data. Justify your choice of row key and column families.

No answer yet
24
Storage/DynamoConsistent HashingMediumQ24

Explain consistent hashing. Why does adding/removing a node only affect O(K/N) keys (where K = number of keys, N = number of nodes)?

No answer yet
25
Storage/DynamoVirtual NodesMediumQ25

What is the purpose of virtual nodes in Dynamo? How do they improve load distribution in a heterogeneous cluster?

No answer yet
26
Storage/DynamoQuorum / Sloppy QuorumHardQ26

Dynamo uses W+R > N for quorum. If N=3, W=2, R=2, is strong consistency guaranteed? What failure scenario breaks this?

No answer yet
27
Storage/DynamoHinted HandoffMediumQ27

Describe hinted handoff. When does it fail to provide durability?

No answer yet
28
Storage/DynamoVector ClocksHardQ28

Explain how vector clocks work in Dynamo. When does Dynamo return multiple values to the client, and what is the client expected to do?

No answer yet
29
Storage/DynamoHashingMediumQ29

Why does Dynamo use MD5 rather than a cryptographic hash like SHA-256 for key placement?

No answer yet
30
Storage/DynamoBigtable vs DynamoHardQ30

Compare Bigtable and Dynamo along these dimensions: data model, consistency model, CAP positioning, partitioning strategy, conflict resolution.

No answer yet
31
Processing/SparkRDD ModelMediumQ31

What is an RDD? What are its key properties (immutability, partitioning, lineage)?

No answer yet
32
Processing/SparkTransformations vs ActionsEasyQ32

What is the difference between a transformation and an action in Spark? Give two examples of each.

No answer yet
33
Processing/SparkFault ToleranceMediumQ33

How does Spark use lineage for fault tolerance? Compare this to replication.

No answer yet
34
Processing/SparkShuffle OptimizationHardQ34

Why is reduceByKey more efficient than groupByKey followed by a reduce? Explain in terms of network traffic.

No answer yet
35
Processing/SparkSpark Streaming / D-StreamsMediumQ35

What is a D-Stream in Spark Streaming? How does it differ from a traditional stream processing system?

No answer yet
36
Processing/SparkD-Stream LatencyMediumQ36

What is the minimum latency achievable with D-Streams? What determines it?

No answer yet
37
Processing/SparkD-Stream Fault RecoveryHardQ37

Explain the difference between transformation lineage recovery and checkpoint recovery in D-Streams. When is each used?

No answer yet
38
Processing/FlinkBatch vs. StreamMediumQ38

How does Flink treat batch processing differently from Spark? What is the philosophical difference?

No answer yet
39
Processing/FlinkWatermarksHardQ39

What is a watermark in Flink? Why is it necessary for correct event-time windowing?

No answer yet
40
Processing/FlinkWindow TypesMediumQ40

Compare tumbling windows and sliding windows in Flink. Give a use case for each.

No answer yet
41
Processing/FlinkCheckpointing / Exactly-OnceHardQ41

Explain Flink's barrier-based checkpointing mechanism. How does it achieve exactly-once semantics without stopping the stream?

No answer yet
42
Processing/FlinkWatermarks — Lateness Trade-offMediumQ42

What is the trade-off between allowed lateness (Δ in watermarks) and result correctness?

No answer yet
43
AI SystemsRay — Tasks vs ActorsMediumQ43

What is the difference between a Ray task and a Ray actor? Give an example use case for each.

No answer yet
44
AI SystemsRay — GCSHardQ44

Explain Ray's Global Control Store (GCS). What information does it store? Why is decoupling control state from the scheduler important?

No answer yet
45
AI SystemsRay — Fault ToleranceMediumQ45

How does Ray handle fault tolerance for stateless tasks vs stateful actors differently?

No answer yet
46
AI SystemsParameter ServerMediumQ46

What is the parameter server pattern for distributed training? What are its limitations at scale?

No answer yet
47
AI SystemsAllReduceEasyQ47

What is AllReduce? How does ring-AllReduce solve the parameter server bottleneck?

No answer yet
48
AI SystemsSkyPilotMediumQ48

What problem does SkyPilot solve? How does it relate to Ray?

No answer yet
49
AI SystemsSkyPilot OptimizerHardQ49

Describe the SkyPilot optimizer. What factors does it consider when placing a task across clouds?

No answer yet
50
AI SystemsData GravityMediumQ50

What is data gravity? Why does it complicate multi-cloud optimization?

No answer yet