RD
Rahul Deshpande·SoC Architect, 12 years in chip design·6 May 2026·16 min read

RTL Design Interview Questions: 30 Real Questions from India 2026

Share

TL;DR. RTL design interviews in India in 2026 still come down to the same five buckets: combinational and sequential basics, FSM and microarchitecture design, SystemVerilog details, synthesis and timing intuition, and debug. The 30 questions below are the ones I see most often when interviewing for RTL roles at Intel, NVIDIA, Qualcomm, AMD, Cadence, and Marvell. Most candidates stumble on the same three: clock domain crossing without proper metastability handling, parameterised modules without a way to verify them, and FSMs that look right on paper but break under reset gating.

I have run over 200 RTL panels in the last six years across Bangalore, Hyderabad, and Noida hiring centres. The bar has actually risen since 2020 because the talent pool has grown — the standard rose with it. A 3-5 year RTL engineer in 2026 is expected to write clean SystemVerilog, reason about synthesis trade-offs, debug a failing test without hand-holding, and explain decisions in their own words.

How RTL design interviews are structured in 2026

The pattern is almost identical across product companies. A 3-5 year engineer should expect 4-5 rounds spread over 1-2 weeks.

  1. Recruiter screen (15 min). Resume walk-through, expected CTC, notice period.
  2. Technical screen (45-60 min). Foundational RTL questions, often on a shared doc or HackerRank. Live coding of one small SystemVerilog module is common.
  3. Onsite RTL deep dive (60-75 min). Microarchitecture exercise. You design a small block on a whiteboard, walk through corner cases, write the RTL.
  4. Onsite synthesis and timing (45-60 min). Read RTL, find the timing problem, fix it. Or explain false paths, multi-cycle paths, and CDC.
  5. Hiring manager and culture (30-45 min). Past project deep dive. Why did you make the trade-offs you made? What broke and how did you fix it?

For freshers, there is usually one extra written round of 15-20 multiple-choice questions on Verilog and digital fundamentals.

Combinational and sequential basics (Q1-Q7)

Q1. What is the difference between blocking and non-blocking assignments?

Blocking assignments (=) execute sequentially within a procedural block. Non-blocking (<=) sample the right-hand side at the start of the time step and update the left-hand side at the end. The rule that survives every interview: use blocking inside always_comb and non-blocking inside always_ff. Mixing them is the single most common cause of simulation/synthesis mismatches.

Q2. What is a latch and why do synthesis tools warn about inferred latches?

A latch is a level-sensitive memory element — it passes data through when the enable is high and holds the value when enable is low. Synthesis tools warn about inferred latches because they almost always indicate a coding bug: an always_comb block that does not assign every output on every path, or a missing default in a case statement. Inferred latches break static timing analysis and create unpredictable hold-time issues. Fix by using always_comb with default assignments at the top of the block.

Q3. Explain setup time, hold time, and what happens when each is violated.

Setup time is the window before the active clock edge during which data must remain stable. Hold time is the window after the edge. A setup violation means data arrived too late and the flop captures the wrong value (functional failure on the next cycle). A hold violation means data changed too early and the flop captures next-cycle data (functional failure that masquerades as a glitch). Setup is fixed by reducing combinational delay or increasing clock period. Hold is fixed by adding buffers on the data path.

Q4. Why are flip-flops preferred over latches in modern ASIC design?

Flip-flops give static timing tools a clean reference point. Latches are level-sensitive, which complicates time borrowing analysis and makes path verification harder. Flops also resist glitches better since they only sample on the edge. Latches are still used deliberately in time-borrowing designs (some Intel CPUs, some ARM CPUs) but as a default choice for digital ASICs, edge-triggered flops win on tooling support alone.

Q5. What is metastability and how is it mitigated?

Metastability happens when a flip-flop samples a signal that is changing during its setup or hold window. The output enters an undefined state for an indeterminate time, eventually resolving to either 0 or 1. The standard mitigation is a synchroniser — a chain of two (or three, for high-MTBF requirements) back-to-back flops on the same clock. The first flop may go metastable, but it has one full clock period to resolve before the second flop samples it. The MTBF improves exponentially with each added flop.

Q6. Design a 3-bit Gray code counter.

The clean approach: maintain a binary counter and convert to Gray on output. gray = binary ^ (binary >> 1). The benefit of Gray coding is that exactly one bit changes per increment, eliminating decoding glitches when the counter crosses clock domains.

module gray_counter (
  input  logic       clk,
  input  logic       rst_n,
  output logic [2:0] gray_out
);
  logic [2:0] bin_count;
  always_ff @(posedge clk or negedge rst_n) begin
    if (!rst_n) bin_count <= 3'b0;
    else        bin_count <= bin_count + 1;
  end
  assign gray_out = bin_count ^ (bin_count >> 1);
endmodule

Q7. What is the difference between synchronous and asynchronous reset? Which would you pick for a new design?

Synchronous reset is sampled on the active clock edge — it requires the clock to be running. Asynchronous reset takes effect immediately, regardless of clock state. The right answer in 2026 for most product designs: asynchronous reset assertion, synchronous deassertion. The async assertion guarantees the design comes out of any state cleanly even if the clock is gated. The sync deassertion ensures the reset removal does not violate setup time on the flops it releases. This pattern is implemented with a reset synchroniser.

FSM and microarchitecture (Q8-Q14)

Q8. Walk me through a 3-process FSM versus a 2-process FSM. Which do you use?

The 3-process FSM separates state register, next-state logic, and output logic into three always blocks. The 2-process style merges next-state and output into one combinational block. In production code, the 3-process pattern wins on readability and synthesis predictability — most synthesis tools optimise it identically to the 2-process version. The interview answer hiring managers want: "I default to 3-process for clarity, but I use 2-process when the FSM has fewer than four states and the output is trivially derived from state."

Q9. Design an FSM that detects the sequence 1011 in a serial bit stream, with overlap.

Five states: S0 (idle), S1 (saw 1), S2 (saw 10), S3 (saw 101), S4 (saw 1011, assert output). On detecting 1011, transition back to S2 (not S0) so the next 1 can extend a pattern that already saw 10. This handles overlapping detection. Without overlap, the input stream 10110 contains one match. With overlap and looping the right way, the stream 1011011 contains two matches.

Q10. Pipeline a multiply-accumulate unit. What are the trade-offs at each pipeline depth?

A 1-stage MAC has the worst critical path but no latency. A 2-stage split (multiply, then accumulate) halves the critical path but adds 1 cycle of latency. A 3-stage split (partial product generation, addition tree, accumulate) hits maximum throughput but adds 2 cycles. The deeper you pipeline, the more flops, more area, more dynamic power, more latency, and more complex control. The right depth depends on the target frequency and the consumer's latency tolerance — DSP loops can handle 3-stage pipes, control loops cannot.

Q11. What is a structural hazard and how do you resolve it?

A structural hazard happens when two pipeline stages need the same hardware resource in the same cycle — for example, both decode and writeback wanting the register file in a single-port design. Resolutions: duplicate the resource (two-port register file), stall one stage until the resource is free, or schedule operations to avoid the contention.

Q12. Design a round-robin arbiter for 4 requesters.

Maintain a "last granted" pointer. On the next cycle, scan the request vector starting one position after the last grant, return the first asserted request. Implement as a priority encoder with a rotated input. The cleanest SystemVerilog version uses a barrel rotator on the request vector and a leading-one detector.

Q13. Explain the difference between a synchronous FIFO and an asynchronous FIFO.

A synchronous FIFO has read and write pointers in the same clock domain — pointer comparison is straightforward, full and empty flags are direct comparisons. An asynchronous FIFO has read and write in different domains; pointers must cross with Gray coding plus synchronisers, and full/empty calculations use the synchronised versions. Async FIFOs are the standard solution for crossing data between unrelated clock domains.

Q14. How do you handle clock domain crossing for a multi-bit data bus?

Three options. (1) Asynchronous FIFO: the safest and most common for streaming data. (2) Handshake protocol: a request signal synchronised across, then the data captured by the receiver after seeing the synchronised request. Works for low-throughput control. (3) Pulse synchroniser: only valid for single-cycle pulses, not multi-bit data. Never just slap a 2-flop synchroniser on each bit of a bus — bits will arrive on different cycles and you will get garbage.

SystemVerilog details (Q15-Q21)

Q15. What is the difference between logic, wire, and reg in SystemVerilog?

logic is the SystemVerilog 4-state type that replaces both wire and reg for synthesisable code. The compiler infers whether it is wire-like (continuous assignment, single driver) or reg-like (procedural assignment) from context. In practice: use logic everywhere in new RTL. wire still appears for explicit multi-driver nets and tri-state buses. reg is legacy.

Q16. What is a SystemVerilog interface and when would you use one?

An interface bundles related signals and tasks into a single named connection point. Use it for buses with many signals (AXI, AHB, custom protocols) — the interface declaration lives in one place, all modules using it stay consistent. Modports define direction views: a master modport sees AWVALID as output, a slave modport sees it as input. Without interfaces, you maintain port lists across dozens of modules. With interfaces, you maintain one declaration.

Q17. Walk me through parameterising a module. When does a parameter become a generic vs an instance-specific value?

Parameters declared with parameter can be overridden at instantiation. Parameters with localparam are constants computed inside the module. Best practice: expose data widths, depths, and protocol options as parameter; derive bookkeeping values like address widths or pointer sizes as localparam from the parameters. This prevents inconsistent overrides at instantiation.

Q18. What is the difference between always_comb, always_latch, and always_ff?

The SystemVerilog procedural blocks are intent-aware. always_comb tells the compiler to enforce combinational rules and warn on any inferred memory. always_latch declares a latch is intentional. always_ff declares edge-triggered logic and verifies the sensitivity list matches a flop. Using these instead of generic always @(*) catches bugs at compile time that previously slipped through to simulation.

Q19. Explain SystemVerilog packed and unpacked arrays.

Packed dimensions are written before the variable name and are contiguous in memory: logic [7:0][3:0] data creates an 8x4-bit array stored as 32 contiguous bits. Unpacked dimensions go after: logic [7:0] data [0:15] creates 16 separate 8-bit values. Packed arrays support bitwise operations on the whole array. Unpacked arrays are the right choice for memories.

Q20. What is a generate block and why is it useful?

Generate blocks let you instantiate hardware conditionally or in loops at elaboration time. The two patterns: generate for for replication (an array of identical blocks parameterised by index) and generate if for compile-time configuration (different RTL for different parameter values). Critical for writing reusable IP — one parameterised module replaces many copies.

Q21. How does SystemVerilog handle race conditions in scheduling?

The simulator divides each time step into regions: Active, Inactive, NBA (non-blocking assignment), Observed, and Postponed. Blocking assignments execute in Active. Non-blocking RHS samples in Active, LHS updates in NBA. Read-only accesses in $monitor happen in Postponed. This regioning is why mixing blocking and non-blocking on the same signal causes the simulation/synthesis mismatch interviewers ask about.

Synthesis and timing (Q22-Q26)

Q22. What is a false path and when would you declare one?

A false path is a logical path through the netlist that can never functionally activate, but the timing tool treats it as real and reports violations on it. The classic example: a static configuration register output that feeds a deeply combinational block — the configuration changes only at boot, never at speed, so the timing path through it is irrelevant. Declare false paths in SDC with set_false_path -from cfg_reg/Q -to .... Be sparing — every false path is a promise to the tool that you will never violate.

Q23. What is the difference between a multi-cycle path and a false path?

A multi-cycle path is real but takes more than one clock cycle to propagate by design — for example, a multi-cycle multiplier whose result is sampled three cycles later. Declare it with set_multicycle_path 3 -setup. A false path is never sampled, so timing does not matter. Multi-cycle paths still need accurate timing analysis, just over a longer window.

Q24. Read this RTL. Where is the timing bottleneck?

The interviewer shows code with a deeply nested combinational block — typically a wide arithmetic operation feeding a wide mux feeding another arithmetic. The right answer is not "add a pipeline stage" by reflex. It is: identify the longest combinational chain, propose either retiming, pipelining, or restructuring (e.g., balancing an adder tree, replacing a priority encoder with a parallel one). Show that you understand the trade-offs of each.

Q25. What is clock skew and how does it affect setup and hold?

Clock skew is the difference in clock arrival times at two flops on the same logical clock. Positive skew (capture flop sees clock later than launch flop) helps setup and hurts hold. Negative skew helps hold and hurts setup. Clock tree synthesis aims for low skew at the leaves but useful skew is sometimes deliberately added during physical design to fix timing.

Q26. What is glitch power and how does RTL affect it?

A glitch is a short transient on a combinational signal caused by unequal arrival times of inputs to a gate. Glitches consume dynamic power without doing useful work. RTL can reduce glitches by balancing logic depth, avoiding unnecessary fan-out on toggling signals, and isolating combinational outputs that feed many destinations behind a flop. Modern tools estimate glitch power during synthesis but the cleanest fix lives in RTL structure.

Behavioural and debug (Q27-Q30)

Q27. You ran a test and it fails on cycle 1,247. Walk me through your debug.

The expected pattern: dump waves around the failure cycle, identify which output is wrong, trace backward through the design until you find the first cycle where state diverged from expected, identify the input or transition that caused it. Not "rerun with more verbose logging." Interviewers are looking for a structured approach that does not depend on randomness.

Q28. A test passes in simulation but fails on the FPGA. What are the likely causes?

Top three: (1) inferred latches that simulate fine but synthesise to a tool-resolved structure, (2) unintended sensitivity-list bugs that change behaviour after synthesis, (3) reset value differences — RTL initialises a register, FPGA flop comes up at X. Beyond RTL: clock skew, IO timing, and FPGA-specific primitive inferences that differ from the simulated model.

Q29. Talk me through a real bug you fixed in the last six months.

This is not optional. Pick something specific. Describe the symptom, your hypothesis, what you ruled out, what you ruled in, the fix, and what you would do differently. Vague answers like "I fixed a timing issue" tank interviews. "We were missing a synchroniser on the doorbell-clear signal between the host and device clock domains, causing intermittent missed interrupts under load. I added a 2-flop synchroniser and a Gray-coded handshake. The intermittent test now passes 1000 runs clean" gets you hired.

Q30. What does "good RTL" mean to you?

The answer hiring managers respond to: RTL that synthesises predictably, simulates the same as it synthesises, is readable by someone who did not write it, and makes intent obvious. Concrete: explicit reset values, named generate blocks, parameters with sensible defaults, avoid magic numbers, and one module per file. Loud opinions are fine here as long as they are defensible.

The 6-week prep plan

If you have an interview lined up in 4-6 weeks, work the order below. Rushing the foundations is the most common reason candidates fail later rounds.

WeekFocusOutput
1Combinational and sequential basics, blocking vs non-blocking, latchesSolve 20 problems on HDLBits
2FSM design, sequence detectors, arbiters, FIFOsBuild 5 small modules from scratch
3SystemVerilog deep dive, interfaces, parameters, generateRefactor 2 of last week's modules to use interfaces
4Synthesis, STA basics, false paths, multi-cycle paths, CDCRun a synthesis pass, read the timing report
5Behavioural questions, project deep-dive prep, mock interviewOne mock interview with a peer or senior
6Review, weak-spot patching, sleepSleep, especially the night before

What companies look for at different experience levels

The questions are similar across freshers and senior engineers, but the depth expected varies. A fresher who can write a clean FIFO, explain blocking vs non-blocking, and reason about CDC at a basic level will pass the technical bar at most product companies. A 3-5 year engineer is expected to have shipped real RTL, debugged a real failing test, and have informed opinions on coding style. A 6-10 year engineer is expected to design entire blocks alone, lead microarchitecture discussions, and mentor juniors.

Where to go from here

RTL is one of three or four entry points into chip design. If you are still deciding between RTL, verification, and physical design, read our comparison of the three career paths. If you are coming from IT services and weighing the switch, the 3-5 year career switch guide covers the realistic path. For salary expectations across specialisations and experience levels, see the India 2026 salary guide.

When you are ready to apply, browse live RTL design roles aggregated directly from chip-company career pages — no recruiters, no keyword games.

Frequently asked questions

How long should I prepare for an RTL design interview?

Four to six weeks for a 3-5 year engineer with active RTL experience. Eight to twelve weeks if you are returning to RTL after a long gap or coming from a non-design role. Less than three weeks usually shows in the depth of answers you can give to follow-up questions.

What is the most common reason RTL candidates get rejected?

Vague answers to behavioural questions about real bugs they fixed. Hiring managers look for a specific symptom, hypothesis, ruling out, fix, and learning. Generic answers like 'I fixed a timing issue' fail the bar at every product company.

Do I need to know UVM for an RTL design role?

Reading-level knowledge is enough for most pure RTL roles. Deep UVM expertise is required for verification roles, not design. That said, an RTL engineer who can debug a UVM testbench is more valuable than one who cannot.

Which language should I use during a live coding round, Verilog or SystemVerilog?

SystemVerilog. Every product company in 2026 writes new RTL in SystemVerilog. Using legacy Verilog signals you have not kept current. Use logic, always_comb, always_ff, interfaces, and parameters.

What tools are tested in an RTL interview?

Simulators (VCS, Xcelium, Questa) at a basic command-line level. Linters (SpyGlass, JasperGold) conceptually. Synthesis (DC, Genus) at the report-reading level. You are not expected to be a tool expert, but you should be able to discuss what each tool does and why.

How important are HDLBits problems for RTL interviews?

Highly useful for foundational practice — finite state machines, basic combinational blocks, simple sequential designs. They are not enough on their own. Real interview questions emphasise debugging, trade-off discussion, and microarchitecture reasoning, which HDLBits does not test directly.

What is the salary range for RTL design engineers in India in 2026?

Freshers earn 5-12 LPA at product companies, mid-level (3-5 years) earn 12-24 LPA, senior (6-10 years) earn 24-45 LPA, and staff/principal roles can reach 40-70 LPA. NVIDIA, Intel, and Qualcomm pay at the top of these ranges. Service companies pay 30-50 percent less at every level.

Is RTL design a dying field with the rise of AI and high-level synthesis?

No. AI tools accelerate certain RTL tasks (linting, structural debug, test generation) but cannot replace the core skill of microarchitecture and trade-off design. India's chip-design hiring is growing 14-18 percent year over year through 2026. RTL skills remain a baseline requirement.

Should I prepare differently for product company vs service company RTL interviews?

Yes. Product companies (Intel, NVIDIA, Qualcomm, AMD) emphasise microarchitecture, debug, and trade-off reasoning. Service companies (LTTS, Wipro VLSI, HCL) emphasise breadth across the flow and tool fluency. Both expect the foundational questions in this guide.

What is the difference between an RTL engineer and a design engineer?

Often used interchangeably. 'Design engineer' is the broader title, covering RTL, microarchitecture, and sometimes block-level integration. 'RTL engineer' is more narrowly the person writing and verifying the actual register-transfer-level code. Most product companies use 'design engineer' as the formal title and 'RTL engineer' colloquially.

Share
Find your next VLSI role — Browse 900+ open positionsUpload your resume for a free skills gap analysis