1. VLSI design flow
    o various steps
  2. Where does functional verification fits in to VLSI design flow?
  3. What is the skill set required for VLSI front end (functional verification) engineer?
    o Text editor (P1) => Today
    o Digital design (P1)
    o Verilog (P1) o SV (P2)
    o Linux (P2) o Python (P3)
    o UVM (P3)
    o standard protocols (P1, P2, P3)
    o debug (P1, P2, P3)
  4. How the course will be organized.


  1. marketing team decides whether to go with FPGA or ASIC flow
    o 1Lakh or lesser => FPGA
    o 1million + => ASIC
    o 1 lakh to 1 million => FPGA/ASIC


  1. Text editor (P1) => Today
    o nothing to do with VLSI as such
    o text editor expertize can reduce overall coding effort
    o text editor comes with keyboard shortcuts
    o copied 11 lines => went to the line from where we want to copy, 11yy(yank=copy)
    o I took my cursor to the point where I want to paste (p)
  2. what may take 1 hour time to code, you can finish it in 10 minutes time.
  3. Multiple text editor
    ====== Below 4 are not good for programming development =========
    //we don’t get shortcuts
    o notepad ==> 2 sec
    o wordpad
    o idle
    o microsoft word ==> 15 sec to load
    o word is good for graphics (colors, font size, 3d effect, images, tables) ====== Below 4 are good for programming development =========
    //they come with shortcuts
    o notepad++
    o gvim ===> Using this throughtout the course
    o nedit
    o emacs
  4. gvim
    o G : Graphical
    o Vim
  5. GVIM works in two modes
    o command mode (thickbox)
    o we can do KBD shortcuts only in this mode
    o insert mode (|)
    o we can type the program or text in this mode
    o when cursor is in insert mode(|), don’t enter KBD shortcuts
  6. GVIM
    o press escape => command mode
    o to enter insert mode: type i or a or ins (various other options)
  7. How does KBD shortcuts work?
    o short notation of what we want to do.
    o delete word => dw
    o paste => p
    o change word => cw
  8. KBD shortcut classification
    o shortcuts for moving from one part of the file to other
    o jumping to specific line number
    o shortcuts for copying a code and pasting
    o shortcuts for deleting a piece of code
    o shortcuts for code replacing(substitution)
    o shortcuts for opening multiple files in same window
    o shortcuts for making code look better(indentation, numbering)
    o shortcuts for doing repetetive activities in simple manner
    o shortcut for undoing the things(what we did earlier)
  9. insert(type) mode, command mode, how to move from one to other modes
  10. how to move cursor to the left(h), right(l), top(k), down(j), multiple lines(10j)
    l : move cursor to the right one position
    4l : move 4 spaces instead of typing: kkkkk => 5k
    any keyboard shortcut, with a number before => repeat the shortcut those many times
    • moving by one space => h,l,k,j
    • moving by words => 3w(right side), b(left side)
    • moving to end of the line($), beginning of the line(0) =>
    • moving to end of the file($),
    • how to move to 1st line(gg), how to move to last line of the file(G)
    • Select whole file text content: ggVG
      end of the line(end), beginning of the line(0), next word(w), end of the word(e), previous word(b)
      Moving to specific line number (:20, enter)
  11. shortcuts for copying a code and pasting
    copy a character, multiple characters
    copy a word(yw), multiple words(nyw)
    copy a line(yy), multiple lines(nyy)
    copy the entire file content (ggVG, copy)
  12. deleting
    delete character(x), multiple characters(nx)
    delete word(dw), multiple words(ndw)
    delete line(dd), multiple lines(ndd)
    delete the entire file content (ggVG, delete)

2:35PM (IST)





  1. how to work with GVIM.
    o you become good by doing these things yourself


  1. Verilog language
    o combinational logic
    o testbench development
    o simulation, check the waveforms
  2. DFT
    o very basic level of Verilog
    Functional verification
    o need to leanr verilog to the perfection


  1. why do we need Verilog language?
    o when we have C, C++, Java, ….
    o C, C++, Java
    o not meant for implementing a hardware behavior
    o hardware has requirements which are different from software
    o hardware needs concept of time
    o hardware needs concept of structure
    o hardware needs concept of state and states changing with time
    o hardware needs concept of concurrent execution
  2. Verilog
    • visualize how the hardware structure looks like
    • multiplexor (2×1)
      o list down signal names: i1(input), i0(input), sel(input), y(output)
      o we should know the functionality of multiplexor
      o if sel is 0, y will be i0
      o if sel is 1, y will be i1
  3. gate level style
    o write truth table
    o use k-maps, come up with Boolean expression
    o then implement Boolean expression

o write truth table

i1 i0 sel y

0 0 0 0
0 0 1 0
0 1 0 1
0 1 1 0
1 0 0 0
1 0 1 1
1 1 0 1
1 1 1 1

y = i0~sel or i1sel

  1. we develop a testbench,
    • apply inputs to the design
    • get the outputs, compare those outputs with expected values
      o if above works, then we can say, mux2x1 is working fine.
  2. what is simulation? how to run simulation?
    • simulation is process of applying inputs to the design
    • how to run simulation?
      o compile the verilog code
      o elaborate
      o wave(add signals to the waveform)
      o run the simulation
    • tools available for simulation purpose
      o modelsim, questasim => mentor graphics
      o vcs => synopsys
      o ncsim, excelium => cadence
      o reviera => aldec
      o ISE => Xilinx

use cd to go to the directory, where we need to run the simulation
o compile the verilog code
o elaborate
o wave(add signals to the waveform)
add wave
o run the simulation
run -all

  1. laptop
    ask you test : a => is laptop working properly?
    o we need to apply multiple inputs
  2. how to install modelsim, mux code, tb, simulation flow


  1. scalar and vector
  2. scalar
    wire r;
    reg a;
  3. vector
    o declar a 9 bit vector, whose MSB is 7
    reg [7:-1] a;
    reg [7:15] a; o declar a 5 bit vector, whose LSB is -2
    reg [-6:-2] b;
    reg [2:-2] b;

4. Why we need vectors?

  1. reg [31:0] addr;
    size? 32
    what value to assign? 200
    in what radix format to assign?
  1. reg [14:0] data;
    value = 350
    assign value in all 4 formats(decimal, hexa, binary, octal)
  2. 1 bit FA
  3. metastability
    o setup time and hold time
    o if there is any violation in setup or hold time => FF enters in to unknown state
    o unknown state => output can be 1 or 0 => metastability state o setup time
    o minumum time before the active edge of the clock for which d input must be stable
    analogy: flight boarding time
    o 45 minutes before the boarding time
    o you may catch the flight ==> logic 1
    o you may miss the flight ==> logic 0
    o hold time
    o analogy:
    o once get down flight, we have to wait for 10 minutes to collect luggage
  4. vector to vector assignment
    a = b; //example of assignment
    integer a, b;
    b = 20;
    a = b; //a? 20

vector also work in the same manner.

  1. busA = busB;
    reg [3:0] busA;
    reg [5:0] busB;
    busA = busB;
    busA[0] = busB[0];
    busA[1] = busB[1];
    busA[2] = busB[2];
    busA[3] = busB[3];
    what happens to busB[4] & [5]? not connected.
  2. vec_a = vec_b;
    vec_a[] = vec_b[]?
  3. in case vec_a is a compliment, then will the top 2 bits be taken as 1;s?
    o no
    o vector assignment is just position to position copy

reg [-2:2] vec_a;
5 bit vector
reg [-8:-6] vec_b;
3 bit vector
vec_a = vec_b;
vec_a[2] = vec_b[-6];
vec_a[1] = vec_b[-7];
vec_a[0] = vec_b[-8];
vec_a[-1] = 0
vec_a[-2] = 0
reg [-2:2] vec_a; 5 bit vector
reg [-8:-4] vec_b; 5 bit vector
vec_a = vec_b;

  1. vec_b = 125 = 7’b1111101; //64+32+16+8+4+1 = 15
    vec_a = vec_b;
    vec_a = 125 = 7’b1111101;
    vec_a[2] = 0 vec_a[0:3] = 4’b1101
    //vec_a[3:0] access will be wrong
    vec_a [-3:-1] = 3’b111
    vec_a [-2:3] = 6’b111101;
  2. reg [6:0] vec_a;
    vec_a[3:0] ? correct
    vec_a[0:3] ? wrong

69 to binary
reg [10:3] vec_a;
vec_a = 8’b0100_0101
vec_a[7] = 0
vec_a[10:8] = 3’b010
vec_a[5:3] = 3’b101
//vec_a[3:5] is wrong

  1. vec_b = -69
    reg [11:3] vec_a;
    69 => 9’b00100_0101
    -69 => 9’b11011_1010 + 1 (2’s complement)
    = 9’b11011_1011
    vec_a = 9’b11011_1011
    vec_a[7] = 1
    vec_a[10:8] = 3’b101
    vec_a[5:3] = 3’b011

reg [3:0] a, b, c;
a=9, b=7, c?

20Q. can we multiply two binary numbers?
o yes

21Q. can we divide two binary numbers?
o yes

  1. how can we measure time using clock?
    clock time period = 10ns
    50ns/timeperiod(10ns) = 5 (number of edges)
    o every electronic device uses same concept of measuring time.
  2. 100Mhz
    To measure 10sec?
    o Not ns
    o TP=100Mhz = 1/100Mhz = 1/100*106Hz = 10-8sec = 10ns

Clock freq = 1 Hz => TP = 1/Freq = 1/1 = 1sec
Clock freq = 10 Hz => TP = 1/10 = 0.1 sec
0.1 sec convert to ns => 10**8 ns
1 meter => 1000 mm

Freq = 100Mhz = 108 Hz TP = 1/Freq = 1/(108) sec = 10-8 sec sec to ns convertion => multiply with 109
TP in ns = 10-8 * 109 ns = 10ns

Traffic light controller working at 10KHz
red time = 10 sec
o how many edges to count to be in red time?
o bring it to Hz => divide => sec => convert to req format
yellow time = 20 sec
green time = 50 sec

red :
– 10Khz => TP = 10-4 sec 10Khz = 10*(103) Hz = 104 Hz TP = 1/(104) sec = 10-4 sec – to measure 10 sec 10 sec/TP = 10 sec/10-4 sec = 105 edges yellow edges to count? o 2 * 105
green edges to count?
o 5 * 10**5

Traffic light controller working at 1KHz
red time = 10 sec
o how many edges to count to be in red time?
o bring it to Hz => divide => sec => convert to req format
yellow time = 20 sec
green time = 50 sec

red :
– 1Khz =>
1Khz = (103) Hz = 103 Hz
TP = 1/(103) sec = 10-2 sec
– to measure 10 sec
10 sec/TP = 10 sec/10-2 sec = 104 edges
yellow edges to count?
o 2 * 104 green edges to count? o 5 * 104

  1. on what basis do we decide the clk freq?
    o if we want better performance, we go for high frequency clock. high frequency clock results in high power consumption.
    o if we want to reduce power consumption, we go for low frequency clock.
  2. TP=10us
    frequency in terms of Mhz?
    TP=10us = 10(10-6) sec = 10-5 sec Freq = 1/ (10-5) Hz = 105 Hz Hz => Mhz (divide with 106) Freq = 105/(10*6) MHz = 0.1 Mhz
  3. TP=1ms
    frequency in terms of GHz? (G means 109) TP=1ms => TP = 10-3 sec
    Freq = 1/(10-3) Hz = 103 Hz
    to convert to GHz => divide with 109 Freq = 103/(109) GHz = 10-6 GHz
  4. TP=1sec
    frequency in terms of Khz


  1. all the combinational logic verilog codes
    o half adder
    o full adder
    o multi bit full adder
    o 2×1 Mux
    o 4×1 Mux
    o 8×1 Mux
    o Decoder
    o Mux implementation using different abstraction levels
    o encoder


  1. how hardware differs from software
    o hardware has structure, software doesn’t have structure
    o hardware has concept of concurrently running processes
    o typing on KBD, running online meeting, projecting data to projector screen
    o hardware has concept of state, which changes with time
    o hardware has concept of time
  2. how verilog implements
    o concept of time
    – by counting the clock edges
    – Clock time period = 10ns
    o to measure 100ns => 100ns/10ns = 10 => we will count 10 clock edges.
    – analogy:
    o we don’t have a watch, but we want to measure 720 hours
    o count 30 sun rises => 30*24 = 720 hours
    o concept of structure
    module halfadder(input a, input b, input cin, output s, output co);
    o concept of concurrent process
    o it uses one always block to implement one process
    o to implement multiple processes, we use multiple always block, all these running parallely.
    o concept of states
    o reg [3:0] state;
  3. How Verilog differs from C language?
    o C language can;t implement
    – concept of time,
    – concept of structure,
    main (input a, output b) { //not possible in C program
    – concept of concurrent process
    o because of this only, we are using Verilog for Hardware coding, not C language.
  4. EDA
    • electronic design automation
  5. How EDA tools make design flow easier?
    o EDA tools helps by automating most of the VLSI design flow.
    o This automation ensures that human efforts are reduced significantly.
    o User(VLSI engineer) needs to give only the top level instructions, rest of flow tool only does.
    o these tools are useful at every stage starting from RTL design, ingeration, verification, synthesis, etc
    o example:
    module dff(clk, rst, d, q); //Verilog code of DFF
    always @(posedge clk) begin //Behavioral style of coding
    if (rst == 1) q = 0;
    else q = d;
  6. Half adder
    o two parts
    o design coding
    o ports: a, b, s, co
    o there is no cin
    o TB coding
    o implement module half_adder with above 4 ports
  7. what difference it makes if we use novopt suppress command and if not used
  1. how to run one simulation


  1. clock generation
  2. memory
    o verilog code
    o testbench


  1. clock generation
    o why clock is important?
    if laptop is working, it is only because tehre is a clock running inside it.
    if there is no clock, concept of sequential circuits is not possible.
    o used for synchronization between two connected modules
  2. laptop
    o processor subsystem
    o high frequency clock
    o Keyboard controller
    o low frequency clock
    o laptop itself can have almost 10 to 20 clocks inside it.
  3. two approaches to generate a clock
    o verilog forver code
    o crystal and PLL
  4. Developing verilog codes: 2 types
    o design code => must use synthesizable constructs only
  5. CLock using Verilog forever code.
  6. instead of opening waveform, using cursor to get frequency values, can it be done directly from the code itself?
    o TB code
  7. freq = f Mhz
    freq = f106 Hz TP in sec = 1/freq in Hz = 1/(f 106) sec
    TP sec to ns = 109 * (1/f * 106) ns = 10**3/f = 1000/f ns 1sec => how many ms is it? 1000ms (103) 1sec => how many ns is it? 109
  8. how to covnert ns to Mhz?
    time = t ns
    what is freq in MHz = 1000/t MHz
    a = 100/b => b = 100/a
  9. 200Mhz => TP=5ns
    TP/2 = 2.5ns
    if the time precision is 1ns => 2.5ns gets rounded off to 3ns
    TP/2 = 2.5ns = 3ns
    how to fix this problem?
    change teh time precision
    2.5ns => 2.5ns only(not as 3ns)
  10. what is jitter?
  11. 500Mhz, 5% jitter
    freq range=475 to 525MHz 200Mhz, 10% jitter
    freq range = 180 to 220 Mhz





Generate a clock of 20Mhz frequency
Generate clock with 60% duty cycle
Convert 25KHz clock in to Time period in us(micro seconds).
200Mhz clock in TP in milli seconds(ms)
TP=50ms, what is clock frequency in GHz
Write Verilog code for 3×1 mux(4th selection case, output should hold the value)
Write a Verilog code for 3 bit full adder.
Also write testbench.


  1. Project#1:
    Title: Clock generation for user provided frequency, duty cycle and jitter
    Description: This is Verilog based project for generating the clock as per the user provided variables.
  2. Arrays
  3. $monitor is always active
    o we jsut need to call only once
    o $display: happens only once
  4. DEPTH=100
    reg [31:0] memory [99:0];
    reg [32:1] memory [100:1];
    reg [32:1] memory [1:100];
    3200 bits


  1. concept to learn in month temparature example
    • how to generate a random real number between 25 to 35
      $urandom_range(25, 35) => only integer
  2. Mango seed

3Q. so if we run the same code from different computer then it will have different seed?
o no
o if you use Questasim in every laptop, all laptop will give exact same pattern for the same code.
o Questasim uses seed as an input to start the randomization pattern
o if same seed is provided, randomization pattern is same in all runs
o if different seed is provided, randomization pattern is different in all runs

4Q. what is that seed value represents?
o starting point of randomization

  1. string

to store 10 chars, course
reg [8*10-1:0] course;

  1. module tb;

7Q. can we use a for loop for instantiation?
o answer is yes => genvar

WIDTH is required at compile time
$vlaueplusargs => run time concept (by this time, already Design structure is created)



  1. Hierarchical modeling
    1 bitFA -> 1 bit FA -> 4 bit FA => parameterizable FA(genvar)
  2. when we run a simulation, there are 2 stages
    o compilation stage => complete structure gets created in this stage
    o elaboration stage => simulation process gets initiated(we can’t change the structure at this stage)
    vsim tb +WIDTH=12 //not possible
  3. parameter overriding
    parameter WIDTH=5
    parameter DEPTH=7 fa_nbit #(.WIDTH(WIDTH), .DEPTH(DEPTH)) dut(….);
    fa_nbit #(WIDTH, DEPTH) dut(….);
  4. reg [3:0] a;
    b = &a;
    = a[3] & a[2] & a[1] & a[0]

b = ^a;
= a[3] ^ a[2] ^ a[1] ^ a[0] //unary reduction xor operator

^101010111 => 0

a = 6’b001100 (12)
b = a >> 2;
b = 6’b000011 (a is getting divided by 4, when we shift 2 positions)
what is practial signifiance?

7.shift operator is not cyclic

  1. left shift means multiple by 2num_shifts right shift means division by 2num_shifts

a = 6’b001100 (12)
b = a << 2;
b = 6’b110000 (48) = 12*4

100101_000000 (64*37)

100101 (1*37)

100101_100101 (65*37)

Multiplcation = shift operation + binary addition



  1. 10/3 = 3 (integer division)
  2. A = 6; (logically true) A && B 1 && 1 -> 1
    B = -9; (logically true) A || !B => 1 || 0 -> 1
    C = x; (logically unknown) C || B => x || 1 => 1

A && B = true && true = true
A || !B = true || false = true
C || B = unknown || true = x | 1 = 1 (true)

79 = 64 + 8+4+2+1 = 8’b0100_1111
54 = 32 + 16+4+2 = 8’b0011_0110
8’b0000_0110 (A&B)
8’b0111_1001 (A^B)

  1. logical inversion (!)
    a = 4’b1010
    !a = !(true) = false = 0 a = -9;
    !a = 0 a = 4’b10x0;
    !a = 0;

bitwise inversion:
a = 4’b1010
~a = 4’b0101

a = -9; (a is 10 bit)
~a = 

a = 4'b10x0;
~a = 0;
  1. unary reduction operators
    vector = a
    all 0’s ==> or(a) = 0
    all 1’s ==> and(a) = 1
    atleast 1 0’s => and(a) = 0
    atleast 1 1’s => or(a) = 1
    odd number of 1’s => xor(a) = 1
    even number of 1’s => xor(a) = 0
  2. how the operator usage differs between vectors and arrays?
    all operations are possible with vectors.
    very limited operations are possible with arrays.
  3. We have a register which is 16 bits, we want to always write [7:4] as always 4’b1111, irrespective of other bit values, how can we implement this using bitwise operators
    reg [15:0] a;
    a = $random | 16’b0000_0000_1111_0000 => this will ensure that [7:4] positions will always be 1111

same question, I want 7:4 positions to be always 0:
a = $random & 16’b1111_1111_0000_1111 => this will ensure that [7:4] positions will always be 0000

same question, I want to always invert 7:4 positions in origial value a, remaiing should be same:
a = a ^ 16’b0000_0000_1111_0000 => this will ensure that [7:4] positions will always be 0000
original a = 16’b1011_0111_1000_1110
xor pattern= 16’b0000_0000_1111_0000

  1. reg [2:0] catd;
    integer f;
    catd = {a, b, c, f};
    f = 10;
    a = 2’b11;
    b = 2’b10;
    c = 2’b01;
    catd =
    f = 32’b00000 …1010;
    catd is 3 bit only, we will only take lower 3 bits of f => 010

reg [3:0] a, c, d; //4 bits
reg [4:0] b, e, f; //5 bits
{a,b,c,d,e,f} = 32’h1234_5678; //convert this to binary
{a,b,c,d,e,f} = 32’b0001_0010_0011_0100_0101_0110_0111_1000
{a,b,c,d,e,f} = 32’b00010_0100_01101_0001_0101_10011_11000
what are the values of a to f?
f =



  1. relational operators
  2. == : logical, === : case
    != : logical, !== : case

4’b1z0x == 4’b1z0x -> x
4’b1z0x != 4’b1z0x -> x

4’b1z0x === 4’b1z0x -> 1
4’b1z0x !== 4’b1z0x -> 0
4’b1z0z !== 4’b1z0x -> 1 (case inequality)
4’b1z0z === 4’b1z0x -> 0 (case equality)

3Q. 10z0 === 1z00 ; ?// will it be 0?

  1. always @(posedge clk) beign
    q = a & b;
    q must be a reg,
    everything else must be declared net(wire, wand, wor, tri)
  2. inout [3:0] a;
    initial begin
    a = 10; //not possible

inout [3:0] a;
reg [3:0] a_t;

assign a = a_t;
initial begin
a_t = 10; //possible

always @(clk) begin

always @(clk) begin

a = b + c;
d = 10;
e = 15;




module fa(a, b, ci, s, co);

wire co;
wire [2:0] s;
assign {co, s} = a+b+ci;
//co : scalar
//s : vector

Implement 4×1 mux using assign.
o TB for both 4×1 mux
Implement 8×1 mux using assign.
o TB for both 8×1 mux
o reuse above 4×1 TB, update => 2min
o implement $monitor
o analyse waveform for Mux behavior



  1. 4×1 mux using assign
  2. 8×1 mux using assign

assign y = s2 ? (s1 ? (s0 ? i7 : i6) : (s0 ? i5 : i4)) : (s1 ? (s0 ? i3 : i2) : (s0 ? i1 : i0));

  1. bufif0
    buffer if enable is 0
  2. 4×1 mux using gates

always @(signal1 or signal2) begin //there can be 4 combinations
– signal1 : 0 -> 1, 1 -> 0
– signal2 : 0 -> 1, 1 -> 0

  1. always can infer 3 types of logic
    o combinational logic
    o sequential logic
    o latch logic
  2. assign compolsory infers combinational logic
  3. what does below code infer?
    3 bit FA
    always @(posedge clk) begin //this infers sequential logic => Flip flops
    {co, s} = a + b + cin; //this infers Full adder
  1. treatment is ongoing
    o even we feel sensitivity, we don’t come out and go again.
  2. Write a Verilog program for swapping two integers

integer a, b;

  1. non-blocking statements introduce concept of temporary variable.

alwaya blk execute sequentialy?





  1. always @(in or sel) begin
    case (sel)
    out = in[0];
    out = in[1];
    out = in[2];
    out = in[3];
  2. check if 100 is prime
    2 to 99
    2 to 50 => this also not required
    2 to 33 => this also not required
    2 to 25 => this also not required
    2 to 20 => this also not required
    2 to 10 => sufficient (square root of 100)

i = 2;
while (i <=num) begin
prime_f = 1; //assume the number is prime
for (j = 2; j <= i**0.5; j=j+1) begin
if (i%j == 0) prime_f = 0;
if (prime_f == 1) begin
$display(“prime number=%0d”,i);



  1. case, casez, casex
  2. casez
    o z is there either in case expression, or case branch, those positions will be ignored during comparison(those positions are don’t care)
    o instead of z, we can also use ?
    ? : don’t care
  3. interrupt handling requries priority handling
    I am teaching
    3 of you have questions(std2, std1, std0)
    all 3 ask at same time.
    all of you are interrupting my session.
    std2 > std1 > std0 ==> priority?? ==> can I use casez? yes
  4. what level of don’t care I want?
    z only as don’t care
    both x and z to be don’t care. what if I want only x to be don’t care? but not z.
  5. pipelining
    o how circuit logic can be divided in to sequneital and combinaitonal logic.
    o how it helps improve overall effieciency of the system
  6. where pipelining concept is used?
    all the processor 7575 + 4784 => 10 secs to do 10 such additions => 100 secs to processor one input it requires = 150ns to processor five inputs it requires = 750ns can we do something to improvise => that concept is called as pipelining
  7. Verilog coding for any complex design is about the ability to divide the whole logic in to smaller combinaitonal logics which are divided by sequential logic.




  1. pipelining
    o dividing combinational logic with multiple stages of FF’s in between
    o it helps process more inputs concurrently
  2. casex, casez


  1. shift registers
  2. timescale
  3. intra and inter delay statements
  4. sytem task, functions
  5. compiler directives


  1. shift registers
    o back to back connected FF’s
    o used for shifting the data
  2. shift registers can also be used as synchroniers during clock domain crossing.
    o when signal moves form one clock domain to anotehr
    o it can result in metastability due to setup time violations.
    o to address this metastability issues, synchroniers are used.

– catching a running bus
o hold the bus, run along with it, catch up the speed of bus, then get in to bus.
o 2 seconds
o 10 seconds

- 1st of synchronizer will have 'x' value
    - x means some voltage in between, not exactly VDD or Ground
    - as we take this x(some voltage in between) through multiple stages, it gets stabilized. Then we have proper output.
  1. time scale
    `timescale 1ns/10ps
    1ns : timestep
    10ps : precision

S1, S2, S3, S4
Morning walk: S1
o Morning walk is a simulation event which is scheduled in to 1st part of the timestep
o time step: 1 full day
o time step: 4 stages

evening tea: S3
o evening tea is a simulation event which is scheduled in to 3rd part of the timestep

9/March: timestep
o S1 : what all activites => complete those activites
o S2 : what all activites
o S3
o S4 => 10/March(next timestep)

  1. when we setup TB’s. we have to manually give timescale values
    `timescale 1ns or 10ns or 100ns
    how muc time precision to use? time step=1ns ==> more accuracy, simulaiton speed will be impacted => 1 hour to complete simulation
    time step=10ns ==> reduced accuracy, simulaiton speed will be improved => 45 minutes
    time step=1ps ==> lot more accuracy, simulaiton speed will be greatly improved => 5 hours
  2. generate a clock of 30Mhz frequency
  3. intra and inter delay statements
  4. System task and functions
    o task and functions provided by system(~language)
    $display: from language
    $monitor: from language
    $readmemh: from language
    $readmemb: from language
    $writememb: from language
    $writememh: from language
    • 30+ system tasks are there =>


  1. user don’t need to implement them, user just needs to know ‘how to use them’
    $readmemh(“image.hex”, dut.mem, start_location, end_location);
    $writememh(“image.hex”, dut.mem, start_location, end_location);
  2. %m
    • prints hierarchy and value
  3. write a logic to generate a rnaodm number between 200 to 300, using seed
    250 + $random(seed)%51
    -50 to 50
  4. module top(input real r);
    endmodule module top(input [63:0] r_vec);
    real r;
    r = $bitstoreal(r_vec); //convetting to r
  5. logarthims
    o log2(128) => 7
    2**n = 128 => what is n? log output
  6. system task categories
    o display
    o simulation time related
    o file handling
    o conversion
    o memory backdoor access
    o randomization
    o log calculation
    o reading user arguments
    o simulation stop and finish
  7. $onehot0(7’b1100_000) => 0
    $onehot0(7’b1000_000) => 1
    $onehot0(7’b0000_000) => 1
  8. $onehot(7’b1100_000) => 0
    $onehot(7’b1000_000) => 1
    $onehot(7’b0000_000) => 0
  9. $countones(7’b1100_000) => 2
    $countones(7’b1000_000) => 1
    $countones(7’b0000_000) => 0
  10. I want to declare 32 bit vector
    define BUS_32 reg [31:0] BUS_32 vec1, vec2;
    reg [31:0] vec1, vec2;

I want BUS of variable sizes?
o paramerized macro
define BUS#(WIDTH) reg [WIDTH-1:0] BUS#(64) vec1;





  1. define a macro WIDTH value of 100 in run.do
    what command to use? top.sv
    compile top.sv wtih WIDTH macro =100
    vlog top +define+WIDTH=100
  2. XMR
    office room: hp_laptop, dell_laptop
    3rd floor, room_no_2: epson_projector


module mux2x1(a, b, sel, y);
input a, b; //size of inputs = 1
output y;
input sel

assign y = sel ? a : b;

//8 bit input multiplexor => I don’t need to create 8 multiplexor verilog coding, change sclaar to vector
module mux2x1(a, b, sel, y);
input [7:0] a, b; //size of inputs = 1
output [7:0] y;
input sel

assign y = sel ? a : b;

  1. reg [15:0] a = 16’h1234;
    b = ? 16’h3412
  2. nibble swapping
    • 4 bits
      b = {};
  3. inputs are always ‘wire’
  4. synchornous reset DFF
    asynchornous reset DFF
  5. How to decide to use if else or case?
    o there is no difference, both of them give same behavior


  1. memory declaration
    2KB memory, 16 bit width

2KB :
Size in bits = 210248 bits
DEPTH = SIZE/WIDTH = 210248/16 = 1024
reg [15:0] mem[1023:0];

  1. 16KB memory with WIDTH=32
    SIZE = 1610248
    DEPTH = 1610248/32 = 4096
    reg [31:0] mem[4095:0];
  2. homework
    Declare a memory of 1KB size, each element size of 16 bits(~WIDTH)
    convert the size in to bits
    1KB = 10248 bits, SIZE = 8192 bits, WIDTH=16, DEPTH=SIZE/WIDTH = 512 reg [WIDTH-1:0] mem [DEPTH-1:0]; reg [15:0] mem[511:0]; Declare a memory of 256 bits size, whose depth is 64 (~DEPTH) reg [3:0] mem[63:0]; Declare 16KB memory, mem width is 32 SIZE = 1610248 WIDTH=32 DEPTH=1610248/32 = 1024 4 = 4096
    What should be address size? 4096 locations require 12 bits(2**12 = 4096)
    Declare 1 GB memory of width of 32
    What is address port size to access this memory?
    Declare a byte address memory of 1KB size.
    What is address port size to access this memory?
  3. what all ports are required to access the memory contents?
    o clock, reset
    o addr, wdata, rdata
    o wr_rd
    o wr_rd = 1 => we are writing to memory
    o wr_rd = 0 => we are reading from memory
    o handshaking signals
    o valid=1
    o driven by TB to indicate to the memory that, I want to do transaction to you(write or read tx)
    o ready=1
    o driven by memory to indicate to the TB that, I am ready to complete transaction
    o what happens if there are no handshaking signals?
    o memory won’t know when transaction is going to happen. o write transaction?
    o TB performing a data write in to the memory location
    o read transaction?
    o TB performing a data read from a specific location in the memory o why ready signal is important?
    o ready is the way for memory to tell its status.
  4. directions
    wr_rd_i=1 => write tx
    wr_rd_i=0 => read tx
  5. Handshaking concept is involved in many protocols
    o AXI, AHB, APB, OCP, Wishbone => Every protocol involves handshaking.
  6. for handshaking to happen how many components are required?
    o 2 components
    o one component gives valid=1
    o other component responds by making ready=1 => this is when hadhsking completes.
  7. indentation
    o makes you efficient

9Q. inside for loop wrote 3 lines can we do this without begin end



  1. is there a valid tx?
    valid=1, ready=1 => hence valid tx is tehre
    is it write or read?
    o wr_rd_i = 1 => hence write
    if write, at what addr and what data?
    addr = 2e
    data = 4923
  2. by using parameters, we are able to reduce the coding effort.
  3. if somewhere, we make some variable to ‘1’, there must be some other place, it should be made 0′
  4. Homework
    Write memory Verilog code instantiation with parameter overriding.
    Only instantiate memory as dut(that code only)
    Parameters: WIDTH_TB(Design: WIDTH), DEPTH_TB, ADDR_WIDTH_TB
    Implement the memory reset logic Verilog code
    rst part of the design code
    Write design logic for writing to the memory and reading from the memory
    Write TB code for writing to all the locations of memory with random data.
    What happens if we don’t override the parameters from TB in to the design
    IN complex examples, parameter can be present in 10 files.
    What are the two things that TB is meant for?
    What is the difference between $display and $monitor?
    When do we say handshaking is complete with valid and ready signals used
    Why TB top module don’t require any ports?
    Why design inputs are declared as reg in TB code?
    What is updates required to make the Verilog code to behave like 1KB memory with 32 width?
    In waveform
    How to change a signal radix to unsigned?
    How to zoom in to waveform?
    How to move cursor to specific time?
    How to search for a value?
    Why read data is delayed by 1 clock cycle after address is issued?


  1. memory concepts
    Why Testbench is required?
    without TB, we don’t know whether memory RTL is working correctly or not.
    TB enables us to check the design behavior.
    What is testcase? How it differs from testbench?
    Test case is the stimulus we are applying to the design.
    Test bench is the platform that enable test cases to be run.
    How user passes testcase information to simulator? Which testcase to run?
    Test name is passed as elaboration argument
    not compile argument
    vsim work.tb +testname=
    How front door access is implemented in testbench?
    Implemented by driving the design ports
    addr, wdata, valid, wr_rd
    How back door access is implemented in testbench?
    Implemented using system tasks like $readmemh/b, $writememh/b
    What are the different system tasks used?

initial begin
task fd_write(); //wrong

task fd_write();
initial begin

  1. where to use task and where to use function?
    function integer sum(input integer a, input integer b);
    sum = a+b;
  2. static, automatic
    – apartments
    o each flat: living room, balcony, kitchen, etc ==> automatic variables
    o swimming pool, gym: shared => static variables


  1. finite state machine
    o state machine with finite number of state
  2. infinite state machine
    o state machine with infinite number of state
  3. construct a house with infinite number of bricks => is it possible? No
    o infinite state machine => infinite states => infinite number of FF’s => can’t be manufactured
  4. module dff(clk, rst, d, q);
    always @(posedge clk) begin
    if (rst==1) q = 0;
    else q = d;

What is a state machine?
Difference between Mealy and Moore state machine?
What are the different encoding styles?
Which one is preferred?
What is difference between finite state machine and infinite state machine?
What is implicit and explicit state machine?
Which one is preferred?
Every sequential circuit compulsorily has a state machine.
How many states are required to implement a 10110 pattern detector using Mealy and Moore style?
Why state machine is very important in Verilog design implementation?

  1. pattern detector
    • how to draw a state machine or state diagram
    • how to implement a Verilog code by refering to state diagram
    • one hot encoding, binary encoding
    • mealy and moore state machine
    • how to write a TB for verifying state diagram
  2. coding
    70 to 80% of time spent on developing the algorithm
    20 to 30% of time spent on implementing the algorithm
  3. what is pattern detector?
    why FSM is required?
    what S_RESET, S_B, S_BB, etc indicate?

9Q. where is valid_i used in design module?

pattern_to_detect = S_BBCBB
cur_state = S_BBCB
We get a C?
what is best possible match?
cur_state = S_BBCB
when C cars => BBCBC (what we have)
– why we ignore LSB’s in above?
pattern_to_detect = BBCBB
– why we ignore MSB’s in above?

compare all 5?
– do they match? no
compare 4?
– BCBC (what we have: considering last 4 vehicles)
– this doesn’t match
– then check for matching 3
compare 3?
BBC don’t match
– try compare 2 vehicles
compare 2?
pattern = BC
pattern_to_detect = BB
are they matching? No
compare 1?
don’t match
then go to S_RESET state.

pattern_to_detect = S_BBCBC
cur_state = S_BBCB
We get a B?
cur_state = S_BBCBB
o best possible match?

  1. Industry always prefers ‘one hot’ encoding.
    • 99.9% of times
    • binary ecndoing can result in unwated state
    • one-hot encoding is safe
  2. Every state mahcine which uses binary encoding can result in glitch conditions.
    o hence industry never uses binary encoding.

What is binary encoding?
What is one hot encoding?
What is the advantage of one-hot encoding over binary encoding?
What is draw back of one-hot encoding?
It requires more FF’s
64 state machine
Binary encoding: 6
One-hot encoding: 64
Industry will still go with one-hot, because correct behavior is more important than saving the cost.
Explain how binary encoding can result in glitch conditions?
How many FFs are required to implement FSM with 15 states using binary and one hot encoding styles?
Industry uses one hot encoding since we are more concerned about design behaving properly compared to saving some flipflops.


  1. pending


  1. 20 states
    • How many FFs required in Binary encoding? 5
      21 = 2 22 = 4
      23 = 8 24 = 16
      2**5 = 32
  2. 37 states
    • How many FFs required in Binary encoding? 6
    • How many FFs required in one-hot encoding? 37
  3. what is difference between overlapping and non-overlapping pattern dtector?
    pattern_to_detect = 10110 I get 30 bits in series: 10101011011000010111101101101100011100 non-overlapping: 10101011011000010111101101101100011100
    o once last bits are used for pattern detection, they won’t be used for new pattern dtection.
    o new pattern detection has to start afresh, by not considering previous bits. overlapping: 10101011011000010111101101101100011100
    o last set of bits are used for earlier pattern detection, can also be used for new pattern dtection.
  4. why we don’t prefer binary encoding? one hot encoding?
    o either learn now, or never learn
    binary encodign:
    o when moving one state to anotehr, there is a possibolity of intermediate states. These unwanted states can result in glitch conditions.
  5. Current pattern: BBCB
    Pattern to detect: BBCBC
    we get a bike? BBCBB
    o which is the best match? S_BB
  6. why we need valid_i from sensor?
    valid_i is used to validate the data going from sensor to the pattern detector.
    • if valid_i=1 => data_in coming to pattern detector is valid.
    • if valid_i=0 => data_in coming to pattern detector is invalid.
      why is it required?
      o if valid_i is not used
      o data_in default value = 0 => pattern detector will always assume that bike is going, even if there is no vehicle is going.
      valid_i = 1 => only then pattern detector will analyse the value of Data_in.
  7. what is advantage of parameters in pattern detector coding?
    o code becoems readable and reusable for different requirements.
  8. what are the signals(ports) at sensor interface?
    o clk, rst, valid, data
  9. If pattern detector requires one process, how many always will be required?
    o 1 always
  10. when reset is applied to the design, what design should do?
    o all reg variables should be driven to reset values(most of the times it is 0)
  11. from the TB, when reset is applied, what should be done inside TB?
    • all design inputs should be driven 0

at reset: TB drives DUT inputs to 0’s
DUT drives DUT outputs to 0’s
Summary: All signals should get resetted.

  1. what is the stimulus for a pattern detector?
    o Random pattern for d_in is the stimulus.
    d_in is not the stimulus.
    valid_i is not the stimulus
  2. vector assignemnt
    reg [2:0] a;
    reg [5:0] b;
    b = 6’b100110;
    a = b;
    a? 3’b110
  3. d_in => 1 bit
    $random: 32 bit
    lowest 1 bit for assignment
  4. always its overlap only right?
    o yes, current dynamic_pattern code is overlapping
  5. can we make it non-overlapping?
  6. non-overlapping
    o once a pattern is detected, atleast for next 5 clock cycles, don’t do any comparision
    o we are still using bits from previous compared pattern.
    o how to ensure that, we don’t use those bits from previous compared pattern?
    o ignore 5 comparisons
  7. In interview, if someone asks code pattern detector?

19Q. is the design ignoring data input during for first 5clk cycles

  1. FIFO


  1. FIFO is required
    o where fast producer is giving data to slow consumer.
  2. write_pinter
    o where the write data should be stored in to the FIFO buffer space
  3. read_pinter
    o where the read data should be read from in FIFO buffer space
  4. FIFO can be implemented
    o without pointers => Inefficent
    o with pointers => efficent
  5. with every write happening, increment wr_ptr by +1 once write completes
  6. with every read happening, increment rd_ptr by +1 once read completes
  7. FIFO is similar to memory
    o how do they differ?
    o FIFO access in order
    o doesn’t require ‘addr’ port
    o wr_ptr and rd_ptr will take care of address location, hence TB doesn’t need to provide this.
    o memory access can be random
  8. how wr_ptr, rd_ptr, full and empty are related to eachother
    • this is all required for Verilog coding of FIFO
  9. bank just opened
    EMPTY = 1
    FULL = 0
  10. Toggle flag
    o whenever wr_ptr goes from 15->0 => wr_toggle_f should be toggled.
    wr_toggle_f = ~wr_toggle_f
    o whenever rd_ptr goes from 15->0 => rd_toggle_f should be toggled.
    rd_toggle_f = ~rd_toggle_f
  11. if 16 people came in
    wr_ptr = 0, wr_toggle_f = 1
    rd_ptr = 0, rd_toggle_f = 0
  12. 12 writes, 12 reads
    wr_ptr = 12, wr_toggle_f = 0 (roll over didn’t happen)
    rd_ptr = 12, rd_toggle_f = 0 (roll over didn’t happen)
    => Empty
  13. 12 writes happened
    15 reads happened
    o invalid scenario

40 writes happened => 2 toggles
12 reads happened => 0 toggles
– difference between writes and reads at best can be DEPTH of the FIFO
– at best, toggles can differ by ‘1’
toggle_f is a 1 bit variable

12 writes happened
9 reads happened
o possible
o neither empty, nor full

40 writes happened
40 reads happened
o possible
o wr_ptr = 8, wr_toggle_f = 0
o rd_ptr = 8, rd_toggle_f = 0
o empty or full? empty

  1. what if we don’t use pointers(wr_ptr and rd_ptr)?
    o FIFO implementation will become inefficient
    what if we don’t use full and empty signals?
    o external components don’t have indication of FIFO status. they will perform, writes to full fifo and reads to empty fifo. Which is wrong.
    what if we don’t use toggle flags?
    o we won’t be able to generate full and empty signals properly
    Why pointer is 4 bit only?
    o because DEPTH=16

18Q. empty_o should be 1 ..when rest=1…?




always @(posedge clk) begin
if (rst) begin
else begin

  1. relatie velocity
    o bring both of them in to same reference, then compare



concepts learnt

  1. How we can use gray code counting pattern to avoid glitch conditions.
    o why glitch conditions happened?
    – when we move one state to other => multiple bits are changing
  2. what is the need for synchronization?
    o synchronization is not just a concept of Async FIFO
    o whereever clock domain cross(signal moves from one clock domain to another), this concept of syncrhonization will be required
    o how do we implemnrt synchornization
  3. how to implement the concept wr_ptr and rd_ptr, such that we don’t need to use address port.
  4. how to implement TB for concurrent write and read scenario driving
  5. how to implement various testcases.

Qualcomm => you will never work on Async FIFO


  1. gray code
    – 7 writes happened, no reads happened
    rd_ptr(0) Now lets say a write is done
    rd_ptr remains same(0000)
    7(0111)->8(1000) => all 4 FF”s outputs will change, with some gap in between
    0111 -> { 0110 -> 0010 -> 0000 } -> 1000
  2. binary to gray code conversion
    wr_ptr_gray = {wr_ptr[3], wr_ptr[3:1] ^ wr_ptr[2:0]}; //binary to gray code conversion
    o XOR in a shifted manner
  3. bin2gray function for a 3 bit bin variable
    function reg [2:0] bin2gray(input reg [2:0] bin);
    reg [2:0] gray;
    gray[2] = bin[2];
    gray[1] = bin[2]^bin[1];
    gray[0] = bin[1]^bin[0];
    bin2gray = gray;
    gray = {bin[2], bin[2:1]^bin[1:0]};
  4. PCIe, DDR, USB, AXI, AHB, Ethernet => 12 protocols

chip consists of communication using above protocols.


  1. Protocol full form
    APB: Advanced peripheral bus
  2. Is it on-chip protocol or peripheral protocol?
  3. Protocol based System architecture?
  4. What are the components in the architecture?
  5. Features supported by the protocol
  6. How handshaking works
  7. Timing diagrams
    • Different phases in timing diagram
    • How write and read transactions happen
  8. Signals
    Signal decoding


  1. APB
    o Quite similar to memory interface protocol
    pclk_i, prst_i, paddr_i, pwdata_i, prdata_o, pwrite, penable_i, pready_o, psel_i, perror_o
    o wr_rd_i => pwrite_i (1 : write, 0: read)
    o psel_i o during memory example
    o One master(TB) <—> one slave(memory)
    o communication is fixed.
    o One master <—> multiple slave
    o selection is required?
    – which slave we want to do transaction.
    o perror_o => if somehting wrong happens, slave can indicate that error status.
  2. APB is a basic protocol
    o it is strong foundation for learning other advanced protocols
  3. USB controller
    o AXI interface(~high perf interface)
    o USB interface
  4. PCIe controller
    o AXI interface
    o PCIe interface
  5. Homework
    Why any controller requires two interfaces?
    How do we categorize protocol in to high performance or low performance?
    Based on how much data transfer happens.
    How do we categorize the protocols?
    For any protocol, what are two types of components?
  6. pwrite same as wr_rd
    pwrite = 1 => master is doing write tx to slave
    pwrite = 0 => master is doing read tx to slave

control signals
psel, penable, pready, pwrite(indicate whether tx is happening or not)
o whether tx is happening or not happening?
o if it is happening, whether it is write or read?

addr and data signals

  1. psel = 4’b0000 => possible => I don’t want to select any slave.
    psel = 4’b0100 => possible => selecting slave#2
    psel = 4’b1110 => not possible => multiple slaves can’t be selected at same time
  2. We will revise APB tomorrow
    o Interrupt controller coding and TB



SPI protocol

  1. Like APB is simplex among all on-chip protocols
    SPI is simplex among all peripheral protocols.
  2. APB
    timing diagram looks like
    wait states
    APB read tx timing diagram
    o control signals: penable, pready, psel, pwrite
    o addr and data signals: paddr, pwdata, prdata, perror
    wait states = 0 cycles
  3. Homework
    o APB write tx timing diagram with 3 wait states
  4. perror
    o indicates error status, something went wrong in current tx.
    o slave is powered down, we are trying to access., slave gives perror=1
    o slave location doesn’t exit(addr doesnt exit), slave gives perror=1
    o perror=1 is an indicaiton to the master that, something went wrong. Definitely tx didn’t happen properly.
  5. if there are 2 slaves, what is the size psel signal?
    reg [1:0] psel;
  6. what is psel=1 is doing?
    • master is telling to the slave that, I am going to access you, keep yourself ready.
  7. SPI
    o SCLk, MOSI, MISO, CS
    o SCLk : M->S : Scalar
    o CS (~similar to PSEL) : M->S (vector)
    o if there are 5 slaves, what is CS size? 4:0
    o MOSI : M->S : Scalar
    o MISO : S->M : Scalar
    o why only 4 ports?
    o Because SPI interface comes outside the chip, we want it to be as small as possible
    o by keeping 4 ports => we are reducing the size.
  8. SPI write timing diagram
    o ADDR=7’h39
    o SPI supports 2 addressing mode: 7 bit and 9 bit modes
    o Data=8’h92 o draw around 22 pulses of clock
    o since it is serial, all bits go one after other
    o address phase
    o LSB is driven first
    7’h39 = 7’b011_1001
    o if addrss phase is completed, till we start data_phase, clock should not be driven(it should be 1′)
    data = 8’h92
    = 8’b1001_0010
    since we are doing write tx, both addr and data will happen on MOSI(M->S) o if we do read
    addr phase on MOSI (address always given by master)
    data phase on MISO (during reads, data is given by slave)
  9. SPI read timing diagram
    o ADDR=7’h39
    o SPI supports 2 addressing mode: 7 bit and 9 bit modes
    o Data=8’h92 (lets say slave is giving 8’h92)
  10. how do we select which slave should be targeted?
    o address mapping
  11. Once you understand SPI controller, learning otehr contrllers becomes easy.




  1. what is the need for registers inside SPI controller(master)?
    o to store the addr and data, where we want to do the tx
  2. How many processes inside SPI controller?
    o two
    o one for programming the registers
    o other for implementing SPI tx


  1. notes:
  2. register programming => APB interface
    o memory coding
  3. design registers
    reg [7:0] addr_regA;
    //size of addr in SPI tx = 7 bits + 1 bit for wr/rd decision = 8 bits
    //array [7:0] : number of txs we want to do.
    //7:0 : there are 8 different address values in this array => hence we can do 8 txs.
    //once processor will load this array, it doesn’t need to do anything for 8 txs.
    o 8 questions in this lab session
    o 1st way: I will give all 8 questions, give you 2 hours time, come back after that
    o for this 2 hours, I can do some other work(processor can do some other txs in the chip)
    o 2nd way: I will give 1 question, come after 15 mints, then give 2nd question, so on
    o I : processor
    o Student: SPI controller
    reg [7:0] data_regA[7:0];
    size of data? 8 bits
    array[7:0] => how many back to back data’s can be provided by SPI controller, without processor intervention.
    reg [7:0] ctrl_reg;
    o analogy:
    o 10th class exams: out of 15 questions, attempt any 12 questions.
    o when to start tx?
    ctrl_reg[0th bit] = 1 => when it is done, then start the transactions.
    ctrl_reg[3:1] = how many back to back txs to do.
    ctrl_reg[6:4] = what is the last tx index, when new tx request comes from where should we start the tx
    ctrl_reg[7] = interrupt generation, indication to the processor that tx is completed.

for any design, registers are very important.

3Q. how do we know there are exactly 8 transactions? is it due to spi protocol?
o nothing like that
o we can keep any number

  1. we have ocmpleted with register pgoramming implementation
    what is pending?
    o using above addr and data, ctrl_reg information, implement SPI tx behavior
    o will this work on APB protocol or SPI protocol?
    o which clock to use? sclk
  2. SPI master can only be in any of the below 5 states.
    once we list down states => next draw the state diagram.
  3. next session
    o TB for SPI controller USB ctrl, PCIe controller, DDR controller






  1. why sclk is input?
    o two different ways of generating clock
    o using forever => can’t be done in design
    o crystal or VCO => I don’t want to keep this inside SPI controller
    o it will make SPI controller complex
    o other way
    o generate clock from outside, provide it to the edsign(SPI controller)
    o SPI controller will use that to generate required clock.
  2. S_ADDR, S_DATA states: clock is running
    other states: clock is not running
  3. addr_regA, data_regA, ctlr_reg


  1. pending
    o currently, everything is checked from waveforms
    o we should do most of the thngs the TB code only
    o Update TB for collecting SPI bits
  2. We have only impleemnted SPI write txs
    o SPI read txs now
    o Write 5 locations(53,54,55,56,57)
    o read back same 5 locations
    o we should get same data.

3Q. doesnt it depend on the device we are connecting?
o SPI salve devices are very simple devices with a small emmory
o hence 7 bit address is sufficient

  1. can we use memreadh and memwriteh instead of a fixed memory location?
  2. instead of 53, .. => write in to some random locations, read back from same random locations
  3. Please do not watch video and replciate the code
  4. backdoor load TB memory with some image
    just perfrom SPI reads to some random locations => check if data matches with image data

remaining course:

  1. Interrupt controller
  2. PISO or SIPO
  3. CRC generation => importnat but wont’ be complex
    o how to implement binary divisin using Verilog
  4. I2C => video access
    o 5xmore difficult
    o inout port(bidrectional port)

SV training




SO. Only understood concept, nothing is done
S1. implement register wr/rd part of design
S2. impelmented TB for writing and reading registers
S3. implemented FSM
S4. Implemented TB for printing the bits
S5. Getting the proper address and data values in TB display
S6. Able to write and read the same locations of SPi
S7. Did backdoor load to memory and SPI reads, data si matching with loaded data.

100 => 99% it won’t be of use.
why 2 interfaces?
what ports required?
which states required?
why count is required?

  1. Interrupt controller
    o lot of similarities with SPI controller
    o both of them are controllers
    o both of them has two interfaces
    o APB, Interrupt interrupt(gets intterrupts from peripherals)
    o both needs register programming to start the functionality
    o both requires state machines
    o both requires 2 processes
    o one for progrmaming registers
    o other for implementing interrupt handling behavior(~SPI tx implementation)
  2. why we are doing Interrupt controller?
    o Interrupt controller is there in every chip.
    o you get in to a project, 95% chance that, you need to work on interrupt concept.
  3. what is interrupt controller?
    o what is interrupt?
  4. why USB controller generates interrupt?
    o we connect a pendrive to laptop
    o this will result in interrupt, so that, processor knows that a pendrive is connected.
    o who generates interrupt?
    o USB controller gneerates interrupt
    o why is it generating? becuase one pendrive got connected.
  5. when admin came in to the room, he doens;t the priority values.
    o I tell him the priroties => processor programming the interrupt controller priority_register array.
  6. interrupt service routine?
    • ISR
    • how processor addresses the peripheral once it gets the interrupt
  7. what all states interrupt controller can be in?
    o Admin
    o S_ERROR
    o Interrupt controller
    o S_ERROR
  8. 16 people in this room
    • I ask them to randomly think of a number between 10 to 99
    • I want to know, who thought of the highest number among all 16?
      o I ask everyone to tell at same time, their number.
      o we will go one by one and gets the highest number through comparison
    • algorithm observations
      o always 1st student number is default highest value(to start with)
      o from 2nd student onwards, do the comparision of current highest value and keep updating
      o by the time, we reach last student, we have the highest number.
      o IMPORTANT: For interrupt controller, consider only peripherals, which have raised interrupt.
  9. current active interrupts
    13, 12, 10, 8, 5, 2
    the highest number has highest priority, in current programing.
    1st to be servied: 13 (why not 15?)
    2nd to be servied: 12 (by this time, 13 is dropped)
  10. interrupt controller is a very good project.
    o what is interrupt?
    o what is controller?
    o what is state diagram?
    o how to interrupt is serviced?
    o how to processor, INTC, peripheral interface with each and their order of execution?
    o what is priority based handling?
  11. after cmpleting SV training,
    o DMA controoler, Ethernet MAC, Memory controller => 1.5 months

SPI, Interrupt controoler, FIFO, pattern detector



  1. SIPO
    o PISO => Similar to SPI in many ways, hence doing SIPO
  2. CRC
  3. Dual port RAM => 2 mins training


  1. CRC
    253%7 = 1
    I will send: 2531(not 253)
    receiving side: 2531
    split in to two parts: 253, 1
    he also knows the divisor = 7
    253%7 = 1 matching with what we got => Number received properly

receiving side: 2551
255%7 = 3 not matching 1 => hence number receiving wrong.

  1. whatever we did above, if we do in binary calculations, remainder is called as CRC
    1 : CRC
    7 : CRC polynomial(divisor)
    253 : data (dividend)
    • we need to implement binary division
      o only one difference
  2. SIPO
    o Non-blocking module of SIPO
  3. SIPO?
    why we need?
    o whenever data comes in to chip in serial manner, inside chip it is used in parallel manner
    o S -> P (SIPO) => DESerializer
    o P -> S (PISO) => SERializer
    o both together called => SerDes
    what are two styles of SIPO?
    o blocking mode
    o we got 8 bits of data in serial interface
    o till we complete transmitting these 8 bits on parallle interface, we can’t get next set of serial bits
    o non-blocking mode
    o receiving of serial bits and transmitting of parallel data, both can happen independently. what are the interface of SIPO?
    o serial interface for receiving serial data
    o parallel interface for transmitting parallel data
  4. How to implement non-blocking SIPO?
  5. connect FIFO wr_en_i to valid?
    o it is wrong.
    o we will have to come up with a logic
  6. homework
    o WHy gray code FIFO is not working?
    o why same data is read from FIFO two times?
Course Registration