#

SESSION#1 NOTES

#

revision:

VLSI design flow
o various steps
Where does functional verification fits in to VLSI design flow?
What is the skill set required for VLSI front end (functional verification) engineer?
o Text editor (P1) => Today
o Digital design (P1)
o Verilog (P1) o SV (P2)
o Linux (P2) o Python (P3)
o UVM (P3)
o standard protocols (P1, P2, P3)
o debug (P1, P2, P3)
How the course will be organized.

Questions

marketing team decides whether to go with FPGA or ASIC flow
o 1Lakh or lesser => FPGA
o 1million + => ASIC
o 1 lakh to 1 million => FPGA/ASIC

Agenda:

Text editor (P1) => Today
o nothing to do with VLSI as such
o text editor expertize can reduce overall coding effort
o text editor comes with keyboard shortcuts
o copied 11 lines => went to the line from where we want to copy, 11yy(yank=copy)
o I took my cursor to the point where I want to paste (p)
what may take 1 hour time to code, you can finish it in 10 minutes time.
Multiple text editor
====== Below 4 are not good for programming development =========
//we don’t get shortcuts
o notepad ==> 2 sec
o wordpad
o idle
o microsoft word ==> 15 sec to load
o word is good for graphics (colors, font size, 3d effect, images, tables) ====== Below 4 are good for programming development =========
//they come with shortcuts
o notepad++
o gvim ===> Using this throughtout the course
o nedit
o emacs
gvim
o G : Graphical
o Vim
GVIM works in two modes
o command mode (thickbox)
o we can do KBD shortcuts only in this mode
o insert mode (|)
o we can type the program or text in this mode
o when cursor is in insert mode(|), don’t enter KBD shortcuts
GVIM
o press escape => command mode
o to enter insert mode: type i or a or ins (various other options)
How does KBD shortcuts work?
o short notation of what we want to do.
o delete word => dw
o paste => p
o change word => cw
KBD shortcut classification
o shortcuts for moving from one part of the file to other
o jumping to specific line number
o shortcuts for copying a code and pasting
o shortcuts for deleting a piece of code
o shortcuts for code replacing(substitution)
o shortcuts for opening multiple files in same window
o shortcuts for making code look better(indentation, numbering)
o shortcuts for doing repetetive activities in simple manner
o shortcut for undoing the things(what we did earlier)
insert(type) mode, command mode, how to move from one to other modes
how to move cursor to the left(h), right(l), top(k), down(j), multiple lines(10j)
l : move cursor to the right one position
4l : move 4 spaces instead of typing: kkkkk => 5k
any keyboard shortcut, with a number before => repeat the shortcut those many times
- moving by one space => h,l,k,j
- moving by words => 3w(right side), b(left side)
- moving to end of the line($), beginning of the line(0) =>
- moving to end of the file($),
- how to move to 1st line(gg), how to move to last line of the file(G)
- Select whole file text content: ggVG
  end of the line(end), beginning of the line(0), next word(w), end of the word(e), previous word(b)
  Moving to specific line number (:20, enter)
shortcuts for copying a code and pasting
copy a character, multiple characters
copy a word(yw), multiple words(nyw)
copy a line(yy), multiple lines(nyy)
copy the entire file content (ggVG, copy)
deleting
delete character(x), multiple characters(nx)
delete word(dw), multiple words(ndw)
delete line(dd), multiple lines(ndd)
delete the entire file content (ggVG, delete)

2:35PM (IST)

#

SESSION#2 NOTES

#

revision:

how to work with GVIM.
o you become good by doing these things yourself

agenda:

Verilog language
o combinational logic
o testbench development
o simulation, check the waveforms
DFT
o very basic level of Verilog
Functional verification
o need to leanr verilog to the perfection

notes:

why do we need Verilog language?
o when we have C, C++, Java, ….
o C, C++, Java
o not meant for implementing a hardware behavior
o hardware has requirements which are different from software
o hardware needs concept of time
o hardware needs concept of structure
o hardware needs concept of state and states changing with time
o hardware needs concept of concurrent execution
Verilog
- visualize how the hardware structure looks like
- multiplexor (2×1)
  o list down signal names: i1(input), i0(input), sel(input), y(output)
  o we should know the functionality of multiplexor
  o if sel is 0, y will be i0
  o if sel is 1, y will be i1
gate level style
o write truth table
o use k-maps, come up with Boolean expression
o then implement Boolean expression

o write truth table

i1 i0 sel y

0 0 0 0
0 0 1 0
0 1 0 1
0 1 1 0
1 0 0 0
1 0 1 1
1 1 0 1
1 1 1 1

y = i0~sel or i1sel

we develop a testbench,
- apply inputs to the design
- get the outputs, compare those outputs with expected values
  o if above works, then we can say, mux2x1 is working fine.
what is simulation? how to run simulation?
- simulation is process of applying inputs to the design
- how to run simulation?
  o compile the verilog code
  o elaborate
  o wave(add signals to the waveform)
  o run the simulation
- tools available for simulation purpose
  o modelsim, questasim => mentor graphics
  o vcs => synopsys
  o ncsim, excelium => cadence
  o reviera => aldec
  o ISE => Xilinx

6.
use cd to go to the directory, where we need to run the simulation
o compile the verilog code
vlog
o elaborate
vsim
o wave(add signals to the waveform)
add wave
o run the simulation
run -all

laptop
ask you test : a => is laptop working properly?
o we need to apply multiple inputs
how to install modelsim, mux code, tb, simulation flow

#

SESSION#3 NOTES

#

scalar and vector
scalar
wire r;
reg a;
vector
o declar a 9 bit vector, whose MSB is 7
reg [7:-1] a;
reg [7:15] a; o declar a 5 bit vector, whose LSB is -2
reg [-6:-2] b;
reg [2:-2] b;

4. Why we need vectors?

reg [31:0] addr;
size? 32
what value to assign? 200
in what radix format to assign?

decimal
addr = 200; //32’d200 same as 200
hexa
addr = 32’hC8
binary
addr = 32’h1100_1000

reg [14:0] data;
value = 350
assign value in all 4 formats(decimal, hexa, binary, octal)
1 bit FA
metastability
o setup time and hold time
o if there is any violation in setup or hold time => FF enters in to unknown state
o unknown state => output can be 1 or 0 => metastability state o setup time
o minumum time before the active edge of the clock for which d input must be stable
analogy: flight boarding time
o 45 minutes before the boarding time
o you may catch the flight ==> logic 1
o you may miss the flight ==> logic 0
o hold time
o analogy:
o once get down flight, we have to wait for 10 minutes to collect luggage
vector to vector assignment
a = b; //example of assignment
integer a, b;
b = 20;
a = b; //a? 20

vector also work in the same manner.

busA = busB;
reg [3:0] busA;
reg [5:0] busB;
busA = busB;
busA[0] = busB[0];
busA[1] = busB[1];
busA[2] = busB[2];
busA[3] = busB[3];
what happens to busB[4] & [5]? not connected.
vec_a = vec_b;
vec_a[] = vec_b[]?
in case vec_a is a compliment, then will the top 2 bits be taken as 1;s?
o no
o vector assignment is just position to position copy

13.
reg [-2:2] vec_a;
5 bit vector
reg [-8:-6] vec_b;
3 bit vector
vec_a = vec_b;
vec_a[2] = vec_b[-6];
vec_a[1] = vec_b[-7];
vec_a[0] = vec_b[-8];
vec_a[-1] = 0
vec_a[-2] = 0
o
14.
reg [-2:2] vec_a; 5 bit vector
reg [-8:-4] vec_b; 5 bit vector
vec_a = vec_b;

vec_b = 125 = 7’b1111101; //64+32+16+8+4+1 = 15
vec_a = vec_b;
vec_a = 125 = 7’b1111101;
vec_a[2] = 0 vec_a[0:3] = 4’b1101
//vec_a[3:0] access will be wrong
vec_a [-3:-1] = 3’b111
vec_a [-2:3] = 6’b111101;
reg [6:0] vec_a;
vec_a[3:0] ? correct
vec_a[0:3] ? wrong

17.
69 to binary
64+4+1
reg [10:3] vec_a;
vec_a = 8’b0100_0101
vec_a[7] = 0
vec_a[10:8] = 3’b010
vec_a[5:3] = 3’b101
//vec_a[3:5] is wrong

vec_b = -69
reg [11:3] vec_a;
69 => 9’b00100_0101
-69 => 9’b11011_1010 + 1 (2’s complement)
= 9’b11011_1011
vec_a = 9’b11011_1011
vec_a[7] = 1
vec_a[10:8] = 3’b101
vec_a[5:3] = 3’b011

19.
reg [3:0] a, b, c;
a=9, b=7, c?

20Q. can we multiply two binary numbers?
o yes

21Q. can we divide two binary numbers?
o yes

how can we measure time using clock?
clock time period = 10ns
50ns/timeperiod(10ns) = 5 (number of edges)
o every electronic device uses same concept of measuring time.
100Mhz
To measure 10sec?
o Not ns
o TP=100Mhz = 1/100Mhz = 1/100*106Hz = 10-8sec = 10ns

Clock freq = 1 Hz => TP = 1/Freq = 1/1 = 1sec
Clock freq = 10 Hz => TP = 1/10 = 0.1 sec
0.1 sec convert to ns => 10**8 ns
1 meter => 1000 mm

Freq = 100Mhz = 108 Hz TP = 1/Freq = 1/(108) sec = 10-8 sec sec to ns convertion => multiply with 109
TP in ns = 10-8 * 109 ns = 10ns

24.
Traffic light controller working at 10KHz
red time = 10 sec
o how many edges to count to be in red time?
o bring it to Hz => divide => sec => convert to req format
yellow time = 20 sec
green time = 50 sec

red :
– 10Khz => TP = 10-4 sec 10Khz = 10*(103) Hz = 104 Hz TP = 1/(104) sec = 10-4 sec – to measure 10 sec 10 sec/TP = 10 sec/10-4 sec = 105 edges yellow edges to count? o 2 * 105
green edges to count?
o 5 * 10**5

25.
Traffic light controller working at 1KHz
red time = 10 sec
o how many edges to count to be in red time?
o bring it to Hz => divide => sec => convert to req format
yellow time = 20 sec
green time = 50 sec

red :
– 1Khz =>
1Khz = (103) Hz = 103 Hz
TP = 1/(103) sec = 10-2 sec
– to measure 10 sec
10 sec/TP = 10 sec/10-2 sec = 104 edges
yellow edges to count?
o 2 * 104 green edges to count? o 5 * 104

on what basis do we decide the clk freq?
o if we want better performance, we go for high frequency clock. high frequency clock results in high power consumption.
o if we want to reduce power consumption, we go for low frequency clock.
TP=10us
frequency in terms of Mhz?
TP=10us = 10(10-6) sec = 10-5 sec Freq = 1/ (10-5) Hz = 105 Hz Hz => Mhz (divide with 106) Freq = 105/(10*6) MHz = 0.1 Mhz
TP=1ms
frequency in terms of GHz? (G means 109) TP=1ms => TP = 10-3 sec
Freq = 1/(10-3) Hz = 103 Hz
to convert to GHz => divide with 109 Freq = 103/(109) GHz = 10-6 GHz
TP=1sec
frequency in terms of Khz

#

SESSION#4 NOTES

#

all the combinational logic verilog codes
o half adder
o full adder
o multi bit full adder
o 2×1 Mux
o 4×1 Mux
o 8×1 Mux
o Decoder
o Mux implementation using different abstraction levels
o encoder

Questions:

how hardware differs from software
o hardware has structure, software doesn’t have structure
o hardware has concept of concurrently running processes
o typing on KBD, running online meeting, projecting data to projector screen
o hardware has concept of state, which changes with time
o hardware has concept of time
how verilog implements
o concept of time
– by counting the clock edges
– Clock time period = 10ns
o to measure 100ns => 100ns/10ns = 10 => we will count 10 clock edges.
– analogy:
o we don’t have a watch, but we want to measure 720 hours
o count 30 sun rises => 30*24 = 720 hours
o concept of structure
module halfadder(input a, input b, input cin, output s, output co);
endmodule
o concept of concurrent process
o it uses one always block to implement one process
o to implement multiple processes, we use multiple always block, all these running parallely.
o concept of states
o reg [3:0] state;
How Verilog differs from C language?
o C language can;t implement
– concept of time,
– concept of structure,
main (input a, output b) { //not possible in C program
}
– concept of concurrent process
o because of this only, we are using Verilog for Hardware coding, not C language.
EDA
- electronic design automation
How EDA tools make design flow easier?
o EDA tools helps by automating most of the VLSI design flow.
o This automation ensures that human efforts are reduced significantly.
o User(VLSI engineer) needs to give only the top level instructions, rest of flow tool only does.
o these tools are useful at every stage starting from RTL design, ingeration, verification, synthesis, etc
o example:
module dff(clk, rst, d, q); //Verilog code of DFF
always @(posedge clk) begin //Behavioral style of coding
if (rst == 1) q = 0;
else q = d;
end
endmodule
Half adder
o two parts
o design coding
o ports: a, b, s, co
o there is no cin
o TB coding
o implement module half_adder with above 4 ports
what difference it makes if we use novopt suppress command and if not used

earlier version of modelsim required to use -novopt
vsim -novopt tb
o without -novopt waveform was not loading all signals
o new version of modelsim, -novopt is alreayd there
o -novopt -suppress 12110

how to run one simulation

#

SESSION#5 NOTES

#

clock generation
memory
o verilog code
o testbench

notes:

clock generation
o why clock is important?
if laptop is working, it is only because tehre is a clock running inside it.
if there is no clock, concept of sequential circuits is not possible.
o used for synchronization between two connected modules
laptop
o processor subsystem
o high frequency clock
o Keyboard controller
o low frequency clock
o laptop itself can have almost 10 to 20 clocks inside it.
two approaches to generate a clock
o verilog forver code
o crystal and PLL
Developing verilog codes: 2 types
o design code => must use synthesizable constructs only
CLock using Verilog forever code.
o
instead of opening waveform, using cursor to get frequency values, can it be done directly from the code itself?
o TB code
freq = f Mhz
freq = f106 Hz TP in sec = 1/freq in Hz = 1/(f 106) sec
TP sec to ns = 109 * (1/f * 106) ns = 10**3/f = 1000/f ns 1sec => how many ms is it? 1000ms (103) 1sec => how many ns is it? 109
how to covnert ns to Mhz?
time = t ns
what is freq in MHz = 1000/t MHz
a = 100/b => b = 100/a
200Mhz => TP=5ns
TP/2 = 2.5ns
if the time precision is 1ns => 2.5ns gets rounded off to 3ns
TP/2 = 2.5ns = 3ns
how to fix this problem?
change teh time precision
2.5ns => 2.5ns only(not as 3ns)
what is jitter?
500Mhz, 5% jitter
freq range=475 to 525MHz 200Mhz, 10% jitter
freq range = 180 to 220 Mhz

100-jitter+$random%(2*jitter);

#

SESSION#6 NOTES

#

Homework:
Generate a clock of 20Mhz frequency
Generate clock with 60% duty cycle
Convert 25KHz clock in to Time period in us(micro seconds).
200Mhz clock in TP in milli seconds(ms)
TP=50ms, what is clock frequency in GHz
Write Verilog code for 3×1 mux(4th selection case, output should hold the value)
Write a Verilog code for 3 bit full adder.
Also write testbench.

Notes:

Project#1:
Title: Clock generation for user provided frequency, duty cycle and jitter
Description: This is Verilog based project for generating the clock as per the user provided variables.
Arrays
o
$monitor is always active
o we jsut need to call only once
o $display: happens only once
DEPTH=100
WIDTH=32
reg [31:0] memory [99:0];
reg [32:1] memory [100:1];
reg [32:1] memory [1:100];
3200 bits

#

SESSION#7 NOTES

#

concept to learn in month temparature example
- how to generate a random real number between 25 to 35
  $urandom_range(25, 35) => only integer
Mango seed

3Q. so if we run the same code from different computer then it will have different seed?
o no
o if you use Questasim in every laptop, all laptop will give exact same pattern for the same code.
o Questasim uses seed as an input to start the randomization pattern
o if same seed is provided, randomization pattern is same in all runs
o if different seed is provided, randomization pattern is different in all runs

4Q. what is that seed value represents?
o starting point of randomization

string

to store 10 chars, course
reg [8*10-1:0] course;

module tb;

7Q. can we use a for loop for instantiation?
o answer is yes => genvar

WIDTH is required at compile time
$vlaueplusargs => run time concept (by this time, already Design structure is created)

#

SESSION#8 NOTES

#

Hierarchical modeling
1 bitFA -> 1 bit FA -> 4 bit FA => parameterizable FA(genvar)
when we run a simulation, there are 2 stages
o compilation stage => complete structure gets created in this stage
o elaboration stage => simulation process gets initiated(we can’t change the structure at this stage)
vsim tb +WIDTH=12 //not possible
parameter overriding
parameter WIDTH=5
parameter DEPTH=7 fa_nbit #(.WIDTH(WIDTH), .DEPTH(DEPTH)) dut(….);
fa_nbit #(WIDTH, DEPTH) dut(….);
reg [3:0] a;
b = &a;
= a[3] & a[2] & a[1] & a[0]

b = ^a;
= a[3] ^ a[2] ^ a[1] ^ a[0] //unary reduction xor operator

5.
^101010111 => 0

6.
a = 6’b001100 (12)
b = a >> 2;
b = 6’b000011 (a is getting divided by 4, when we shift 2 positions)
what is practial signifiance?

7.shift operator is not cyclic

left shift means multiple by 2num_shifts right shift means division by 2num_shifts

a = 6’b001100 (12)
b = a << 2;
b = 6’b110000 (48) = 12*4

9.
100101_000000 (64*37)

100101 (1*37)

100101_100101 (65*37)

Multiplcation = shift operation + binary addition

#

SESSION#9 NOTES

#

10/3 = 3 (integer division)
A = 6; (logically true) A && B 1 && 1 -> 1
B = -9; (logically true) A || !B => 1 || 0 -> 1
C = x; (logically unknown) C || B => x || 1 => 1

A && B = true && true = true
A || !B = true || false = true
C || B = unknown || true = x | 1 = 1 (true)

3.
79 = 64 + 8+4+2+1 = 8’b0100_1111
54 = 32 + 16+4+2 = 8’b0011_0110
—————-
8’b0000_0110 (A&B)
8’b0111_1001 (A^B)

logical inversion (!)
a = 4’b1010
!a = !(true) = false = 0 a = -9;
!a = 0 a = 4’b10x0;
!a = 0;

bitwise inversion:
a = 4’b1010
~a = 4’b0101

a = -9; (a is 10 bit)
~a = 

a = 4'b10x0;
~a = 0;

unary reduction operators
vector = a
all 0’s ==> or(a) = 0
all 1’s ==> and(a) = 1
atleast 1 0’s => and(a) = 0
atleast 1 1’s => or(a) = 1
odd number of 1’s => xor(a) = 1
even number of 1’s => xor(a) = 0
how the operator usage differs between vectors and arrays?
all operations are possible with vectors.
very limited operations are possible with arrays.
We have a register which is 16 bits, we want to always write [7:4] as always 4’b1111, irrespective of other bit values, how can we implement this using bitwise operators
reg [15:0] a;
a = $random | 16’b0000_0000_1111_0000 => this will ensure that [7:4] positions will always be 1111

same question, I want 7:4 positions to be always 0:
a = $random & 16’b1111_1111_0000_1111 => this will ensure that [7:4] positions will always be 0000

same question, I want to always invert 7:4 positions in origial value a, remaiing should be same:
a = a ^ 16’b0000_0000_1111_0000 => this will ensure that [7:4] positions will always be 0000
original a = 16’b1011_0111_1000_1110
xor pattern= 16’b0000_0000_1111_0000
—————————–
16’b1011_0111_0111_1110

reg [2:0] catd;
integer f;
catd = {a, b, c, f};
f = 10;
a = 2’b11;
b = 2’b10;
c = 2’b01;
catd =
f = 32’b00000 …1010;
catd is 3 bit only, we will only take lower 3 bits of f => 010

9.
reg [3:0] a, c, d; //4 bits
reg [4:0] b, e, f; //5 bits
{a,b,c,d,e,f} = 32’h1234_5678; //convert this to binary
{a,b,c,d,e,f} = 32’b0001_0010_0011_0100_0101_0110_0111_1000
regroup
{a,b,c,d,e,f} = 32’b00010_0100_01101_0001_0101_10011_11000
what are the values of a to f?
f =
e
d
c
b
a

#

SESSION#10 NOTES

#

relational operators
== : logical, === : case
!= : logical, !== : case

4’b1z0x == 4’b1z0x -> x
4’b1z0x != 4’b1z0x -> x

4’b1z0x === 4’b1z0x -> 1
4’b1z0x !== 4’b1z0x -> 0
4’b1z0z !== 4’b1z0x -> 1 (case inequality)
4’b1z0z === 4’b1z0x -> 0 (case equality)

3Q. 10z0 === 1z00 ; ?// will it be 0?

always @(posedge clk) beign
q = a & b;
end
q must be a reg,
everything else must be declared net(wire, wand, wor, tri)
inout [3:0] a;
initial begin
a = 10; //not possible
end

solution:
inout [3:0] a;
reg [3:0] a_t;

assign a = a_t;
initial begin
a_t = 10; //possible
end

6.
team1
always @(clk) begin
p1;
p2;
p3;
p4;
end

team2
always @(clk) begin
p5;
p6;
p7;
p8;
end

7.
fork
begin
a = b + c;
end
d = 10;
begin
e = 15;
end
join

#

SESSION#11 NOTES

#

1.
module fa(a, b, ci, s, co);

wire co;
wire [2:0] s;
assign {co, s} = a+b+ci;
//co : scalar
//s : vector
endmodule

2.
Implement 4×1 mux using assign.
o TB for both 4×1 mux
Implement 8×1 mux using assign.
o TB for both 8×1 mux
o reuse above 4×1 TB, update => 2min
o implement $monitor
o analyse waveform for Mux behavior

#

SESSION#12 NOTES

#

4×1 mux using assign
8×1 mux using assign

3
assign y = s2 ? (s1 ? (s0 ? i7 : i6) : (s0 ? i5 : i4)) : (s1 ? (s0 ? i3 : i2) : (s0 ? i1 : i0));

bufif0
buffer if enable is 0
4×1 mux using gates
o

6.
always @(signal1 or signal2) begin //there can be 4 combinations
..
end
– signal1 : 0 -> 1, 1 -> 0
– signal2 : 0 -> 1, 1 -> 0

always can infer 3 types of logic
o combinational logic
o sequential logic
o latch logic
assign compolsory infers combinational logic
what does below code infer?
3 bit FA
always @(posedge clk) begin //this infers sequential logic => Flip flops
{co, s} = a + b + cin; //this infers Full adder
end

it is a combination of full adder and sequential logic
what is sequential logic?
o Flipflops
o how many FFs are required?
o 4 FF’s (3 for sum, 1 for CO)

treatment is ongoing
o even we feel sensitivity, we don’t come out and go again.
Write a Verilog program for swapping two integers

integer a, b;

non-blocking statements introduce concept of temporary variable.

13.
alwaya blk execute sequentialy?

#

SESSION#13 NOTES

#

questions:

always @(in or sel) begin
case (sel)
0:
out = in[0];
1:
out = in[1];
2:
out = in[2];
default:
out = in[3];
endcase
end
check if 100 is prime
2 to 99
2 to 50 => this also not required
2 to 33 => this also not required
2 to 25 => this also not required
2 to 20 => this also not required
2 to 10 => sufficient (square root of 100)

3.
i = 2;
while (i <=num) begin
prime_f = 1; //assume the number is prime
for (j = 2; j <= i**0.5; j=j+1) begin
if (i%j == 0) prime_f = 0;
end
if (prime_f == 1) begin
$display(“prime number=%0d”,i);
end
end
i=i+1;
end

#

SESSION#14 NOTES

#

case, casez, casex
casez
o z is there either in case expression, or case branch, those positions will be ignored during comparison(those positions are don’t care)
o instead of z, we can also use ?
? : don’t care
interrupt handling requries priority handling
interrupt?
I am teaching
3 of you have questions(std2, std1, std0)
all 3 ask at same time.
all of you are interrupting my session.
std2 > std1 > std0 ==> priority?? ==> can I use casez? yes
what level of don’t care I want?
z only as don’t care
both x and z to be don’t care. what if I want only x to be don’t care? but not z.
pipelining
o how circuit logic can be divided in to sequneital and combinaitonal logic.
o how it helps improve overall effieciency of the system
where pipelining concept is used?
all the processor 7575 + 4784 => 10 secs to do 10 such additions => 100 secs to processor one input it requires = 150ns to processor five inputs it requires = 750ns can we do something to improvise => that concept is called as pipelining
Verilog coding for any complex design is about the ability to divide the whole logic in to smaller combinaitonal logics which are divided by sequential logic.

#

SESSION#15 NOTES

#

revision:

pipelining
o dividing combinational logic with multiple stages of FF’s in between
o it helps process more inputs concurrently
casex, casez

agenda:

shift registers
timescale
intra and inter delay statements
sytem task, functions
compiler directives

Notes:

shift registers
o back to back connected FF’s
o used for shifting the data
shift registers can also be used as synchroniers during clock domain crossing.
o when signal moves form one clock domain to anotehr
o it can result in metastability due to setup time violations.
o to address this metastability issues, synchroniers are used.

analogy:
– catching a running bus
o hold the bus, run along with it, catch up the speed of bus, then get in to bus.
o 2 seconds
o 10 seconds

- 1st of synchronizer will have 'x' value
    - x means some voltage in between, not exactly VDD or Ground
    - as we take this x(some voltage in between) through multiple stages, it gets stabilized. Then we have proper output.

time scale
`timescale 1ns/10ps
1ns : timestep
10ps : precision

S1, S2, S3, S4
Morning walk: S1
o Morning walk is a simulation event which is scheduled in to 1st part of the timestep
o time step: 1 full day
o time step: 4 stages

evening tea: S3
o evening tea is a simulation event which is scheduled in to 3rd part of the timestep

9/March: timestep
o S1 : what all activites => complete those activites
o S2 : what all activites
o S3
o S4 => 10/March(next timestep)

Simulator also works in this manner
a = b + c; //one simulation event
a <= b + c; //one simulation event
timestep
1 month : 30 timestep(assumption: 1 day is one timestep)
1 month : 720 timestep(assumption: 1 hour is one timestep)
o every hour I have to keep looking at my day planner => I end up sepdning lot of time looking in to my planner
1 month : 4 timestep(assumption: 1 week is one timestep)
o one week is not a good option.
as we reduce the timestep value => accuracy increases, but speed gets impacted
as we increase the timestep value => accuracy decreases, but speed gets increased
my dialy planner:
- 15 activities to do
  1 hour as timestep: even then I have 15 activities during 24 timesteps
  1 day as timestep: I have 15 activities during 1 timestep

when we setup TB’s. we have to manually give timescale values
`timescale 1ns or 10ns or 100ns
how muc time precision to use? time step=1ns ==> more accuracy, simulaiton speed will be impacted => 1 hour to complete simulation
time step=10ns ==> reduced accuracy, simulaiton speed will be improved => 45 minutes
time step=1ps ==> lot more accuracy, simulaiton speed will be greatly improved => 5 hours
generate a clock of 30Mhz frequency
intra and inter delay statements
System task and functions
o task and functions provided by system(~language)
$display: from language
$monitor: from language
$readmemh: from language
$readmemb: from language
$writememb: from language
$writememh: from language
- 30+ system tasks are there =>

#

SESSION#16 NOTES

#

user don’t need to implement them, user just needs to know ‘how to use them’
$readmemh(“image.hex”, dut.mem, start_location, end_location);
$writememh(“image.hex”, dut.mem, start_location, end_location);
%m
- prints hierarchy and value
write a logic to generate a rnaodm number between 200 to 300, using seed
250 + $random(seed)%51
-50 to 50
module top(input real r);
endmodule module top(input [63:0] r_vec);
real r;
r = $bitstoreal(r_vec); //convetting to r
endmodule
logarthims
o log2(128) => 7
2**n = 128 => what is n? log output
system task categories
o display
o simulation time related
o file handling
o conversion
o memory backdoor access
o randomization
o log calculation
o reading user arguments
o simulation stop and finish
o
$onehot0(7’b1100_000) => 0
$onehot0(7’b1000_000) => 1
$onehot0(7’b0000_000) => 1
$onehot(7’b1100_000) => 0
$onehot(7’b1000_000) => 1
$onehot(7’b0000_000) => 0
$countones(7’b1100_000) => 2
$countones(7’b1000_000) => 1
$countones(7’b0000_000) => 0
I want to declare 32 bit vector
define BUS_32 reg [31:0] BUS_32 vec1, vec2;
reg [31:0] vec1, vec2;

I want BUS of variable sizes?
o paramerized macro
define BUS#(WIDTH) reg [WIDTH-1:0] BUS#(64) vec1;

#

SESSION#17 NOTES

#

questions:

define a macro WIDTH value of 100 in run.do
what command to use? top.sv
compile top.sv wtih WIDTH macro =100
vlog top +define+WIDTH=100
XMR
Analogy:
office room: hp_laptop, dell_laptop
3rd floor, room_no_2: epson_projector

building.ground_floor.office_room.hp_laptop
building.ground_floor.office_room.dell_laptop
building.3rd_floor.room_no_2.epson_projector

3.
module mux2x1(a, b, sel, y);
input a, b; //size of inputs = 1
output y;
input sel

assign y = sel ? a : b;
endmodule

//8 bit input multiplexor => I don’t need to create 8 multiplexor verilog coding, change sclaar to vector
module mux2x1(a, b, sel, y);
input [7:0] a, b; //size of inputs = 1
output [7:0] y;
input sel

assign y = sel ? a : b;
endmodule

reg [15:0] a = 16’h1234;
b = ? 16’h3412
nibble swapping
- 4 bits
  b = {};
inputs are always ‘wire’
synchornous reset DFF
asynchornous reset DFF
How to decide to use if else or case?
o there is no difference, both of them give same behavior

#

SESSION#18 NOTES

#

memory declaration
2KB memory, 16 bit width

2KB :
Size in bits = 210248 bits
WIDTH=16
SIZE = DEPTH * WIDTH
DEPTH = SIZE/WIDTH = 210248/16 = 1024
reg [15:0] mem[1023:0];

16KB memory with WIDTH=32
SIZE = 1610248
DEPTH = 1610248/32 = 4096
reg [31:0] mem[4095:0];
homework
Declare a memory of 1KB size, each element size of 16 bits(~WIDTH)
convert the size in to bits
SIZE = DEPTH * WIDTH
1KB = 10248 bits, SIZE = 8192 bits, WIDTH=16, DEPTH=SIZE/WIDTH = 512 reg [WIDTH-1:0] mem [DEPTH-1:0]; reg [15:0] mem[511:0]; Declare a memory of 256 bits size, whose depth is 64 (~DEPTH) reg [3:0] mem[63:0]; Declare 16KB memory, mem width is 32 SIZE = 1610248 WIDTH=32 DEPTH=1610248/32 = 1024 4 = 4096
What should be address size? 4096 locations require 12 bits(2**12 = 4096)
Declare 1 GB memory of width of 32
What is address port size to access this memory?
Declare a byte address memory of 1KB size.
What is address port size to access this memory?
what all ports are required to access the memory contents?
o clock, reset
o addr, wdata, rdata
o wr_rd
o wr_rd = 1 => we are writing to memory
o wr_rd = 0 => we are reading from memory
o handshaking signals
o valid=1
o driven by TB to indicate to the memory that, I want to do transaction to you(write or read tx)
o ready=1
o driven by memory to indicate to the TB that, I am ready to complete transaction
o what happens if there are no handshaking signals?
o memory won’t know when transaction is going to happen. o write transaction?
o TB performing a data write in to the memory location
o read transaction?
o TB performing a data read from a specific location in the memory o why ready signal is important?
o ready is the way for memory to tell its status.
directions
wdata_i
rdata_o
wr_rd_i
wr_rd_i=1 => write tx
wr_rd_i=0 => read tx
valid_i
ready_o
Handshaking concept is involved in many protocols
o AXI, AHB, APB, OCP, Wishbone => Every protocol involves handshaking.
valid-ready
for handshaking to happen how many components are required?
o 2 components
o one component gives valid=1
o other component responds by making ready=1 => this is when hadhsking completes.
indentation
o makes you efficient

9Q. inside for loop wrote 3 lines can we do this without begin end

#

SESSION#19 NOTES

#

is there a valid tx?
valid=1, ready=1 => hence valid tx is tehre
is it write or read?
o wr_rd_i = 1 => hence write
if write, at what addr and what data?
addr = 2e
data = 4923
by using parameters, we are able to reduce the coding effort.
if somewhere, we make some variable to ‘1’, there must be some other place, it should be made 0′
Homework
Write memory Verilog code instantiation with parameter overriding.
Only instantiate memory as dut(that code only)
Parameters: WIDTH_TB(Design: WIDTH), DEPTH_TB, ADDR_WIDTH_TB
Implement the memory reset logic Verilog code
rst part of the design code
Write design logic for writing to the memory and reading from the memory
Write TB code for writing to all the locations of memory with random data.
What happens if we don’t override the parameters from TB in to the design
IN complex examples, parameter can be present in 10 files.
What are the two things that TB is meant for?
What is the difference between $display and $monitor?
When do we say handshaking is complete with valid and ready signals used
Why TB top module don’t require any ports?
Why design inputs are declared as reg in TB code?
What is updates required to make the Verilog code to behave like 1KB memory with 32 width?
In waveform
How to change a signal radix to unsigned?
How to zoom in to waveform?
How to move cursor to specific time?
How to search for a value?
Why read data is delayed by 1 clock cycle after address is issued?

#

SESSION#20 NOTES

#

memory concepts
Why Testbench is required?
without TB, we don’t know whether memory RTL is working correctly or not.
TB enables us to check the design behavior.
What is testcase? How it differs from testbench?
Test case is the stimulus we are applying to the design.
Test bench is the platform that enable test cases to be run.
How user passes testcase information to simulator? Which testcase to run?
Test name is passed as elaboration argument
not compile argument
vsim work.tb +testname=
$value$plusargs(“testname=%s”,testname);
How front door access is implemented in testbench?
Implemented by driving the design ports
addr, wdata, valid, wr_rd
How back door access is implemented in testbench?
Implemented using system tasks like $readmemh/b, $writememh/b
What are the different system tasks used?

notes:
1.
initial begin
task fd_write(); //wrong
endtask
end

2.
task fd_write();
begin
end
endtask
initial begin
fd_write();
end

where to use task and where to use function?
o
function integer sum(input integer a, input integer b);
begin
sum = a+b;
end
endfunction
static, automatic
analogy:
– apartments
o each flat: living room, balcony, kitchen, etc ==> automatic variables
o swimming pool, gym: shared => static variables

#

SESSION#21 NOTES

#

finite state machine
o state machine with finite number of state
infinite state machine
o state machine with infinite number of state
construct a house with infinite number of bricks => is it possible? No
o infinite state machine => infinite states => infinite number of FF’s => can’t be manufactured
module dff(clk, rst, d, q);
always @(posedge clk) begin
if (rst==1) q = 0;
else q = d;
end
endmodule

5.
What is a state machine?
Difference between Mealy and Moore state machine?
What are the different encoding styles?
Which one is preferred?
What is difference between finite state machine and infinite state machine?
What is implicit and explicit state machine?
Which one is preferred?
Every sequential circuit compulsorily has a state machine.
How many states are required to implement a 10110 pattern detector using Mealy and Moore style?
Why state machine is very important in Verilog design implementation?

pattern detector
- how to draw a state machine or state diagram
- how to implement a Verilog code by refering to state diagram
- one hot encoding, binary encoding
- mealy and moore state machine
- how to write a TB for verifying state diagram
coding
70 to 80% of time spent on developing the algorithm
20 to 30% of time spent on implementing the algorithm
what is pattern detector?
why FSM is required?
what S_RESET, S_B, S_BB, etc indicate?

9Q. where is valid_i used in design module?

10.
pattern_to_detect = S_BBCBB
cur_state = S_BBCB
We get a C?
what is best possible match?
cur_state = S_BBCB
when C cars => BBCBC (what we have)
– why we ignore LSB’s in above?
pattern_to_detect = BBCBB
– why we ignore MSB’s in above?

compare all 5?
– do they match? no
compare 4?
– BCBC (what we have: considering last 4 vehicles)
– BBCB
– this doesn’t match
– then check for matching 3
compare 3?
CBC
BBC don’t match
– try compare 2 vehicles
compare 2?
pattern = BC
pattern_to_detect = BB
are they matching? No
compare 1?
don’t match
then go to S_RESET state.

11.
pattern_to_detect = S_BBCBC
cur_state = S_BBCB
We get a B?
cur_state = S_BBCBB
o best possible match?
S_BB

Industry always prefers ‘one hot’ encoding.
- 99.9% of times
- binary ecndoing can result in unwated state
- one-hot encoding is safe
Every state mahcine which uses binary encoding can result in glitch conditions.
o hence industry never uses binary encoding.

14.
What is binary encoding?
What is one hot encoding?
What is the advantage of one-hot encoding over binary encoding?
What is draw back of one-hot encoding?
It requires more FF’s
64 state machine
Binary encoding: 6
One-hot encoding: 64
Industry will still go with one-hot, because correct behavior is more important than saving the cost.
Explain how binary encoding can result in glitch conditions?
How many FFs are required to implement FSM with 15 states using binary and one hot encoding styles?
Industry uses one hot encoding since we are more concerned about design behaving properly compared to saving some flipflops.

15.
BCCBC

pending

overlapping
dynamic pattern detector
implicit style of coding

#

SESSION#22 NOTES

#

20 states
- How many FFs required in Binary encoding? 5
  21 = 2 22 = 4
  23 = 8 24 = 16
  2**5 = 32
37 states
- How many FFs required in Binary encoding? 6
- How many FFs required in one-hot encoding? 37
what is difference between overlapping and non-overlapping pattern dtector?
pattern_to_detect = 10110 I get 30 bits in series: 10101011011000010111101101101100011100 non-overlapping: 10101011011000010111101101101100011100
o once last bits are used for pattern detection, they won’t be used for new pattern dtection.
o new pattern detection has to start afresh, by not considering previous bits. overlapping: 10101011011000010111101101101100011100
o last set of bits are used for earlier pattern detection, can also be used for new pattern dtection.
why we don’t prefer binary encoding? one hot encoding?
o either learn now, or never learn
binary encodign:
o when moving one state to anotehr, there is a possibolity of intermediate states. These unwanted states can result in glitch conditions.
Current pattern: BBCB
Pattern to detect: BBCBC
we get a bike? BBCBB
o which is the best match? S_BB
why we need valid_i from sensor?
valid_i is used to validate the data going from sensor to the pattern detector.
- if valid_i=1 => data_in coming to pattern detector is valid.
- if valid_i=0 => data_in coming to pattern detector is invalid.
  why is it required?
  o if valid_i is not used
  o data_in default value = 0 => pattern detector will always assume that bike is going, even if there is no vehicle is going.
  valid_i = 1 => only then pattern detector will analyse the value of Data_in.
what is advantage of parameters in pattern detector coding?
o code becoems readable and reusable for different requirements.
what are the signals(ports) at sensor interface?
o clk, rst, valid, data
If pattern detector requires one process, how many always will be required?
o 1 always
when reset is applied to the design, what design should do?
o all reg variables should be driven to reset values(most of the times it is 0)
from the TB, when reset is applied, what should be done inside TB?
- all design inputs should be driven 0

at reset: TB drives DUT inputs to 0’s
DUT drives DUT outputs to 0’s
Summary: All signals should get resetted.

what is the stimulus for a pattern detector?
o Random pattern for d_in is the stimulus.
d_in is not the stimulus.
valid_i is not the stimulus
vector assignemnt
reg [2:0] a;
reg [5:0] b;
b = 6’b100110;
a = b;
a? 3’b110
d_in => 1 bit
$random: 32 bit
lowest 1 bit for assignment
always its overlap only right?
o yes, current dynamic_pattern code is overlapping
can we make it non-overlapping?
o
non-overlapping
o once a pattern is detected, atleast for next 5 clock cycles, don’t do any comparision
o we are still using bits from previous compared pattern.
o how to ensure that, we don’t use those bits from previous compared pattern?
o ignore 5 comparisons
In interview, if someone asks code pattern detector?

19Q. is the design ignoring data input during for first 5clk cycles
o BCBCB

FIFO
o

#

SESSION#23 NOTES

#

FIFO is required
o where fast producer is giving data to slow consumer.
write_pinter
o where the write data should be stored in to the FIFO buffer space
read_pinter
o where the read data should be read from in FIFO buffer space
FIFO can be implemented
o without pointers => Inefficent
o with pointers => efficent
with every write happening, increment wr_ptr by +1 once write completes
with every read happening, increment rd_ptr by +1 once read completes
FIFO is similar to memory
o how do they differ?
o FIFO access in order
o doesn’t require ‘addr’ port
o wr_ptr and rd_ptr will take care of address location, hence TB doesn’t need to provide this.
o memory access can be random
how wr_ptr, rd_ptr, full and empty are related to eachother
- this is all required for Verilog coding of FIFO
bank just opened
EMPTY = 1
FULL = 0
Toggle flag
o whenever wr_ptr goes from 15->0 => wr_toggle_f should be toggled.
wr_toggle_f = ~wr_toggle_f
o whenever rd_ptr goes from 15->0 => rd_toggle_f should be toggled.
rd_toggle_f = ~rd_toggle_f
if 16 people came in
wr_ptr = 0, wr_toggle_f = 1
rd_ptr = 0, rd_toggle_f = 0
12 writes, 12 reads
wr_ptr = 12, wr_toggle_f = 0 (roll over didn’t happen)
rd_ptr = 12, rd_toggle_f = 0 (roll over didn’t happen)
=> Empty
12 writes happened
15 reads happened
o invalid scenario

14.
40 writes happened => 2 toggles
12 reads happened => 0 toggles
– difference between writes and reads at best can be DEPTH of the FIFO
– at best, toggles can differ by ‘1’
toggle_f is a 1 bit variable

15.
12 writes happened
9 reads happened
o possible
o neither empty, nor full

16.
40 writes happened
40 reads happened
o possible
o wr_ptr = 8, wr_toggle_f = 0
o rd_ptr = 8, rd_toggle_f = 0
o empty or full? empty

what if we don’t use pointers(wr_ptr and rd_ptr)?
o FIFO implementation will become inefficient
what if we don’t use full and empty signals?
o external components don’t have indication of FIFO status. they will perform, writes to full fifo and reads to empty fifo. Which is wrong.
what if we don’t use toggle flags?
o we won’t be able to generate full and empty signals properly
Why pointer is 4 bit only?
o because DEPTH=16

18Q. empty_o should be 1 ..when rest=1…?

#

SESSION#24 NOTES

#

1.
always @(posedge clk) begin
if (rst) begin
end
else begin
//write
//read
end
end

relatie velocity
o bring both of them in to same reference, then compare

#

SESSION#25 NOTES

#

concepts learnt

How we can use gray code counting pattern to avoid glitch conditions.
o why glitch conditions happened?
– when we move one state to other => multiple bits are changing
what is the need for synchronization?
o synchronization is not just a concept of Async FIFO
o whereever clock domain cross(signal moves from one clock domain to another), this concept of syncrhonization will be required
o how do we implemnrt synchornization
how to implement the concept wr_ptr and rd_ptr, such that we don’t need to use address port.
how to implement TB for concurrent write and read scenario driving
how to implement various testcases.

Qualcomm => you will never work on Async FIFO

Questions:

gray code
7->8
– 7 writes happened, no reads happened
wr_ptr(7)
rd_ptr(0) Now lets say a write is done
7(0111)->8(1000)
rd_ptr remains same(0000)
7(0111)->8(1000) => all 4 FF”s outputs will change, with some gap in between
0111 -> { 0110 -> 0010 -> 0000 } -> 1000
binary to gray code conversion
wr_ptr_gray = {wr_ptr[3], wr_ptr[3:1] ^ wr_ptr[2:0]}; //binary to gray code conversion
o XOR in a shifted manner
bin2gray function for a 3 bit bin variable
function reg [2:0] bin2gray(input reg [2:0] bin);
reg [2:0] gray;
begin
gray[2] = bin[2];
gray[1] = bin[2]^bin[1];
gray[0] = bin[1]^bin[0];
bin2gray = gray;
end
endfunction
gray = {bin[2], bin[2:1]^bin[1:0]};
PCIe, DDR, USB, AXI, AHB, Ethernet => 12 protocols

chip consists of communication using above protocols.

Protocols

Protocol full form
APB: Advanced peripheral bus
Is it on-chip protocol or peripheral protocol?
Protocol based System architecture?
What are the components in the architecture?
Features supported by the protocol
How handshaking works
Timing diagrams
- Different phases in timing diagram
- How write and read transactions happen
Signals
Signal decoding

Notes:

APB
o Quite similar to memory interface protocol
pclk_i, prst_i, paddr_i, pwdata_i, prdata_o, pwrite, penable_i, pready_o, psel_i, perror_o
o wr_rd_i => pwrite_i (1 : write, 0: read)
o psel_i o during memory example
o One master(TB) <—> one slave(memory)
o communication is fixed.
o One master <—> multiple slave
o selection is required?
– which slave we want to do transaction.
o perror_o => if somehting wrong happens, slave can indicate that error status.
APB is a basic protocol
o it is strong foundation for learning other advanced protocols
USB controller
o AXI interface(~high perf interface)
o USB interface
PCIe controller
o AXI interface
o PCIe interface
Homework
Why any controller requires two interfaces?
How do we categorize protocol in to high performance or low performance?
Based on how much data transfer happens.
How do we categorize the protocols?
For any protocol, what are two types of components?
pwrite same as wr_rd
pwrite = 1 => master is doing write tx to slave
pwrite = 0 => master is doing read tx to slave

7
control signals
psel, penable, pready, pwrite(indicate whether tx is happening or not)
o whether tx is happening or not happening?
o if it is happening, whether it is write or read?

addr and data signals

psel = 4’b0000 => possible => I don’t want to select any slave.
psel = 4’b0100 => possible => selecting slave#2
psel = 4’b1110 => not possible => multiple slaves can’t be selected at same time
We will revise APB tomorrow
o Interrupt controller coding and TB

#

SESSION#26 NOTES

#

SPI protocol

Like APB is simplex among all on-chip protocols
SPI is simplex among all peripheral protocols.
APB
signals
timing diagram looks like
wait states
APB read tx timing diagram
o control signals: penable, pready, psel, pwrite
o addr and data signals: paddr, pwdata, prdata, perror
wait states = 0 cycles
Homework
o APB write tx timing diagram with 3 wait states
perror
o indicates error status, something went wrong in current tx.
o slave is powered down, we are trying to access., slave gives perror=1
o slave location doesn’t exit(addr doesnt exit), slave gives perror=1
o perror=1 is an indicaiton to the master that, something went wrong. Definitely tx didn’t happen properly.
if there are 2 slaves, what is the size psel signal?
reg [1:0] psel;
what is psel=1 is doing?
- master is telling to the slave that, I am going to access you, keep yourself ready.
SPI
o SCLk, MOSI, MISO, CS
o SCLk : M->S : Scalar
o CS (~similar to PSEL) : M->S (vector)
o if there are 5 slaves, what is CS size? 4:0
o MOSI : M->S : Scalar
o MISO : S->M : Scalar
o why only 4 ports?
o Because SPI interface comes outside the chip, we want it to be as small as possible
o by keeping 4 ports => we are reducing the size.
SPI write timing diagram
o ADDR=7’h39
o SPI supports 2 addressing mode: 7 bit and 9 bit modes
o Data=8’h92 o draw around 22 pulses of clock
o since it is serial, all bits go one after other
o address phase
o LSB is driven first
7’h39 = 7’b011_1001
o if addrss phase is completed, till we start data_phase, clock should not be driven(it should be 1′)
data = 8’h92
= 8’b1001_0010
since we are doing write tx, both addr and data will happen on MOSI(M->S) o if we do read
addr phase on MOSI (address always given by master)
data phase on MISO (during reads, data is given by slave)
SPI read timing diagram
o ADDR=7’h39
o SPI supports 2 addressing mode: 7 bit and 9 bit modes
o Data=8’h92 (lets say slave is giving 8’h92)
how do we select which slave should be targeted?
o address mapping
Once you understand SPI controller, learning otehr contrllers becomes easy.

#

SESSION#27 NOTES

#

questions:

what is the need for registers inside SPI controller(master)?
o to store the addr and data, where we want to do the tx
How many processes inside SPI controller?
o two
o one for programming the registers
o other for implementing SPI tx

agenda:

notes:
register programming => APB interface
o memory coding
design registers
reg [7:0] addr_regA;
//size of addr in SPI tx = 7 bits + 1 bit for wr/rd decision = 8 bits
//array [7:0] : number of txs we want to do.
//7:0 : there are 8 different address values in this array => hence we can do 8 txs.
//once processor will load this array, it doesn’t need to do anything for 8 txs.
Analogy:
o 8 questions in this lab session
o 1st way: I will give all 8 questions, give you 2 hours time, come back after that
o for this 2 hours, I can do some other work(processor can do some other txs in the chip)
o 2nd way: I will give 1 question, come after 15 mints, then give 2nd question, so on
o I : processor
o Student: SPI controller
reg [7:0] data_regA[7:0];
size of data? 8 bits
array[7:0] => how many back to back data’s can be provided by SPI controller, without processor intervention.
reg [7:0] ctrl_reg;
o analogy:
o 10th class exams: out of 15 questions, attempt any 12 questions.
o when to start tx?
ctrl_reg[0th bit] = 1 => when it is done, then start the transactions.
ctrl_reg[3:1] = how many back to back txs to do.
ctrl_reg[6:4] = what is the last tx index, when new tx request comes from where should we start the tx
ctrl_reg[7] = interrupt generation, indication to the processor that tx is completed.

for any design, registers are very important.

this is the only medium through which processor can talk to the controller.
o processor wants to say something to SPI controller => it can’t give instructions in English, etc
o it uses APB protocol and does the programming of regsiters
o based on programming of registers, SPI controller understands what processor wants out of it.
o ex:
– whether I should write or read tx?
– how many txs should i do?

3Q. how do we know there are exactly 8 transactions? is it due to spi protocol?
o nothing like that
o we can keep any number

we have ocmpleted with register pgoramming implementation
what is pending?
o using above addr and data, ctrl_reg information, implement SPI tx behavior
o will this work on APB protocol or SPI protocol?
o which clock to use? sclk
SPI master can only be in any of the below 5 states.
IDLE, ADDR, IDLE_BW_ADDR_DATA, DATA, IDLE_WITH_TXS_PENDING, ERROR
once we list down states => next draw the state diagram.
next session
o TB for SPI controller USB ctrl, PCIe controller, DDR controller

#

SESSION#28 & 29 NOTES NOT WRITTEN

#

SESSION#30 NOTES

#

Questions:

why sclk is input?
o two different ways of generating clock
o using forever => can’t be done in design
o crystal or VCO => I don’t want to keep this inside SPI controller
o it will make SPI controller complex
o other way
o generate clock from outside, provide it to the edsign(SPI controller)
o SPI controller will use that to generate required clock.
S_ADDR, S_DATA states: clock is running
other states: clock is not running
addr_regA, data_regA, ctlr_reg

Agenda:

pending
o currently, everything is checked from waveforms
o we should do most of the thngs the TB code only
o Update TB for collecting SPI bits
We have only impleemnted SPI write txs
o SPI read txs now
o Write 5 locations(53,54,55,56,57)
o read back same 5 locations
o we should get same data.

3Q. doesnt it depend on the device we are connecting?
o SPI salve devices are very simple devices with a small emmory
o hence 7 bit address is sufficient

can we use memreadh and memwriteh instead of a fixed memory location?
instead of 53, .. => write in to some random locations, read back from same random locations
Please do not watch video and replciate the code
backdoor load TB memory with some image
just perfrom SPI reads to some random locations => check if data matches with image data

remaining course:

Interrupt controller
PISO or SIPO
CRC generation => importnat but wont’ be complex
o how to implement binary divisin using Verilog
I2C => video access
o 5xmore difficult
o inout port(bidrectional port)

SV training

#

SESSION#31 NOTES

#

status:
SO. Only understood concept, nothing is done
S1. implement register wr/rd part of design
S2. impelmented TB for writing and reading registers
S3. implemented FSM
S4. Implemented TB for printing the bits
S5. Getting the proper address and data values in TB display
S6. Able to write and read the same locations of SPi
S7. Did backdoor load to memory and SPI reads, data si matching with loaded data.

100 => 99% it won’t be of use.
why 2 interfaces?
what ports required?
which states required?
why count is required?

Interrupt controller
o lot of similarities with SPI controller
o both of them are controllers
o both of them has two interfaces
o APB, Interrupt interrupt(gets intterrupts from peripherals)
o both needs register programming to start the functionality
o both requires state machines
o both requires 2 processes
o one for progrmaming registers
o other for implementing interrupt handling behavior(~SPI tx implementation)
why we are doing Interrupt controller?
o Interrupt controller is there in every chip.
o you get in to a project, 95% chance that, you need to work on interrupt concept.
what is interrupt controller?
o what is interrupt?
o
why USB controller generates interrupt?
o we connect a pendrive to laptop
o this will result in interrupt, so that, processor knows that a pendrive is connected.
o who generates interrupt?
o USB controller gneerates interrupt
o why is it generating? becuase one pendrive got connected.
when admin came in to the room, he doens;t the priority values.
o I tell him the priroties => processor programming the interrupt controller priority_register array.
interrupt service routine?
- ISR
- how processor addresses the peripheral once it gets the interrupt
what all states interrupt controller can be in?
o Admin
o S_NO_QUESTION
o S_QUESTION
o S_QUETION_GIVE_TO_TRAINER_WAITING_FOR_SERVING
o S_ERROR
o Interrupt controller
o S_NO_INTERRUPT
o S_INTERRUPT
o S_WAITING_FOR_SERVICING
o S_ERROR
16 people in this room
- I ask them to randomly think of a number between 10 to 99
- I want to know, who thought of the highest number among all 16?
  o I ask everyone to tell at same time, their number.
  o we will go one by one and gets the highest number through comparison
- algorithm observations
  o always 1st student number is default highest value(to start with)
  o from 2nd student onwards, do the comparision of current highest value and keep updating
  o by the time, we reach last student, we have the highest number.
  o IMPORTANT: For interrupt controller, consider only peripherals, which have raised interrupt.
current active interrupts
13, 12, 10, 8, 5, 2
the highest number has highest priority, in current programing.
1st to be servied: 13 (why not 15?)
2nd to be servied: 12 (by this time, 13 is dropped)
interrupt controller is a very good project.
o what is interrupt?
o what is controller?
o what is state diagram?
o how to interrupt is serviced?
o how to processor, INTC, peripheral interface with each and their order of execution?
o what is priority based handling?
after cmpleting SV training,
o DMA controoler, Ethernet MAC, Memory controller => 1.5 months

SPI, Interrupt controoler, FIFO, pattern detector

#

SESSION#32 NOTES

#

SIPO
o PISO => Similar to SPI in many ways, hence doing SIPO
CRC
Dual port RAM => 2 mins training

Notes:

CRC
253%7 = 1
I will send: 2531(not 253)
receiving side: 2531
split in to two parts: 253, 1
he also knows the divisor = 7
253%7 = 1 matching with what we got => Number received properly

receiving side: 2551
255%7 = 3 not matching 1 => hence number receiving wrong.

whatever we did above, if we do in binary calculations, remainder is called as CRC
1 : CRC
7 : CRC polynomial(divisor)
253 : data (dividend)
- we need to implement binary division
  o only one difference
SIPO
o Non-blocking module of SIPO
SIPO?
why we need?
o whenever data comes in to chip in serial manner, inside chip it is used in parallel manner
o S -> P (SIPO) => DESerializer
o P -> S (PISO) => SERializer
o both together called => SerDes
what are two styles of SIPO?
o blocking mode
o we got 8 bits of data in serial interface
o till we complete transmitting these 8 bits on parallle interface, we can’t get next set of serial bits
o non-blocking mode
o receiving of serial bits and transmitting of parallel data, both can happen independently. what are the interface of SIPO?
o serial interface for receiving serial data
o parallel interface for transmitting parallel data
How to implement non-blocking SIPO?
connect FIFO wr_en_i to valid?
o it is wrong.
o we will have to come up with a logic
homework
o WHy gray code FIFO is not working?
o why same data is read from FIFO two times?