VLSI Backend basics course
2 months of basics
o ASIC flow : 1 or 2 days
o Digital design : 2 weeks
o CMOS , Finfet and other device fundamentals, IC fabiraiton : 2 weeks
o VLSI backend basic keywords & VLSI technology : weekeeds
o Linux : 1 week
o TCL scripting: 1 to 2 weeks
4 months of advnaced
- ASIC flow, Keywords
Ex: mobile phone manufacturing
o Product will be out after 1 year or 2
o what does the consumer want out of the product?
o SOC architecture, Design specifications
o RTL Design team takes SOC architecture and design specifications as input and develops RTL code (Register transfer level code)
o RTL code : High level description of the design behavior
o INdividual blocks of the SOC are developed:
o CPU SS RTL is available
o GPU SS RTL is available
o DSP SS RTL is available
o Application sepcific SS RTL is available
o Memory SS RTL is available
o COnnecting all subsustems RTL code with each others is called as RTL integration
analogy: Desktop (Monitor, Motherboard, harddisk…) => Assemble them to get desktop
o we need to make sure that above RTL integrated code is working fine?
o till this stage, design is still in the form Verilog code only
o Functional verication gives confidence that, the code is working fine.
============= VLSI Front end flow end ====================
o RTL or Behaviroal code of the design is converted in to Logic gate level format.
GLS (0-delay & timing simulations)
o Wahatever RTL –> Gate level conversion that happened, is it proper?
o Gate level simulations
o Same as Functional verication, difference is that we use Gate level code instead of RTL code.
o Design for testability
o 5nm technology : FinFET channel length is 5nm
even small process variaion, or dust particles => Can results in manufacturing defects(on the sma e
o Si wafer => we get various chips
o some of them are faulty(with manufacturing defects)
o others are all good
o 100 chips
o 5 chips : bad
o 95 chips : good
o Post silicon validation
o Tester hardware
o We apply vectors(large stream of bits, 1 & 0s) to the hardware => output of hardware will indicate whether chip is good or bad.
o vectors: 1. Functional vectors(Functional verication engineers) 2. Test vector(Provided by DFT engineers)
o Does DFT engineer do the testing?
o No. He is respondible for using some techniques
o SCAN, ATPG, Compression, MBIST, LBIST, JTAG, IJTAG => patterns are generated, that needs to be run on the chip.
o DFT engineer uses a concept of ‘simulations’ to check if above patterns are working fine at the RTL and gate level.
o DFT simulations : input is RTL code or gate level code
o Post silicon validation: input is Physical chip.
o in both the cases, we are going to run vectors. Physical Design
o Input to Physical design: Gate level netlist(code)
o This gate level needs to be put in to the die area
o at the transistors level
o every transistor requires VDD, VSS(GND)
o If my chip has 1 Billion transistor => how many voltage lines should go throygh oyt the chip : 2 Billion voltage
o Many chips have a concept of multi voltage domains
o DIfferent transistors working on different voltages.
Post silicon validation
o Role of PSV engineer, is to get vectors form various teams and run those vectors on the chips
o figure out which chip is good, which is bad.
o faulty chips are not used
o Yield = 100 – faulty chips/100 chips %
200 total, 7 are faulty => yield = 96.5%
final product packaging.
- Multi voltage domain
o Reduce the power consumption (Square)
3V -> 1.5V (power is reduced by 4 times)
o why can’t we working 0.1V?
o 900 times power reduction => mobile phone chargeing works for 3 years.
o 0.1 Volts will not be sufficient to establish the channels in the Transistors => they won’t work
- 10 years => best 3.3V
8 years back => 2.5V
5 years back => 1.8V if we have transistors which work at 1.8V, why we need 3.3V transistors?
o some applications requires high perfromance => that comes with high voltage applications
o some applications requires low perfromance, saving power is important => that comes with low voltage
- Physical design
o voltage grids(power rings), and voltage rails(power rails) needs to be used to distribute the voltage to whole chip.
o where to keep power rings, power rails is what is done as part of power planning
o done for all the votlages in the chip.
o to connect signals from one votlage domain to signals in abotehr voltage domain, we need to use ‘level shifters’
o we need isolation cells to separate the power domains inside voltage domain.
o during the creation of ‘level shifters’ in the design
o out of 1000 required, even if one is missed => Circuit won’t work properly
- Gate level netlist = Macros + standard cells (~logic gates)
o we are trying to place this ‘macros & standard cells’ in to the die area(50mm2)
- Physical design does Physical design flow at 2 levels
o chip level
o Uses individual block LEF(macros) and other standard cells(which don’t belong to any macro) => implement the PD flow on that.
o block level
o output of block level implementation : LEF file(~Macro)
o LEF : Logical extraction Format (~macro)
- 2 types of digital designs
o requires concept of stotage elements
o storage requires Flipflops
o each flipflop requires
o If my design has 1 Lakh flip flops
o how many clock lines to be routed totally?
o 1 Lakh line
o how many reset lines to be routed totally?
o 1 Lakh line
- Chip uses multiple clocks
o 10 to 15 clocks are used across the chip.
o it helps with optimal power consumption
o high perfromance circuits reuqires high freqyency clock
o low perfromance circuits reuqires low freqyency clock
- Main chip gets only 2 clocks as input
o clock generation circuitry
o clock dividers
o 2 source clocks => we gneerate clocks of various frequencies
these clocks are distributed to the various blocks on the chip.
- A chip will have billions of things going in to it.
o distibution of clock => Clock tree Synthesis (CTS)
o Reset also will be distirbuted
o we need to connect one block on the chip with other block
12Q. Regarding Clock?? I Didn’t get the Source Clock
1GHz clock needs to go to every Flipflop in the block.
o all the flipflops should receive the clock at same time.
o Uses many clock distribution techniques so that all the Flpflops gets clock at the same time.
- Clock skew
- Metal layers
o all above connections can be done with one layer of metal?
o Clock will be given in one metal layer
o Reset will be given in 2nd metal layer
o voltage(VDD) given in 3rd metal layer
o voltage(VSS) given in 4th metal layer
o data connection and routing given in 5th metal layer
- cross talk
- we are doing every connection inside the chip
o clock distribution
o reset distribution
o voltage distribution
o connection of one block to another block
o all these are done using ‘metal’
o what is associated with metal?
o power dissipation
o voltage drop (IR drop)
o proabbly, chip has billions of metal lines => each has power dissipattion, and voltage drop
o ask trainer: how the heat power is removed from the chip? (only fans)
o heat sink(is it inside chip or outside chip?)
- Physical design engineer needs to analyze IR drop(voltage drop) also
o what happens if there is a voltage drop?
o destination point if it requires 3.3 voltage for proper operation
o due to the voltage drop => we may get only 2.9 votlage => circuit may not behave properly.
- MOST IMPORTANT THING ABOUT PHYSICAL DESIGN IMPLEMENTATION
whole chip can be treated as => Voltage, resistance, capacitance Current is flow of electrons
o opposite direction
o voltage is essentially ‘accumulation of the charge’
- Why RC circuit is important?
o Both resistor and Capacitance impacts the delay of the circuit
o how to reduce the delay?
- We can’t manufacture chip just by doing Floorplan, powerplan, CTS, routing
o we need to do timing analysis
o chip consists of lot of resistors and capacitors => which introduces the delay?
o we want to check whether these delay’s that are introduced are in permissible limit
o rise time (signal goes from 0->1) : it should be in reasonable limit
o fall time (signal goes from 1->0) : it should be in reasonable limit
o all these delays are coming because of R & C
o If metal doesn’t offer any resistance (R=0 for any metal length), chip would have been ideal => no need to do timing analysis.
o reality, chip is not ideal => metals have resistance, there is a parasitic capacitance => it introduces delay o concept of checking all these delays
o delay are many types
o net delay (delay from metal)
o clk->Q delay
o rise delay
o fall delay
o propagation delay
o Physical design engineer extracts a file called ‘SPEF’ from the whole Physical design implementation
o SPEF : standard Parasitic extraction format
o .01 cm of copper, with 0.1um width => 0.002 ohms of resistance
o based on the load the output of the gate, how many circuits it has to drive => capacitance values
o Parasitic extraction
o For overall circuit, what is the R, C at each of point of the circuit
o complex SOC may have billions of R’s and millions of C => SPEF file
o From the SPEF files, they estimate all the delays’
o using these delays we do timing analysis
=> check whether design is behaving as per the timing constraints
- Physical verification
o basic layout rule checks
o distance of one metal to other metal => can’t be smaller than 0.1nm (ex)
o we check throught out the circuit for this rule
- Physical design is still being done at Gate level only
o these gates and macros needs to be converted to transistor level.
o Layout engineer does this conversion.
o Now whole circuit is in transistor level.
o 3 dimensitional positioning of everything in the circuit
o where is VDD line(X, Y, Z)
o where is VSS line(X, Y, Z)
o where is teh metal one
o for all transistors(billion) where are source (X,Y, Z) positions
o for all transistors(billion) where are drain (X,Y, Z) positions
o for all transistors(billion) where are Gate (X,Y, Z) positions
o for all transistors(billion) where are drain (X,Y, Z) positions
o Foundry uses these inputs and series of IC fabrication steps => Finally chip is fabricated.
- ASIC flow
o inputs are every stage
o what is the output of each stage
- Physical design flow
o inputs, data preparation …. o physical verification
- net delay
o gate level netlist?
o gate: logic gates
o level: something implement using logic gates
o netlist : list of nets
o electrical wire is called as a net
o output of any logic gate is always a net
o one type of net in verilog is ‘wire’
nets are 4 types: wire, wand, wor, tri
o net delay
o delay caused by the wire
- fall time
o AND gate output is supposed to go from 1->0
o ideally we want this transition to be happening in 0 time
o in real circuits, there is a delay by the time AND gate output settles to 0 => delay (fall time)
- How these delays are important for a Physical design engineer?
o Static: We don’t apply any inputs to the design
o timing analysis: since we are doing timing analysis
- Timing analysis is done so that actual electronic device will not have any timing violations, which will in turn result in chip entering in to meta stable state.
- what is meta stable state?
o inputs: clk, rst, d
o output: q
- what is special about Flipflop?
o it is one of the storage elements.
o Sequential circuit compolsorily requires flipflops.
o Flipflops store the value from one clock cycle to next clock cycle.
- design are all sequential circuits
ex: mobile phone chip is a sequential circuit
o next state of mobile phone depends on the current state of the device.
o there is a need to rememebr what is the current state of teh device?
o to remmeber => we need storage elements => Flipflops
state = HYNERNATE;
state = SLEEP;
state = ACTIVE;
o at every positive edge of clock, if reset is applied, q will become ‘0’, else q will take the value of ‘d’
- I ran DFF simulation
o Dynamic analysis
- What is DFF has to do with timing analysis?
o DFF : setup time, hold time
- setup time
o analogy: setup time is similar to flight boarding time
o Ideal case: boarding time should be ‘0’
o setup time: In ideal case it should be ‘0’
o because of delay causing elements inside the DFF => it requires us to apply the D input some minimum time before +edge of clock => that minimum time is called as ‘setup time’
- what is example of state storage in mobile phone?
You turn on mobile
o screen is on and active ==> one state of teh design
Don’t use it for 1 minute
what happens after that?
o screen will turn off, screen gets locked ===> one state of the design
- STA basics
- timing checks cocnepts
- setup time
- hold time
- does AND gate setup time?
no setup time for logic gates.
- setup and hold time is only flipflops.
o setup time:
o requirement of minimum time before the +edge of the clock, for which duration the D input must be stable(1 or 0), but it should not change.
o how to calculate setup time?
- transmission gate
o NMOS and PMOS connected parallely
o NMOS Gate = 1 => circuit is closed
o NMOS Gate = 0 => circuit is open
o PMOS is reverse
- 1st transmission gate
o CLK = 0 => transmission gate will be closed
1st tx_gate_delay = 5ns
1st inv_delay = 3ns
2nd inv_delay = 3ns
2nd tx_gate_delay = 5ns
clock edge = 50ns
if we change input at 45ns, what happens?
at 39ns, 1st tx_gate input is 0
by 50ns time, it has reflected at the output of 2nd inverter(input of 2nd tx_gate)
at 45ns, we have changed the 1st tx_gate input from 0->1
- what is buffer mean and use of buffer in circuit
buffer are used in PD implementation to create a delay.
- Physical design work happens at 2 levels
o chip level PnR(Physical design) implementation
o PnR : Placement and Route
o Block level PnR(Physical design) implementation
- You get in to a project at Qualcomm
o Mobile SOC team
o Physical design work for Mobile SOC happens at two levels
o Block level
o Complete PD flow from Floorplan to STA closure, physical verification
o there are no timing violations w.r.t that block.
o From that PD flow, the Physical design engineer who worked on this block, extracts a file called as LEF and DEF.
o chip level
o Chip level PD engineers gets the LEF and DEF for all the sub-blocks
arm_cpuss.lef, arm_cpuss.def ==> MACRO
o chip level engineers have lot of Macro and standard cells => on which he needs to do PnR flow.
- what are factors that make PD flow complex?
o area limitation
o every chip comes with specific area within which complete chip netlist should be fitted in to.
o Different factors will come in to picture
o How much we are actually utilizing for macro & standard cells
o we eitehr reduce the metal size
o reduce the metal spacing (can’t be done below a limit, then Physical veriifcaiton rules will violate)
o Increase the space between one macro and another macro.
o pitch * number of lines
o During the floopplanning, PD engineer takes all these in to consideration.
o everything we do in Physical design flow, has an impact on timing analysis.
o placement of Macros impacts the dealy values
o placement of standard cells impacts the dealy values
o based on placement, the metal line legnth will vary => resistance is changed => delay’s will change
o CTS also impacts timing analysis
o if we follow one style of CTS, we will have one type of timing analysis
o ROuting of connections from one macro to another macro also impacts delay values.
o AT EVERY STAGE, PD ENGINEER HAS TO MAKE SURE THERE ARE NO TIMING VIOLATIONS.
o Closing timing violations
o to close timing violations, PD engineer changes some part of circuit(adding buffer, changing Vt, etc), this can impact some other part of the circuit.
o at sub micron(low technology nodes), there are lot of 2nd order effects
o DIBL, CMOS latch up, Hot electron effect, Pinch off, etc
o Physical design engineer has do introduce some additional thigns in the flow, to mitigate all these problems.
o mitigattion? at the end of implementation, all above issues should not be there in any part of the chip.
o Multi voltage domain SOC’s
o Need to take care of multiple voltage rings, rails and strips.
o these all things add to increasing number of metal layers.
o metal layers happen in 3rd dimension => connecting them to the components at Z=0 position, becomes complex.
o Multi voltage domain requires Level shifters, Isolcaiton cells
o PD engineer needs to accommodate all of them in PnR flow
o Engineering Change Order – we have done complex flow, at later stage we find a bug in the design or there is a requirement for minor design update.
o in this case, we don’t want to start VLSI flow again from architecture level
o Why? Increases the overall time taken, increases the cost
o cocnept where we try to accomodate the minor updates at the PD flow only.
o the bugs or minor updates can be 2 types
o Functional updates
o A verification engineer has found a bug in the design, which is a minor, can be fixed with some changes in the netlist(logic gate level)
o This update is called as ‘Functional ECO’
o Physical design engineer would have allocated some ‘spare cells’ in the chip, which he will use for minor netlist updates.
o Timing ECO
o A verification engineer or STA engineer has found a timing violation(setup or hold), which much be fixed at netlist level.
o This update is called as ‘Timing ECO’
o may involve CTS and ROuting to be done all over again.
o All this PnR flow has to be done while making sure that, chips meets
o IR drop analysis requirements
o All Physical verification rules should be meeting.
o For complex chips, Routing and CTS takes a lot of time.
o Mobile SOC routing can take 12 hours.
o if you have done a mistake, we know that after 12 hours, again fix it, then run again, we get result after 12 hours.
o PD engineer needs to avoid iterations, by doing very efficient analysis at each run.
11Q. Can we only Have Macros Shape in Square or Rectangle Can’t we use Triangle Shape which is the Universal Best shape to cover Most of the area of any shape?
o Physical design implement happens in many shapes
o Square and Rectanble
- Verilog knowledge for Physical design engineer
o what is meant by Behavioral level of coding?
o what is meant by gate level netlist?
o how a verilog module code looks like?
o input port, output port, inout ports
o how the gates are connected to each otehr in the netlist
o write a basic verilog code?
o DFF verilog code
o How DFF Behavioral code converts to gate level netlist
module dff(clk, rst, d, q);
input clk, rst, d;
output reg q;
always @(posedge clk) begin
if (rst) q = 0;
else q = d;
module dff(clk, rst, d, q);
input clk, rst, d;
output reg q;
nand g1 (n2, d, n1);
nand g2 (n3, n1, n2);
nand g3 (n3, n1, n2);
nand g4 (n3, n1, n2);
nand g5 (n3, n1, n2);
nand g6 (n3, n1, n2);
nand g7 (n3, n1, n2);
nand g8 (n3, n1, n2);
not g9 (n3, n1, n2);
not g10 (n3, n1, n2);