soc-may2021-batch-notes

SESSION#1

SOC design and verification flow
o practical aspects
SOC verification engineer spends a lot of time on test case debug
o commonly faced testcase issues in SOC
o how to debug them
Subsystems of SOC
o Processor SS
o Peripheral SS
o Power management block
o CLock and reset block
Training centre
o to setup an SOC(ARM Cortex M3) => it takes million of dollars to get all the licenses and setup the SOC
There are many SOC platforms
o Mobile
o server
o networking
o consumer electronics based SOC
All SOC has many things in common
o how we come up with architecture
o common components in every SOC o APplication specific SS or IPs which will differing between SOC’s
SOC architecture : Components
o CPU sub system
o 4 Cortex A7 cores (ARM company processors)
o 2 Cortex A15 cores/processors (ARM core)
o with respect to each core, there is a L1 cache
o with respect to set of cores, there is a L2 cache
o L1 and L2 Cache
o Cache a type of memory which is developed using SRAM technology
o Cache will be relatively smaller in size compared to main DDR memory
o if DDR memory is 4 GB, cache will be 1MB or even smaller
o why we don’t want to 4GB cache?
o implementing 4GB cache => 4GB SRAM memory will be very costly => it increases the cost of chip, power consumption will be more
o we intentionally(soc architect) keep cache to be of smaller
o SRAM and DRAM
o SRAM are quick to access, but they require more transistors to implement each cell of the memory => more cost
o DRAM are slow to access, but they require fewer transistors to implement each cell of the memory => less cost
o All the SOC main applications will be stored in DDR memory or in the other memories(harddisk, MMC, SDcard)
o few applications which needs to be accessed frequently by the processor, these applications are kept in a memory which is very quick to access
o these types of quick access memories are called as ‘cache’
o Cache is divided in to 3 categories
o L1 cache(Level1)
o Cache(SRAM memory) associated with each core of processor sub system
o 4 for A7
o low pefroamnce, low power consumption
o 2 for A15
o high pefroamnce, high power consumption
o 6 L1 cache will be there
o L2 cache(Level1)
o L3 cache(Level1)
o cache uses a concept of spatial locality and temporal locality
o cache keeps the copy of frequently accessed locations of DDR
o if location 100 is accessed, there is a good chance that locaiton 101 to 120 will also be accessed => cache will keep copy of these locations also.
o Processor may take
o 5ns to access applicaiton from L1 cache
o 10ns to access applicaiton from L2 cache
o 20nd to access applicaiton from L3 cache
o cache keeps the copy of main memory content
o since it is very close to processor, it will be very quick to access
ARM processors
o 3 categories
o A series
o application specific processors
o provides high performance
o M series
o embedded specific processors
o provides medium performance
o R series
o real time specific processors
o provides medium performance
o Mobile SOC
o Qualcomm SOC platform
o what is it called as: Snapdragon
o NVidia SOC platform
o what is it called as: Tegra
To run an application which requires very high perofrmnace
A15 processor will be used
To run an application which requires low perofrmnace, but power saving is priority
A7 processor will be used
Cortex A77
Cortex : one type processor from ARM
A : application series
77 : code name of this specific processor(core)
Snapdragon 865
4 Cortex A77 cores
4 Cortex A55 cores
Snapdragon 865
o works at maximum frequency of 2840 Mhz (2.8 GHz)
o Physical design team would have maximum closed at setup and hold time at 2840Mhz
o as frequency increases => TP reduces => it makes it difficult to do the timing closure
o why can’t we use it at 4GHz?
o Timing closure => we won’t be able to do setup timing closure at 4Ghz
o 4GHz will result in lot of timing violations => if we can’t resolve timing violations => it will result in metastablity -> chip enters in to unknown state of operation.
SOC components
o Processor SS
o A7 cores
o L1 cache
o L2 cache
o A15 cores
o L1 cache
o L2 cache
o Coherency interconnect
o Coherency : refers to the cocnept of maintaing L1 and L2 cache contents matching with the main memory contents.
o Processor specific components
o Timer
o Design specific subsystems (SS that are required because of the application)
o GPU SS
o DSP SS
o Application IP SS
o AES
o 2D Graphics
o MPEG
o Interconnect
o NOC (Network on Chip)
o Memory SS
o Memory controller
o Memory scheduler
o PHY logic of each memory
o memories
o High speed wired peripherals
o PCIe, USB, SATA, UFS
o High speed wireless peripherals
o WiFi
o GSM
o Bluetooth
o LTE
o Low speed peripherals
o SPI, I2C
o UART (wire less)
o IO peripherals
o HDMI
o MIPI
o Display
o PMU
o JTAG
o Security SS
o Crypto
o RSA-PSS certification engine
Laptop SOC(server SOC)
o does it require Modem SS? No
o Mobile SOC, does it require Modem SS? Yes
How to understand SOC features?
o as a consumer, when you buy any electornic device(ex: mobile phone), what do we look for?
o same things at a very minute level is called as SOC features
o Mobile SOC, I want a mobile with
o Octa core processor SS
o USB4 port
o Display
o 16MP camera
o 5G mobile => Modem SS which is compatible for 5G
o to support these feature, SOC architect comes up with SOC architecture, by choosing subsystem which support these featured
o SS and IP which will fullfil the above feature requirements
o SOC integration
Octa core
o ornamental purpose
o in real life, even if we have Octa core processor mobile, we will never have all these cores used at same time
o selling tactic
o Just by using Quad core, we may still get same perofrmance
When we apply power on reset to the SOC
o Main CPU(ARM applicaiton sepcific SS) will boot up
o it will in turn program other processor SS(GPU, DSP, Modem) to boot up
o main CPU SS will do the required programming to boot up otehr processor SS(as reuqired)
Documents required while working on SOC
o we will correlate with IP design and verificaiton flow
o Design specification
Transaction matrix spreadsheet
o SOC consists of many subsystems and IPs connected to each other using interconnects (SNOC, PNOC, PCNOC, etc)
o Transaction matrix will tell, which sbbsystem can access which components of the SOC
o it will also tell, which components/SS, a specific SS can’t access
o These will be required as part of verification
o if one speciifc access is not supported, we don’t create that scenario as part of testcase development
interrupt mapping
o SOC has various processors
o each processor has many interrupt
o each interrupt is mapped to one specific peripheral controller/component/IP
o when this controller/component requires processor attention, it asserts the interrupt
o processor understands that this controller requires my attention => Interrupt service routine to communicate with controller.
o interrupt mapping docuement provides information on how the processor interrupts are mapped to various peripherals
IO sheet
o provides information on PAD muxing
o Chip has 50 pins
o some will be fixed purpose
o USB D+/D- => 2 pins for USB connection
o PCIe => 2X configuraiton => 4 pins(2 uplink, 2 downlink)
o DDR => 16 pins
o total fixed pints = 16+4+2 = 22 pins
o these 22 pins can’t be used for any other purpose
o remaining 28 pins used for generic requirements(requirements specific to current application)
o General purpose Input/output pins
o General purpose
o can be used for connecting any kind of peripheral
ex: SPI requires 4 pins(SCLK, MOSI, MISO, CS) => 4 GPIO pins(GPIO_0, 1,2,3 used for SPI connection)
o UART : 2pins (GPIO_4, GPIO_5) for this UART connection
o SPI : 4pins (GPIO_6, GPIO_7, 8, 9) for this SPI connection
o MMC : 8 pins (GPIO10, 11…17)
o same SOC, some other requiement => some otehr application
o same GPIO pins will be used for some other peripheral connection
o GPIO_0, GPIO_1, GPIO_2, GPIO_3 => camera connection
o GPIO_4, GPIO_5, 6, 7 => SPI
o GPIO8, 9 => I2C connection
o Same GPIO
o can be used for connecting various peripharals (PAD muxing)
o we have totally 28 GPIO pad
o each GPIO can be multiplexed to work for various peripharals connections => PAD muxing
o same GPIO pin can be configured to work as an input or as an output => IO
o GPIO pin
o IO sheet will provide information about
o what all peripherals can be connected to each GPIO
o what is the selection value to be used during PAD muxing
SOC development always happens in revision
o Snapdragon 866, 855, 880
o These all versions will have atleast 90% of things in common
o only 10 to 15% changes from one version to other
o 85 to 90% testcases, TB architecture, coverage plan, regression setup will be common across multiple versions
o when we develop TB for new version of snapdragon, we reuse the exiting TB, Testplan, features, regression setup, etc
o many documents are also reused
o update these documents(testplan, verificaiton plan) for new features added in the new version of SOC
o SOC versions can be just differing one block
o one SOC having Crypto 1.7 version => OEM(ex: LG) may want Crypto1.7 version
o one SOC having Crypto 1.9 version => OEM(ex: Samsung) may want Crypto1.9 version
o everything else remains same acrsso both chips
o now these two becomes two different SOC projects
HW programmers guide (HPG)
o SOC has lot of subsystmes, each inturn has lot of IPs
o all these subsystmes and IPs has lot of registers
o SOC testcase development is majorly about programming these registers for curernte testcase requiements
o understanding what is order that needs to be followed in programming the registers
o what values to program in to each register
o these are important aspects of SOC bring up for a specific feature
o test case develepor(verificaiton engineer) needs to know how to program these rgisters for specific feature verifiaction
o HPG gives detailed information about how to program registers(what order to program in, what values to program, where to wait, etc)
o Verification engineer refers to HPG during testcase coding
o Design team provides this document
Memory mapping (address mapping)
o how processor address map
o processor has 32 bit address bus
o 32’h0000_0000 to 32’hFFFF_FFFF
o this address range is distributed among various peripherals
o SWI manual and memory map docuemtn provides address mapping for various peripherals.
frequency plan
o SOC can work in different modes
o each modes, which component has to work at which frequency
o it is like a table, which gives complete SOC frequency information
Important aspect of any SOC verificaiton engineer
o as soon as you get in to project, we need to collect the paths or copies of these documents
if we don;t have IO sheet(pad muxing)
o we don’t know how to program GPIO
o if we dont program GPIO
chip requires 3 UARTs
UART0
UART1
UART2
Just by seeing PAD muxing(IO sheet)
o we can list down what all peripherals will be required
o UART : 3
o SSP : 3
o GPT_0, CH0, CH1 to CHannel5
o GPT_1, CH0, CH1 to CHannel5
o GPT_2, CH0, CH1 to CHannel5
o GPT_3, CH0, CH1 to CHannel5
o I2C0, I2C1,
o AUDIO
o JTAG : 5 (TDO, TDI, TMS, TCK, TRST)
o CONFIG
o WAKE_UP0 : GPIO_22 will be configured in Function#0 to use this pin as CHip wake up pin
ex: Laptop I enter any button when it is in sleep, it just come up
o WAKE_UP1
o LDO18 comparator
o OSC32K : 32KHz clock
o XTAL32K_IN : 32KHz clock
o USB
o QSPI
o SPI: will have only one pin
o QSPI : 4 data pins
o ADC/DAC
o VOICE_P o to use QSPI in specific application,
o GPIO_28, 29, 30, 31, 32, 33 => all these GPIO has some registers
o these registers need to be programmed to work in FUnction#0 mode
o These GPIOs will work like QSPI ports
o As per the current application(from verificaiton engineer perspective: as per current testcase requirements)
o verificaiton engineer needs to program GPIO’s(all 50 of them) with speciifc function mode values, to make sure that GPIO’s work specific to current testcase requirements.
o verificaiton engineer also know the limiation
o ex: If 2 SSP’s are used, we can’t use all UART
o SSP and UART are sharing same GPIO’s
o both combined can be only 3 in total
o 3’SSP + 0 UART
o 2’SSP + 1 UART
o 0’SSP + 3 UART
o 1’SSP + 2 UART
SOC interfaces
o how to develop TB
o what UVC/VIP needs to be connected
o SOC interface are used to drive the stimulus to the SOC
o to collect the output of SOC
o We also need to know the signifiance of each SOC interface
SoC Performance requirements
o how much bandwidth should we get PCIe
o what performance is required from DDR
o performance from Modem
o verificaiton engineer needs to develop performance targeted testcases, which will measure the specific feature/component performance, we can compare these values against the expected performance numbers.
o if performance doens’t meet the requirements, we analyse the SOC architecture, we(architect) will do the updates to improve the performance.
Boot flow
o refers to SOC bring up flow
o when to apply reset, how to put SOC in a specific mode
o where to keep the boot image
o how to ensure that processor boots up and does the essential configuration of SOC components
o making sure that SOC has booted up
we need to know how SOC power architecture is implemented => UPF
o power aware simulation setup requires UPF
when we develop a testcase targeting specific feature
ex: processor accessing SPI master core, then SPI master core accessing SPI slave
o how the tx goes from processor to interconnect(System NOC
o how this tx further goes from SNOC to Peripheral NOC(all peripherals are connected, hence called PNOC)
o how this tx further goes from PNOC to SPI master core
o how this tx further driven at SPI interface
o how response come SPI master core to the processor
what is memory preloading? how it is done in SOC?
o SOC has lot of memories
o Boot ROM
o Code RAM
o Message RAM
o DDR
o Flash
o Internal MEM(IMEM)
o Onchip Memory (OCMEM)
o If specific application(ex: talking over phone, playing a game, etc) uses Boot ROM, Code RAM, OCMEM, and DDR
o if we don’t initialize(preload) these memoeis with proper image(content), what will be there(some junk data)
o if processor accesses memory and gets junk data, what will happen?
o applicaiton won’t work as expected, Processor can enter in to unknown state of operation
o It is essential to preload the memories with required image file, so that processor gets the valid data during the application running.
o we need to preload the memories
o EX: DDR 4GB
o 4GB with 32 bit data bus size, how many locations?
o 4GB = 4 K * M = 4KKK*8 bits = 32 * K * K K o DW = 32 bit width o 1K : 1024 locations (210 locations) o how many locations = 32KKK/32 = 230 locations
o to preload 4GB memory using front door access
o assming each location access takes : 2 clock cycles
o to preload 4GB DDR, we will need = 2* 230 = 231 clock cycles (very large time even in SOC flow)
o this time taken to front door load a memory, increases the testcase run time
o hence we don’t do front door loading
o solution? Backdoor loading
$readmemh(“image_4GB.hex”, top.soc_top.mem_ss.ddr);
o $readmemh : used for backdoor write
o read: read the image file, load it in to memory
o every SOC(small or big), will always have memory backdoor loaidng concept o There is step in SOC design flow, called as Post silicon validation
o checking the real silicon
o there we can backdoor loading(is only supported in RTL and GLS simulation, not in real chip)
o then how do we preload memories?
o we must use front door loading
o SOC will have TIC to preload the memories
during SOC simulations, user needs to pass lot of arguments to the simulator commands
o Makefile : makes it easy to do this.
make run +testname=… +num_of_uart=2 +run_time=10ms
CSV,
make soc_run +….
design integration
o CDC checks
o Lint checks
o UPF architecture implementation
o UPF: Unified Power format
o describes the power architecture of the SOC
o what is power architecture?
o SOC architecture is developed as
o multiple voltage domains
o each voltage domain having multiple power domains
o each power domain having multiple clock domains
o what is voltage domain? (VD)
o set of components(SS & IPs) working on a specific voltage
o where do we require voltage in a chip/SOC?
o SoC/Chip consists of transistos
o each transistor requires VDD and Ground connection
o with higher voltage, we get higher performnace, but with higher power consumption
o applications that require high performance will be driven with higher voltage
o applications that require low performance(power reductin is importnat) will be driven with lower voltage
o hence same SOC/Chip will have different components working on different voltages
o even same component can work at multiple voltage levels
o DVFS : Dynamic Voltage and Frequency Scaling
o concept of reduing/increasing the voltage and frequency to meet the curernt pefromance and power requirements.
o if we want to get high performance out of same SS, we will run the SS with higher voltage and higher frequency
o if we want to get low power consumption with normal performance out of same SS, we will run the SS with lower voltage and lower frequency
o Dynamically we control voltage and frequency => DVFS
o Why each voltage domain(VD) is divided in to multiple power domains?
o Ex: Multi media SS
o Audio
o Video
o Image processing
o Current application: playing a song(audio)
o do we need video SS to be on? No
o do we need Image processing SS to be on? No
o do we need audio SS to be on? Yes
o Multi Media SS => as a whole will be working on one voltage domain(ex: 2.5V, all the transistors in MMSS will be given with 2.5 voltage rail)
o for current application we don’t want Video_SS, Image_processing_SS to be on
o MMSS_VD
o Audio_PD
o Video_PD
o Image_processing_PD
o current application
o turn on Audio_PD
o turn off Video_PD and Image_processing_PD
o benefit of turning of?
o there is no dynamic power consumption, also no static power consumption
o dynamic power consumption: transistor switching activity(1->0 and 0->1)
o static power consumption: leakage current
o transistor will have leakage paths(S->S, D->S), these paths result in power consumption
o when we cut off whole voltage itself to a power domain(PD is completely off)
o there won;t be any dynamic and static power consumption
o what happpens if there is no concept of power domain inside a voltage domain?
o all PD will get the voltage, even though they are not running, there will be static power dissipation.
o using concept of PD, we avoid this unwanted power dissipation
o why we need multiple clock domains in same PD?
o clock are used to get various interface requirements
o SPI master core
o 2 interfaces
o APB/AHB interface to connect with the processor
o SPI interface to connect with the SPI slave
o should both APB adn SPI work on same clock frequency?
o They should be different, these are 2 different interface with different protocol and different requirements
o same SPI master core which is a single power domain requires 2 different clocks.
SOC architecture is developed as
o multiple voltage domains
o each voltage domain having multiple power domains
o each power domain having multiple clock domains
what is it has to with UPF?
o when architecture does VD, PD, CD implementation
o VD are separated from each other by using level shifters
o MMSS works on 2.5V
o 0 to 1.25 => logic 0
o 1.25 to 2.5 => logic 1
o DDR SS works on 1.8V
o 0 to 0.9 => Logic 0
o 0.9 to 1.8 => Logic 1
o we can’t have signals directly moving from a 2.5V domain in to 1.8V domain
o it might result in wrong behavior
o MMSS generates a signal whoch analog voltage level is 1.0V(Logic 0)
o same voltage(1.0V) when does in to DDR SS, it gets treated as Logic1
o due to this, we can’t directly connect signals between voltage domains
o we need to use ‘Level shifters’ for all the connections between all the VD’s
o PD are separated from each other by using isolation cells
o MMSS
o audio_SS_PD
o video_SS_PD
o image_processing_PD
o if we give voltage to audio_SS_PD and no voltage to otehr 2 PD’s
o there will be some physical connection(wires) between these PD’s
o scenario: one domain with voltage in it, other domain without any voltage
o other domain without any voltage will get some impact from 1st domain
o to isolate it, we use Isolation cells between all the PD connections
o CD are implemented using Clock gating logic
o To save power, one of the important things is to remove unwatend transactions
o there is no dynamic power consumption(due to no switching actiivty)
o most important aspect to remove switching activity is to cut off clock(gate it)
o power architecture requires defining
o Level shifters, Isolation cells and clock gating logic across the SOC platforms
o there will be 100’s of level shifters, Isolation cells and clock gating logic
o UPF files define these locations, what type of level shifters uses, what type of isolattion cell and gating logic
o Who defines this UPF?
o RTL integration engineer will develp the UPF and checks it behavior
o this UPF is inputs multiple stages later
o Functional verification
o power aware verification
o PAGLS
o Physical design engineer
o DFT engineer
when we move from IP to SOC level
o verification perspective is completely changed
o IP verification:
o check design for all possible design configuration(by doing various register programmings)
o SOC verification:
o check design integration in to the SOC
o memory/address mapping working fine?
o interrupt mapping working fine?
o power behavior(PD, VD, CD) is working fine?
o pefromance requirements are working fine?
connectivity checks
o During RTL integration, all SS’s and IPs are integration by port connections
o connectivity checks is done to check whether all the connections are properly done.
o how is it done?
o we write an assertion for this check
property mmss_req_to_ddrss_req_prop;
@(posedge clk) (mmss_req == 1) |-> ##3 (ddrss_req == 1);
endproperty
assert property (mmss_req_to_ddrss_req_prop);
o when we run formal simulations => if above assertion fails, then that connection is not proper
o for every connection, we develop one property.
o when we run simulation(generate stimulus), we know what connections are proper and which are not proper o why connectivity checks are important?
o this is the easiest to catch the design bugs
o if we miss out issues here, finding them by writing testcases, debugging, tracing will be a lot of effort.
o connectivity checks makes it very easy to catch these issues
o in SOC, lot of issues will be related to connectivity only
PA RTL simulation
Power aware RTL simulation
Run RTL simulation with UPF files super imposed on the RTL
o RTL with power intent imposed in to
o creating level shifter between VD’s
o creating isolattion cells between PD’s
o this kind of simulation is called as PA RTL
PA GLS
o UPF file(level shifters, Isolation cells and CLock gating logic super imposed on the gate level netlist(PARTL: RTL bheavioral code)
o GLS with UPF super imposed => PAGLS
o PAGLS gives more accurate results compared PARTL
power aware simulation?
simulation run with UPF behavior super imposed on RTL/Gate level netlist
PA Simulation will give us real issues that can be seen in real chip.
GLS
o same RTL TB, with Gate level netlist used instead of RTL => GLS

SESSION#2 (7/MAY/2021)

Questions:

Agenda:
SOC verification
- Module to SOC verification
Setting up SOC TB
SOC features
SOC testplan
o test categories
SOC testcase flow
SOC test case development: things to focus on
o Memories
o Address mapping
o SOC interconnects
o Interrupt mapping

notes:

what is vector simulations? in further sessions
evcd?
SOC is made up of Subsystems and IPs(individual blocks)
o these Subsystems and IPs are already verified at SS level or IP level
o then why do we need to verify them at SOC level?
o this is done to ensure that SS or IP has been properly integrated in to the SOC
what we mean by proper integration?
o during connectivity, RTL integration team sometimes add synchronizers in those paths at the clock domain crossing boundaries. These connevitiy checks target all those asepcts.
o Register access
o Processor write and reads all the registers of SS or IP and checks the data values(write with read value)
o register reset test
o reset the SOC, perform reads to all the register of the SS or IP
o we develop multiple register access tests for various SS or IPs
o TIC(Test interface controller) tests
o This is interface defined as part AMBA(Advacned microcontroller Bus architecture) used by ARM processors
o TIC interface
o In general functional flow(normal applicaitons run on a mobile), it is the processor core which executes the instructions.
o core fetches the instructions and executes them, as per the Program counter flow.
o There are times, where we want to place the Processor core or SOC in a test mode, where an external master will be able to do the transactions intead of processor core
o We need to put SOC(processor core) in test mode
o TIC interface has TCLK, TREQA, TREQB, TBUS[31:0], TACK
o TBUS used as both address and data bus
o TACK is given by processor SS to acknowlege TIController request for bus
o during functional mode, processor core will have grant of the bus.
o during test mode, TIC controller will have grant of the bus.
o what is TIC testcases?
o Checking whether TIC interface is working properly or not.
o TIC is connected with GPIO
o TIC testcases check whether this integration happened properly or not.
o are we able to put SOC in to testmode, where external test controller is able to access the bus. o Interrupt testcases
o targeted towards checking the interrupt mapping
o processor has interrupt bus(63:0 for ex), these interrupts are mapped to various peripherals
o each interrupt can be level triggered or edge triggered
o Interrupt testcases check following
o whether peripheral controller is able to trigger the interrupt?
o whether this interrupt is reaching the processor(Generic interrupt controller-GIC)?
o whether GIC is forwarding this interrupt to the processor?
o is the processor servicing this interrupt(ISR)?
o is the ISR completing as expected
o is the interrupt is getting de-asserted
o what we do in Interrupt testcases?
o we create a scenario by means register programming and peripheral interface transfers, which will ensure that interrupt is triggered by the peripheral controller(ex: USB controller, PCIe controller, SPI master core) o Frequency corners
o SOC data sheets(specificaiton), one document called SOC frequency plan.
o DIfferent frequencies at which various SS or IPs of the SOC can work at.
o based on frequency plan choses, these components can work at different frequncies
o frequncy plan documents, tells what are these values.
o we need different plans to acheive the optimal performance and optimal power consumption.
o Frequency corner testcases, we keep the design in specific frequency plan, check clock frequency of various SS or IPs o Power and voltage domain targeted tests
o Every SOC is divided in to VD, PD and CD
o SOC has a block(SS) called Power manager(PMSS or PMB)
o Responsible for turning on/off a specific VD
o turning off VD, there is no voltage supply to the complte VD
o Responsible for turning on/off a specific PD
o turning off PD, there is no voltage supply to this specific PD
o Responsible for turning on/off a specific CD
o turning off CD, gating the clock
o This turning on and off is acheived by means of Processor programming the Power manager registers.
o ex: DSS_VD_Reg
o by prgoramming this register, we can turn on/off the Display SS VD.
o As part of these testcases, user programs the PMSS registers to keep various domains in ON/OFF state
o as part test checking, we check whether these domains are getting on/off
o whether clocks are getting gated or not
o WHY THESE TESTS ARE IMPORTANT?
o If we don’t run these tests, we are not sure whether PMSS has proper control on individual VD,PD and CD
o Design might have a bug, where we are not able to place a specific SS in to off state
o this will result in lot of power wastage.
o These tests will help us figure out these kind of issues.
o As a verification engineer what should I do?
o I shhould refer to PMSS block specification, understand in which VD, PD and CD my subsystme or IP falls
o figure out what registers in PMSS needs to be configured to control my VD, PD and CD
o I shhould develop testcases, which will turn on /turn off my VD, PD and CD
o i should check that, this on/off is indeed happening. o Tests involving processors for various use cases
o every block or SS has specific usecases
o use case: How that block gets used in real life
o ex: PCIe
o are we able to do enumeration to the PCIe endpoint from root complex
o are we able to generate memory write and memory reads to endpoint?
o are we able to place PCIe link in various speeds?
o are we able make PCIe controller in SOC behave like a EP and external VIP behave like Root complex?
o PCIe has concept of cross link (we can interchange the roles of the connected devices)
o Same PCIe controller IP verification will be a thorough verification where we do all possible register configurations and check the design behavior. o Security tests
o AXI or AHB: Secure and priveleged accesses
o We can enable some secure and priveleged access features for processor accessing various peripheral controller
o whatever is privelges given to my block, is it working as expected o Negative/error tests
o we don’t do these much
o we create an error scenario, check if design behaves in wrong manner as per expectation o Performance tests
o we target SOC for required bandwidth.
o PCIe is supposed to give 10Gbps BW
o we create testcases to check if we are getting this much BW at PCIe interface
Module level verification to SOC level verification
o how they differ?
- 1st difference(TB level difference)
  o how IP verification TB changes when we go to SOC TB
  o at IP verification, processor is replaced with BFM and generator
  o at SOC verification, all processor, interconnect, Interrupt controller, PMSS, all these RTL blocks will be intact
  o they will not be replaced with equivalent TB compoents.
  o if we replace, processor core with BFM and generator, develop testcases
  o even if testcase passes, we don’t have confidence that this will really work, since we didn’t check with the real processor
  o hence at SOC verification, no RTL component is repalced with any TB component
- 2nd difference (in terms of feature and testcases targeted)
  o IP verificaiton is thorough verificaiton, where all the possible design configurations are checked for.
  o SOC verificaiton is only targeted of proper integration checks
- 3rd difference (how testcases are developed)
  o IP verificaiton, testcase is completely developed as SV and UVM code
  o register write is done
  reg_block.ba_mask.write(status, data);
  `uvm_do_with
  tx.randomize() with {};
  o SOC verificaiton, testcase is combination of C code and SV/UVM code
  o Memory controller registers will be programmed by the A7 or A15 processor
  o Processor to do register programming, needs an instruction
  o Processor by itself can’t to any activity, its needs an or multiple instruction to do this programming
  o instruction: assembly code
  ADD
  SUB
  MUL
  LD
  o We develop the required reigster programming as a C testcase
  o ARMCC (ARM compiler)
  o ARMCC compiles this C testcase, generates an equivalent assembly instruction code
  o this instruction code, is further converted in to an Hex (hexa demical image) file
  o this hex file is loaded in to one of the processor accessed memroies(on-chip and off-chip)
  o ARM core fetches these instructions adn executes the instructions.
  o hence register write or read operation will happen
  o Processor related execution code needs to developed in eitehr C language or assembly level code(not prefered)
  o this C code is compiled, loaded in to a memory
  o SUMMARY:
  o SOC testcase developement involves 2 aspects
  o C part of the testcase
  o SV part of the testcase
one combined register block
o practically speaking this concept is not possible
TB components can be divided in to 2 parts
o Stimulus generation
o Checking design outputs
SOC environment(the various files) is more complex comapred to IP level environment
o requires that additional C part of the testcases
o SOC has many sub components
o has many peripheral controllers
o these peripherals requires many VIPs to be integrated
o many sequence libraries
o many interfaces(physical and virtual interface)
o many testcase files
o ex:
IP verification for PCIe controller, may require in total 50 to 100 files
SOC verification for mobile SOC, may require 2000+ files across various directory
o we can’t keep all 2000+ files in same folder
for setting up SOC TB, we need to integrate various VIP at peripheral interfaces
o how to integrate a UVC in to SOC TB.
SOC is targeted to have limited number of ports(helps reduce the size of the SOC)
o how do we reduce number of pins?
o we used GPIO and do PAD muxing, hence same GPIOs can be used for various usecases.
o when we develop a testcase, we need to program these GPIO for the current testcase requirements.
Integrating UVC in to SoC TB
at the coding level, what we need to do for UVC integration?
o ex: PCie VIP from Synopsys, we want to integrate this PCIe UVC/VIP in to our SOC TB, what are the steps involved?
– Figure out what is the agent name of the PCIe UVC/VIP
– include all VIP/UVC files in to compilation file list
– Instantiate this agent in the soc_env class
o build the agent
o connect the monitor of PCIe UVC to the SOC level scoreboard
– UVC will provide PCIe_interface definition
o instantiate interface in to top module or a file which is meant to have all the physical inteface instantiations
o this file is included in to the top module
o doing this to reduce the numebr of lines in top module
– as per testcase requirements, do teh connection of PCIe ports with design ports
– develop SV test files which use the sequences provided by the UVC/VIP
– VIP/UVC will have configuration file
o Denali models
o we need to do the required configuration, so the PCIe VIP will behave as per our requirements
o what we mean by configuration file?
o PCIE_VIP Behave like a End point
o PCIE_VIP to have x4 link
o PCIE_VIP to support 8Gbps
SOC testplan
o testplan is developed targeting a specific SS or IP
o whole SOC won’t have one testplan, it will have multiple testplans one for each SS or IP
o XLS which lists all the feature, scenario and testcases targeting these scenario, detailed flow of the test, track test status, debug analysis
o How to list SOC features?
o connectivity
o Register access
o TIC(Test interface controller) tests
o Interrupt
o Frequency corners
o Power and voltage domain targeted tests
o Clock gating tests
o use cases testcases
o Data flow
o Security tests
o Negative/error tests
o Performance tests o SOC verification happens in multiple stages
o 3 stages
o 1st stage
o connectivity, register access, interrupt, TIC tests are passing.
o 2nd stage
o functional tests(use cases tests)
o frequency corner
o power and voltage domain tests
o security tests
o 3rd stage
o error tests
o performance tests
o coverage closure
o functional coverage
o toggle coverage
o we don’t focus on other code coverage aspects
o SOC verification progress is tracked using reviews
o HLVR
o High level verification review
o top level plan for SS or IP verification is discussed
o what features will be verified
o verifiation plan
o tool version, VIP versions
o MLVR
o Mid level verification review
o Have we targeted all features
o basic coverage status
o testcase and TB review
o code review
o LLVR
o Low level verification review
o final regression status
o coverage closure
o exclusion file reviews
o force file reviews
How test case flow works in SOC?
o SOC TB is same as IP TB with below differences
o BFM, Generator will not be there, Processor repalces their role
o Testcase is implemented as both C and SV files
o Both SOC and Ip TB starts with user calling run_test()
o essentially SOC TB is an UVM TB
o whatever testcase we are running, various phases of TB will execute.
o
every SOC has Clock and reset controller
o TB will give reset to reset controller block
o
processor execution insturctions in to 2 aspects
o Booting the processor (BOOT ROM)
o Processor should do some basic configuraiton of the chip, before the chip can be functional.
o As soon as boot is done, instruction execution control moves to the new memory location.
o Running the actual application (Done from any RAM or FLash or external memory, ex: DDR)
o it is the C testcase file that user has developed
in complex peripheral testcases, peripheral component is repalced with an VIP/UVC
o as part of testcases, this VIP or slave model is supposed to give some response txs
o UVC /VIP may require user to execute some sequences
pcie_mem_data_seq.start(top.agent_block.pcie_agent.sqr); //called as part of SV testcase
processor-peripheral handshaking can we do below code without processor booting up?
pcie_mem_data_seq.start(top.agent_block.pcie_agent.sqr); //called as part of SV testcase
– it is not possible
– why?
o processor has not booted up
o basic SOC clocks are not running
o it hasn’t done any register programming
o PCIe controller register programming is not done
o PCIe controller is not a position to respond to the PCIe TLP(packets/txs) coming from PCIe slave model(VIP)
o testcase will not work as expected
– what is the solution?
o when should we run above sequence?
o We should wait for processor to boot up, processor to configure the power management block to power up teh PCIe speciifc VD, PD, and CD.
o We should also wait for processor to configure the PCIe controller registers as per the testcase requirements.
o Only then, PCIe VIP(slave model) should start the above sequence(by this point PCIe controller has got ready to accept the TLP packets)
o This requires a handshaking behavior, where we make SV part of the testcase to wait for processor to complete register programming to a point where PCIe controller is ready. o There is a need for handshaking between processor and PCIe TB component SV code?
o while (gpio_14 != 1) begin
@(posedge clk);
end
RAM locations used for processor and peripheral handshaking
o locations: 32’h1200_0000 to 32’h1200_0100
o Processor: REG_OUT32(0x12000000, 0x5555FFFF);
o SV part of testcase, will poll 32’h1200_000 for 32’h5555FFFF
keep SOC in TIC mode
`tic_mode(); //processor enter in to TIC mode
data = tic_read(32’h1200_000);
while (data != 32’h5555FFFF) begin
data = tic_read(32’h1200_000);
end
//when while loop completes, it indicates the processor to peripheral handshaking is completed o Same way, peripheral to processor handshaking will also happen SV: tic_write(32'h1200_0010, 32'hAAAA_BBBB); o processor polls above 0x12000000 locations for 0xAAAABBBB value o whenever this data comes, handshaking is completed o Processor can proceed further now.
Coding of TB components and testcase files
SOC has concept of mode pins(specific to one product company, may not apply to all otehr SOC)
mode_0, mode_1
0 0 => TIC mode
0 1 => functional mode
1 0 => ATPG mode
1 1 => reserved
DMA testcase: functionality
o copying data from one memory location to another memory location using DMA channels
DATA_RAM -> CODE_RAM
o checking: whether this data from source memory location has moved to destination memory location properly => test pass

OUT32 is an API call to write in to a memory location.

if C testcase called test_fail()
log file will report as ‘native portion of test failed’
MY_OUT32(addr, wdata); used for performing a processor write operation to the specified address location.
rdata = MY_IN32(addr); used for performing a processor read operation to the specified address location.
armcc, armasm, armlinker
Boot can be 2 stages
o primary boot from Boot ROM
o 2nd boot from other memory (RAM, DDR, Flash) => main application

SESSION#3

Agenda:

Boot sequence
SOC reset
SOC clocks
Memories
Address mapping
Interconnects
Interrupt mapping
Testcase debug
Typical testcase issues
Important debug points
WLAN introduction

SESSION#4

revision:

How to develop SOC testcases
o C and SV files
o how they interface
how to debug SOC testcases
o 70% of time is spent on developing and debugging testcases
SOC testrun outputs
o transcript (session.log)
o Processor instruction trace file
o tarmac.log
o how the Processor executed various instructions throygh out the simulation from time=0 to end of simulation
o list files
o PERL scripts
o waveform => .lst file
o dump all the details from waveform
how this script is relevant to SOC verification
SOC testcases involves 100’s or sometime 1000’s of write/read from the processor
processor: AXI or AHB
solution:
o open the waveform
o dump all waveform to a list file
o run a script which will tell what all txs have happened
o compare this output log file with our expected flow
SoC Architecture, understanding transaction matrix
- SOC has various master and slave components connected using AXI or AHB or APB interconnects
- transaction matrix tells which masters can access which slaves(it also tells which can’t access)
  o SOC verificaiton engineer shouldn’t target such scenarios
  o Application developer using this hardware should also be aware of this limiation.
processor boot, SCF file
o which is the boot processor in the SOC
o SOC can have multiple processors in multiple subsystems
o among all these processors only one processor will be the boot processor
o once it boots up, it program required registers to boot up other processors as required o ex:
o we are develiping a testcase for GPU verification
o GPU can’t be booted at time=0
o first we need to boot up Cortex-A7 core
o Cortex-A7 will program the required registers in SOC, to boot up GPU SS
o programming PMU to power up GPUSS PD, VD
o programming Clock and reset manager to generate the required clocks for SS
o applying the reset to the GPU SS
o then GPU core starts booting up
o once it is booted up, it has its own PC(program counter), which will start executing, hence GPU application runs
ex: GPU application: Playing a video game
o SCF : Scatter file(specific to ARM)
o when a processor executes any instrucitons: memory is divided in to parts
o memmory to executable instrucitons
o memory to store the data variables
a = b + c;
o scatter file indicates to the processor what is the location of these RO and RW spaces in RAM memory.
o scatter file is an input to the processor
o as a verificaiton engineer, why should I know this?
o We should ensure we don’t use the locaitons allocarted in to scatter file for any other purpose.
interconnects
o verificaiton engineer should be aware transaction flow path.
o user should dump only these hiearchies in to the waveform
o ex: I2C test
o Cortex M4 processor interface(AHB)
o AHB interconnect M1 interface
o AHB interconnect S5 interface
o AHB interface ports connecting to I2C
o I2C master interface
o I2C interface (which connects to I2C slave)
o in IP verificaiton
o add wave sim:/tb/* => should not be done in SOC
DDR initialization
o DDR should be initialized properly(all its mode registers should be configured properly)
o DDR has reached a state where we can do write/reads => we should know the signal which indicates this status.
ddr_initialization(); //c librayr function called in C part of the testcase
TIC interface
external component can access SOC registers through this interface
Clock domains
o for a given test what all blocks/SSs are required
o check whether all the clocks going tp thse blocks/SSs have their clock enabled?
Different clock mode
o SOC can work on various frequencies
XO mode, turbo mode
CDC
o when our test involves, txs going from one clock domain to other clock domain
o there is metastability at CDC points.
o desinger would use multi stage synchronizers
MMU, Physical address, virtual address
MMU : Memory management unit
SOC address mapping can be viewed in two mapping styles
o physical address mapping
o virtual address mapping
MMU is a look up tables which does PA<–>VA mapping
o Cortex-M series processors won’t need MMU
During the complex testcase development, we should know whether the MMU is initiazed properly(role of boot processor during boot, MMU load happens)
WLAN based SOC
o data sheet
o features, test plan
o TB architecture
data sheet
PRCM : Power reset clock manager
o design has approx 1000 registers across various sub systems
WLAN SOC architecture
o Cortex M4 based ARM SS
o Digital components
o AHB interconnect
o Cortex M4 is connected through M0, M1, M2 interfaces
o Components for clock generation of 32Mhz, 32Khz, PLL
o DMA controller
o USB controller
o AES encryption
o Various types of memoryes
o ROM
o Code RAM and Data RAM
o Flash controller
o AHB2APB brdige to connect with APB interconnect’s
o 2 APB interconnect
o APB0
o I2C, QSPI, SSP, I2S, ADC, DAC, ACOMP, UART, GPIO, timer, PWM
o APB1
o PINMUX, UART, SSP, I2S, WDT, I2C, PMU, RTC, 4k SRAM
o I2C, QSPI, SSP, I2S, ADC, DAC, ACOMP, UART, GPIO, timer, PWM
o Analog components
o ADC, DAC, ACOMP
o WLAN SS
o Feroceon CPU
o SRAM/ROM, JTAG
o Voltage gneeration logic
o 802.11 standard MAC, Base band
o Direct conversion WLAN RF
o Security / encryption block
Mobile SOC,
o 3000+ testcases
APplications
o most of them are non-portable applications
o mobile phone, laptop: portable applications, which works on battery power
o power management is not very significant
o we don;t need very high perfromance cortex M33 = cortex M4 + Security features
GPIO
o mapping one of the external singals mapped to PMU
o When GPIO in input mode => that signal goes to PMU
How IRQ is used for wakeup?

in all power modes, PMU is involved
o GPIO can be mapped to one of the PMU input signals
o How IRQ can be mapped?
o RTC : time counter, which generates a signal to PMU, which in turn wakes up system.

RTC (clock)
in AON domain
clock and control interface
o XTAL_IN (32MHz)
o XTAL32K_IN (32.7KHz)
o WAKEUP_0, WAKEUP_1
o AUDIO_CLK
o RESETn
configuraiton pins
CON[5], CON[4]
processor can boot from
o UART
o USB
o Flash
All teh components in SOC are divided in to 2 categroies
o AON components
o PMU, RTC
o other components
SOC VD 3 categories
1.1V
1.8V
2.5V
3.3V once all votlage are distribute(driven) to the various blocks of the SOC, then we are applying reset to the all the internal SOC blocks
SOC can work in 6 power modes
o active (0)
o idle (1)
o standby (2)
o sleep (3)
o deep sleep (4)
o shut down

WLAN doesnt support Idle, Sleep
SOC has 16 power modes totally

state retention
o when we are putting a specific SS or component in to low power state(sleep, deep sleep), we would be cutting of the power supply. All its register will lose the state(values). When we power up the system again, processor needs to spend more time to reconfigure the design to bring to current state of operation(where it was when we moved to sleep mode), that can take time.
o SOC provides concept of state retention, we capture all important register values, state values, put those values in to memory(memory continues to have power supply). Due to this when we wake up the system, we can reload these rgisters and states from the memory. System will come back to the same state without processoror reconfiguration.
WFI
Wait for interrupt => we are placing processor in to low power mode. This ocncept uses an interrupt to wake up the processor.
o any peripheral if requires processor attention, will generate an interrupt. Processor which is in low power mode, will wake up on seeing interrupt(WFI)

29.
PM3 mode
o all the power supplies except for AON and SRAM are turned off.
o only RTC is running
how to put system in to PM3 state:
program PMU.PWR_MODE.pwr_mode = 2’b10 ==> indication to PMU that system should be put in PM3 state
o it immediately cuts off all votlage supplies, except AON
o it gates all clocks, except RTC
o it enables SRAM, 192KB SRAM retention

Wakeup from PM2, Pm3, PM4 can happen through below sources
o RTC, WAKE_UP0, WAKE_UP1, ULP_COMP
clock controller
o SFLL uses 38.4 MHz clock input and generates a 200MHz clock source
In real life chip(M4) may be booted from all these sources.
o as a verification engineer we need to develop multiple tests, with each booting from different memory sources.
We want to create a testcase, wehre M4 boots from UART
o GPIO_16, GPIO_27 : 00 test flow:
o power up seqeunce
o apply the voltage
o wait for voltage
o applt the POR
o wait for reset to be completed
o PRCM by default applies reset to the processor
o processor by default boots from BOOT ROM
o Then it checks Boot_pin0 and Boot_pin1
o 2’b00 => Need to boot from UART
o Basic boot from Boot ROM would involve programming DMA controller or series of series form UART to SRAM
o UART to SRAM transfer happens through UART interface
o processor will start booting from predefined location of SRAM
o UART : Asynchornous means there is clock synchronized data(no clock going from tx to reciver)
o UART internally has a concept of baud rate, at which it starts capturing the incoming data.

SESSION#5

Flash controller
o Flash differs from DDR or SRAM?
o Flash is either NAND based or NOR based
o Flash is non-volative memory
o when we power off, we don’t lose the content
o Flash memories are relatively quick with access
o Flash memories are used for booting purpose
o since they are non-volative, they can hold the boot image
o they can be loaded from external peripherals, from where the processor will boot.
o Flash controller?
o connects to the SOC through QSPI (Q: 4, 4 data pins)
o Flash access works on a concept of commands
flash controller programming (C testcase)
//Programming for putting flash controller QPSI read mode
reg_field_out32(fccr, cache_en, 0x0);
reg_field_out32(fccr, cmd_type, 0xC);
rdata = reg_in32(fcsr);
while (rdata && 0x1 == 0x1) begin
rdata = reg_in32(fcsr);
end
reg_out32(fcsr, 0x1); //this clear the above field from 1->0
reg_field_out32(fccr, flash_pad_en, 0x0);
DMA controller based testcases
o C testcase
o scenario:
#define DMA_SADR2 0x4400_0160
o using CH2 to transfer data from Flash to SRAM
test () {
uint32 data1, data2, wdata;
//To access a flash memroy, we need to flash controller configuration
//do teh correspodning flash controller register pgroamming
//since we are accessing DMA Controller, we need to program CLock and reset controller for DMA clock generation
reg_field_out32(peri_clk_en, dma_clk_en, 0x1);
reg_field_out32(flash_system_en, flash_clk_en, 0x1); //DMA controller related programming reg_out32(`DMA_SADR2, 0x1F000000); //Source address reg_out32(`DMA_DADR2, 0x00100000); //Destination address wdata[12:0] = 1024; //1KB transfer wdata[14:13] = 2; wdata[14:13] = 2; //INCR8 wdata[16:15] = 0; wdata[29] = 1; wdata[30] = 1; reg_out32(`DMA_CTRLA2, wdata); //reg_out32(`DMA_CTRLB2, wdata); not required for this test reg_out32(`DMA_CHL2_EN, 0x80000000); //poll for the interrupt for tfr completion rdata = reg_in32(status_tfrint); while (rdata && 0x4 == 0x4) begin //we can also poll DMA_CHL2_EN.en is should become 0 rdata = reg_in32(status_tfrint); end //when we come out of above while loop, it means that tfr on CH2 is completed reg_out32(status_tfrint, 0x4); reg_out32(`DMA_CHL2_STOP, 0x80000000);//data integrity checks for (i = 0; i < 256; i++) begin data1 = in32(0x1F000000+4*i); data2 = in32(0x00100000+4*i); if (data1 != data2) begin test_fail(); end end test_pass();} //Below ISR is mapped to the UART IRQ //below code gets executed anytime, UART interrupt is generated => that mapping is done as part of our C testcase developed uart_irq_isr() { uint32 int_num; rdata = in_rd32(uart_intr_src_reg); if (rdata && 0x1 == 0x1) { //0th interrupt it true //tx has completed out_reg32(uart_intr_clear_reg, 0x1); } if (rdata && 0x2 == 0x2) { //1st interrupt it true => FIFO empty read error has happened //Spec will tell how to handle read error scnearios } if (rdata && 0x4 == 0x4) { //2nd interrupt it true => FIFO full write error has happened //Spec will tell how to handle write error scnearios } }
ANy chip has concept of time(delay) only due to the clock.
o if we are working 1KB clock, TP=1ms => Each clock edge amounts to 1ms
o if we need to generate a pulse once every 2 seconds => we need to count 2000 clock edges, whenever we reach this count, we generate a pulse of required duraiton.
o component which does this kind of couting is called as ‘GPT’
WLAN
o we are transmititng the data in serial manner => CRC
Peripherals together are implmeneted as one subsystem
- LSPSS
  o Low Speed Peripheral SS
  o ex: UART, I2C, QSPI
  o LSPSS : Also requires(optional) an internal DMA
  o If there is no internal DMA, NOC connected DMA controller will do teh peripheral transfers.
UART (Universal Asynchronous Receiver Transmitter)

baud rate
o indication of bits transmitted per second
Both transmitting UART component and receiving UART component are configured to work on same baud rate.
o transmitting block will trnasmit the bits as per above baud rate
o receiving block receives the bits as per configured baud rate //uart_transmit_test.c
test() {
//GPIO
//enable the clock to GPIO
//enable the clock to DMA
//enable the clock to DDR
//program LCR
wdata[1:0] = 0x3;
wdata[2] = 0x1;
wdata[3] = 0x1;
out_reg32(uart_lcr, wdata); }

SESSION#6

RISC instructions

11 categories
o Load and store instrucitons
LDR
SDR
o these are the only two instrucitons which can directly access memory locaitons.
CISC
o 0x1000, 0x1020
add 0x1000 0x1020 => CISC, which is not possible RISC
o this instruction gets converted to opcode.
o this will require 32 bit for opcode storage.
o opcode is a unique 32 bit binary number, which exactly(1-1) maps to the instruction we want to perform.
ex; 0x32832800
o One isntruciton is sufficeinet
o Same thing in RISC would require atleast 4 insturctions
LDR X0, 0x1000
LDR X1, 0x1020
ADD X0, X0, X1
STR X0, 0x1000 //stroing in to 0x1000 location
RISC applicaitons involve higher memory density.
o this will require 128 bits for opcode storage.
CPUSS clusters
o group of processors
o generally required only in A series processors
o M series processors don’t require clusters.
o ex: Washing machine
big little
fetch, decode, execute, memory operations, register write backs
ARM processors are developed based on the common architecture.
o ARM might have released 100’s of processors
o Ex: cortex-M3, cortex-A15, cortex-A57, etc
o All these processors are developed based on a common architecture
o ARM V1
o ARM V2
o ARM V3
o ARM V4
o ARM V5 => assume it was only supporting 32 bit instructions(instruction opcode size can be 32 bit only)
o Mmeory was very costly
o being RISC architecture, each command we want to do may require more instructions to do.
o ARM V6
o ARM V7
o Thumb2 instructions are based on 16 bit size instruction(opcode
o resulting in half the size of opcode.
o later memory became cheaper =>
o ARM V8A
Same ARM v8 architecture can work in both modes
AArch32
AArch64
ARMV7 architecture
15 GPRs (R0 to R14) ARMv8
31 GPRs (X0 to X30 or W0 to W30)
~(1<<4) = ~10000 = 01111
AND X0, X0, 01111
make the 4th bit to 63rd bit of X0 value to ‘0’
keep only 0-3 bits positions, other positions of X0 will be 0

SESSION#7

agenda:

Interrupt controller
GIC
Cache and Cache coherency
Multi core processors – Cache coherency
MMU
CPUSS testplan
Processor interfaces
ARM compiler and linker
CPUSS testbench coding
CPUSS testcase coding

Notes:

Every SOC has interrupt controller
Cortex A series:
o GIC
o Present outside Processor core
Cortex M series:
o NVIC
o Vector interrupt controller
o Part of M core
Interrupt handling – registers involved
o interrupt status register
o interrupt source register
o interrupt enable register
o
o interrupt clear register

o peripheral component can generate interrupt for many reasons
o USB
o device is connected – interrupt needs to be generated
o device is disconnected – interrupt needs to be generated
o Transfer is completed – interrupt needs to be generated
o We are attemping wrong transfer – interrupt needs to be generated
o CRC error happened in data packet – interrupt needs to be generated
o 10 more reasons why interrupt can be generated
o we cna;t keep 15 interrupt lines just for USB controller along
o usb_int
o generated to the processor – goes to INTC
o INTC indicates to the processor about usb_int
o processor performs a read to USB_INT_SOURCE/STATUS register
o processor gets the data
o processor gets to know why the interrupt is generated by the USB
o processor further makes the decision, what to do in eeach category
o that where interrupt handling will involve lot of brarnches
o processor state can be indicated using 2 variables
o PC
o which instruction is processor executing currently
o PSTATE
o prcessor state

Why Interrupt concept is important for SOC verification engineer?
o every functional test will have interrupt generated
80% of testcases we work in SOC, will have interrupt being generated
many debugs end up with interrupt related issues
o interrupt not generated
o interrupt not services
o interrupt serivce routine is stuck
o interrupt is mapped to a wrong interrupt service routine
GIC 2 two sub compoennts
o Distributor
o CPU interface
WHole CPUSS has only one GIC
CPUSS : can have up to 8 cores(processor)
GIC responsbility to figure out to which core the interrupt should be given to.
– generally same interrupt is not given to all the cores.
GIC has registers
o GIC registers can be programmed by the processor to control GIC behavior
o registers
o Global enable register
o IAR, EOI
ARM Exceptions
o Reset
o Abort
o Interrupt
o Secure world
o Virtual OS
GIC_enable();
C program function, which is part of every testcase.
Since it is required for every testcase, we don’t call in the user developed C file.
o it is part of initial Processor bringup sequence, that goes in to every testcase.
if 320 interrupts are coming to the GIC
o how does it store them?
o 1 register is 32 bits
320 bits(each interrupt status) : 10 registers
4 categories of interrupts in ARM based systems
o PPI
o SPI
o SGI
o MSI
PCIe
o PCIe has 256832 functions can be connected
256 : buses
each bus can have 8 devices
each deveice can have 32 functions
o each function can generate an interrupt
o if we end up mapping one physical line for each interrupt, we will require how many physical interrupt lines?
o 256832 lines => which is not feasible
o how to manage interrupt from various functions?
o Memory range is allocated
o one location for each PCIe funciton
o whenever the PCIe funciton wants to generate an interrupt, it actually performs Memroy_Write(MSI) to that specific location.
o PCIe Root complex keeps polling these locations, it figures out who speciifcally generated interupt. o same concept is applicable for MSI in ARM GIC
o When some otehr SS wants to generate Interrupt to the ARM CPUSS
o that SS can perform AHB write transaction in to MSI register present in GIC block
o GIC sees that someone pefromed write to it,
o Based on write data, it knows who has generated the interrupt
How does ARM processor views the memory?
o 2 types
o Device memory
o memory need not be cached
o we only read what is exactly required
o Normal memory
o memory meant for cacheble locations
o it can involve speculative access
o we may read more than what we need.
o normal memory gives better performance due to speculative accesses.
SOC fundamentals
CPUSS : ARM Cores
o L1 cache for each core (closest to the ARM core)
o L2 cache for all cores together (little further than L1)
o DDR (more furteher than L2)
o Various other types of memories(USB, PCIe devices, Hard disk, Flash) (Far)
All teh information in SOC operation in to two categories
o application
o Essentially the ARM instructions we are talking about
o Data (processed by the application)
o Watching some video
o Video player is the application
o Video is the data
o SOC operation requires applications to run faster
o if the application is present in L1 cache => it will run very quickly
so on L2, L3, DDR, Other memories
o DDR will be called as ‘system memory’
o Generally caches(L1, L2) will be small, they can’t hold complete application data
o What is typical DDR size?
o 8GB RAM
o this can hold various applications
o ex: video player, Chrome
o All software in laptop are installed?
o are they installed in DDR?
o No, DDR is volatile memory, it loses the content once power is turned off.
o when we open laptop and invoke video player
o video player applicaiton(series of hexadecimcal codes, each is mnemonic for one ARM instruction) is installed in C drive
o By means of Operating system, we open video player application
o SOC has a DMA controller which moves this application image in to DDR memory
o Processor knows which location it has been moved
o Processor is the one which has asked DMA to do that transfer.
o processor starts fetching instrucitons from the DDR
o processor has something called as Cacheble addresses
o WHole DDR memory can’t be cached.
o DDR itself can contain both the types of information
o application
o data to be processed by application
o as a practice, only application related locaitons are only cached.
o data don’t need to cached, since we are not going to access the data repeatedly.
o we will be calling various options of the application repeatedly.
o Summary
o application data will be present in DDR
o some part of the application, which processor feels that it might be accessed regularly, puts that part of code in to L1 and L2 cache.
o Cache guideliens:
o cache uses concept of spatial locality and temporal locality
o temporal locality: the location that is being accessed frequently needs to be part of the cache
o spatial locality: if a specific location is accessed, there is a good chance that following locations might also be accessed
o cache gets that data of following locaitons also, even if application is not asking that locations data now.
o All these things are managed by a componetn called as ‘Memory management unit’
o MMU is present as part of CPUSS(not outside)
o It is required for every processor instruction access
o all processor instruction goes through MMU
o MMU decides whether the locaitons is cached locaiton or not
o what type of access is it? application or data
o When we want to run more applications, we close some applications
o DMA will move the new applications and earlier applications are flushed out. o BASELINE:
o CPU only sees 3 things
o L1, L2, DDR, Flash, Controller registers
o CPU doesn’t directly interface with peripheral memories.
o CPU want to fetch any information, it has to be from either
o L1, L2, DDR, On chip RAM or Flash
o memories in SOC are two types
o on-chip memory
o Boot ROM, Flash, On chip RAM
o off-chip memroy
o DDR
When we run a testcase
o ARM compiler and linker gets called by default
o C code -> ASM code -> Hex image
o Verificaiton engineer will do back door loaidng(only for simulation) of this image in to eitehr
o DDR
o Flash
o CODE RAM
o Verificaiton engineer will set this details as part of testcase defintion
o C hex Image is loaded in to DDR/Flash/Code RAM
o processor first starting from BOOT ROM
o once done, processor starts fetching instructions from one of the above memory locations
o C hex image => We are calling as application
o in real life
o we connect a bootable CD in to the laptop
o SOC has a DMA controller, which will move this image from CD to Flash
o application is now moved to Flash
o Processor fetches instructions from Flash and boots up
o How does DMA controller knows that it has to move, since processor has not yet booted up?
o DMA controller will have some default settings, which work without even being programmed, hence it automatically.
o It might have been taken care of in basic boot image in the BOOT ROM.
o in the simulation, we could have just backdoor loaded the image to Flash memory heirarchy
o real chip, some component has to move this data, that will be DMA controller. ==> Frotn door access.
speculative access
speculation?
o Reliacen: 2250
o 100 share assuming it will go to 2500 in a week time
o this is called speculation.
non-speculation
o post office bonds
o fixed : 7% interest
all controller registers must be mapped device type of memory only
ARM CPUSS has L2 memory system
o L2 Memory subsystem has SCU (SNoop control Unit)
o it takes care of cache coherency between L1 and L2 and DDR
o it uses concept of Dirty bit(location which are modified in cache, but not updated in system memory)
o MOESI
CPUSS components
o Multiple cores
o Timer
o Interrupt
o Core
o Trace
o Debug
o L2 memory system
o core govenor
Why CPUSS requires multiple clock inputs from CLock controller?
o CPUSS internally has a Clock controller in it.
o clock controller can ensure the same clock signal frequency can be varied based on the application requirements.
o DVFS : Switch teh frequency dynamically
o We don’t want our SOC level clock controller to do this work, it increases the latency.
o Hence Clock controller inside CPUSS, will help reduce this latency
o Clock controller has atleast 2 PLL’s which ensure that we can generates various clock frequencies as required by CPUSS.
frequency plan
o concept of running compoentns are different frequencies based on what is the specific plan
LOW
MEDIUM
HIGH
TURBO o Core requires one clock
o L2 requires anoterh clock
CPUSS verification engineer needs to execrise all these thigns as part of verification