DMA-CNTLR-NOTES-MAY2021

SESSION#1

Overview:

DMA Controller: Direct Memory Access Controller

– Duration: 5 weeks

whatever we do on electronic device, is all about data transfers.
o talking over phone
o playing a game
o sending an sms
o opening a chrome browser
what is data trnasfers?
o source and destination
who is involved?
who provides? who consumes?
– 2 possible components
o memory
o DDR, Flash, SRAM, ROM, SDCARD, MMC, Harddisk
o peripheral
o USB, Ethernet, KBD, PCIe, Mouse, HDMI
4 types of transfers:
o memory to memory
ex: Transfer data from DDR to SRAM
o memory to peripheral
ex: Harddisk to USB
There is movie file present in Hard disk, we want to transfer it to USB
o peripheral to memory
ex: USB to DDR
o peripheral to peripheral
ex: Data coming one peripheral, we want to drive it to abother peripheral
There is data coming from Ethernet port, store that data to pendrive.
My chip will perform better, if we can do data transfers efficiently
o more data get transfered in less time
o data transfer should happen with minimal processor intervention
o if processor is involved, it will get busy with data transfer, it won’t be able service other requests on the chip.
o to address this, processor(main master of SOC) delegates this transfer work to otehr component(DMA controller), which will do the transfer work.
Analogy:
Bank (System)
Manager (master of the system)
clerks
accountant
Manager(processor) will give the work clerk(DMA controller)
Manager will give all instructions on how to do this work.
clerk will complete the work
once done with the work, he will inform Manager that work is done
Manager will assign a new work(if there is some work) to the clerk, this goes on.
Same is relation between Processor and DMA controller
o ideally speaking, it is processor which is supposed to read from Hard disk, get the data, write this data to USB port.
– this transfers lets takes 2 minutes(in CHIP, 2 minutes is very big time)
– during 2 minutes processor can’t address any other request
o other requests will get queued up.
o SOlution to this problem:
o wherever there is need for such data transfers on the chip, chip architect introduces DMA controller.
o processor informs DMA controler(in chip terminology, configures the DMA controller => programming the DMA controller registers), indicating what processor wants DMA controller to do.
o DMA controller to read from hard disk, write the data to USB, it also need to tell ‘how much data to transfer’
o DMA controller will have source_addr_register, dest_addr_register, data_size, some otehr variables => these will be programmed by the processor to configure the DMA controller
o then DMA controller starts the transfer
o once the whole transfer is done, it indicates to processor that I am done with transfer.
o DMA controller uses a concept of ‘channel’ to do this transfer
DMA controller complex aspects:
o chip many times requires many parallel data transfers
o DMA controller requires multiple channels
how many? depdends up on the requirement
o DMA transfer behavior is different for memory and different for peripheral
o DMA needs to have logic for both of them
o when concurrent transfers are happening, it needs to ensure that all these transers are happening properly to the right locations.
SOC project
almost 70% of SOC testcases will have DMA involved in their flow.
register programming
APB : Master interface
processor acts as a master
DMA acts as a slave
For data transfers, DMA is supposed to perform reads and perform writes
o write/reads can be to either memories or peripherals
o this is implemented using AXI interface
o DMA acts as a master
o Memory/peripherals will act as slaves
For developing testbench, how the diagram looks like
30 keywords discussed so far
o DMA
o Channel
o Source address
o destination address
o transfer size
o data transfers types
o data storage elements
o peripheral
o memories
o master interface
o slave interface
o why APB is used for programming register
o Why AXI is used for data transfers?
o how processor configures the DMA controller
o DMA controller registers
o ex: SPI controller registers(addr_regA, data_regA, control_reg)
MOSI, MISO : are SPi interface ports, they are not registers
Interrupt controller registers(priority_regA)
o Design configuration
o How processor interfaces with DMA controller
o through register programming
o DMA interrupt generation
Most important thing in the project: Design specification
o Design specification is a document provided by the design team
o this document tells everything about how design is supposed to work
Dual Core design
o It can have 2 cores: Core0, Core1 Configurable build and optional features
o DMA controller IP(RTL files) can be generated for different client requirements
o number of cores
o numebr of channels
o size of transfers Clock divider for slow channels
o core1 is meant for peripheral transfers
o these peripherals refer to slower peripherals
o we are okay with lower perfroamnce, at the benefit of power saving
o hence we have option to provide low frequency clock Block transfer in a frame context
o ex: movies consists of frames
each frame is like one image
when we do data transfers, insteead of saying transfer 100 bytes, we want to say, transfer frame by frame
our DMA controller, support this frame transfer concept. Three operation modes:
DMA controller can be put in 3 operation modes
– independent
– outstanding
– joint
o By keeping DMA in these modes, we can prioritize what is importnat for us
o Performance
o what is size of transfers
o Importantly: user can choose to keep DMA controller in above 3 modes to get the optimal performance out of DMA controller. Three level priority arbitration
o DMA controller has cores, each having multiple channels
o if we configure DMA controller for doing multiple concurrent transfers
o we may to give different priorities to different transfers(to different channels)
CH0 : High
CH1 : Normal
CH2 : Top
CH3 : Normal
CH4 : High
Ch5 : top
CH6 : High
Ch7 : top
o benefit:
o SOme application you want run on high pririty => You configure that channel with Top priority => hence that specific channel data transfers will happen with top most priroity => my electronic device will behave as per my expectation. Windowed channel arbitration (tokens)
o Tokens
o Channels can be given with tokens
o we can allot AXI interface to these channels based on tokens alloted to them. Configurable interrupt controller with multiple processor support
o we can map which interrupt is mapped to which processor
o since DMA has multiple interrupt lines, we can connect them to multiple processors Supports any address alignment
o in case aligned transfers, we are supposed to generate address which are multiple of eitehr 4 or 8 or 16
ex: address must be always multiple of 8(it is alinged 64 bit data boundary)
addr lower 3 bits must be ‘0’
o DMA controller does not have any such restrictions
o AXI read and AXI writes can happen to any address locations wihtout any alignmnet restrictions.
araddr = 32’h12536711; Supports any buffer size alignment
o buffer size need not be any number multiple.
o we can do transfers of any size within maximum limitation Supports command lists, including block lists
o one DMA command can do one set of Source address reading and one set of destnation writing
ex: read 128 bytes from a memory location, write these 128 bytes to another memory location
o command lists?
o linked list of DMA command
o it helps us to back to back DMA command
o where we are reading multiple location and writing multiple locations using these multiple DMA command
o multiple DMA command => DMA command lists
o Block lists can also be implemented using DMA Peripheral flow control, including peripheral block transfer
o peripheral are different from memory
o memory is completely passive
o memory is a pure slave
o I can just perform read or write that memory without any other indication(no need of permission from memory).
o peripheral is active
o we can do peripheral transfers only if peripheral is ready for the transfer
o for ex: if peripheral didn’t receive data from slave device(ex: KBD controller, KBD panel, I didn’t press any button on KBD, KBD controller won’t get any data), in this case KBD controller(peripheral) is not ready to do the DMA command completion
o Hence in case of peripheral transfer, DMA controller requires an indication(req) from peripheral that I am ready for the transfer(which is not required in case of memories) peripheral block transfer
o DMA supports even block transfer for peripherals also Peripheral to peripheral transfer
o DMA supports data transfer from peripheral to peripheral Scheduled transfers
o DMA has provision for schedinling transfers for a future time.(DOUBT??) Endianness byte swapping
o little
o memory access where lower bytes of data bus goes to lower address location of the memory
o addr = 32’h16; data = 32’h12345678;
78 => stored to 32’h16
56 => stored to 32’h17
34 => stored to 32’h18
12 => stored to 32’h19
o big
o memory access where lower bytes of data bus goes to higher address location of the memory
o addr = 32’h16; data = 32’h12345678;
12 => stored to 32’h16
34 => stored to 32’h17
56 => stored to 32’h18
78 => stored to 32’h19
o one memory may support little endian, other memory (of present DMA command) may support big endian
o it requires our DMA to this swapping, before it write to the destnation memory
o in above example, DMA controller when it gets data of 32’h12345678
o before it writes to big endian memory, it should swap the bytes (lower to upper)
wdata = 32’h78563412; //byte swapping Software control peripheral request
o Software control => processor programming the design registers
o peripheral request being controlled by means of register programming => study sepc to understand in depth Watchdog timer
o if design is stuck/has no actiivty, watchdog timer can generate a reset to reser overall system Channel pause and resume
o option where we can pause a channel for some duration, during which that specific channel won’t process any DMA commands
channels are the one which processes the DMA commands APB3 registers
o APB3 protocol is used to program the registers Complete status register set for debug
o design has status registers, which indicate various aspects of DMA controller status
o why these are important?
o RTL level, open RTL code, analyze where things are wrong
o once the chip is manufactured
o how to debug DMA controller internal issues
o DMA controller provides status registers, by reading these registers, we get informaiton about various aspects of DMA controller behavior
which makes debug easier.

Q: how is multiple of 8 aligned to 64 bit data boundary?
each location: we do one byte transfer
address multiple of 8 => 16
if we are doing 64 bit trnasfer(8 bytes)
64 bytes will be written to => 16, 17, 18, 19, 20, 21, 22, 23 (it will perfectly fit in to 64 bit data bounadaries)
what is nto aligned:
if starting address = 17
64 bytes will be written to => 17, 18, 19, 20, 21, 22, 23
24 (falls in anotehr 64 bit data boundaries)

SESSION#2

construction options
core build options:
– Data width (32 or 64 bits)
AXI wdata & rdata ports should they be 32 bit size or 64 bit size
– Data buffer size (16-512 bytes)
o Each channel in DMA is implemented as a FIFO
o when AXI performs read to a memory or peripheral, read data is stored in to a FIFO
o DMA does write to destination from the FIFO stored data.
o FIFO size can be from 16 to 512 bytes
– AXI write command depth (1-64 commands)
o AXI protocol can perform write to the memory or peripherals
o for every write request, memory or peripheral is supposed to provide response.
o how many AXI outstanding commands(txs) can be there till which we still issue the requests.
– Address bits (16-32)
o AXI address bus size(16 or 32 bits)
– Buffer size bits (9-16)
– how many bits should be used for buffer size storage
AXI
o 5 channels
o Write address channel
o awvalid, awready
o Write data channel
o wvalid, wready
o Write resp channel
o bvalid, bready
o Read address channel
o arvalid, arready
o Read data and resp channel
o rvalid, rready
o for all these 5 channels, when master/slave issues the command(valid), otehr compoennts should issue ready signals
o if master issues awvalid = 1, there is a sepcific number of clock cycles, within which awready should be made ‘1’ by the slave.
o if slave doesn’t do, then it is called timeout condition.
SPI controller
o 8 bytes of transfer => hence buffer size can be small
DMA
o AXI
o name of the port, size(number of bits), direction, description
o APB
o Peripheral
unlimited pending read commands
o we can continue to issues reads, even though we are not getting read data from the slave
AXI supports 4 types of responses
OKAY
EXOKAY
DEC_ERR
o AXI master is issuing a request with an address, that address is not mapped to any slave.
for ex: PIN = 999999, if this PINcode doesn’t exit, post office will give me decode error.
SLV_ERR
o Address is valid, but slave is not in a position to response, in those cases it issues SLV_ERR.
scan_en
o concept used during DFT
o when we want to run scan scenarios of the DFT, scan_en is driven ‘1’
o design will work in scan mode, not in functional mode
Peripheral interface
o periph_tx_req[31:1] => 31 bits in size
o signal which is driven by peripheral controller to the DMA
o periph_tx_req[3] = 1
o correposnding 3rd peripheral controller(ex: SPI controller) is telling that, I am ready to transmit the data, please provide me the data
o it(Peripehral controller => SPI controller) is giving request to DMA, that I can start the trnamsit of the data
o DMA performs write to the SPI buffer memory(memory inside teh SPI controller)
o SPI controller will use this data, transmits the same to the SPI slave
o once this request is addressed(DMA has given data to SPI controller)
DMA drives periph_tx_clr[3] = 1, to indicate that I am done with your request o periph_rx_req => it is also a input
o SPI controller is telling I am have received the data from slave. I am ready give the data to you(? : DMA)
o please issue a read me, then I(? : SPI) will give the data
o DMA will issue read
o read is read
o DMA will drive periph_rx_clr[3]=1, to indicate the I have completed the read.
o 3 should be configured in to CHANNEL Specific PERIPH_NUM Register
o if tehre are 8 channels in a core, how many PERIPH_NUM registers will be there?
o 8 will be require
o One register per channel
o PERIPH_NUM tells which peripheral number it is curently doing the transfer to/from.
ex: CH4_PERIPH_NUM reg = 7 => CH4 is doing transfers to/from 7th Peripheral.
o why periph_rx/tx_req are 31 bits size?
o DMA will be able to address up to 31 peripherals concurrently
both AXI and APB work on same clock
Modes
o for some kind of transfers
ex: smaller size read, small size write, independment mode may give best performance
ex: large size read, large size write, joint mode may give best performance
o independent mode
o normal
o outstanding mode
o joint mode
o normal (though we are in joint mode, but we will work in normal mode)
o joint mode
o independent, normal and joint are talked w.r.t to channels
o if write channel and read channel work independently => it is independent mode
o if write channel and read channel work in outstanding mode => it is outstanding mode
independent mode – normal channel mode
o arbiter inside the core decides when to issue the write
o reads can be issued independent of writes
o we want to do 256 bytes transfer (move 256 bytes from a source memory to destination memory)
o All 256 bytes, may not be possible in single AXI transaction
o AXI in current design, can do maximum tx of size = 164 = 64 bytes(is maximum number of bytes, we can do in one AXI tx) o 16 : maximum number of transfers(beats) in a transaction o 4 : AXI Data bus size is 32 bits => 4 bytes o we can’t transfer all 256 bytes in signle axi transaction o at minimum, how many txs are required = 4 o sometimes, tx can be unaligned, due to which we many need to do more traansactions than 4 o writes can’t happen, unless data is available in the buffer o this increases the latency o to increase the efficiency in this mode, we have to increase the size of data buffer, so that more write can be done back to back, without waiting for read completion. example in spec: o read at read_addr = 32’h3000_0001; it is being a unaligend address DMA can’t issue 64 byte read transaction to this location DMA issues read_tx to read 1 byte only read_len = 0; => 1 beat read_size = 0; => 1 byte/beat 11 = 1( 1 byte will be read)
o 32’h3000_0008 is an aligned address for 64 bit transfers
o full size transaction is done
4 bytes/beat * 16 beats = 64 bytes
o how to acheive 56 bytes transfer
4 bytes/beat * 14 beats = 56 bytes
AXI read tx with ARLEN=13(number of beats = 13+1 = 14)
o independent
o COre has 8 channels
o these 8 channels can independely to reads, irrespective of writes
o for a given channel, there is a depdepncy between read and write(write can’t happen unless buffer has the data)
o AXI has dedicated read channel and dedicated write channel
o this cocnept makes channel independent
o where write of CH1 can be happening independent of some other channel read.
core: independent mode
o each channel
o read outstanding or write outstanding or both
o read outstanding
o read commands are issued once the write commands have been issued, before the write data has actually been written out.
o write outstanding
o write commands are issued once the read commands have been issued, before the read data has actually been read out from source location.
o read command is issued, even though read data didn’t come to teh DMA, in the mean time we are issuing write(assumption: read data will be available, hence write data can also be issued) => assumption: slave is quick responding, hence data will be available, hence write data phase can be done on time.
o how it differs from normal mode?
o if read is issued, only when read data phase is completed, there is enough data in the channel FIFO, only then write will be issued.
o this mode requires AXI slave to be quick responding.
o when overflow will happen?
o Buffer full size = 128 bytes
o current size = 100 bytes
o write command is issued for writing 64 bytes to the destination
o without completing this write, read outstanding mode allows read to be issued even before 64 bytes are written
o read command is issued for reading 64 bytes
o for some reason, write destination slave is not ready to accept the data(it is slow), write 64 bytes transfer won’t happen.
o FIFO size = 100
o read bytes = 64
o total data size = 164 > FIFO_capacity => Overflow will happen
o in what case this will work?
o if write destination slave is quick, then 64 bytes would go out of FIFO => 100-64 = 36
o read will get 64 bytes => 36+64 = 100 < FIFO_SIZE => in this case, overflow won’t happen
o when underflow will happen?
o current FIFO size = 16 bytes
o read followed by write
o read for 64 bytes
o without waiting for 64 bytes read to complete (if it completes, then FIFO will have = 80), DMA issues write of 32 bytes(read data didn’t come)
o FIFO has only 16 bytes, we are trying to 32 bytes trnasfers => FIFO underflow will happen.
o overflow or underflow can happen
o because we don’t have any direct control on memory/peripherals, they can be slow at times => it can result in overflow or underflow.
o overflow or underflow => ERROR condition?
o how DMA addresses these?
o DMA issues ERROR interrupt
o channel stops the transfers
o if we want to keep DMA in independent – outstanding channel mode
o CORE0_JOINT_MODE[0] = 1 => Core0 all the channels will work in ‘Joint mode’
o 0 => Core0 all the channels will work in ‘independent mode’
o how to choose between normal and outstanding mode?
to keep the channel in ‘normal channel mode’ => RD_OUTSTANDING = 0, WR_OUTSTANDING=0
to keep the channel in ‘Read outstanding mode’ => RD_OUTSTANDING = 1, WR_OUTSTANDING=0
to keep the channel in ‘Write outstanding mode’ => RD_OUTSTANDING = 0, WR_OUTSTANDING=1
to keep the channel in both ‘Read and Write outstanding mode’ => RD_OUTSTANDING = 1, WR_OUTSTANDING=1
STATIC_REG0[30] => RD_OUTSTANDING
STATIC_REG1[30] => WR_OUTSTANDING
Joint mode
o each core will use a single arbiter for both read and write operations.
o once channel moves to Joint mode
o the read data is not stored in to FIFO, it is directly routed on to write data bus
o because of this DMA can issue AXI read burst whose size is bigger than FIFO Size(since data is not saved in to FIFO)
o BURST_MAX_SIZE
o Read and write tx overal byte transer size can’t exceed BURST_MAX_SIZE

SESSION#3

revision:

when we run a specific application(some thing we requires data transfer M->M, M->P, P->P)
o at embedded system level, DMA controller will be programmed to get the optimal performance
o anywhere Peripheral is involved in the transfer, we must use Independent Normal mode
o other modes are only applicatble for M->M transfers

notes:

DMA command
o describes one transaction from Source -> destination
o 4 x 32 bits
1st 32 bits => source address (from where DMA controller needs to read the data)
2nd 32 bits => destination address (to where DMA controller wants to write the data)
3rd 32 bits => size of transfer (how much data to transfer)
4th 32 bits => details about next DMA command Unique registers for every channel:
CMD0_REG
CMD1_REG
CMD2_REG
CMD3_REG
for CH0:
CH0_CMD0_REG
CH0_CMD1_REG
CH0_CMD2_REG
CH0_CMD3_REG
If I want to transfer 128 bytes of data from addr=32’h1200_0000 to dest_addr=32’h2400_0000 using CH3
o Processor should program
CH3_CMD0_REG=32’h1200_0000
CH3_CMD1_REG=32’h2400_0000
CH3_CMD2_REG=128
CH3_CMD3_REG=based on the next command we want to do
o once we enable and start the CH3, then CH3 starts doing this transfer
o MODE will be used based on what we programmed in CORE0_JOINT_MODE
Lets say using CH3, we want to do 4 back to back commands ==> command lists(list of commands)
o CH3 has only one set of registers
o how can we do 4 commands using one set of registers?
solution: 1 command will be programmed in to CH3_CMD0_REG to CH3_CMD3_REG
other 3 commands will be stored in the memory, AXI interface will be used to read this command. Once DMA gets 128 bits(one command), it performs that operation. Once this operation is done, DMA_CTRL again reads the memroy to get the next command.
There will be field called ‘CMD_LAST’ in last 32 bits of the command, which will tell whether current command is the last command or is there any command after this. Command lists: (ex: CH5)
1st command will be present in design registers
CH5_CM0_REG to CH5_CMD3_REG
remaining commands will be present in memory(processor will perform write to this memory)
once 1st command is done, DMA_CTRL gets these commands by reading the memory.
CMD3_REG will tell where is next command present.
why we need command lists?
o used for doing many back to back transfers
o ex:
20KB
5X4KB
after last page, completion interrupt should be generated.
o how do we use DMA Controller to acehive this data transfer?
o 0x1000 = ‘h1_0000_0000_0000 = 4KB
210 = 1KB above thing is 212 = 4KB
Both peripheral and memory is all about data transfer, what is the difference?
o memory is completely passive
o peripheral is active
Peripheral control
o Ex: Rx peripheral (Receive peripheral)
o It is going to receive data from its slave device(Ex: SPI slave)
o RD_PERIPH_NUM
o one register for each channel
o CH3_RD_PERIPH_NUM=9 => CH3 is connected Peripheral#9
Arbitration
o Why arbitration is required in DMA_CTRL?
o DMA has 8 channels
o multiple channels might be doing transfers concurrently
o Whereas there is only one physical AXI interface
o multiple channels(ex; we are using all 8 channels), are doing transfers on same physical channel
o 3 priority levels which a channel can be set in
o Normal
o High
o Top
o Top most priorotu
o how to set priority of a channel?
CH0_RD_PRIO
TOP => CH0 has top priority for read
CH0_WR_PRIO
NORMAL => CH0 has NORMAL priority for write
Last week of course
o TOken
o Scheduled channels
o Interrupt depth
o Software request of peripheral control
o Multiple processor control
o AXI timeouts
o watchdog timer
o clock gating
o Multiple output port control
o Limiting pending AXI commands
o clock 1 divider
o endianness swapping

SESSION#4

notes:

DMA controller has 2 types of registers
o static registers
o why static?
o once configured in a testcase(application), we don’t change them till end of the testcase
o shared registers
o all channels of a core share these registers
o channel specific registers
o these registers are specific to channels
o they may need to be reconfigured multiple times during the same testcase
configuration flows
o DMA controller has many features
o how to program the registers(configuraiton of register) to implement specific feautre of the design

o general configuraiton
o At level design has specific behavior
CLKDIV
CORE0 to be normal or joint mode
CORE1 to be normal or joint mode

o configure and start a channel
o 2 cores => 8 channels/core
o we choose one or more channels for current testcase (reallife, it is the application)
o we have to program some registers of those channels => which will make transfer happen
o static configuration
o command

o stop a channel
o by default a channel stops once all commands are executed by it.
o user can also forcefully stop a channel, by programming CH_ENABLE register to 0.

o pause and resume a channel
o restart a channel
o interrupt handling
o what is interrupt signifiance in DMA controller?
o DMA controller specific channel generates interrupt to indicate ‘I am done with the transfer’
o power down sequence

Analogy of DMA controller
System: Courier office
manager: processor
delivery boy: channel in the DMA controller
There is a required gets some docuemtns from one place to anotehr place
delivery boy needs to get 1024 books and deviver to some other location.
o he can only carry 128 books at a time
o he goes 8 times and get 128 books each time, deliver those 128 books parallely
Whole sequence of 1024 book trnasfers(using 8 individual transactions) is called as one command.
We create a command for each such set of book transfers
o we totally wanted to do 4*1024 such transfers
o we setup 4 commands to be exeucted by DB#1 o courier office: command will mean? o from where to get the books(source address) o where to deliver the books(destination address) o how many books to carry? (size of transfer) o once I am done with this set of transfer, is there any otehr work(new command) I need to do? o if there is nothing, DB will stop the work(channel has stopped) o else he will again start running a new command one courrier office can have multipel DB’s
o multuple channels how interrupt can be correlated in courier example:
o once DB is done with the transdfers, he does and inromrs manager that I am done.
o two possibilties
o inform at end of every 1024 transfers => how many interrupts needs to be gneerated = 4
o inform at end of all 4*1024 transfers => how many interrupts needs to be gneerated = 1 DMA controller understanding:
o processor : configures decides which channel will do which transfer
o channel :
o are configured to read a location and write the data to another location
o what they are doing is called as => COMMAND.
performance

independent mode, 64 bit data bus
o total buffer size = 1024B
o data bus = 64b (8B)
o channel buffer size =16B
o at a time channel FIFO can hold only 16B
o each beat(one individual transfer in the transaction) can store up to 16 Bytes
o each beat size maximum size of AXI tx can be = 16B = 128 bits
o AXI read: arlen = 1(2 transers in whole transaction), arsize=3(8 bytes per beat)
total axi read transaction size = lengthburst_size = 28 = 16 Bytes
o to do 1024 Bytes transfer
o how many such transactions do = 64 o when channel buffer size is increased to 32
o total axi read transaction size = lengthburst_size = 48 = 32 Bytes
o to do 1024 Bytes transfer
o how many such transactions do = 32
o every channel has a variable called as ‘MAX_BURST_SIZE=32 Bytes’

Registers
o Registers are the medium of communication between processor to DMA controller.
o Processor wants to tell DMA controlelr that, ‘now get in to Joint mode’ => Processor programs one or multiple registers to do that
o Processor wants DMA to do some data transfer, how it will tell? => processor program one or more registers to do that
o more complex is the design => more features => to implement more features, we need more registers
o in many cases => count of registers can tell the complexity of teh design. SPI_CTRL => 15 to 20 registers => low complecity design
DMA_CTRL => 50 to 100 registers(number increases due to channel number) => medium complex design
USB_CORE => 50 to 100 registers => medium complex design
DISPLAY SS => 100 to 300 registers => complex design
what is the signifiance of registers for a verificaiton engineer?
o Registers are the medium of communication between processor to DMA controller.
o when we develop testbench for the DMA controller, we are palying the role of processor(replacing the processor with BFM+Generator)
o Anything we need to the DMA controller, it is by programming the DMA Controller registers.
o by having better understanding of the regsiters, I will able to program its behavior better.
o If I don’t know anything about registers, I can’t do anything with DMA.
DMA controller has 2 types of registers
o static registers
o why static?
o once configured in a testcase(application), we don’t change them till end of the testcase
o shared registers
o all channels of a core share these registers
o channel specific registers
o these registers are specific to channels
o they may need to be reconfigured multiple times during the same testcase

when we go through register map, we should undertand following:
o name of the register
o size of each register
o type of register
o RW : It is possible to read and write
o RO : read only
o WO : write only
o W1C : writing 1 to register, clear the register content.
o register address
o register fields, their signifnaice

We should create an XLS with all above details

register address calculation
0x123 = 32’h123
CMD_REG0
CORE0_CH0_CMD_REG0?
address = 0x0 + 0x0 + 0x0 = 0x0
If processor wants to write data=32’h1234_5678 to CORE0_CH0_CMD_REG0
paddr = 0x0;
pwdata = 0x1234_5678
pwrite = 1;
penable = 1;
psel = 1;
wait (pready == 1); CORE1_CH3_CH_START_REG?
address = 0x800 + 0x300 + 0x44 = 0xB44
If processor wants to start the CH3 of CORE1, what it is going to do?
processor write data=32’h1 to CORE1_CH3_CH_START_REG
paddr = 0xB44;
pwdata = 0x1
pwrite = 1;
penable = 1;
psel = 1;
wait (pready == 1); processor what to gets interrupt status of CH5 of Core0?
processor needs to read CORE0_CH5_INT_STATUS_REG
addr = 0x0 + 0x500 + 0xAC = 0x5AC
processor will issue a read to 0x5AC address on APB interface.
paddr = 0x5AC;
pwrite = 0;
penable = 1;
psel = 1;
wait (pready == 1);
we get prdata (which is int_status_reg data) processor want to enable CH0 of CORE0
address = 0x0 + 0x0 + 0x40 = 0x40
paddr = 0x40;
pwdata = 0x1 What is the total range of channel register for both core0 and core1?
0x0 till 0x800 + 0x700 + 0xAC (0xFAC,d,e,f)
range: 0x0 to 0xFAF
shared registers are starting = 0x1000
range between: 0xFAF to 0x1000 is the reserved region.
for shared regsiters, what address we see in the spec, is the real register address, we don’t need to add core_base_address and channel_base_address.
design has
- 22 channel registers
- 24 shared registers
- if we take all channels in a core, total regiters will be
  = 22*8 + 24 = 200 registers
verificaiton engineer should know everything of all registers
wr_tokens
o CH3 got grant to do AXI writes
o lets say CH4 is also waitng
o how many transaction CH3 can do before it releases the grant to CH4
o tokens is that number
o more tokens, channel can do more transfers, before it releases the bus to otehr channel
restrict register
DMA controller can issues up to 8 interrupts
o INT0_STATYS
o who generated this interrupt?
COre 0 or core1
CH0 or CH1 or…CH7
How the interrupt sequence works?
o DMA Controller generates an interrupt on one its interrupt lines
intr[3]
this goes to processor
It reads INT3_STATUS
16 bits data
data = 16’b0000_0000_0100_0000
– interrupt is generated due to CORE0, CH6
now processor will read CORE0_CH6_INT_STATUS_REG
data will be 13 bits
these 13 bits will tell what really caused CH6 to generate this interrupt.
based on this read data, processor will make decision on how to handle

SESSION#5

questions:

data bus width and burst_max_size?
- data bus width
  o size of WDATA or RDATA bus connecting the DMA to teh AXI Slave
  reg [31:0] wdata, rdata; //bus width is 32 bits
  in each beat(transfer), it can carry a maximum of 32 bits
  o if AXI does a transaction with burst length of 16, total burst size = 16*4=64 Bytes
- burst_max_size
  o limits what is the maximum size of the burst that DMA can issue
  o if burst_max_size=32 for a channel
  with 32 bit data bus width, DMA can only issue AXI transaction of Lenght=8
  8*4 = 32 byts
Verification of DMA
o Reading the design specification
o List down features
o come up with scenariors
o develop testplan with detailed test description
o list down funcitonal coverage points
o come up with TB architecture
o Develop the TB components
o integrate the TB components
o Develop the sanity testcases
o Develop functional testcases
o Setup the regression
o RUn the regression
o generate coverage report
o Analyze the regression results, coverage report
o update existing testcases
o Add new testcases
o Final goal:
o All testcases passing
o 100% functional coverage
o 100% code coverage
o 100% assertion coverage
List down features
o go over the specification, list down all important aspects of the spec.
verificaiton is not about creating 100’s of testcases
o it is about otpmizing the number of testcases and making sure that we are able to check all the features and sceanrios thourorohly
o grouping multuple related things in to the same testcase

if we didn’t find the terms ‘underflow’ and ‘overflow’ in the testplan, that feature is not targeted

list down funcitonal coverage points
o funcitonal coverage gives the confidence that each feature has been checked

did I do following things?
o dual core design?
o clock divider?
o values of the clock divider are the bins
o channels
o number of channels are the bins
o 8 bins: 1, 2,3…8
o block transfer
o operation modes
o bins: independent mode, outstanding mode, joint mode
o channel priority
o bins: normal, top, high
o cross coverage: Channel number X priority
o tokens
o rd_tokens
o wr_tokens
o interrupt depth
o address alignment
o buffer size alignment
o command lists
o block command lists
o peripheral control
o bins
peripheral to peripheral transfer
peripheral to mem transfer
mem to peripheral transfer
o scheduled transfers
o endianness byte wrapping
bins : 16, 32, 64
bins : little, big
cross of both
o software control peripheral request
o watchdog timer configuration values
o channel pause and resume

Above funcitonal coverage points needs to be converted in to covergroups and coverpoints.
o coverage class
o integrate this coverage class in to the testbench

come up with TB architecture
design files are compiling properly or not?

class compilation has order dependency
module compilation has no order dependency
o module verilog files can be placed in any order
o what if design doesnt’ compile properly?
o RTL design team didn’t give up complete set of Verilog files.
o We need to report this to design team.
o any project or any new RTL release, we always should check, are the RTL files compiling properly?
o RTL team also does the compilation checks before they give the RTL to verificaiton team.
o compilation is clean
o RTL files are provoded properly

clock generation
- how many clocks do we need to pass from TB to the DUT?

reset
o design will only have one reset

every interface requires 3 clocking blocks
o master
o slave
o monitor
what is the order we follow to include files in this case?
include peripheral signals in apb
o we will still keep peripheral interface separate
o we will just instantite peripheral interface inside APB BFM

12Q. in the absence of prgm block, how do we ensure no race comes?
o even if we remove progam block, there won’t be any race condition

SESSION#6

revision:

Test plan for DMA controller
o XLS
o features, scenario, testcase, description, detailed test flow, status, comments
Projects are tracked by based on coverage
o project are tracked in phases
o P1 : Registers access, basic DMA tests should be passing
o P2 : multi channel tests, peripheral tests should pass
o P3 : ALl prefromence tests should pass
Developed TB
o skeletal structure

Notes:

ANy design we verify, the very 1st testcase that is coded is register_reset_test
o when reset is applied to the design, design registers should get reset
o after appliing reset, perform reads to the design registers, compare those values to the specificaiton provided values.
Testcase always has 3 things in it
o reset
o brings design in to a known state of operation
o OUR TB: reset is applied as part of top_most module(any project we go, reset is always in top module)
o configure
o programming design registers
o applying scenario
o generating the transaction
register_reset_test
o configure
o this is not a functional test, hence no need to program the registers
o applying scenario
o read all the design registers
Static_reg0: default value
{1000_0100_0000_0001_0000_0000_0000_0000}
8401_0000
which field is mismatching?
once the reference model is implemnted, register read data checks will be done tehre.
test_reg_wr_rd
o write to all registers a random data
o read from all those registers a random data
o compare write data with read data => reference model or register model
resolving pslverr
o write
error for 0x24 to 0x3c