1. DMA Controller: Direct Memory Access Controller

– Duration: 5 weeks

  1. whatever we do on electronic device, is all about data transfers.
    o talking over phone
    o playing a game
    o sending an sms
    o opening a chrome browser
  2. what is data trnasfers?
    o source and destination
    who is involved?
    who provides? who consumes?
    – 2 possible components
    o memory
    o DDR, Flash, SRAM, ROM, SDCARD, MMC, Harddisk
    o peripheral
    o USB, Ethernet, KBD, PCIe, Mouse, HDMI
    4 types of transfers:
    o memory to memory
    ex: Transfer data from DDR to SRAM
    o memory to peripheral
    ex: Harddisk to USB
    There is movie file present in Hard disk, we want to transfer it to USB
    o peripheral to memory
    ex: USB to DDR
    o peripheral to peripheral
    ex: Data coming one peripheral, we want to drive it to abother peripheral
    There is data coming from Ethernet port, store that data to pendrive.
  3. My chip will perform better, if we can do data transfers efficiently
    o more data get transfered in less time
    o data transfer should happen with minimal processor intervention
    o if processor is involved, it will get busy with data transfer, it won’t be able service other requests on the chip.
    o to address this, processor(main master of SOC) delegates this transfer work to otehr component(DMA controller), which will do the transfer work.
    Bank (System)
    Manager (master of the system)
    Manager(processor) will give the work clerk(DMA controller)
    Manager will give all instructions on how to do this work.
    clerk will complete the work
    once done with the work, he will inform Manager that work is done
    Manager will assign a new work(if there is some work) to the clerk, this goes on.
    Same is relation between Processor and DMA controller
    o ideally speaking, it is processor which is supposed to read from Hard disk, get the data, write this data to USB port.
    – this transfers lets takes 2 minutes(in CHIP, 2 minutes is very big time)
    – during 2 minutes processor can’t address any other request
    o other requests will get queued up.
    o SOlution to this problem:
    o wherever there is need for such data transfers on the chip, chip architect introduces DMA controller.
    o processor informs DMA controler(in chip terminology, configures the DMA controller => programming the DMA controller registers), indicating what processor wants DMA controller to do.
    o DMA controller to read from hard disk, write the data to USB, it also need to tell ‘how much data to transfer’
    o DMA controller will have source_addr_register, dest_addr_register, data_size, some otehr variables => these will be programmed by the processor to configure the DMA controller
    o then DMA controller starts the transfer
    o once the whole transfer is done, it indicates to processor that I am done with transfer.
    o DMA controller uses a concept of ‘channel’ to do this transfer
    DMA controller complex aspects:
    o chip many times requires many parallel data transfers
    o DMA controller requires multiple channels
    how many? depdends up on the requirement
    o DMA transfer behavior is different for memory and different for peripheral
    o DMA needs to have logic for both of them
    o when concurrent transfers are happening, it needs to ensure that all these transers are happening properly to the right locations.
  4. SOC project
    almost 70% of SOC testcases will have DMA involved in their flow.
  5. register programming
    APB : Master interface
    processor acts as a master
    DMA acts as a slave
  6. For data transfers, DMA is supposed to perform reads and perform writes
    o write/reads can be to either memories or peripherals
    o this is implemented using AXI interface
    o DMA acts as a master
    o Memory/peripherals will act as slaves
  7. For developing testbench, how the diagram looks like
  8. 30 keywords discussed so far
    o DMA
    o Channel
    o Source address
    o destination address
    o transfer size
    o data transfers types
    o data storage elements
    o peripheral
    o memories
    o master interface
    o slave interface
    o why APB is used for programming register
    o Why AXI is used for data transfers?
    o how processor configures the DMA controller
    o DMA controller registers
    o ex: SPI controller registers(addr_regA, data_regA, control_reg)
    MOSI, MISO : are SPi interface ports, they are not registers
    Interrupt controller registers(priority_regA)
    o Design configuration
    o How processor interfaces with DMA controller
    o through register programming
    o DMA interrupt generation
  9. Most important thing in the project: Design specification
    o Design specification is a document provided by the design team
    o this document tells everything about how design is supposed to work
  10. Dual Core design
    o It can have 2 cores: Core0, Core1 Configurable build and optional features
    o DMA controller IP(RTL files) can be generated for different client requirements
    o number of cores
    o numebr of channels
    o size of transfers Clock divider for slow channels
    o core1 is meant for peripheral transfers
    o these peripherals refer to slower peripherals
    o we are okay with lower perfroamnce, at the benefit of power saving
    o hence we have option to provide low frequency clock Block transfer in a frame context
    o ex: movies consists of frames
    each frame is like one image
    when we do data transfers, insteead of saying transfer 100 bytes, we want to say, transfer frame by frame
    our DMA controller, support this frame transfer concept. Three operation modes:
    DMA controller can be put in 3 operation modes
    – independent
    – outstanding
    – joint
    o By keeping DMA in these modes, we can prioritize what is importnat for us
    o Performance
    o what is size of transfers
    o user can choose to keep DMA controller in above 3 modes to get the optimal performance out of DMA controller. Three level priority arbitration
    o DMA controller has cores, each having multiple channels
    o if we configure DMA controller for doing multiple concurrent transfers
    o we may to give different priorities to different transfers(to different channels)
    CH0 : High
    CH1 : Normal
    CH2 : Top
    CH3 : Normal
    CH4 : High
    Ch5 : top
    CH6 : High
    Ch7 : top
    o benefit:
    o SOme application you want run on high pririty => You configure that channel with Top priority => hence that specific channel data transfers will happen with top most priroity => my electronic device will behave as per my expectation. Windowed channel arbitration (tokens)
    o Tokens
    o Channels can be given with tokens
    o we can allot AXI interface to these channels based on tokens alloted to them. Configurable interrupt controller with multiple processor support
    o we can map which interrupt is mapped to which processor
    o since DMA has multiple interrupt lines, we can connect them to multiple processors Supports any address alignment
    o in case aligned transfers, we are supposed to generate address which are multiple of eitehr 4 or 8 or 16
    ex: address must be always multiple of 8(it is alinged 64 bit data boundary)
    addr lower 3 bits must be ‘0’
    o DMA controller does not have any such restrictions
    o AXI read and AXI writes can happen to any address locations wihtout any alignmnet restrictions.
    araddr = 32’h12536711; Supports any buffer size alignment
    o buffer size need not be any number multiple.
    o we can do transfers of any size within maximum limitation Supports command lists, including block lists
    o one DMA command can do one set of Source address reading and one set of destnation writing
    ex: read 128 bytes from a memory location, write these 128 bytes to another memory location
    o command lists?
    o linked list of DMA command
    o it helps us to back to back DMA command
    o where we are reading multiple location and writing multiple locations using these multiple DMA command
    o multiple DMA command => DMA command lists
    o Block lists can also be implemented using DMA Peripheral flow control, including peripheral block transfer
    o peripheral are different from memory
    o memory is completely passive
    o memory is a pure slave
    o I can just perform read or write that memory without any other indication(no need of permission from memory).
    o peripheral is active
    o we can do peripheral transfers only if peripheral is ready for the transfer
    o for ex: if peripheral didn’t receive data from slave device(ex: KBD controller, KBD panel, I didn’t press any button on KBD, KBD controller won’t get any data), in this case KBD controller(peripheral) is not ready to do the DMA command completion
    o Hence in case of peripheral transfer, DMA controller requires an indication(req) from peripheral that I am ready for the transfer(which is not required in case of memories) peripheral block transfer
    o DMA supports even block transfer for peripherals also Peripheral to peripheral transfer
    o DMA supports data transfer from peripheral to peripheral Scheduled transfers
    o DMA has provision for schedinling transfers for a future time.(DOUBT??) Endianness byte swapping
    o little
    o memory access where lower bytes of data bus goes to lower address location of the memory
    o addr = 32’h16; data = 32’h12345678;
    78 => stored to 32’h16
    56 => stored to 32’h17
    34 => stored to 32’h18
    12 => stored to 32’h19
    o big
    o memory access where lower bytes of data bus goes to higher address location of the memory
    o addr = 32’h16; data = 32’h12345678;
    12 => stored to 32’h16
    34 => stored to 32’h17
    56 => stored to 32’h18
    78 => stored to 32’h19
    o one memory may support little endian, other memory (of present DMA command) may support big endian
    o it requires our DMA to this swapping, before it write to the destnation memory
    o in above example, DMA controller when it gets data of 32’h12345678
    o before it writes to big endian memory, it should swap the bytes (lower to upper)
    wdata = 32’h78563412; //byte swapping Software control peripheral request
    o Software control => processor programming the design registers
    o peripheral request being controlled by means of register programming => study sepc to understand in depth Watchdog timer
    o if design is stuck/has no actiivty, watchdog timer can generate a reset to reser overall system Channel pause and resume
    o option where we can pause a channel for some duration, during which that specific channel won’t process any DMA commands
    channels are the one which processes the DMA commands APB3 registers
    o APB3 protocol is used to program the registers Complete status register set for debug
    o design has status registers, which indicate various aspects of DMA controller status
    o why these are important?
    o RTL level, open RTL code, analyze where things are wrong
    o once the chip is manufactured
    o how to debug DMA controller internal issues
    o DMA controller provides status registers, by reading these registers, we get informaiton about various aspects of DMA controller behavior
    which makes debug easier.

Q: how is multiple of 8 aligned to 64 bit data boundary?
each location: we do one byte transfer
address multiple of 8 => 16
if we are doing 64 bit transfer(8 bytes)
64 bytes will be written to => 16, 17, 18, 19, 20, 21, 22, 23 (it will perfectly fit in to 64 bit data boundaries)
what is not aligned:
if starting address = 17
64 bytes will be written to => 17, 18, 19, 20, 21, 22, 23
24 (falls in another 64 bit data boundaries)

Course Registration