pcie-assignment-6 - VLSI Guru

PCIe Assignment #1

PCIe architecture concepts
PCI concepts

Questions:

Explain ISA/EISA – 8/16/32 bit architecture first developed for the PC/XT by IBM

Operational Frequency: 8.3 MHz
16 bit or 32 bit Bus
Bandwidth: 8.3 MBps-33 MBps
PCI –The parallel bus with auto-configuration developed by Intel

Operational Frequency : 33 MHz
32bitor 64 bit Bus
Bandwidth: 132 MBps-264 MBps
PCI-X –A double wide “PCI” bus with souped up clock developed by IBM, HP and Compaq for use primarily in the server

Operational Frequency 66MHz/100MHz/133MHz/266 MHz/533 MHz
32bitor 64 bit Bus
Bandwidth : ~4 GBps
Explain PCIe evolaution from ISA, PCI, PCI-X, then PCIe
1. The initial bus architectures which used ISA/EISA were as shown in the picture above.
  Some salient features of such bus architectures are :
  1.         High speed peripherals are placed on local bus for quick access to memory.
  2.         CPU local bus is operated on a single clock – slowest device on bus gates performance.
  3.         With increasing CPU speeds ISA suffers from throughput bottleneck.

Explain The PCI Bus Architecture features (not PCIe)

Isolation (electrical and frequency) of devices from CPU.
Allows concurrent cycles on Local and PCI bus
Local Bus frequency can be increased independent of the PCI bus speed / loading.
PCI – one of the first fully HW-SW integrated bus mastering implementation.
Plug and Play in PCI
1995 – 66Mhz 64 bit PCI

PCI Bus Mastering : Devices and peripherals requiring service to gain immediate and direct control can do so through an arbitration process. This avoids CPU overlooking every transaction request. This in-turn reduces the overall latency of the system to service I/O transactions.
PCI Plug and Play : Devices are automatically detected , resources allocated, appropriate drives invoked and configured on power-on.

Explain PCI-X improments over PCI
- Used to meet the Server Market Demands where connector size / cost / signal routing are not sensitive constraints.
  Improved PCI protocol with the following enhancements : –
  
  ·           Registered inputs
  ·           No wait states
  ·           Supports Split Transaction
  ·           Data transfer in blocks
  ·           Specified data transfer size
  ·           Interrupt handling using MSI –Message
  ·           Signaled Interrupt
  ·           No Snoop (NS) attributes – Some kinds of transaction where in there is no wait for a snoop  result from processor
  ·           Relaxed ordering attribute for traffic
  ·          PCI X (2.0) – supports ECC and is 1.5V signaling
Transaction Flow comparison using an example
- See Picture Below
  Scenario :
  1.         If device A wants some data from B
  2.         Say information resides in A1 which is outside B.
  3.         Sometime is needs to process this request and there is a delay which is the latency period to service the request.
  If, no transactions are allowed on the bus until this request has been serviced then in a multi-master case, system efficiency is impacted heavily.
  
  How do PCI / PCI-x and PCI-Express handle such a scenario?

The mechanisms used are :: 1. Delayed Transaction Support – PCI
2. Split transaction – PCI-x
3. PCI express supports both ; but also a very advanced flow control mechanism in place.

   PCI uses the concept of delayed transactions between master and slave. In a normal transaction scenario, First, devices on the bus must to arbitrate / compete to take control of the bus. The algorithm should be fair for all users which is an inherent problem for isochronous transfers.
   In the above case to begin the transaction, Device A arbitrates for control of bus. Lets take only a simple case as shown in diagram, with only two devices (in conventional cases you would have multiple devices).
   Once Device A has successfully aribitrated to become a master it sends a transaction. Device B decodes the transaction and determines that device A has requested information from Function A1. Since device B does not readily have the data requested by device A, it terminates the transaction with a retry response giving device B some time to go and fetch the required data from A1.
   Terminating the transaction also allows the bus to become available for other devices to arbitrate.
   In this delayed transaction protocol Device A has to once again arbitrate the bus for control and then send the original request to dev B . This process is iterative and can take multiple times before the transaction is serviced within the minimum specified time.
PCI-X adopts a split transaction protocol.
  This does not require the master to continuously retry transactions that cannot be immediately serviced by the device (target).
   In the same scenario above as discussed for PCI, device does not terminate the transaction with a retry ; rather in PCI-x it sends out a split response. This split response lets the master know that the transaction will be completed sometime in the future.
   So now the bus is free for other devices. The moment Dev A is able to get control of the bus through arbitration it sends a split completion packet with data to device A completing the transaction.

Explain Need for PCI-Express(PCIe)

1)        BW Limitations:: When PCI was invented the typical throughput it offers in real time , approx 90MBps was adequate in the 90’s. Now we have Gigabit Ethernet which by itself requires a throughput of 125 MBps
2)        Inability to support real time data transfers:: Applications such as streaming audio/ video require guaranteed BW and deterministic latency. When PCI was defined the real time IO BW need was limited, so no mechanism was defined. PCI based systems always allow fair arbitration – priority scheme is the only workaround to give dedicated BW and deterministic latency.
3)        Future I/O support (security – virtualization)
4)        Reliability at higher speeds while supporting multiple slots
5)        Form Factor – Pin count – PCB, SI Challenges

Draw the PCIe architecture

Write down various components involved in PCIe architecture

Root Complex –Connects CPU to PCIe Topology. Root complex generates transactions on behalf of the CPU.
•End-Point – Bottom of the PCIe Tree Structure. Have only one upstream port. Can request / complete transactions.
•Legacy End-Point – uses old PCI bus operations for backward compatibility.
•Switch– provides fan-out and aggregation. Switches act as packet routers across ports and provide peer-to-peer support (mandatory).

Bridge– Connect different buses (Forward: PCIe to PCI ; Reverse : PCI to PCIe)

Does PCIe has master – slave concept in PCIe architecture?
Explain layered architecture in PCIe?

The PCIe has a layered architecture as shown in figure below. The three layers of the PCIe stack are ::
Transaction Layer : Responsible for converting requests or completion data from device core to a valid PCIe transaction
Data Link Layer : Integrity of transactions across the link.
Physical Layer : Actually transmit and receive transactions across a PCIe link.On power-on the physical layer initialises the number of layers to be used , link speed etc..The TL and DLL are oblivious to how the data is transmitted, it is taken care by the PL.
How Packet translation works across PCIe layers?

Pin- Link Efficiency calculations

32bit – PCI
32 bit = 4 bytes (bidirectional bus)
Frequency :: 33 MHZ
Throughput /BW= 33 * 4 = 132 MBps
PCI needs a minimum of 32 pins for the data + say an additional 42 sideband,power,gnd pins = 74 pins for a 32bit PCI card
The pin link efficiency is calculated as = 132MBps(Tx and Rx)   = 1.8 MBytes / pin
   74
64bit – PCI-X
Similar calculation as above with 8 byte bus / 533 MHz operational frequency and approx 150 pins for a PCI-x card results in a pin-link efficiency of :
533 * 8 (i.e 4264 MBps Tx-Rx) = 28.4 MB/pin
  150
PCI-Express is a 2.5GHz (gen-1) serial dual-simplex standard. It has separate transmit and receive lines unlike PCI/PCI-x which are parallel interfaces.
So this results in just 4 pins for PCI express data lines and 4 more pins for power and ground.
Since PCI-express is a differential standard each tx and rx line has 2 pins for tx+/tx- and rx+/rx-.

PCIe
There is a slight variation in PCIe as it uses 8b-10b encoding for its data
2500 * 1 byte   =    250 MBps for Transmit, similiarly we have another 250MBps for receieve
10 bits
So we have 500MBps per lane
Hence, pin-link efficiency = 500MBps   = 62.5 Mbytes/pin
   8
This is for a x1 PCIe link. The concept of links and lanes are discussed later.
B:D:F
1. Bus Number
1. Device Number
1. Function Number
PCIe Lane connect

Before we see how PCIe handles transactions, its time to introduce some more PCI Express specifics…

PCI is a parallel multi-drop interface whereas PCIe is a serial point-point interface.
In a pt-pt interface only one device resides at the tx or rx end.
Connection between two PCIe devices is called a LINK. A link is made up of 1 to 32 LANES. A lane has 4 wires namely TX+,TX-,RX+,RX-. If a PCIe link has 1 lane its called a x1 PCIe, if it has 4 lanes then x4 PCIe , x8 PCIe , x32 PCIe etc..
Every link is a dual unidirectional interface as each lane has a separate transmit and receive line for communication.
The collection of transmit and receive pairs at a device end is called a PORT.
This concept of multiple lanes offers flexible Bandwidth.

EMBEDDED CLOCKING
If you notice the diagram there is no clock signal .. So how does PCIe implement clocking ?
PCIe uses 8b-10b encoding to embed the clock within the data stream that is transmitted. At initialisation the fastest signalling rate supported by both devices is identified.

Explain PCIe Flow Control

So how does PCI express handle the above example?
It uses a advanced split transaction (similar to PCI-x) and flow control logic.
Lets take the example of a highway (read as a link in PCIe) with different roads for different kinds of traffic. The different roads can allow a certain type of traffice (say only cars or cycles / it can also be based on speeds 100 km roads or 10km roads ). A similiar setup is used in the PCIe implementation.
Highway = Link
Roads    = Virtual channels
Different vehicles = Traffic classes (TC)

The different kinds of packet traffic are called traffic classes. The to and fro traffic on the link is based on traffic class priority. Highest priority is given to traffic classes that require dedicated bandwidth like isochronous. Other traffic classes are balanced for low priority transactions to avoid bottlenecks.
Virtual channels are like virtual wires between two devices. The finite physical link bandwidth is divided up among the supported virtual channels as appropriate. Each virtual channel has its own set of queues,buffers,control logic and credit based mechanism to track how full or empty those buffers are on either side ( think of it like signal stops or toll gates which monitor traffic on roads)
Each Virtual channel (VC) supports a certain type of traffic. This mechanism takes care that the bottlenecked transactions on one virtual channel does not affect other VC traffic.
PCI express supports a max of EIGHT traffic classes and EIGHT virtual channels.
eg. A x1 PCIe link can have eight virtual channels and a x32 PCIe link can support one VC alone.
There are three simple rules::
1. Once a packet is assigned a traffic class it cannot change that while moving through a virtual channel.
2.  Each VC can support one or more traffic classes
3.  A single TC cannot be mapped to multiple virtual channels

Remember by these analogies easily 🙂
Cars, bicycles, trucks can go on the same road.
But bicycles cannot go on all the roads in a highway blocking other types of traffic.