PCIe Assignment #1

Questions:

  1. Explain ISA/EISA – 8/16/32 bit architecture first developed for the PC/XT by IBM
  1. Operational Frequency: 8.3 MHz 
  2. 16 bit or 32 bit Bus
  3. Bandwidth: 8.3 MBps-33 MBps
  4. PCI –The parallel bus with auto-configuration developed by Intel

  1. Operational Frequency : 33 MHz 
  2. 32bitor 64 bit Bus
  3. Bandwidth: 132 MBps-264 MBps
  4. PCI-X –A double wide “PCI” bus with souped up clock developed by IBM, HP and Compaq for use primarily in the server

  1. Operational Frequency 66MHz/100MHz/133MHz/266 MHz/533 MHz
  2. 32bitor 64 bit Bus
  3. Bandwidth : ~4 GBps
  4. Explain PCIe evolaution from ISA, PCI, PCI-X, then PCIe
    1. The initial bus architectures which used ISA/EISA were as shown in the picture above.
      Some salient features of such bus architectures are :  
      1.         High speed peripherals are placed on local bus for quick access to memory. 
      2.         CPU local bus is operated on a single clock – slowest device on bus gates performance. 
      3.         With increasing CPU speeds ISA suffers from throughput bottleneck.
Picture
Picture
Picture
  1. Explain Need for PCI-Express(PCIe)

1)        BW Limitations:: When PCI was invented the typical throughput it offers in real time , approx 90MBps was adequate in the 90’s. Now we have Gigabit Ethernet which by itself requires a throughput of 125 MBps 
 2)        Inability to support real time data transfers:: Applications such as streaming audio/ video require guaranteed BW and deterministic latency. When PCI was defined the real time IO BW need was limited, so no mechanism was defined. PCI based systems always allow fair arbitration – priority scheme is the only workaround to give dedicated BW and deterministic latency. 
3)        Future I/O support (security – virtualization) 
4)        Reliability at higher speeds while supporting multiple slots 
5)        Form Factor – Pin count – PCB, SI Challenges 

  1. Draw the PCIe architecture
Picture
  1. Write down various components involved in PCIe architecture

Root Complex  –Connects CPU to PCIe Topology. Root      complex generates transactions on behalf of the CPU.
End-Point   – Bottom of the PCIe Tree Structure. Have only one upstream port. Can request / complete transactions. 
Legacy End-Point – uses old PCI bus operations for backward compatibility.
 •Switch– provides fan-out and aggregation. Switches act as packet routers across ports and provide peer-to-peer support (mandatory).

Bridge– Connect different buses (Forward: PCIe to PCI ; Reverse : PCI to PCIe)

  1. Does PCIe has master – slave concept in PCIe architecture?
  2. Explain layered architecture in PCIe?
Picture
  1. The PCIe has a layered architecture as shown in figure below. The three layers of the PCIe stack are ::
    Transaction Layer : Responsible for converting requests or completion data from device core to a valid PCIe transaction
    Data Link Layer : Integrity of transactions across the link. 
    Physical Layer : Actually transmit and receive transactions across a PCIe link.On power-on the physical layer initialises the number of layers to be used , link speed etc..The TL and DLL are oblivious to how the data is transmitted, it is taken care by the PL.
  2. How Packet translation works across PCIe layers?
Picture
  1. Pin- Link Efficiency calculations 

32bit – PCI 
32 bit = 4 bytes (bidirectional bus)
Frequency :: 33 MHZ
Throughput /BW= 33 * 4 = 132 MBps
PCI needs a minimum of 32 pins for the data + say an additional 42 sideband,power,gnd pins = 74 pins for a 32bit PCI card 
The pin link efficiency is calculated as = 132MBps(Tx and Rx)      =  1.8 MBytes / pin
                                                                                         74
64bit – PCI-X
Similar calculation as above with 8 byte bus / 533 MHz operational frequency and approx 150 pins for a PCI-x card results in a pin-link efficiency of :
 533 * 8 (i.e 4264 MBps Tx-Rx)   = 28.4 MB/pin
  150 
PCI-Express is a 2.5GHz (gen-1) serial dual-simplex standard. It has separate transmit and receive lines unlike PCI/PCI-x which are parallel interfaces.
So this results in just 4 pins for PCI express data lines and 4 more pins for power and ground. 
Since PCI-express is a differential standard each tx and rx line has 2 pins for tx+/tx- and rx+/rx-.

  1. PCIe
    There is a slight variation in PCIe as it uses 8b-10b encoding for its data
    2500 * 1 byte     =    250 MBps for Transmit, similiarly we have another 250MBps for receieve
    10 bits       
    So we have 500MBps per lane 
    Hence, pin-link efficiency = 500MBps    =  62.5 Mbytes/pin 
                                                           8                                             
    This is for a x1 PCIe link. The concept of links and lanes are discussed later. 
  2. B:D:F
    1. Bus Number
    1. Device Number
    1. Function Number
  3. PCIe Lane connect
Picture

Before we see how PCIe handles transactions, its time to introduce some more PCI Express specifics… 

PCI is a parallel multi-drop interface whereas PCIe is a serial point-point interface. 
In a pt-pt interface only one device resides at the tx or rx end.   
Connection between two PCIe devices is called a LINK. A link is made up of 1 to 32 LANES. A lane has 4 wires namely TX+,TX-,RX+,RX-. If a PCIe link has 1 lane its called a x1 PCIe, if it has 4 lanes then x4 PCIe , x8 PCIe , x32 PCIe etc.. 
Every link is a dual unidirectional interface as each lane has a separate transmit and receive line for communication.
The collection of  transmit and receive pairs at a device end is called a PORT.
This concept of multiple lanes offers flexible Bandwidth.

EMBEDDED CLOCKING 
If you notice the diagram there is no clock signal .. So how does PCIe implement clocking ? 
PCIe uses 8b-10b encoding to embed the clock within the data stream that is transmitted. At initialisation the fastest signalling rate supported by both devices is identified.

Picture
  1. So how does PCI express handle the above example?
    It uses a advanced split transaction (similar to PCI-x) and flow control logic. 
    Lets take the example of a highway (read as a link in PCIe) with different roads for different kinds of traffic. The different roads can allow a certain type of traffice (say only cars or cycles / it can also be based on speeds 100 km roads or 10km roads ). A similiar setup is used in the PCIe implementation.
    Highway = Link 
    Roads    = Virtual channels
    Different vehicles = Traffic classes (TC)

    The different kinds of packet traffic are called traffic classes. The to and fro traffic on the link is based on traffic class priority. Highest priority is given to traffic classes that require dedicated bandwidth like isochronous. Other traffic classes are balanced for low priority transactions to avoid bottlenecks.
    Virtual channels are like virtual wires between two devices. The finite physical link bandwidth is divided up among the supported virtual channels as appropriate. Each virtual channel has its own set of queues,buffers,control logic and credit based mechanism to track how full or empty those buffers are on either side ( think of it like signal stops or toll gates which monitor traffic on roads)
    Each Virtual channel (VC) supports a certain type of traffic. This mechanism takes care that the bottlenecked transactions on one virtual channel does not affect other VC traffic. 
    PCI express supports a max of EIGHT traffic classes and EIGHT virtual channels. 
    eg. A x1 PCIe link can have eight virtual channels and a x32 PCIe link can support one VC alone.
    There are three simple rules::
     1.  Once a packet is assigned a traffic class it cannot change that while moving through a virtual channel.
     2.  Each VC can support one or more traffic classes 
     3.  A single TC cannot be mapped to multiple virtual channels 

    Remember by these analogies easily 🙂
    Cars, bicycles, trucks can go on the same road. 
    But bicycles cannot go on all the roads in a highway blocking other types of traffic. 

Picture
Course Registration