PCIe
Assignment #1
- PCIe architecture concepts
- PCI concepts
Questions:
- Explain ISA/EISA – 8/16/32 bit architecture first developed for
the PC/XT by IBM
- Operational
Frequency: 8.3 MHz
- 16
bit or 32 bit Bus
- Bandwidth:
8.3 MBps-33 MBps
- PCI –The
parallel bus with auto-configuration developed by Intel
- Operational
Frequency : 33 MHz
- 32bitor
64 bit Bus
- Bandwidth:
132 MBps-264 MBps
- PCI-X –A
double wide “PCI” bus with souped up clock developed by IBM, HP and Compaq for
use primarily in the server
- Operational
Frequency 66MHz/100MHz/133MHz/266 MHz/533 MHz
- 32bitor
64 bit Bus
- Bandwidth
: ~4 GBps
- Explain PCIe evolaution from ISA, PCI, PCI-X, then PCIe
- The initial bus
architectures which used
ISA/EISA were as shown in the picture above.
Some salient features of such bus architectures are :
1. High speed peripherals are placed on local bus for quick
access to memory.
2. CPU local bus is operated on a single clock – slowest device
on bus gates performance.
3. With
increasing CPU speeds ISA suffers from throughput bottleneck.
- Explain The PCI Bus Architecture
features (not PCIe)
- Isolation (electrical and
frequency) of devices from CPU.
- Allows concurrent cycles on Local
and PCI bus
- Local Bus frequency can be
increased independent of the PCI bus speed / loading.
- PCI – one of the first fully HW-SW
integrated bus mastering implementation.
- Plug and Play in PCI
- 1995 – 66Mhz 64 bit PCI
- PCI Bus Mastering : Devices and
peripherals requiring service to gain immediate and direct control can do
so through an arbitration process. This avoids CPU overlooking every
transaction request. This in-turn reduces the overall latency of the
system to service I/O transactions.
- PCI Plug and Play : Devices are
automatically detected , resources allocated, appropriate drives invoked
and configured on power-on.
- Explain PCI-X improments over
PCI
- Used
to meet the Server Market Demands where connector size / cost / signal routing
are not sensitive constraints.
Improved PCI protocol with
the following enhancements : –
· Registered inputs
· No wait states
· Supports Split Transaction
· Data transfer in blocks
· Specified data transfer size
· Interrupt handling using MSI
–Message
· Signaled Interrupt
· No Snoop (NS) attributes – Some kinds
of transaction where in there is no wait for a snoop result
from processor
· Relaxed ordering attribute for traffic
· PCI X (2.0) – supports ECC and is 1.5V
signaling
- Transaction Flow
comparison using an example
- See Picture Below
Scenario :
1. If device A wants
some data from B
2. Say information
resides in A1 which is outside B.
3. Sometime is needs
to process this request and there is a delay which is the latency period to
service the request.
If, no transactions are allowed on the bus until this request
has been serviced then in a multi-master case, system efficiency is impacted
heavily.
How do PCI / PCI-x and PCI-Express handle such a scenario?
- The mechanisms used are :: 1. Delayed
Transaction Support – PCI
2. Split transaction – PCI-x
3. PCI express supports both ; but also a very advanced flow control mechanism
in place.
PCI uses the concept of delayed transactions between master and
slave. In a normal transaction scenario, First, devices on the bus must to
arbitrate / compete to take control of the bus. The algorithm should be fair
for all users which is an inherent problem for isochronous transfers.
In the above case to begin the transaction,
Device A arbitrates for control of bus. Lets take only a simple case as shown
in diagram, with only two devices (in conventional cases you would have
multiple devices).
Once Device A has successfully aribitrated
to become a master it sends a transaction. Device B decodes the transaction and
determines that device A has requested information from Function A1. Since
device B does not readily have the data requested by device A, it terminates
the transaction with a retry response giving device B some time to go and fetch
the required data from A1.
Terminating the transaction also allows the
bus to become available for other devices to arbitrate.
In this delayed transaction protocol
Device A has to once again arbitrate the bus for control and then send the
original request to dev B . This process is iterative and can take multiple
times before the transaction is serviced within the minimum specified time.
PCI-X adopts a split transaction protocol.
This does not
require the master to continuously retry transactions that cannot be
immediately serviced by the device (target).
In the same scenario above as discussed
for PCI, device does not terminate the transaction with a retry ; rather
in PCI-x it sends out a split response. This split response lets the master
know that the transaction will be completed sometime in the future.
So now the bus is free for other devices. The
moment Dev A is able to get control of the bus through arbitration it sends a
split completion packet with data to device A completing the transaction.
- Explain Need for PCI-Express(PCIe)
1) BW Limitations::
When PCI was invented the typical throughput it offers in real time , approx
90MBps was adequate in the 90’s. Now we have Gigabit Ethernet which by itself
requires a throughput of 125 MBps
2) Inability to
support real time data transfers:: Applications such as streaming audio/ video
require guaranteed BW and deterministic latency. When PCI was defined the real
time IO BW need was limited, so no mechanism was defined. PCI based systems
always allow fair arbitration – priority scheme is the only workaround to give
dedicated BW and deterministic latency.
3) Future I/O support
(security – virtualization)
4) Reliability at
higher speeds while supporting multiple slots
5) Form Factor – Pin
count – PCB, SI Challenges
- Draw the PCIe architecture
- Write down various components
involved in PCIe architecture
Root Complex –Connects CPU to PCIe Topology.
Root complex generates transactions on behalf of
the CPU.
•End-Point – Bottom of the PCIe Tree Structure. Have only one
upstream port. Can request / complete transactions.
•Legacy End-Point – uses old PCI bus operations for backward
compatibility.
•Switch– provides fan-out and aggregation. Switches act
as packet routers across ports and provide peer-to-peer support (mandatory).
Bridge– Connect different buses (Forward: PCIe to PCI ;
Reverse : PCI to PCIe)
- Does PCIe has master – slave
concept in PCIe architecture?
- Explain layered architecture in
PCIe?
- The PCIe has a layered architecture as shown in figure below. The three
layers of the PCIe stack are ::
Transaction Layer :
Responsible for converting requests or completion data from device core to a
valid PCIe transaction
Data Link Layer :
Integrity of transactions across the link.
Physical Layer :
Actually transmit and receive transactions across a PCIe link.On power-on the
physical layer initialises the number of layers to be used , link speed
etc..The TL and DLL are oblivious to how the data is transmitted, it is taken
care by the PL. - How Packet translation works
across PCIe layers?
- Pin- Link Efficiency calculations
32bit –
PCI
32 bit = 4 bytes (bidirectional bus)
Frequency :: 33 MHZ
Throughput /BW= 33 * 4 = 132 MBps
PCI needs a minimum of 32 pins for the data + say an additional 42
sideband,power,gnd pins = 74 pins for a 32bit PCI card
The pin link efficiency is calculated as = 132MBps(Tx and Rx)
= 1.8 MBytes / pin
74
64bit – PCI-X
Similar calculation as above with 8 byte bus / 533 MHz operational
frequency and approx 150 pins for a PCI-x card results in a pin-link efficiency
of :
533 * 8 (i.e 4264 MBps Tx-Rx) = 28.4 MB/pin
150
PCI-Express is a 2.5GHz (gen-1) serial dual-simplex standard. It has separate
transmit and receive lines unlike PCI/PCI-x which are parallel interfaces.
So this results in just 4 pins for PCI express data lines and 4 more pins for
power and ground.
Since PCI-express is a differential standard each tx and rx line has 2 pins for
tx+/tx- and rx+/rx-.
- PCIe
There is a slight variation in PCIe as it uses 8b-10b encoding for its data
2500 * 1 byte = 250 MBps for Transmit, similiarly we have another 250MBps for
receieve
10 bits
So we have 500MBps per lane
Hence, pin-link efficiency = 500MBps = 62.5 Mbytes/pin
8
This is for a x1 PCIe link. The concept of links and lanes are discussed
later. - B:D:F
- Bus Number
- Device Number
- Function Number
- PCIe Lane connect
Before we see how
PCIe handles transactions, its time to introduce some more PCI Express
specifics…
PCI is a parallel multi-drop interface
whereas PCIe is a serial point-point interface.
In a pt-pt interface only one device resides at the tx or rx end.
Connection between two PCIe devices is
called a LINK. A link is made up
of 1 to 32 LANES.
A lane has 4 wires namely TX+,TX-,RX+,RX-. If a PCIe link has 1 lane its called a
x1 PCIe, if it has 4 lanes then x4 PCIe , x8 PCIe , x32 PCIe etc..
Every link is a dual unidirectional
interface as each lane has a separate transmit and receive line for
communication.
The collection of transmit and receive pairs at a device end is called a PORT.
This concept of multiple lanes offers
flexible Bandwidth.
EMBEDDED CLOCKING
If you notice the diagram there is no clock signal .. So how does PCIe
implement clocking ?
PCIe uses 8b-10b encoding to embed the clock within the data stream that is
transmitted. At initialisation the fastest signalling rate supported by both devices
is identified.
- Explain PCIe Flow Control
- So
how does PCI express handle the above example?
It uses a advanced split transaction (similar to PCI-x) and flow control logic.
Lets take the example of a highway (read as a link in PCIe) with different
roads for different kinds of traffic. The different roads can allow a
certain type of traffice (say only cars or cycles / it can also be based on
speeds 100 km roads or 10km roads ). A similiar setup is used in the PCIe
implementation.
Highway = Link
Roads = Virtual channels
Different vehicles = Traffic classes (TC)
The different kinds of packet traffic are called traffic classes. The to and
fro traffic on the link is based on traffic class priority. Highest priority is
given to traffic classes that require dedicated bandwidth like isochronous.
Other traffic classes are balanced for low priority transactions to avoid
bottlenecks.
Virtual channels are like virtual wires between two devices. The finite
physical link bandwidth is divided up among the supported virtual channels as
appropriate. Each virtual channel has its own set of queues,buffers,control
logic and credit based mechanism to track how full or empty those buffers are
on either side ( think of it like signal stops or toll gates which monitor traffic
on roads)
Each Virtual channel (VC) supports a certain type of traffic. This mechanism
takes care that the bottlenecked transactions on one virtual channel does not
affect other VC traffic.
PCI express supports a max of EIGHT traffic classes and EIGHT virtual channels.
eg. A x1 PCIe link can have
eight virtual channels and a x32 PCIe link can support one VC alone.
There are three simple
rules::
1. Once a packet is assigned a traffic class it cannot change that
while moving through a virtual channel.
2. Each VC can support one or more traffic classes
3. A single TC cannot be mapped to multiple virtual channels
Remember by these analogies easily 🙂
Cars, bicycles, trucks can go on the same road.
But bicycles cannot go on all the roads in a highway blocking other types
of traffic.