HyperTransport System: HT Architectural Overview

General
HyperTransport provides a point-to-point interconnect that can be extended to support a wide range of devices. Figure 2-1 on page 21 illustrates a sample HT system with four internal links. HyperTransport provides a high-speed, high-performance, point-to-point dual simplex link for interconnecting IC components on a PCB. Data is transmitted from one device to another across the link.
Figure 2-1. Example HyperTransport System
The width of the link along with the clock frequency at which data is transferred are

scalable:
Link width ranges from 2 bits to 32-bits
Clock Frequency ranges from 200MHz to 800MHz (and 1GHz in the future)
This scalability allows for a wide range of link performance and potential applications with bandwidths ranging from 200MB/s to 12.8GB/s.
At the current revision of the spec, 1.04, there is no support for connectors implying that all HyperTransport (HT) devices are soldered onto the motherboard. HyperTransport is technically an "inside-the-box" bus. In reality, connectors have been designed for systems that require board to board connections, and where analyzer interfaces are desired for debug.
Once again referring to Figure 2-1, the HT bus has been extended in the sample system via a series of devices known as tunnels. A tunnel is merely an HT device that performs some function, but in addition it contains a second HT interface that permits the connection of another HT device. In Figure 2-1, the tunnel devices provide connections to other I/O buses:
Infiniband
PCI-X

Ethernet
The end device is termed a cave, which always represents the termination of a chain of devices that all reside on the same HT bus. Cave devices include a function, but no additional HT connection. The series of devices that comprise an HT bus is sometimes simply referred to as an HT chain.
Additional HT buses (i.e. chains) may be implemented in a given system by using a HT-to-HT bridge. In this way, a fabric of HT devices may be implemented. Refer to section entitled, "Extending the Topology" on page 33 for additional detail.
Transfer Types Supported
HT supports two types of addressing semantics:

legacy PC, address-based semantics
messaging semantics common to networking environments
The first part of this book discusses the address-based semantics common to compatible PC implementations. Message-passing semantics are discussed in Chapter 19, entitled "Networking Extensions Overview," on page 443.

Address-Based Semantics
The HT bus was initially implemented as a PC compatible solution that by definition uses Address-based semantics. This includes a 40-bit, or 1 Terabye (TB) address space. Transactions specify locations within this address space that are to be read from or written to. The address space is divided into blocks that are allocated for particular functions, listed in Figure 2-2 on page 23.

HT Address Map
HyperTransport does not contain dedicated I/O address space. Instead, CPU I/O space is mapped to high memory address range (FD_FC00_0000h—FD_FDFF_FFFFh). Each HyperTransport device is configured at initialization time by the boot ROM configuration software to respond to a range of memory address spaces. The devices are assigned addresses via the base address registers contained in the configuration register header. Note that these registers are based on the PCI Configuration registers, and are also mapped to memory space (FD_FE00_0000h—FD_FFFF_FFFFh. Unlike the PCI bus, there is no dedicated configuration address space.
Read and write request command packets contain a 40-bit address Addr[39:2]. Additional memory address ranges are used for interrupt signaling and system management messages. Details regarding the use of each range of address space is discussed in subsequent chapters that cover the related topic. For example, a detailed discussion of the configuration address space can be found in Chapter 13, entitled "Device Configuration," on page 305.
Data Transfer Type and Transaction Flow
The HT architecture supports several methods of data transfer between devices, including:
Programmed I/O

DMA
Peer-to-peer
Each method is illustrated and described below. An overview of packet types and transactions is discussed later in this chapter.

Programmed I/O Transfers
Transfers that originate as a result of executing code on the host CPU are called programmed I/O transfers. For example, a device driver for a given HT device might execute a read transaction to check its device status. Transactions initiated by the CPU are forwarded to the HT bus via the Host HT Bridge as illustrated in Figure 2-3. The example transaction is a write that is posted by the host bridge; thus no response is returned to from the target device. Non-posted operations of course require a response.
Transaction Flow During Programmed I/O Operation

DMA Transfers
HT devices may wish to perform a direct memory access (DMA) by simply initiating a read or write transfer. Figure 2-4 illustrates a master performing a DMA read operation from main DRAM. In this example, a response is required to return data back to the source HT device.
Transaction Flow During DMA Operation

Peer-to-Peer Transfers
the initial request to read data from the target device residing on the same bus. Note that even though the target device resides on the same bus, it ignores the request moving in the upstream direction (toward the host processor). When the request reaches the upstream bridge, it is turned around and sent in the downstream direction toward the target device. This time the target device detects the request and returns the requested data in a response packet.

Peer-to-Peer Transaction Flow
The peer-to-peer transfer does not occur directly between the requesting and responding devices as might be expected. Rather, the upstream bridge is involved in handling both the request and response to ensure that the transaction ordering requirements are managed correctly. This requirement exist to support PCI-compliant ordering. True, or direct, peer-to-peer transfers are supported when PCI ordering is not required as defined by the networking extensions. See Chapter 19, entitled "Networking Extensions Overview," on page 443 for details.

HT Signals
The HT signals can be grouped into two broad categories:
The link signal group — used to transfer packets in both directions (High-Speed Signals).
The support signal group — that provides required resources such as power and reset, as well as other signals to support optional features such power management (Low-Speed Signals).

Primary HT Signal Groups
Link Packet Transfer Signals
The high-speed signals used for packet transfer in both directions across an HT link include:

CAD (command, address, data). Multiplexed signals that carry control packets (request, response, information) and data packets. Note that the width of the CAD bus is scalable from 2-bits to 32-bits. (See "Scalable Performance" on page 30.)

CLK (clock). Source-synchronous clock for CAD and CTL signals. A separate clock signal is required for each byte lane supported by the link. Thus, the number of CLK signals required is directly proportional to the number of bytes that can be transferred across the link at one time.

CTL (control). Indicates whether a control packet or data packet is currently being delivered via the CAD signals.
these signals and defines various widths of data bus supported. The variables "n" and "m" define the scaling option implemented. Refer to "Link Initialization" on page 282 for details regarding HT data width and clock speed scaling.

Link Signals Used to Transfer Packets
Link Support Signals
The low-speed link support signals consist of power- and initialization-related signals and power management signals. Power- and initialization-related signals include:
VLDT & Ground — The 1.2 volt supply that powers HT drivers and receivers
PWROK — Indicates to devices residing in the HT fabric that power and clock are stable.
RESET# — Used to reset and initialize the HT interface within devices and perhaps their internal logic (device specific).
Power management signals
LDTREQ# — Requests re-enabling links for normal operation.
LDTSTOP# — Enables and disables links during system state transitions.
Link Support Signals

Scalable Performance
The width of the transmit and receive portion of the link (CAD signals) may be different. For example, devices that typically send most of their data to main memory (upstream) and receive limited data from the host can implement a wide path in the high performance direction and narrow path for traffic in the lesser used direction, thereby reducing cost.
The HyperTransport link combines the advantages of both serial and parallel bus architectures. HT provides options for the number of data paths implemented and for the clock rate at which data is transferred (see "Scalable Link Width and Speeds" on page 30); thus, providing scalable link performance ranging from 0.2GB/s to 12.8GB/s. This scalability is helpful to system designers. For example:
An implementation that needs all the available bandwidth (e.g. system chipsets), can use wide links (up to 32 bits), running at the highest clock frequencies (up to 800MHz now and 1GHz in the future).
Implementations that don't require high bandwidth but do require low power may use narrow links (as few as 2 bits) and lower frequencies (down to 200MHz).
Scalable Link Width and Speeds

HyperTransport lends itself to scaling well because:
The high frequency bus translates to fewer pins required to transfer a specific amount of data. The same protocol is used regardless of link width.
Differential signaling results in a very low current path to ground, thereby reducing the number of power and ground pins required for devices.
Each additional byte lane added has its own source synchronous clock.
HT's implementation of ACPI compliant power management and interrupt signaling is message based, reducing pin count. Note that only two additional signals, LDTSTOP# and LDTREQ#, are required for managing power.

Data Widths
HT provides scalable data paths with link widths of 2-, 4-, 8-, 16-, or 32-bits wide in each direction, as pictured in Figure 2-10 on page 31. The link width used immediately following reset is restricted to no wider than 8 bits. Later during software initialization, configuration software determines the maximum link width that can be supported in each direction and configures both devices to use the maximum width supported for each direction. See "Tuning the Link Width (Firmware Initialization)" on page 295 for details.

Link Widths Supported
Table 2-1. Signals Used for Different Link Widths Link Widths
2
4
8
16
32

Pin Names
Number of Pins

Data Pins (CAD)
8
16
32
64
128

Clock Pins (CLK)
4
4
4
8
16

Control Pins (CTL)
4
4
4
4
4

LDTSTOP#/LDTREQ#
2
2
2
2
2

RESET#
1
1
1
1
1

PWROK
1
1
1
1
1

VHT
2
2
3
6
10

GND
4
6
10
19
37

Total Pins
26
36
57
105
199

As mentioned earlier, asymmetrical link widths are allowed in HyperTransport. For example, devices that typically send the bulk of their data in one direction and receive limited data in the other direction can save on cost by implementing a wide path in the high bandwidth direction and a narrow path for traffic in the low bandwidth direction. Note that the HyperTransport protocol doesn't change with link width. Packet formats remain the same, although it will obviously require more bit times to shift out a 32 bit value on a 2-bit link vs. a 32-bit link (16 bit times vs. 1 bit time).

Clock Speeds
HyperTransport clock speeds currently supported are 200MHz, 300MHz, 400MHz, 500MHz, 600MHz, and 800MHz. Note that 700MHz is not supported. Both rising edge and falling edges of the clock are used to clock signals. The clocking mechanism is referred to as double data rate (DDR) clocking. DDR clocking translates to an effective clock frequency that is double the actual clock frequency. In addition, because each link is dual simplex, the actual link bandwidth is quadrupled when compared to the clock rate.
Table 2-2 shows the bandwidth numbers based on symmetrical links for selected combinations of clock frequency and link width. For example, consider the bandwidth in GigaBytes/second for a 32-bit link operating at 800MHz:
800MHz clock with DDR = effective clock of 1,600MHz/s (1.6GTransfers/s)
1.6GTransfers/s x 4 bytes = 6.4GB/s
6.4GB/s in both directions = 12.8GB/s.
Table 2-2. Maximum Bandwidth Based on Various Speeds and Link Widths Link Width (bits)
Bandwidth per Link(in Gbytes/sec)

800MHz
400MHz
200MHz

2
0.8
0.4
0.2

4
1.6
0.8
0.4

8
3.2
1.6
0.8

16
6.4
3.2
1.6

32
12.8
6.4
3.2

Extending the Topology
Based on point-to-point links, a HyperTransport chain may be extended into a fabric, using single and multi-link devices together. Devices defined for HT include:
Single HT link "cave" devices used to implement a peripheral function
Single or multi-link Bridges; (HT-to-HT, or HT to one or more other protocols such as PCI, PCI-X, AGP or Infiniband)
Multi-link Tunnel devices used to implement a function and extend a link to a neighboring device downstream, thus creating a chain
These devices are the basic building blocks for the HT fabric
Basic HT Device Types
exemplifies a HyperTransport topology that includes all three device types previously discussed. The basic difference between an HT-to-HT bridge and a tunnel device is:
A bridge creates a new link (with its own bus number), and acts as a HyperTransport host bridge for each secondary link.
A tunnel buffers signals, passes packets, but merely extends an existing link to another device. It is not a host, and the bus number is the same on both sides of the tunnel. It also implements an internal function of its own, which a bridge typically would not.
HyperTransport Topology Supporting All Three Major Device Types

Packetized Transfers
Transactions are constructed out of combinations of various packet types and carry the commands, address, and data associated with each transaction. Packets are organized in multiples of 4-byte blocks. If the link uses data paths that are narrower than 32 bits, successive bit-times are added to complete the packet transfer on an aligned 4-byte boundary. The primary packet types include:
Control Packets — used to manage various HT features, initiate transactions, and respond to transactions
Data packets — that carry the payload associated with a control packet (maximum payload is 64 bytes).
As illustrated in Figure 2-13, the control (CTL) signal differentiates control packets from data packets on the bus.
Distinguishing Control from Data Packets
For every group of 8 bits (or less) within the CAD path, there is a CLK signal. These groups of signals are transmitted source synchronously with the associated CLK signal. Source synchronous clocking requires that CLK and its associated group of CAD signals must all be routed with equal length traces in order to minimize skew between the signals.
Control Packets
Control packets manage various HT features, initiate transactions, and respond to transactions as listed below:
Information packets
Request packets
Response packets
Information packet (4 bytes)
Information packets are exchanged between the two devices on a link. They are used by the two devices to synchronize the link, convey a serious error condition using the Sync Flood mechanism, and to update flow control buffer availability dynamically (using tags in NOP packets). The information packets are:
NOP
Sync/Error
Request packet (4 or 8 bytes)
Request packets initiate HT transactions and special functions. The request packets include:
Sized Write (Posted)
Broadcast Message
Sized Write (non-posted)
Sized Read
Flush
Fence
Atomic Read-Modify-Write
Response packet (4 bytes)
Response packets are used in HT split-transactions to reply to a previous request. The response may be a Read Response with data, or simply a Target Done Response confirming a non-posted write has reached its destination.

Data Packets
Some Request/Response command packets have data associated with them. Data packet structure varies with the command which caused it:
Sized Dword Read Response or Write data packets are 1-16 dwords (4-64 bytes)
Sized Byte Read Response data packets are 1 dword (any byte combination valid)
Sized Byte Write data packets are 0-32 bytes (any byte combination valid)
Read-Modify-Write
[ Team LiB ]

HyperTransport Protocol Concepts
Channels and Streams
In HyperTransport, as in other protocols, ordering rules are needed for read, posted/non-posted write transactions, and responses returning from earlier requests. In a point-point fabric, all of these occur over the same link. In addition, transactions from different devices are also merging over the same links. HyperTransport implements Virtual Channels and I/O Streams to differentiate a device's posted requests, non-posted requests, and responses from each other and from those originating from different sources.

Virtual Channels
HyperTransport defines a set of three required virtual channels that dictate transaction management and ordering:
Posted Requests — Posted write transactions belong to this channel.
Non-Posted Requests — Reads, non-posted writes, and flushes belong to this channel.
Responses — Read responses and target done packets belong to this channel.
An additional set of Posted, Non-Posted and Response virtual channels is required for isochronous transactions, if supported. This dedicated set of virtual channels assist in guaranteeing the bandwidth required of isochronous transactions.

When packets are sent over a link, they are sent in one of the virtual channels. Attribute bits in the packets tag them as to which channel they should travel. Each device is responsible for maintaining queues and buffers for managing the virtual channels and enforcing ordering rules.

Each device implements separate command/data buffers for each of the 3 required virtual channels as pictured in Figure 2-14 on page 38. Doing so ensures that transactions moving in one virtual channel do not block transactions moving in another virtual channel. There are I/O ordering rules covering interactions between the three virtual channels of the same I/O stream. Transactions in different I/O streams have no ordering rules (with exception of ordering rules associated with Fence requests). Enforcing ordering rules between transactions in the same I/O stream prevents deadlocks from occurring and guarantees data is transferred correctly. Based on ordering requirements, nodes may not:
Make accepting a request dependent on the ability of that node to issue an outgoing request.
Make accepting a request dependent on the receipt of a response due to a request previously issued by that node.
Make issuing a response dependent on the ability to issue a request.
Make issuing a response dependent upon receipt of a response due to a previous request.
I/O Streams
In addition to virtual channels, HyperTransport also defines I/O streams. An I/O stream consists of the requests, responses, and data associated with a particular UnitID and HyperTransport link. Ordering rules require that I/O streams be treated independently from each other. When a request/response packet is sent, it is tagged with sender attributes (UnitID, Source Tag, and Sequence ID) that are used by other devices to identify the transaction stream in use, and the required ordering within it. Entries within the virtual channel buffers include the transaction stream identifiers (attributes).
used properly, the independent I/O streams create the effect of separate connections between devices and the host bridge above them — much as a shared bus connection appears.
Transactions (Requests, Responses, and Data)
Transfers initiated by HT devices require one or more transactions to complete. These devices may need to perform a variety of operations that include:
sending or forwarding data (write)
requesting that a target return data to it (read)
performing an atomic read/modify/write operation
wanting additional control over ordering of its posted transactions (using Flush and Fence commands)
wanting to broadcast a message to all downstream agents (done by bridges only)
The format of these transactions also vary depending on the type of operation (request) specified as listed below:
Requests that behave like reads and that require a read response and data (i.e., Sized Read, Atomic RMW)
Requests that behave like writes, and require a target done response to confirm completion (i.e. Non-posted Sized Writes)
Posted Requests that behave like writes but don't require any target response or data. (i.e. Posted Sized Writes, Broadcast Message, or Fence)

Transaction Requests
Every transaction begins with the transmission of a Request Packet. Note that the actual format of a request packet varies depending on the particular request, but in general each request contains the following information:
Target address within HyperTransport memory space
The request type (command)
Sender's transaction stream ID (UnitID, SeqID)
The amount of data to be transferred (if any)
Other attributes: virtual channel to use, etc.
HT defines seven basic request types. The characteristics of each request type is discussed in the following sections.

Transaction Responses
Responses are generated by the target device in cases where data is to be returned from the target device, or when confirmation of transaction completion is required. Specifically, in HyperTransport, a response follows all non-posted requests. A target responds to:
Return data to satisfy an earlier read or Atomic Read-Modify Write (RMW) request
Confirm the arrival of non-posted write data
Confirm the completion of a Flush operation
Report errors
The information in a response varies both with the Request that causes it, and with the direction the response is traveling in the HyperTransport fabric. However, content of an HT response generally includes:
Response type (command)
Response direction (upstream or downstream)
Transaction stream (UnitID, Source Tag)
Misc. info: virtual channel to use, error, etc.
Transaction Types
As discussed earlier, HT defines seven basic transaction types. This section introduces the characteristics of each type and defines any sub-types that exist.
Sized Read Transactions
Sized Read transactions permit remote access to a device memory or memory-mapped I/O (MMIO) address space. The operation may be initiated on HT from the host bridge (PIO operation), or an HT device may wish to read data from memory (DMA operation) or from another HT device (peer-to-peer operation). Two types of Sized Read transactions define the different quantities of data to be read.

Sized (Byte) Read — this request defines an aligned 4 byte block of address space from which 0 to 4 bytes can be read. Any single byte location or any group of bytes within the 4 byte block can be accessed. The typical use of this transaction is for reading MMIO registers.
Sized (DW) Read — this request identifies an aligned 64 byte block of address space from which 4-64 bytes can be read. Any continuous group of aligned 4 byte groups (DWs) can be accessed.

The protocol associated with Sized Read transactions is illustrated in Figure 2-15 on page 41. These transactions begin with the delivery of a Sized Read Request packet and completes when the target device returns a corresponding response packet followed by data.

Figure 2-15. Example Protocol — Receiving Data from Target

The basic rules for maintaining high performance of HT reads include:

For reads, the requester won't issue the request until it has buffers available to receive all requested data without wait states.

The requester won't issue the request until it knows the target has room in its transaction queue to accept it (Flow Control)

Upon receiving the read request, the target won't issue the read response until it has all requested data and status available to send. Once it starts the response, there will be no wait states until the read response packet and all data (up to 16 dwords) have been sent.

Upon receiving the response, the requester will check the error bits to make certain the data is valid.

The target and any bridges in the path de-allocate buffers and queue entries as soon as the response has been sent.

Sized Write Transactions
Sized Write transactions permit the host bridge (PIO operation) to send data to a HyperTransport device, or permits a HyperTransport device to send data to memory (DMA operation) or to another device (Peer-to-peer operation). Two types of Sized Write requests permit different sizes of memory or MMIO space to be accessed.

Sized (Byte) Write — this request identifies an aligned block of 32 bytes of address space into which data is to be written. The amount of data to be written can be from 0 to 32 bytes. Note that the maximum transfer size of 32 bytes only occurs if the start address is 32 byte aligned. If the start address is not on a 32-byte boundary, the transfer will be less than 32 bytes. Furthermore, no Byte Write transaction crosses a 32 byte address boundary. Any combination of bytes (need not be contiguous) can be written from the start address to the next aligned 32 byte block of address space.

Sized (DW) Write — this request identifies an aligned block of 64 bytes of address space into which data can be written. The start address must be aligned on 4-byte boundaries, and data to be written is always aligned in 4- byte contiguous groups (DWs). The amount of data written can be from 1 to 16 DW increments.
Non-Posted Sized Writes.
The packet protocol associated with Sized Write transactions depends on whether the Sized Write is posted or not. Figure 2-16 on page 43 illustrates the case of a non-posted Sized Write. This diagram illustrates the basic HT split-transaction request-target done response sequence.

The basic rules for maintaining high performance in HT writes include:
The requester won't issue the non-posted write request until it knows the target can accommodate the request and all of the data to be sent. Refer to the section on Flow Control to see how this is managed for writes.
Upon receiving the write request and data, the target won't issue the target done response until it has properly delivered all data. Once it starts the response, there will be no wait states until the four bytes of the target done response packet have been sent.
Upon receiving the response, the requester will check the error bits to make certain delivery is complete.
The target and any bridges in the path de-allocate request queue entries as soon as the target done response has been sent.

Posted Sized Writes
In both case the transaction begins with the Sized Write request followed by the data. Non-posted operations include a response packet that is delivered back to the requester as verification that the operation has completed, whereas posted writes end once the data is sent.

Flush
Flush is useful in cases where a device must be certain that its posted writes are "visible" in host memory before it takes subsequent action. Flush is an upstream, non-posted "dummy" read command that pushes all posted requests ahead of it to memory. Note that only previously posted writes within the same transaction stream as Flush transaction need be flushed to memory. When an intermediate bridge receives a Flush transaction, it generates one or more Sized Write transactions necessary to forward all data in its upstream posted-write buffer toward the host bridge. Ultimately, the host bridge receives the command and flushes the previously-posted writes to memory. Receipt of the read response from the host bridge is confirmation that the flush operation has completed.
The protocol used when performing a Flush transaction is depicted. When the Flush request reaches the host bridge it completes previously-posted writes to memory. In this example two previously-posted writes are flushed to memory, after which the Target Done (TgtDone) response is returned to the requester.
Example Protocol — Flush Transaction

Fence
Fence is designed to provide a barrier between posted writes, which applies across all UnitIDs and therefore across all I/O streams and all virtual channels. Thus, the fence command is global because it applies to all I/O streams. The Fence command goes in the posted request virtual channel and has no response. The behavior of a Fence is as follows:
The PassPW bit must be clear so that the Fence pushes all requests in the posted channel ahead of it.
Packets with their PassPW bit clear will not pass a Fence regardless of UnitID.
Packets with their PassPW bit set may pass a Fence.
A nonposted request with PassPW clear will not pass a Fence as it is forwarded through the chain, but it may do so after it reaches a host bridge.
Fence requests are never issued as part of an ordered sequence, so their SeqID will always be 0. Fence requests with PassPW set, or with a nonzero SeqID, are legal, but may have an unpredictable effect. Fence is only issued from a device to a host bridge or from one host bridge to another. Devices are never the target of a fence so they do not need to perform the intended function. If a device at the end of the chain receives a fence, it must decode it properly to maintain proper operation of the flow control buffers. The device should then drop it.
Example Protocol — Fence Transaction

Atomic
Atomic Read-Modify-Write (ARMW) is used so that a memory location may be read, (evaluated and) modified, then conditionally written back — all without the race-condition of another device trying to do it at the same time. HT defines two types of Atomic operation:
Fetch & Add
Compare & Swap
The protocol associated with an Atomic Transaction is shown in Figure 2-20 on page 46. The request is followed by a data packet that contains the argument of the atomic operation. The target device performs the request operation and returns the original data read from the target location.
Example Protocol — Atomic Operation

Broadcast
Broadcast Message requests are sent downstream by host bridges, and are used to send messages to all devices. They are accepted and forwarded by all agents onto all links.
the operation of a Broadcast transaction. This example shows a broadcast request working its way down the HT fabric. All devices recognize the Broadcast Message request type and the reserved address, accept the message, and pass it along. Examples of Broadcast Messages include Halt, Shutdown, and the End-Of-Interrupt (EOI) message.
Managing the Links
This section introduces a collection of miscellaneous topics that we have labeled Link Management. They include:
Flow Control
Initialization and Reset
Configuration
Error Detection and Handling
Each of these topics is discussed in the following sections.

Flow Control
Other than information packets, all packets are transmitted from a transmitter to a buffer in the receiver. The receiver buffer will overflow if the transmitter sends too many packets. Flow control ensures that the transmitter only sends as many packets to the receiver device as buffer space allows.
Information packets are not subject to flow control. They are not transmitted to buffers within a device. Devices are always ready to accept information packets (e.g. NOP packets). Only request packets, response packets and data packets are subject to flow control.
Flow control occurs across each link between the source and the ultimate target device. HyperTransport devices must implement the six types of buffers listed above as part of its receiver state-machine. A designer implements buffers of appropriate size to meet bandwidth/performance requirements. The size of each buffer is conveyed to the transmitter during initialization, and available space is updated dynamically through NOP transmission.
HyperTransport requires transmitters on each link to accept NOP packets from receivers at reset indicating virtual channel buffering capacity, then establish a packet coupon scheme that:
Guarantees no transmitter will send a packet that the receiver can't accept
Eliminates the need for inefficient disconnects and retries on the link.
Requires each receiver to dynamically inform the transmitter (via NOP packets) as buffer space becomes available.
With three virtual channels, there are three pairs of buffers in each receiver to handle request/responses and the data:
Posted Request Buffer
Posted Request Data Buffer
Non-Posted Request Buffer
Non-Posted Request Data Buffer
Response Buffer
Response Data Buffer
Buffer entries are sized according to what will be contained in them.
If A Device Supports the optional Isochronous Channel, it must implement additional flow control buffers to support them. An "ISOC" bit is set in request and response packets indicating routing. If the "ISOC" bit is set, all link devices that support it will use these channels; others will pass Isochronous pacekts along in regular channels.
ISOC traffic is exempt from the fairness algorithm implemented for non-ISOC traffic, resulting in higher performance. Isochronous transactions are serviced by devices before non-isochronous traffic. Theoretically, isochronous traffic may result in starving non-isochronous traffic. Applications must guarantee that isochronous bandwidth does not exceed overall available bandwidth.
Initialization and Reset
HyperTransport defines two classes of reset events:
Cold Reset. This occurs on boot and starts when the PWROK and RESET# signals are both seen low. When this happens:
All devices and links return to default inactive state
Previously assigned UnitID numbers are "forgotten" and all return to default UnitID of 0.
All Configuration Space registers return to default state
All error bits and dynamic status bits are cleared
Warm Reset. This occurs when PWROK is high and RESET is seen low.
All devices and links return to default inactive state
Previously assigned UnitID numbers are "forgotten", and all return to default UnitID of 0.
All Configuration Space registers defined as persistent retain previous values. The same is true for Status and error bits defined as persistent.
All other error bits and dynamic status bits are cleared
Because HyperTransport supports scalable link width and clock speed, a set of default minimum link capabilities are in effect following cold reset.
Initial link width is conveyed when both devices sample CAD signal inputs from the other at the end of reset. Initial link clock speed is 200MHz.
Later, Configuration of devices allows optimizing CAD width and clock speeds for each link.
Refer to the core topic section on Reset and Initialization for details on this process.
It is a motherboard's responsibility to tie upper CAD inputs to 0 if a device receiver is attached to a narrower transmitter CAD interface.
Configuration
At boot time, PCI configuration is used to set-up HyperTransport devices:
Read in configuration information about device requirements and capabilities.
Program the device with address range, error handling policy, etc.
Basic configuration of a device is similar to that of PCI devices; however, specific HyperTransport-specific features are handled via the advanced capability registers.
Error Detection and Handling
HyperTransport defines required and optional error detection and handling. Key areas of error handling:
Cycle Redundancy Check (CRC) generation and checking on each link.
Protocol (violation) errors
Receive buffer overflow errors
End-Of-Chain errors
Chain Down errors
Response errors

HyperTransport System

Thursday, December 27, 2007

HT Architectural Overview

No comments:

Blog Archive

About Me