Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

RE: [10GBASE-T] latency

Thanks Stephen.

RDMA applications are usually run over clusters that are seperated by
tens of meters.  The buffer-to-buffer latencies that are achievad using
infiniband networks are in the order of 5usec or less (and that includes
data DMA and digital processing prior to the MAC on TX and after the MAC
in RX).

WRT power, the less the better of course.  One question would be how
does the power of a low-latency 10GBase-T compare to the power of a
fiber-based transponder?  If the 10GBase-T solution's power is still
much less than that of fiber-based transponders, why sacrifice the


-----Original Message-----
[] On Behalf Of Stephen
Sent: Friday, February 20, 2004 10:55 AM
To: Gavin Parnaby
Subject: RE: [10GBASE-T] latency

Gavin and Serag

Gavin, thanks for that overview of the PAUSE functionality. One question
I have is does anyone know if PAUSE is implemented end-to-end or
hop-by-hop in an Ethernet connection between source and sink via one or
more switches? Also, what is the primary method of flow control when the
following scenario occurs?

machine A <-- Speed A --> Switch <-- Speed B --> Server

My concern is if speed A is < Speed B (e.g. A=1000BASE-T and
B=10GBASE-T) and we use PAUSE to ensure flow control then 10GBASE-T may
have to include PAUSE. If not then buffers in the switch will overflow
when machine A requests a large amount of data from the server. Perhaps
flow control in a switch with multi-speed ports is handled using
something other than PAUSE? I am assuming this is a low level switch and
TCP is not used to do flow control, is this correct?

RDMA considerations are a concern. It's sure to desire as low a latency
as possible. I believe there is a CFI scheduled for March? As Gavin
pointed out we have some headroom thanks to the maximum link length in
wireline versus optical. This headroom is about 8000 baud periods
(assuming a 833 MHz clock). However perhaps RDMA applications will
require link lengths a lot less than 3Km. 

However is a low-latency PHY that consumes a lot of power any more
attractive than a higher-latency PHY that consumes less than half that



On Fri, 2004-02-20 at 07:55, Gavin Parnaby wrote:
> Hello Mario,
> I've been working on a NIC-on-a-chip for Gigabit Ethernet, and I have 
> some experience with the way flow control is used in Ethernet 
> controllers.
> The PAUSE function is used to ensure that the receive buffers in a NIC

> do not overflow. A PAUSE packet is triggered when a station's receive 
> buffer is filled to a high-water mark. When the other station's MAC 
> processes the PAUSE packet, it stops transmitting. The high-water mark

> level is calibrated so that all data potentially transmitted between 
> the PAUSE packet being sent and it being processed will not overflow 
> the buffer.
> The total latency between a PAUSE packet transmit being requested by 
> station A and the PAUSE packet actually pausing the transmitter in 
> station B determines how much additional data could be received before

> the flow stops. The processing time in the receiver is a part of this 
> delay, along with the propagation delay, the time to send the PAUSE 
> frame and potentially two maximum-length frames (one on each of 
> station A & B) (these could be jumbo frames).
> So given this upper bound on the response time, it is possible to set 
> the watermark level so that PAUSE frames will prevent buffer overflow.
> If a standard increases the processing latency in the receiver then 
> the buffer sizes and watermark level would need to be changed in the 
> controller/switch, as more data would potentially need to be buffered 
> between the transmit/receive of a PAUSE packet. I do not believe that 
> this would create a major problem in the design of the controller.
> As you say, since the propagation delay of a 3km fiber link is 
> substantially greater than for 100m UTP (~30,000 bit times compared to

> ~1140 bit times), the receive buffer size / space above the watermark 
> level used in fiber controllers should  be substantially larger than 
> for Gigabit over copper. Jumbo frames also change the amount of 
> buffering / watermark level needed. I think this indicates that an 
> increase in receiver processing time for 10GBase-T is viable in terms 
> of the PAUSE operation. There may be other requirements regarding RDMA

> etc.
> Regards,
> Gavin.
> On Fri, 2004-02-20 at 11:47, Stoltz, Mario wrote:
> > Hi Stephen,
> > 
> > The latency requirements in the standard are based on clause 44.3 
> > (which refers to clause 31 and annex 31B). Underlying reason for 
> > specifying delay budgets is "Predictable operation of the MAC 
> > Control PAUSE operation" as the standard puts it.
> > 
> > In the 802.3ae days, there was a minor discussion in 2001 around the

> > latency budgets summed up in table 44.2 "Round-trip delay 
> > constraints". Back then, I commented against the latency numbers of 
> > draft 3.0 (which are now in the standard). My argument back then was

> > based on two points: a) the fact that the individual delay numbers 
> > in table 44.2 seemed to be built assuming different semiconductor 
> > technologies, and b) the fact that cabling delay is several orders 
> > of magnitude above sublayer delay anyway if we look at the distance 
> > objectives of the optical PHYs. For the sake of economic 
> > feasibility, I proposed relaxing the numbers, but without success.
> > 
> > The current situation seems as if the delay budget threatens to 
> > inhibit technically attractive solutions. What we are probably 
> > missing (today as well as back then) is some data on the MAC control

> > PAUSE operation and how it is really used in the field. That could 
> > tell us how reasonable it may be to add some slack to the current 
> > numbers.
> > 
> > Some data, anyone?
> > 
> > Best regards,
> > Mario.
> > 
> > -----Original Message-----
> > From:
> > [] On Behalf Of 
> > Stephen Bates
> > Sent: Donnerstag, 19. Februar 2004 18:56
> > To: Booth, Bradley;
> > Subject: Re: [10GBASE-T] latency
> > 
> > 
> > 
> > Hi Brad and the 10GBASE-T Group
> > 
> > I used to work for Massana (now part of Agere) but am now an 
> > Assistant Prof at the University of Alberta. I've been talking to 
> > some of you about this latency issue as I think it has a huge 
> > bearing on the viability of 10GBASE-T.
> > 
> > I did some work based on the presentation of Scott Powell and others

> > that tried to estimate the power consumption of 10GBASE-T 
> > components. Based on present performance criteria and ADCs featuring

> > in ISSCC this year I concur with his results which show that they 
> > are, by far, the dominant power drain. For this and other reasons I 
> > am coming to the conclusion that the trade off between the SNR 
> > target at the decoder input and coding gain is not appropriate at 
> > present (I assuming we are using the 1000BASE-T code).
> > 
> > Part of my research is involved with coding and decoding in 
> > high-speed systems with ISI. One area of application is obviously 
> > 10GBASE-T. I know Sailesh presented some work on LDPC codes. Another

> > coding option people have mentioned is a concatenated code. Both of 
> > these require that the latency budget in 10G be relaxed. In the 
> > first case because LDPC requires an iterative decoder and the second

> > since we must interleave between the two codes.
> > 
> > I have heard the figure of 1us being the limit for MAC to MAC 
> > latency in 10G though I've not heard any justification or reasons 
> > for this. Even assuming we can 50% of this in the decoder we still 
> > only have about 400-500 baud periods (and hence clock cycles) to 
> > play with. This is a very small figure for both the options above.
> > 
> > I think getting a better idea of what the upper bound on latency 
> > needs to be is very important and I would be interested in hearing 
> > people's opinion on the coding options for 10GBASE-T. I hope to make

> > another of the study group meetings as soon as my teaching 
> > commitments are concluded.
> > 
> > If anyone has any questions about this please feel free to contact 
> > me.
> > 
> > 
> > Regards
> > 
> > 
> > Stephen
> > 
> > On Wed, 2004-02-18 at 12:12, Booth, Bradley wrote:
> > > I remember Sailesh mentioning that if we are willing to make 
> > > trade-offs against latency, that we can make use of significantly 
> > > more powerful techniques to reduce the complexity.  I know people 
> > > have been looking at this as a possible issue.  What is an 
> > > acceptable latency trade-off?  Is the current latency requirement 
> > > for 1000BASE-T creating problems for it in latency sensitive 
> > > applications?
> > > 
> > > Any thoughts or comments?
> > > 
> > > Cheers,
> > > Brad
> > > 
> > > Bradley Booth
> > > Chair, IEEE 802.3 10GBASE-T Study Group
> > >
> > > 512-732-3924 (W)
> > > 512-422-6708 (C)

Dr. Stephen Bates

Dept. of Electrical and Computer Engineering      Phone: +1 780 492 2691
The University of Alberta                         Fax:   +1 780 492 1811
Canada, T6G 2V4