RE: [10GBASE-T] latency
Gavin and Serag
Gavin, thanks for that overview of the PAUSE functionality. One question
I have is does anyone know if PAUSE is implemented end-to-end or
hop-by-hop in an Ethernet connection between source and sink via one or
more switches? Also, what is the primary method of flow control when the
following scenario occurs?
machine A <-- Speed A --> Switch <-- Speed B --> Server
My concern is if speed A is < Speed B (e.g. A=1000BASE-T and
B=10GBASE-T) and we use PAUSE to ensure flow control then 10GBASE-T may
have to include PAUSE. If not then buffers in the switch will overflow
when machine A requests a large amount of data from the server. Perhaps
flow control in a switch with multi-speed ports is handled using
something other than PAUSE? I am assuming this is a low level switch and
TCP is not used to do flow control, is this correct?
RDMA considerations are a concern. It's sure to desire as low a latency
as possible. I believe there is a CFI scheduled for March? As Gavin
pointed out we have some headroom thanks to the maximum link length in
wireline versus optical. This headroom is about 8000 baud periods
(assuming a 833 MHz clock). However perhaps RDMA applications will
require link lengths a lot less than 3Km.
However is a low-latency PHY that consumes a lot of power any more
attractive than a higher-latency PHY that consumes less than half that
On Fri, 2004-02-20 at 07:55, Gavin Parnaby wrote:
> Hello Mario,
> I've been working on a NIC-on-a-chip for Gigabit Ethernet, and I have
> some experience with the way flow control is used in Ethernet
> The PAUSE function is used to ensure that the receive buffers in a NIC
> do not overflow. A PAUSE packet is triggered when a station's receive
> buffer is filled to a high-water mark. When the other station's MAC
> processes the PAUSE packet, it stops transmitting. The high-water mark
> level is calibrated so that all data potentially transmitted between the
> PAUSE packet being sent and it being processed will not overflow the
> The total latency between a PAUSE packet transmit being requested by
> station A and the PAUSE packet actually pausing the transmitter in
> station B determines how much additional data could be received before
> the flow stops. The processing time in the receiver is a part of this
> delay, along with the propagation delay, the time to send the PAUSE
> frame and potentially two maximum-length frames (one on each of station
> A & B) (these could be jumbo frames).
> So given this upper bound on the response time, it is possible to set
> the watermark level so that PAUSE frames will prevent buffer overflow.
> If a standard increases the processing latency in the receiver then the
> buffer sizes and watermark level would need to be changed in the
> controller/switch, as more data would potentially need to be buffered
> between the transmit/receive of a PAUSE packet. I do not believe that
> this would create a major problem in the design of the controller.
> As you say, since the propagation delay of a 3km fiber link is
> substantially greater than for 100m UTP (~30,000 bit times compared to
> ~1140 bit times), the receive buffer size / space above the watermark
> level used in fiber controllers should be substantially larger than for
> Gigabit over copper. Jumbo frames also change the amount of buffering /
> watermark level needed. I think this indicates that an increase in
> receiver processing time for 10GBase-T is viable in terms of the PAUSE
> operation. There may be other requirements regarding RDMA etc.
> On Fri, 2004-02-20 at 11:47, Stoltz, Mario wrote:
> > Hi Stephen,
> > The latency requirements in the standard are based on clause 44.3 (which
> > refers to clause 31 and annex 31B). Underlying reason for specifying
> > delay budgets is "Predictable operation of the MAC Control PAUSE
> > operation" as the standard puts it.
> > In the 802.3ae days, there was a minor discussion in 2001 around the
> > latency budgets summed up in table 44.2 "Round-trip delay constraints".
> > Back then, I commented against the latency numbers of draft 3.0 (which
> > are now in the standard).
> > My argument back then was based on two points: a) the fact that the
> > individual delay numbers in table 44.2 seemed to be built assuming
> > different semiconductor technologies, and b) the fact that cabling delay
> > is several orders of magnitude above sublayer delay anyway if we look at
> > the distance objectives of the optical PHYs. For the sake of economic
> > feasibility, I proposed relaxing the numbers, but without success.
> > The current situation seems as if the delay budget threatens to inhibit
> > technically attractive solutions. What we are probably missing (today as
> > well as back then) is some data on the MAC control PAUSE operation and
> > how it is really used in the field. That could tell us how reasonable it
> > may be to add some slack to the current numbers.
> > Some data, anyone?
> > Best regards,
> > Mario.
> > -----Original Message-----
> > From: email@example.com
> > [mailto:firstname.lastname@example.org] On Behalf Of Stephen
> > Bates
> > Sent: Donnerstag, 19. Februar 2004 18:56
> > To: Booth, Bradley; email@example.com
> > Subject: Re: [10GBASE-T] latency
> > Hi Brad and the 10GBASE-T Group
> > I used to work for Massana (now part of Agere) but am now an Assistant
> > Prof at the University of Alberta. I've been talking to some of you
> > about this latency issue as I think it has a huge bearing on the
> > viability of 10GBASE-T.
> > I did some work based on the presentation of Scott Powell and others
> > that tried to estimate the power consumption of 10GBASE-T components.
> > Based on present performance criteria and ADCs featuring in ISSCC this
> > year I concur with his results which show that they are, by far, the
> > dominant power drain. For this and other reasons I am coming to the
> > conclusion that the trade off between the SNR target at the decoder
> > input and coding gain is not appropriate at present (I assuming we are
> > using the 1000BASE-T code).
> > Part of my research is involved with coding and decoding in high-speed
> > systems with ISI. One area of application is obviously 10GBASE-T. I know
> > Sailesh presented some work on LDPC codes. Another coding option people
> > have mentioned is a concatenated code. Both of these require that the
> > latency budget in 10G be relaxed. In the first case because LDPC
> > requires an iterative decoder and the second since we must interleave
> > between the two codes.
> > I have heard the figure of 1us being the limit for MAC to MAC latency in
> > 10G though I've not heard any justification or reasons for this. Even
> > assuming we can 50% of this in the decoder we still only have about
> > 400-500 baud periods (and hence clock cycles) to play with. This is a
> > very small figure for both the options above.
> > I think getting a better idea of what the upper bound on latency needs
> > to be is very important and I would be interested in hearing people's
> > opinion on the coding options for 10GBASE-T. I hope to make another of
> > the study group meetings as soon as my teaching commitments are
> > concluded.
> > If anyone has any questions about this please feel free to contact me.
> > Regards
> > Stephen
> > On Wed, 2004-02-18 at 12:12, Booth, Bradley wrote:
> > > I remember Sailesh mentioning that if we are willing to make
> > > trade-offs against latency, that we can make use of significantly more
> > > powerful techniques to reduce the complexity. I know people have been
> > > looking at this as a possible issue. What is an acceptable latency
> > > trade-off? Is the current latency requirement for 1000BASE-T creating
> > > problems for it in latency sensitive applications?
> > >
> > > Any thoughts or comments?
> > >
> > > Cheers,
> > > Brad
> > >
> > > Bradley Booth
> > > Chair, IEEE 802.3 10GBASE-T Study Group
> > > firstname.lastname@example.org
> > > 512-732-3924 (W)
> > > 512-422-6708 (C)
Dr. Stephen Bates
Dept. of Electrical and Computer Engineering Phone: +1 780 492 2691
The University of Alberta Fax: +1 780 492 1811
Canada, T6G 2V4 email@example.com