Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

RE: [EFM] 802.3x Flow Control usage in transport networks- Short Burst vs Congest


-----Original Message-----
From: Roy Bynum [mailto:rabynum@xxxxxxxxxxxxxx]
Sent: Monday, February 24, 2003 11:29 AM
To: cribeiro@xxxxxxxxxxxxxxxx; Shahram Davari; 'Siamack Ayandeh'
Cc: Ben Brown; Geoff Thompson; mattsquire@xxxxxxx; Chau Chak;
Subject: [EFM] 802.3x Flow Control usage in transport networks- Short
Burst vs Congest


I will take your points one at a time.  This response is to:
"- In my opinion - which is based both on my practical experience and
theorethical understanding - PAUSE is only good for short bursts. PAUSE has
little effect for long bursts of traffic, because one can hold on for very
little time before filling up the transmission queue, which forces the
equipment to discard frames."

In testing using artificially created worst case scenarios, it turns out 
that the reality is very different from what you believe.  We found, 
particularly with EOS, that using 802.3x Flow Control "pause" frames to 
rate control a fully congested link worked better than using it to rate 
control extreme bursts of traffic.  We found that mildly or non-congested 
links with extreme bursts of traffic were the least able to properly make 
use of 802.3x and small amounts of data could be lost.  If you are 
interested in how we came that conclusion over several years of testing, 
continue to read.

Starting in early 1998, a data lab was set up by MCI in Richardson, Texas 
to test the characteristics of different protocols supporting IP over DWDM 
and Automated Switched Optical Networks (ASON).  This lab was originally 
specific to optical switching and was not part of the MCI Developer's Lab. 
We found that GbE was the most stable optical data link protocol to support 
optical switched/restored transport networks.  We found that Packet over 
SONET (PoS) from the IETF was the worst.  GbE and SONET/SDH were closely 
tied in restoration reliability.  (Work in this lab prompted the effort put 
into 10GbE WAN PHY.)  In late 1998 early prototypes of Ethernet Over SONET 
were made available to the optical data lab, which we tested using the test 
equipment that we had available to do the optical switched data transport 
testing.  Over a period of time, through 2001, additional test equipment 
and vendor systems, both SONET transport nodes, Ethernet data switches, and 
IP routers were added to the lab for continued characterization.

In early 1999, as part of the EOS testing we ran across a data switch 
vendor's "boxes" and a transmission node vendor's "boxes" that both 
supported 802.3x Flow Control.  We decided to look at the performance 
characteristics that were provided by "pause" frames.  We were very 
surprised at the performance and data reliability enhancement that using 
only 802.3x provided.  We decided to continue to test and characterize the 
use of 802.3x in various scenarios.

We specifically set up the testing to only look at the non-blocking 
Ethernet switches support of 802.3x from different data switch vendors, 
combined with the EOS mapping and rate control of different transmission 
node vendors.  For the 802.3x Flow Control tests, there were no 
transmission buffers other than what was inherent in the non-blocking 
switches.  We used data generators from three different vendors, with up to 
12 different flows.  We used two switches in series at each end of the 
transmission link with traffic injected/removed at each of the two switches 
on each end of the link.  As part of this testing we also looked at 
priority queuing.

We set up one series of tests to attempt to generate an aggregate of 
traffic bandwidth that was at 90% of the Ethernet link speed, either 100mb 
or 1000mb depending on the link media.  The traffic transmission speed was 
~1.44mb (T1 payload) for the 100mb link and ~588Mb (OC12C payload) for the 
1000mb link.  The transmission link included multiple "rings" as well as 
two digital cross connect provisioned interconnects.  The data generators 
were set up to properly respond to "pause" frames.  We tested the switch 
configurations and test scenarios without the transmission link in place to 
be sure that the switches and data generator configurations were 
functioning  properly in the way that we had envisioned.  We put the 
transmission link in place and tested all of the scenarios again.  For some 
of the scenarios, the test period was 7 days.  For others, particularly 
where the performance and reliability were low, the test period was much 

It turns out that the fully congested EOS transmission nodes generate a 
steady stream of "pause" frames to rate control to the transmission payload 
rates.  In this scenario, the performance was the best (latency variance of 
125us) and the reliability was the best (data loss of about 0.0000000001 
frames per second).  I would expect the end to end reliability to be 
somewhat less because the Service Level Agreements (SLAs) that service 
provider give out do not support the higher reliability.  Since the 
performance of transmission facilities is measured in pecoseconds per bit, 
the performance of the transmission link (for leased circuit and switched 
circuit services only) is set at the edge of the network, not in the core, 
the same performance characteristics will be seen in the field as what was 
seen in our lab.

What we found to be a problem was with very sharp bursts of traffic in 
non-congested links.  The distance lag of the link between the transmission 
facing data switch and the transmission node allowed the congestion level 
of the transmission node be exceeded for a short period of time if the 
burst happened to hit the transmission node at a particular time relative 
to the 125us payload window.  Depending on how much the burst bandwidth 
exceeded the payload bandwidth at that critical time, some data would be 
lost.  The vast majority of the timing of the bursts hit within the mapping 
window such that the distance lag did not effect the performance of 
802.3x.  Even with deliberately setting up the timing of the bursts to 
create the problem in worst case, the data loss was still within the 
0.000001 frames per second of most SLAs. (This is still better data 
reliability than most IP networks deliver and definitely better than any IP 
services provided by a commercial service provider.  With performance that 
no IP network or service provider can even imagine.)

When Priority Queuing was added, it was found that the lowest priority 
queues were throttled the most, while the high priority queue was not 
throttled any, depending on the transfer rate of the high priority queue 
relative to the transmission link payload rate.  When combined with the 
other test results, the very best architecture would include the use of 
priority queuing and the injection of low priority "junk" traffic 
specifically to fill (congest) the transmission link and set up a steady 
state flow of "pause" frames from the transmission nodes at each end of the 
link.  This would allow the high priority traffic full access to the link 
while being sure that the full benefits of active rate control were

Thank you,
Roy Bynum

At 09:55 PM 2/21/2003 -0300, Carlos Ribeiro wrote:
>- In my opinion - which is based both on my practical experience and
>theorethical understanding - PAUSE is only good for short bursts. PAUSE has
>little effect for long bursts of traffic, because one can hold on for very
>little time before filling up the transmission queue, which forces the
>equipment to discard frames.