Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Local Fault/Remote Fault




Stephen,

I sympathize with you. Please note that most, if not all, of the Link
Status architecture was developed and presented "on the fly" at the
November meeting in Tampa. Subsequently, all the clause editors had only
a couple of weeks to get this new architecture in the form of formal
Clause documentation and ready for Task Group ballot. As a result, the
consistency of the Link Status architecture in P802.3ae D2.0 is
poor-to-mediocre at best.

That said, I want to thank you for taking the time to put your thoughts
together because this is your opportunity to improve the overall quality
of the P802.3ae document. I suggest that all the ideas below be arranged
into several D2.0 comments which are tied together by a common thread:
call it Link Status.

I've included some specific responses to your note below to help with
comment generation. I would appreciate it very much if you can take
ownership of this issue and follow through with comments on this issue.

Stephen.Finch@ti.com wrote:
> 
> First, let me say that I participated in the definition of
> Local Fault and Remote Fault as presented at the November
> meeting so I think I understand what was intended.  The problem
> I'm having is finding all of the pieces of that definition
> in the draft so that those who didn't attend or see some
> of the slides that were presented can understand.
> 
> With that said, what I did was search through D2.0 looking
> for Local Fault and LF to find all of the associated text.
> I may have missed some.  What I found was incomplete and/or
> confusing.
> 
> Here are the concepts I think we need to communicate:
> 
> 1.  Any device, in either its transmit or receive paths, could
>     detect a fault condition.  The fault may be that the data
>     being received is invalid or that some internal problem
>     is causing the problem.  Some/many faults may go
>     undetected.  If a device detects a fault condition (i.e.,
>     a locally detected fault) it should set its link status
>     to zero and not forward what is received, but should,
>     at its output, either:
> 
>   a.  generate a local fault pulse ordered set if it is
>       capable of doing so,
> or
>   b.  generate all zeros or all ones, making it probable that
>       the next device in the link will detect the problem and
>       (hopefully) generate a local fault pulse ordered set.

I disagree with (b). The response to a detected fault condition should
be consistent. It's just as easy to generate a local fault pulse
ordered-set as it is to generate all zeros and ones. Generating multiple
responses at a transmitter results in multiple interpretations at the
receiver. (a) should be the only response to a detected fault. The (a)
response only is implied in the accepted baseline proposal in
taborek_2_1100.pdf.
 
> 2.  All devices not detecting fault conditions should forward
>     whatever is received.  Local fault pulse ordered sets and
>     remote fault pulse ordered sets may be generated by other
>     devices and, when received, must be forwarded on.  With the
>     exception of the RS layer, receiption of a Local fault
>     ordered set or a remote fault ordered set must have no
>     effect on the device receiving these pulse ordered sets.

This is not strictly true. Any fault condition will bring down the
entire link. The link remains down until the fault condition abates. The
Link Status protocol should protect against false fault detection
conditions such as those caused by random bit or signal errors. A fault
condition recognition process is implemented whereby a detected fault
conditions are validated. A device which recognizes a fault condition
essentially operates in "fault" mode rather than "normal data" mode. In
this sense, the reception of a fault ordered-set DOES have impact on the
device receiving these pulse ordered sets.

> 3.  The RS layer is where the Local Fault Pulse Ordered Set is
>     processed.  The RS layer is the only place that a Remote
>     Fault Pulse Ordered Set can be generated.  If an RS receives
>     a Local Fault Pulse Ordered Set it must stop sending packets
>     and begin sending alternating columns of Idles and Remote
>     Fault Pulse Ordered Sets.  If an RS receives a Remote Fault
>     Pulse Ordered Set, it must stop sending packets and send
>     only Idles.

Correct.

> Devices detecting fault conditions set their link status to 0
> and attempt to generate LF's (local fault ordered sets).  In some
> cases, multiple devices may be detecting faults and attempt to
> send LF's.

Correct.

> Station management can obtain each device's status and localize
> the problem.

Correct.

> What I found in the standard is in the following clauses:
> 
> 45.2.1.2.3
> 45.2.2.1.7
> 45.2.3.1.7
> 45.2.4.2.3
> 45.2.5.2.3
> 46.2.5.1    (last paragraph)
> 46.2.6
> Table 46-4
> 48.1.3.1
> 48.2.2
> 48.2.4.5 and 48.2.4.5.1
> Figure 48-10
> 48.2.5.4 and 48.2.5.4.1 and 48.2.5.4.2
> 49.2.4.5
> 49.2.11.1.1  (definition of LFRAME_R)
> Figure 49-14 --> top state
> 
> I don't think these "pieces" capture what we need.  In fact, the
> inconsist usage of terms is confusing.  For example, what
> does "detected a local fault signal on the inbound path" mean?

Loosely translated, inbound path is any devices receiver. Local fault
signal could be a Loss_of_Signal, loss-of-sync, or local fault message. 

> I think we need some standardized terms used through out.
> And I think we need a basic description (better written than what
> I did above) place somewhere in the intro and not in one of
> the "component" pieces where it could be missed by others.
>
> Before I start on what I think should be done, I'd like confirmation
> that my description above is correct.  I'll then start on my proposed
> fixes.

Go for it!

> Steve Finch

-- 

Happy Holidays,
Rich

------------------------------------------------------- 
Richard Taborek Sr.                 Phone: 408-845-6102       
Chief Technology Officer             Cell: 408-832-3957
nSerial Corporation                   Fax: 408-845-6114
2500-5 Augustine Dr.        mailto:rtaborek@nSerial.com
Santa Clara, CA 95054            http://www.nSerial.com