RE: [EFM-OAM] Notes from yesterday's call
Dan, all,
I'm guessing Matt is talking about a counter that contains a sum of the
number of TLVs received of each type, not a sum of the actual counts inside
each TLV. That right, Matt?
For what it's worth, I submitted a comment based on what Jonathon Thatcher
suggested back in February. It adds a new field to the three error_XXX
TLVs (not the summary TLV) that is the running count of errors of that type
that have exceeded the threshold since OAM initialization. They'd overflow
and be non-resettable. Basically it's a sum of the values populated in
each error_XXX TLV that's been generated and sent. The hope was this could
help address the concern from the last conference call that information
could be lost due to the objects only containing the "last seen" PDU info.
The idea was that if the information recipient received all the event PDUs
but didn't service the MIB objects for every one (or even lost some PDUs
along the way), the error count information sent in event PDUs wouldn't be
lost. Since each error TLV has a timestamp (consensus of 2nd conf. call),
the running counter and timestamp provide a simple way to keep track of the
number of errors that have exceeded the threshold over a known period of
time, whether you check after every received event PDU or every few seconds
(assuming you can deal with the overflow).
I thought Jonathon's idea was helpful and reasonable. I didn't remember
seeing it in comments on drafts since then so I entered it. Hope I didn't
distort it too badly, but does this help fix some of the concerns?
Brian
At 12:27 PM 4/30/2003 +0300, Romascanu, Dan (Dan) wrote:
>Matt,
>
>As long as we define counters for each event type and assuming these
>counters are accurate, why would be solution b) (an event table) 'more
>reliable and complete' than a) (a MIB group of counters per event type)?
>
>Dan
>
>
> > -----Original Message-----
> > From: Matt Squire [mailto:mattsquire@acm.org]
> > Sent: Wednesday, April 30, 2003 5:31 AM
> > To: Jonathan Thatcher
> > Cc: Brian Arnold; stds-802-3-efm-oam@ieee.org
> > Subject: Re: [EFM-OAM] Notes from yesterday's call
> >
> >
> >
> >
> > Thats a great point. As a group, we have to give proper
> > consideration
> > to the options before us on the subject of event
> > notification. We could
> > a) Just have counters so that we know how many notifications we've
> > received for every event type
> > b) Just have the event tables (as in D1.414), which may result in
> > missed information, but the information there will be
> > reliable and more
> > complete than a simple counter
> > c) Provide both - neither is perfect and the combination is better
> > than the parts
> > d) Find a different way (don't know what this is).
> >
> > We touched on a couple of these options in the past, but I don't know
> > that there's been clear consensus behind any path.
> >
> > Thougts?
> >
> > - Matt
> >
> > Jonathan Thatcher wrote:
> > > It will be hard to convince network managers to use a
> > something that they know to be unreliable.
> > >
> > > But, there are two kinds of unreliability: unreliable
> > communication (acceptable if rare), unreliable information
> > (absolutely unacceptable). Or, I think that we can sell the
> > fact that some information might not "make it through." But,
> > that which does get through must be known good. A related
> > point is that with redundant communication (to increase the
> > probability that information does "make it through"), there
> > can be no case where the redundancy confuses the information
> > recipient.
> > >
> > > jonathan
> > >
> > > | -----Original Message-----
> > > | From: Matt Squire [mailto:mattsquire@acm.org]
> > > | Sent: Tuesday, April 29, 2003 5:28 AM
> > > | To: Brian Arnold
> > > | Cc: Matt Squire; stds-802-3-efm-oam@ieee.org
> > > | Subject: Re: [EFM-OAM] Notes from yesterday's call
> > > |
> > > |
> > > |
> > > |
> > > | >>
> > > | >> Events. Lots of discussion on the events. First, we
> > > | decided we need
> > > | >> to have counters for the number of events, not just for
> > > | the number of
> > > | >> event PDUs (each PDU can contain multiple events of
> > > | different types).
> > > | >> Second, are unconfortable with the current C30 handling
> > > | for events,
> > > | >> where the latest received event info is an attribute.
> > > | Given that this
> > > | >> information can change multiple times per second, its
> > > | quite possible
> > > | >> that the changes would be missed. So it was suggested
> > > | that instead of
> > > | >> keeping the latest PDU info, we should only keep
> > counters. Seemed
> > > | >> like people wanted to think a little bit about that one.
> > > | >
> > > | >
> > > | > Did anyone bring up Jonathon Thatcher's parallel counter
> > > | idea (keeps a
> > > | > running count, overflowing) from his comments on D1.3?
> > > | Discussion was
> > > | > on the reflector in late February 2003. Seems that could
> > > | be a way to
> > > | > maintain an accurate total count if we think we might miss
> > > | an update,
> > > | > but maybe I'm misinterpreting the concern above?
> > > | >
> > > | >
> > > |
> > > | The suggestion was to add counters. I forget Jonathan's
> > > | exact previous
> > > | comment, but the point brought up on the call was that we
> > > | cannot expect
> > > | to reliably pass up the content of all events though
> > Clause30. Given
> > > | that they're unreliable, shouldn't we have a counter (in
> > addition or
> > > | instead)? Since the method is unreliable, should we have
> > it at all?
> > > |
> > > | - Matt
> > > |
> > > |
> > > |
> > >
> > >
> >
> >
> >