Thread Links			Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

fwd from Jim Demmel: More on repeatability

To: stds-1788 <stds-1788@xxxxxxxxxxxxxxxxx>
Subject: fwd from Jim Demmel: More on repeatability
From: Michel Hack <hack@xxxxxxxxxxxxxx>
Date: Thu, 04 Aug 2011 18:00:17 -0400
Delivered-to: mhonarc@xxxxxxxxxxxxxxxx
List-help: <http://listserv.ieee.org/cgi-bin/wa?LIST=STDS-1788>, <mailto:LISTSERV@LISTSERV.IEEE.ORG?body=INFO%20STDS-1788>
List-owner: <mailto:STDS-1788-request@LISTSERV.IEEE.ORG>
List-subscribe: <mailto:STDS-1788-subscribe-request@LISTSERV.IEEE.ORG>
List-unsubscribe: <mailto:STDS-1788-unsubscribe-request@LISTSERV.IEEE.ORG>
Sender: stds-1788@xxxxxxxx

> ... same answer on different machines.  His motivation was a number
> of his customers (civil engineers) who had contractual obligations to
> their customers to get repeatable answers, i.e. "the bridge is safe"
> will not change to "the bridge is not safe" if you run the code again.

A more likely situation is the following (at least with a well-designed
bridge safety assessment tool):

  (a)  The safety score is 0.995  --  where > 0.95 is considered safe
  (b)  The safety score is 0.993  --  where > 0.95 is considered safe

I doubt this discrepancy would be found alarming.  And, regardless of
repeatability, if I got the answer:
  (c)  The safety score is 0.952  --  where > 0.95 is considered safe
I would not be happy; I would re-check my assumptions to find out why
I got such a marginal score.  Human experience with the methods used
and the means to interpret results cannot totally be discarded.

Where Interval methods can contribute is by giving results as follows:

  (d)  The safety score is 0.995 +- 0.002  where > 0.95 is considered safe
  (e)  The safety seore is 0.993 +- 0.002  where > 0.95 is considered safe

A genuinely marginal result would then look like:
  (f)  The safety score is 0.951 +- 0.002  where > 0.95 is considered safe

As a human, I would probably also be unhappy with:
  (g)  The safety score is 0.953 +- 0.002  where > 0.95 is considered safe
(but this would be affected by my experiences with such scores, and the
distribution of good and bad scores).

Michel.
---Sent: 2011-08-04 22:20:09 UTC

Prev by Date: Re: Let's not BE NP-hard, shall we...?
Next by Date: fwd from Jim Demmel: More on repeatability
Previous by thread: RE: fwd from Jim Demmel: More on repeatability
Next by thread: Re: fwd from Jim Demmel: More on repeatability
Index(es):
- Date
- Thread