Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re:[RPRWG] Fw: News Article "Beth Israel Deaconess copes with a massive computer crash "




Robert,

802.1y in the making is addressing this and some similar problems. 
The fix in simple words is to retain the designated port in blocking 
state if it **keeps receiving** inferior BPDUs. 

-shyam

-------- Original Message --------
Subject: [RPRWG] Fw: News Article "Beth Israel Deaconess copes with a massive computer crash "
Date: Wed, 27 Nov 2002 13:41:53 -0500
From: "Robert D. Love" <rdlove@xxxxxxxxx>
Reply-To: "Robert D. Love" <rdlove@xxxxxxxx>
To: "802 SEC" <stds-802-sec@xxxxxxxx>
CC: "802.17" <stds-802-17@xxxxxxxx>, <jhalamka@xxxxxxxxxxxxxxxxxxxxx>

Alan Marshall alerted me to a rather interesting story that ran in =
yesterday's Boston Globe (appended to the bottom of this email note) =
which described a massive computer system outage caused by a problem =
with Spanning Tree.  I followed up by writing to the head of the =
computer center there.  His reply is attached directly below.  =
Basically, they violated the 7 hop count limit for spanning tree.  One =
consequence of that violation is that the system came down on their =
heads producing a three day problem.  If the system simply didn't allow =
the traffic to continue beyond 7 hop counts, that would be =
understandable.  However, having the whole system crash is a very =
undesirable outcome.  Are there any changes that should be considered in =
802.1, or the standards that use Spanning Tree to minimize the risk of a =
sudden massive system crash? =20

Tony, you may want to have 802.1 investigate this problem.  =20

Other working group chairs whose standards use spanning tree may want to =
alert your working groups to this newly discovered feature.

Best regards,

Robert D. Love
President, Resilient Packet Ring Alliance
President, LAN Connect Consultants
7105 Leveret Circle     Raleigh, NC 27615
Phone: 919 848-6773       Mobile: 919 810-7816
email: rdlove@xxxxxxxx          Fax: 208 978-1187
----- Original Message -----=20
From: jhalamka@xxxxxxxxxxxxxxxxxxxxx=20
To: rdlove@xxxxxxxx=20
Sent: Wednesday, November 27, 2002 11:06 AM
Subject: RE: News Article "Beth Israel Deaconess copes with a massive =
computer crash "


Here's the technical explanation for you.

When TAC was first able to access and assess the network, we found the =
Layer 2 structure of the network to be unstable and out of specification =
with 802.1d standards. The management vlan (vlan 1) had in some =
locations 10 Layer2 hops from root.

The conservative default values for the Spanning Tree Protocol (STP) =
impose a maximum network diameter of seven. This means that two distinct =
bridges in the network should not be more than seven hops away from one =
to the other.

Part of this restriction is coming from the age field Bridge Protocol =
Data Unit (BPDU) carry: when a BPDU is propagated from the root bridge =
towards the leaves of the tree, the age field is incremented each time =
it goes though a bridge. Eventually, when the age field of a BPDU goes =
beyond max age, it is discarded. Typically, this will occur if the root =
is too far away from some bridges of the network. This issue will impact =
convergence of the spanning tree.

A major contributor to this STP issue was the PACS network and its =
connection to the CareGroup network. To eliminate its influence on the =
Care Group network we isolated it with a Layer 3 boundary. All =
redundancy in the network was removed to ensure no STP loops were =
possible.


Full connectivity was restored to remote devices and networks that were =
disconnected in troubleshooting efforts prior to TACs involvement. =
Redundancy was returned between the core campus devices. Spanning Tree =
was stabilized and localized issues were pursued.=20

Thanks for your support.

  -----Original Message-----
  From: Robert D. Love [mailto:rdlove@xxxxxxxxx]
  Sent: Wednesday, November 27, 2002 9:26 AM
  To: jhalamka@xxxxxxxxxxxxxxxxxxxxx
  Subject: News Article "Beth Israel Deaconess copes with a massive =
computer crash "


  Dear Dr. Halamaka,=20

  I read the referenced article with great interest since for the past =
20 years I have been deeply involved in the creation of IEEE 802 =
standards.  These standards include Ethernet, WiFi, and other evolving =
standards which all use the Spanning Tree Protocol.  The article brings =
up the question as to whether there is any fundamental weakness in the =
algorithm that those of us creating the standards need to be aware of.  =
Therefore, I would greatly appreciate any help you can provide me in =
finding out more information to allow me to better understand any =
potential weaknesses in the Spanning Tree Algorithm.  I would also like =
to know if you believe that this problem should be addressed by IEEE =
802.3, the working group which created the Ethernet standards, or by =
other working groups within IEEE 802. =20

  My concern is more than academic.  I am presently the Vice Chair of =
the Working Group (IEEE 802.17) which is defining a metropolitan area =
networking standard, Resilient Packet Ring, which will also be using the =
spanning tree protocol.

  My full contact information is contained in the signature line.  =
E-mail is probably the easiest way to reach me.

  Thank you in advance for any assistance you can provide me.

  Best regards,

  Robert D. Love
  President, Resilient Packet Ring Alliance
  President, LAN Connect Consultants
  7105 Leveret Circle     Raleigh, NC 27615
  Phone: 919 848-6773       Mobile: 919 810-7816
  email: rdlove@xxxxxxxx          Fax: 208 978-1187

        Beth Israel Deaconess copes with a massive computer crash=20

        By Anne Barnard, Globe Staff, 11/26/2002=20

        hirteen days ago, as his computer crunched the mountain of data =
he hoped would be his humble contribution to medical progress, the =
researcher - he shall remain nameless - got a phone call he'd never =
forget.=20

                     =20
            =20

        It was Dr. John Halamka, the former emergency-room physician who =
runs Beth Israel Deaconess Medical Center's gigantic computer network. =
He told the professor that his flood of numbers was overwhelming the =
system, threatening to freeze thousands of electronic medical records =
and grind the hospital's network to a halt.=20

        ''He said, `Oh, my God!' and pulled the plug out of the wall,'' =
Halamka said last week.=20

        It was too late. Somewhere in the web of copper wires and glass =
fibers that connects the hospital's two campuses and satellite offices, =
the data was stuck in an endless loop. Halamka's technicians shut down =
part of the network to contain it, but that created a cascade of new =
problems.

        The entire system crashed, freezing the massive stream of =
information - prescriptions, lab tests, patient histories, Medicare =
bills - that shoots through the hospital's electronic arteries every =
day, touching every aspect of care for hundreds of patients.=20

        Within a few hours, Cisco Systems, the hospital's network =
provider, was loading thousands of pounds of network equipment onto an =
airplane in California, bound for a 2 a.m. arrival at Logan =
International Airport. In North Carolina's Research Triangle area, =
computer experts were being rousted out of bed to join a batallion of =
electronic shock troops who would troubleshoot the situation. Closer to =
home, Cisco technicians were converging on Boston from across =
Massachusetts.

        The crisis began on a Wednesday afternoon, Nov. 13, and lasted =
nearly four days. Before it was over, the hospital would revert to the =
paper systems that governed patient care in the 1970s, in some cases =
reverting to forms printed ''Beth Israel Hospital,'' from before its =
1996 merger. Hundreds of employees, from lab technicians to chief =
executive officer Paul Levy, would work overtime running a =
quarter-million sheets of paper from one end of the campus to the other. =


        And hospitals across the country - not to mention investment =
banks, insurance companies and every other business that relies on a =
constantly accessible stream of quickly-changing information - would get =
a scary reminder of how dependent they are on their networks, and what =
would happen if they disappeared.

        ''It's like the Y2K that never happened,'' said Dianne Anderson, =
vice president for patient care services at Beth Israel Deaconess.=20

        Now, Halamka - the hospital's chief information officer and a =
networking addict who answers e-mails on his Blackberry device whether =
he's at a meeting or a family dinner - is hustling to answer questions =
from all over the country, from community hospitals in Western =
Massachusetts and major medical centers such as Johns Hopkins =
University, and financial-services companies that could lose millions in =
a crash.=20

        ''The message,'' he said, ''is make sure you're ready for a =
massive disruption of your network - whether it's 9/11 or a natural =
disaster or whatever.''=20

        As a result of the crash, Beth Israel Deaconess plans to spend =
$3 million to replace its entire network - creating an entire parallel =
set of wires and switches, double the capacity the medical center =
thought it needed.=20

        No other Massachusetts hospital has ever reported such a =
long-lasting or disruptive network crash, said Elliot Stone, executive =
director of the Massachusetts Health Data Consortium, a group that =
brings together chief information officers from hospitals and health =
plans around the state. He praised Beth Israel Deaconess for being open =
about the problem and sharing lessons learned, both about technology =
itself and about policy - such as the need to enforce rules against =
unauthorized additions of new software onto the network. Not least, =
Stone said, Halamka's counterparts see the incident as ammunition in =
their constant quest to convince management to pay for network upgrades.

        The crash surprised experts in the field because most disaster =
planners mainly worry about backing up hard drives and building =
redundant servers. But in this case, it wasn't those repositories of =
information that were in trouble. It was the network itself - the =
''pipes'' that carry the information from one place to the other. It was =
like when at busy times at the office, your e-mail slows down - only so =
bad that everything ceased to function.

        ''Usually, when you think about backup, you're talking about =
backing up hard drives. You don't think about the network itself,'' said =
Mark Tuomenoksa, founder and chairman of Woburn-based OpenReach, a =
network-security consulting company.=20

        Halamka said that was the case at Beth Israel Deaconess: ''We =
don't just have a backup generator, we have a backup-backup generator, =
and then we have batteries. Servers are clustered; data writes on five =
different hard drives.'' There is even a double ''pipeline'' between the =
computer center on Tremont Street and Beth Israel Deaconess's main =
campuses - but during the crash, both were clogged.=20

        The crisis had nothing to do with the particular software the =
researcher was using. The problem had to do with a system called =
''spanning tree protocol,'' which finds the most efficient way to move =
information through the network and blocks alternate routes to prevent =
data from getting stuck in a loop. The large volume of data the =
researcher was uploading happened to be the last drop that made the =
network overflow.=20

        Halamka said Beth Israel Deaconess's recent economic troubles =
were not behind the problem. In fact, on Oct. 1, hospital officials had =
approved a consultant's plan to overhaul the network - just not quite in =
time. ''Now,'' he said, ''we're going to do it faster.''=20

        The crisis also tapped into medicine's ambivalence about =
computers. Yesterday, doctors at Brigham and Women's Hospital reported =
in the Archives of Internal Medicine that 73 percent of =
medication-related mistakes involved in malpractice claims are =
preventable and probably could be averted through computerized =
prescription ordering - the latest in a growing pile of evidence that =
computerization can cut medical errors.=20

        At the same time, clinicians have sometimes been wary of turning =
over control to a computer, Tuomenesko said: ''When I enter something =
into a computer, how do I know it got there?''

        That was part of the problem Beth Israel Deaconess had: New =
information could sometimes be entered, but since network function was =
fading in and out, clinicians weren't sure whether that information was =
being delivered. So, the hospital decided to shut down the computers - =
taping handwritten ''Do Not Use'' notes to monitors - creating an =
instant generation gap, said Anderson, the hospital's top nurse =
executive.=20

        ''Nurses and doctors over the age of 35 were very much at =
ease,'' she said. ''The younger nurses and doctors were very uncertain. =
We were teaching residents how to write orders; we were showing nurses =
how to do flow sheets.''

        Meanwhile, the hospital was figuring out how to run at its usual =
pace without the 100,000 e-mails it usually sends a day. The lab was =
dumping 3,000 results a day on paper into plastic bins, to be delivered =
by runners who came by every 10 to 15 minutes. Microbiologists were =
ferrying lab results. Cardiac fellows were digging through paper records =
to find old cardiograms to compare to new ones. People at all levels of =
the hospital hierarchy had to deal with each other face to face.

        ''The lab is usually anonymous until something goes wrong,'' =
said Gina McCormack, technical director of the West Campus lab. ''A lot =
of people realized we're here. People got to understand each other's =
jobs.''=20

        Anne Barnard can be reached, when the network is working, at =
abarnard@xxxxxxxxxx=20

        This story ran on page C1 of the Boston Globe on 11/26/2002.
        =A9 Copyright 2002 Globe Newspaper Company.=20

         =20
        =20
      =20




  =20


------=_NextPart_001_002D_01C2961A.C02D3720
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 5.50.4611.1300" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>Alan Marshall alerted me to a rather =
interesting=20
story that ran in yesterday's Boston Globe (appended to the bottom of =
this email=20
note) which described a massive computer system outage caused by a =
problem with=20
Spanning Tree.&nbsp; I followed up by writing to the head of the =
computer center=20
there.&nbsp; His reply is attached directly below.&nbsp; Basically, they =

violated the 7 hop count limit for spanning tree.&nbsp; One consequence =
of that=20
violation is that the system came down on their heads producing a three =
day=20
problem.&nbsp; If the system simply didn't allow the traffic to continue =
beyond=20
7 hop counts, that would be understandable.&nbsp; However, having the =
whole=20
system crash is a very undesirable outcome.&nbsp; Are there any changes =
that=20
should be considered in 802.1, or the standards that use Spanning Tree =
to=20
minimize the risk of a sudden massive system crash?&nbsp; </FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Tony, you may want to have 802.1 =
investigate this=20
problem.&nbsp; &nbsp;</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Other working group chairs whose =
standards use=20
spanning tree may want to alert your working groups to this newly =
discovered=20
feature.</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV>Best regards,</DIV>
<DIV>&nbsp;</DIV>
<DIV>Robert D. Love<BR>President, Resilient Packet Ring =
Alliance<BR>President,=20
LAN Connect Consultants<BR>7105 Leveret Circle&nbsp;&nbsp;&nbsp;&nbsp; =
Raleigh,=20
NC 27615<BR>Phone: 919 848-6773&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
Mobile: 919=20
810-7816<BR>email: <A=20
href=3D"mailto:rdlove@xxxxxxxx";>rdlove@xxxxxxxx</A>&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
Fax: 208 978-1187</DIV>
<DIV style=3D"FONT: 10pt arial">----- Original Message -----=20
<DIV style=3D"BACKGROUND: #e4e4e4; font-color: black"><B>From:</B> <A=20
title=3Djhalamka@xxxxxxxxxxxxxxxxxxxxx=20
href=3D"mailto:jhalamka@xxxxxxxxxxxxxxxxxxxxx";>jhalamka@xxxxxxxxxxxxxxxxx=
.edu</A>=20
</DIV>
<DIV><B>To:</B> <A title=3Drdlove@xxxxxxxx=20
href=3D"mailto:rdlove@xxxxxxxx";>rdlove@xxxxxxxx</A> </DIV>
<DIV><B>Sent:</B> Wednesday, November 27, 2002 11:06 AM</DIV>
<DIV><B>Subject:</B> RE: News Article "Beth Israel Deaconess copes with =
a=20
massive computer crash "</DIV></DIV>
<DIV><FONT face=3DArial size=3D2></FONT><BR></DIV>
<DIV><FONT size=3D2>
<P>Here's the technical explanation for you.</P>
<P>When TAC was first able to access and assess the network, we found =
the Layer=20
2 structure of the network to be unstable and out of specification with =
802.1d=20
standards. The management vlan (vlan 1) had in some locations 10 Layer2 =
hops=20
from root.</P>
<P>The conservative default values for the Spanning Tree Protocol (STP) =
impose a=20
maximum network diameter of seven. This means that two distinct bridges =
in the=20
network should not be more than seven hops away from one to the =
other.</P>
<P>Part of this restriction is coming from the age field Bridge Protocol =
Data=20
Unit (BPDU) carry: when a BPDU is propagated from the root bridge =
towards the=20
leaves of the tree, the age field is incremented each time it goes =
though a=20
bridge. Eventually, when the age field of a BPDU goes beyond max age, it =
is=20
discarded. Typically, this will occur if the root is too far away from =
some=20
bridges of the network. This issue will impact convergence of the =
spanning=20
tree.</P>
<P>A major contributor to this STP issue was the PACS network and its =
connection=20
to the CareGroup network. To eliminate its influence on the Care Group =
network=20
we isolated it with a Layer 3 boundary. All redundancy in the network =
was=20
removed to ensure no STP loops were possible.</P>
<P></P>
<P>Full connectivity was restored to remote devices and networks that =
were=20
disconnected in troubleshooting efforts prior to TACs involvement. =
Redundancy=20
was returned between the core campus devices. Spanning Tree was =
stabilized and=20
localized issues were pursued. </P>
<P>Thanks for your support.</P></FONT></DIV>
<BLOCKQUOTE style=3D"MARGIN-RIGHT: 0px">
  <DIV class=3DOutlookMessageHeader dir=3Dltr align=3Dleft><FONT =
face=3DTahoma=20
  size=3D2>-----Original Message-----<BR><B>From:</B> Robert D. Love=20
  [mailto:rdlove@xxxxxxxxx]<BR><B>Sent:</B> Wednesday, November 27, 2002 =
9:26=20
  AM<BR><B>To:</B> jhalamka@xxxxxxxxxxxxxxxxxxxxx<BR><B>Subject:</B> =
News=20
  Article "Beth Israel Deaconess copes with a massive computer crash=20
  "<BR><BR></DIV></FONT>
  <DIV><FONT face=3DArial size=3D2>Dear Dr. Halamaka, </FONT></DIV>
  <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2>I read the referenced article with =
great interest=20
  since for the past 20 years I have been deeply involved in the =
creation of=20
  IEEE 802 standards.&nbsp;&nbsp;These standards include Ethernet, WiFi, =
and=20
  other evolving standards which all use the Spanning Tree =
Protocol.&nbsp; The=20
  article brings up the question as to whether there is any fundamental =
weakness=20
  in the algorithm that those of us creating the standards need to be =
aware=20
  of.&nbsp; Therefore, I would greatly appreciate any help you can =
provide me in=20
  finding out more information to allow me to better understand any =
potential=20
  weaknesses in the Spanning Tree Algorithm.&nbsp; I would also like to =
know if=20
  you believe that this problem should be addressed by IEEE 802.3, the =
working=20
  group which created the Ethernet standards, or by other working groups =
within=20
  IEEE 802.&nbsp; </FONT></DIV>
  <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2>My concern is more than =
academic.&nbsp; I am=20
  presently the Vice Chair of the Working Group (IEEE 802.17) which is =
defining=20
  a metropolitan area networking standard, Resilient Packet Ring, which =
will=20
  also be using the spanning tree protocol.</FONT></DIV>
  <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2>My full contact information is =
contained in the=20
  signature line.&nbsp; E-mail is probably the easiest way to reach=20
  me.</FONT></DIV>
  <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2>Thank you in advance for any =
assistance you can=20
  provide me.</FONT></DIV>
  <DIV>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2>Best regards,</FONT></DIV>
  <DIV>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2>Robert D. Love<BR>President, =
Resilient Packet=20
  Ring Alliance<BR>President, LAN Connect Consultants<BR>7105 Leveret=20
  Circle&nbsp;&nbsp;&nbsp;&nbsp; Raleigh, NC 27615<BR>Phone: 919=20
  848-6773&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Mobile: 919 =
810-7816<BR>email: <A=20
  =
href=3D"mailto:rdlove@xxxxxxxx";>rdlove@xxxxxxxx</A>&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
  Fax: 208 978-1187</FONT></DIV>
  <DIV><!-- TABLING FOR MAIN PORTION OF PAGE AND SIDE ADS -->
  <TABLE cellSpacing=3D3 cellPadding=3D0 width=3D670 border=3D0>
    <TBODY>
    <TR vAlign=3Dtop>
      <TD width=3D500>
        <DIV align=3Dcenter><FONT size=3D2></FONT>&nbsp;</DIV>
        <DIV align=3Dcenter>
        <P><FONT size=3D+1>Beth Israel Deaconess copes with a massive =
computer=20
        crash</FONT>=20
        <P><FONT size=3D-1><B>By Anne Barnard, Globe Staff, =
11/26/2002</B></FONT>=20
        <P><WIRE_BODY><IMG alt=3DT=20
        =
src=3D"mhtml:mid://00000042/!http://a1636.g.akamai.net/7/1636/797/b2172ca=
3abc0af/graphics.boston.com/globe/images/dropcaps/T.gif"=20
        align=3Dleft>hirteen days ago, as his computer crunched the =
mountain of=20
        data he hoped would be his humble contribution to medical =
progress, the=20
        researcher - he shall remain nameless - got a phone call he'd<B> =

        </B>never forget.=20
        <P>
        <TABLE cellSpacing=3D0 cellPadding=3D0 width=3D160 =
align=3Dright>
          <TBODY>
          <TR>
            <TD width=3D2>&nbsp; </TD>
            <TD vAlign=3Dtop width=3D1 bgColor=3D#666666>&nbsp;</TD>
            <TD width=3D2>&nbsp; </TD>
            <TD vAlign=3Dtop width=3D160><FONT face=3Darial,helvetica =
size=3D-2><FONT=20
              face=3Darial,helvetica size=3D2>
              <P>&nbsp;</P></FONT></FONT></TD></TR></TBODY></TABLE>
        <P>It was Dr. John Halamka, the former emergency-room physician =
who runs=20
        Beth Israel Deaconess Medical Center's gigantic computer =
network. He=20
        told the professor that his flood of numbers was overwhelming =
the=20
        system, threatening to freeze thousands of electronic medical =
records=20
        and grind the hospital's network to a halt. </P>
        <P>''He said, `Oh, my God!' and pulled the plug out of the =
wall,''=20
        Halamka said last week. </P>
        <P>It was too late. Somewhere in the web of copper wires and =
glass=20
        fibers that connects the hospital's two campuses and satellite =
offices,=20
        the data was stuck in an endless loop. Halamka's technicians =
shut down=20
        part of the network to contain it, but that created a cascade of =
new=20
        problems.</P>
        <P>The entire system crashed, freezing the massive stream of =
information=20
        - prescriptions, lab tests, patient histories, Medicare bills - =
that=20
        shoots through the hospital's electronic arteries every day, =
touching=20
        every aspect of care for hundreds of patients. </P>
        <P>Within a few hours, Cisco Systems, the hospital's network =
provider,=20
        was loading thousands of pounds of network equipment onto an =
airplane in=20
        California, bound for a 2 a.m. arrival at Logan International =
Airport.=20
        In North Carolina's Research Triangle area, computer experts =
were being=20
        rousted out of bed to join a batallion of electronic shock =
troops who=20
        would troubleshoot the situation. Closer to home, Cisco =
technicians were=20
        converging on Boston from across Massachusetts.</P>
        <P>The crisis began on a Wednesday afternoon, Nov. 13, and =
lasted nearly=20
        four days. Before it was over, the hospital would revert to the =
paper=20
        systems that governed patient care in the 1970s, in some cases =
reverting=20
        to forms printed ''Beth Israel Hospital,'' from before its 1996 =
merger.=20
        Hundreds of employees, from lab technicians to chief executive =
officer=20
        Paul Levy, would work overtime running a quarter-million sheets =
of paper=20
        from one end of the campus to the other. </P>
        <P>And hospitals across the country - not to mention investment =
banks,=20
        insurance companies and every other business that relies on a =
constantly=20
        accessible stream of quickly-changing information - would get a =
scary=20
        reminder of how dependent they are on their networks, and what =
would=20
        happen if they disappeared.</P>
        <P>''It's like the Y2K that never happened,'' said Dianne =
Anderson, vice=20
        president for patient care services at Beth Israel Deaconess. =
</P>
        <P>Now, Halamka - the hospital's chief information officer and a =

        networking addict who answers e-mails on his Blackberry device =
whether=20
        he's at a meeting or a family dinner - is hustling to answer =
questions=20
        from all over the country, from community hospitals in Western=20
        Massachusetts and major medical centers such as Johns Hopkins=20
        University, and financial-services companies that could lose =
millions in=20
        a crash. </P>
        <P>''The message,'' he said, ''is make sure you're ready for a =
massive=20
        disruption of your network - whether it's 9/11 or a natural =
disaster or=20
        whatever.'' </P>
        <P>As a result of the crash, Beth Israel Deaconess plans to =
spend $3=20
        million to replace its entire network - creating an entire =
parallel set=20
        of wires and switches, double the capacity the medical center =
thought it=20
        needed. </P>
        <P>No other Massachusetts hospital has ever reported such a =
long-lasting=20
        or disruptive network crash, said Elliot Stone, executive =
director of=20
        the Massachusetts Health Data Consortium, a group that brings =
together=20
        chief information officers from hospitals and health plans =
around the=20
        state. He praised Beth Israel Deaconess for being open about the =
problem=20
        and sharing lessons learned, both about technology itself and =
about=20
        policy - such as the need to enforce rules against unauthorized=20
        additions of new software onto the network. Not least, Stone =
said,=20
        Halamka's counterparts see the incident as ammunition in their =
constant=20
        quest to convince management<B> </B>to<B> </B>pay for network=20
        upgrades.</P>
        <P>The crash surprised experts in the field because most =
disaster=20
        planners mainly worry about backing up hard drives and building=20
        redundant servers. But in this case, it wasn't those =
repositories of=20
        information that were in trouble. It was the network itself - =
the=20
        ''pipes'' that carry the information from one place to the =
other. It was=20
        like when at busy times at the office, your e-mail slows down - =
only so=20
        bad that everything ceased to function.</P>
        <P>''Usually, when you think about backup, you're talking about =
backing=20
        up hard drives. You don't think about the network itself,'' said =
Mark=20
        Tuomenoksa, founder and chairman of Woburn-based OpenReach, a=20
        network-security consulting company. </P>
        <P>Halamka said that<B> </B>was the case at Beth Israel =
Deaconess: ''We=20
        don't just have a backup generator, we have a backup-backup =
generator,=20
        and then we have batteries. Servers are clustered; data writes =
on five=20
        different hard drives.'' There is even a double ''pipeline'' =
between the=20
        computer center on Tremont Street and Beth Israel Deaconess's =
main=20
        campuses - but during the crash, both were clogged. </P>
        <P>The crisis had nothing to do with the particular software the =

        researcher was using. The problem had to do with a system called =

        ''spanning tree protocol,'' which finds the most efficient way =
to move=20
        information through the network and blocks alternate routes to =
prevent=20
        data from getting stuck in a loop. The large volume of data the=20
        researcher was uploading happened to be the last drop that made =
the=20
        network overflow. </P>
        <P>Halamka said Beth Israel Deaconess's recent economic troubles =
were=20
        not behind the problem. In fact, on Oct. 1, hospital officials =
had=20
        approved a consultant's plan to overhaul the network - just not =
quite in=20
        time. ''Now,'' he said, ''we're going to do it faster.'' </P>
        <P>The crisis also tapped into medicine's ambivalence about =
computers.=20
        Yesterday, doctors at Brigham and Women's Hospital reported in =
the=20
        Archives of Internal Medicine that 73 percent of =
medication-related=20
        mistakes involved in malpractice claims are preventable and =
probably=20
        could be averted through computerized prescription ordering - =
the latest=20
        in a growing pile of evidence that computerization can cut =
medical=20
        errors. </P>
        <P>At the same time, clinicians have sometimes been wary of =
turning over=20
        control to a computer, Tuomenesko said: ''When I enter something =
into a=20
        computer, how do I know it got there?''</P>
        <P>That was part of the problem Beth Israel Deaconess had: New=20
        information could sometimes be entered, but since network =
function was=20
        fading in and out, clinicians weren't sure whether that =
information was=20
        being delivered. So, the hospital decided to shut down the =
computers -=20
        taping handwritten ''Do Not Use'' notes to monitors - creating =
an=20
        instant generation gap, said Anderson, the hospital's top nurse=20
        executive. </P>
        <P>''Nurses and doctors over the age of 35 were very much at =
ease,'' she=20
        said. ''The younger nurses and doctors were very uncertain. We =
were=20
        teaching residents how to write orders; we were showing nurses =
how to do=20
        flow sheets.''</P>
        <P>Meanwhile, the hospital was figuring out how to run at its =
usual pace=20
        without the 100,000 e-mails it usually sends a day. The lab was =
dumping=20
        3,000 results a day on paper into plastic bins, to be delivered =
by=20
        runners who came by every 10 to 15 minutes. Microbiologists were =

        ferrying lab results. Cardiac fellows were digging through paper =
records=20
        to find old cardiograms to compare to new ones. People at all =
levels of=20
        the hospital hierarchy had to deal with each other face to =
face.</P>
        <P>''The lab is usually anonymous until something goes wrong,'' =
said=20
        Gina McCormack, technical director of the West Campus lab. ''A =
lot of=20
        people realized we're here. People got to understand each =
other's=20
        jobs.'' </P>
        <P><I>Anne Barnard can be reached, when the network is working, =
at=20
        </I><A href=3D"mailto: =
abarnard@xxxxxxxxx">abarnard@xxxxxxxxx</A><I>.=20
        </I></P></WIRE_BODY>
        <P><FONT size=3D-1>This story ran on page C1 of the Boston Globe =
on=20
        11/26/2002.<BR>=A9 <A=20
        =
href=3D"http://www.boston.com/globe/search/copyright.html";>Copyright</A> =

        2002 Globe Newspaper Company.</FONT> </P></DIV>
        <P><FONT face=3DArial size=3D2></FONT></P></TD>
      <TD width=3D10><IMG height=3D1=20
        =
src=3D"mhtml:mid://00000042/!http://a1636.g.akamai.net/7/1636/797/94fb0c3=
ed8a8f9/graphics.boston.com/globe/images/rules/1x1.gif"=20
        width=3D10 border=3D0></TD>
      <TD width=3D160><IMG height=3D1=20
        =
src=3D"mhtml:mid://00000042/!http://a1636.g.akamai.net/7/1636/797/94fb0c3=
ed8a8f9/graphics.boston.com/globe/images/rules/1x1.gif"=20
        width=3D160 border=3D0>&nbsp;<A target=3D_top=20
        =
href=3D"http://rmedia.boston.com/RealMedia/ads/click_lx.ads/www.boston.co=
m/health/globe/20444/RIGHT1/g_globesnt_rosx01a/GlobeSanta_sky800_3.gif/64=
313036663138363364316230363030"></A>=20

        <P><A target=3D_top=20
        =
href=3D"http://rmedia.boston.com/RealMedia/ads/click_lx.ads/www.boston.co=
m/health/globe/30522/RIGHT2/g_globesnt_rosx01a/clear.gif/6431303666313836=
3364316230363030"><IMG=20
        alt=3D""=20
        =
src=3D"mhtml:mid://00000042/!http://rmedia.boston.com/RealMedia/ads/Creat=
ives/g_globesnt_rosx01a/clear.gif"=20
        border=3D0></A> </P></TD></TR></TBODY></TABLE><!-- END TABLING =
FOR MAIN PORTION OF PAGE AND SIDE ADS --><BR=20
  clear=3Dall>
  <P><FONT face=3DArial size=3D2></FONT>&nbsp;
  <P><A target=3D_top=20
  =
href=3D"http://rmedia.boston.com/RealMedia/ads/click_lx.ads/www.boston.co=
m/health/globe/4260/POPUN/default/empty.gif/64313036663138363364316230363=
030"><IMG=20
  height=3D2 alt=3D""=20
  =
src=3D"mhtml:mid://00000042/!http://rmedia.boston.com/RealMedia/ads/Creat=
ives/default/empty.gif"=20
  width=3D2 border=3D0></A> </P></DIV></BLOCKQUOTE></BODY></HTML>

------=_NextPart_001_002D_01C2961A.C02D3720--