lundi 31 mars 2008

Reliability rules

here's an example of what makes a storage box unreliable. this is a bug found on Centera
that can make the whole cluster totally unresponsive

Symptom The CentraStar software running on Gen4/Gen4LP nodes
may begin to restart frequently 30-90 days after upgrading to
version 3.1.3. The IPMI driver (Intel's Intelligent Platform
Management Interface) is consuming small amounts of memory
over 30-90 days without returning it for use by other more
critical components which will then ultimately run out of
memory and subsequently cause restart of the CentraStar software
running on the node. As a result the node will go off online.

Reliability is what we (customers) are after

Aucun commentaire: