lundi 31 mars 2008

Reliability rules

here's an example of what makes a storage box unreliable. this is a bug found on Centera
that can make the whole cluster totally unresponsive

Symptom The CentraStar software running on Gen4/Gen4LP nodes
may begin to restart frequently 30-90 days after upgrading to
version 3.1.3. The IPMI driver (Intel's Intelligent Platform
Management Interface) is consuming small amounts of memory
over 30-90 days without returning it for use by other more
critical components which will then ultimately run out of
memory and subsequently cause restart of the CentraStar software
running on the node. As a result the node will go off online.

Reliability is what we (customers) are after

samedi 29 mars 2008

Reliability before Performance

Performance is a very important metric when shopping for an IT infrastructure and for storage in particular. Machines are continuously gaining in performance and it is now believed by some that performance offered by most of computing components are above what is required by the applications.
Although performance is always taken into account when choosing a solution, I believe the highest and foremost criterion to success is Reliability.
Vendors, first build a reliable centric solution, then add performance to it, thanks!

lundi 3 mars 2008

There are two choices to benchmark IO-stack (storage/OS/filesystem) against a given application:
  • Install the application
  • Install an application simulator
Installing the application can be complicated...for complicated application.
On the other hand, running an application simulator is great, because it's quicker to install and run.
As I wanted to find the best filesystem to run a mail server, I stumbled on this great mailstore simulator
I was able to run numerous test runs, before deploying my application.

other interesting simulator I used:
- filebench
- Netapp Simulator
- vdbench
- iometer
- slamd
- mailstore simulator