Monday, March 14, 2005

The network is the computer

I am extremely disappointed with the support from a vendor for the past few weeks.

The OS is reporting ECC memory, but correctable ones that it.

As a diligent SA that I am. I logged in a service call. As there isn't any service distruptions (yet), I did not pressed on like I was from the past.

The error message persists and many alerts were triggered on the monitoring software, causing much alarm for the rest of the SA.

It went off for a few days, and while the vendor called, I told them maybe we can closed for now as they did not give any constructive solutions nor being proactive as I hope them to be.

then a few days later, the error keep comming back. triggering a lot of error messages!

This time round, I log in a service call with the vendor again. Guess what, all I have in response is their new DIMM replacement policy.

Nothing constructive. I mean if I was the vendor, I'd sugggest something to find out more details, instead of asking for data collection tools only.

I would have request the user to run some diagnostics tool. extended POST maybe, of Validation Test maybe?

oh hell no .. all I have is the pdf file of DIMM replacement policy.

If you are a SA, you will know that it doesn't thrills a least bit to know your vendor's DIMM replacement policy.

All I want is availability and performance!! By all means give me something constructive!

anyway, I requested some down time from my user, shut the box down. setted up extended diags and ran a full POST.

AH AH! 1 bank of memory was blacklisted!

the server booted up with 1GB ram less.

let see how long they take to response to this ....

gold support contract .....
*mutters*

guess i better study harder to get my RHCE.

^z

du & df inconsistency???

I am extremely puzzled that du and df inconsistency.
I know that open files will be hogging the filesystem, and df will report it as being used. but strangely, lsof / did not show anything!

nevertheless, the df command is being used for the monitoring, as such many alerts were reported in the monitoring tool.

I logged on and did some checks. upon checking, it was found that many process is hogging onto the filesystem. by grace of fuser -cu / .

I did a for loop to check on it. using ptree, pfiles with the output of fuser and found that a particular processes (hundreds of them !) is hogging the filesystem ...

hmm ... does runing process using the filesystem is hogging the filesystem?

it looks like the filesystem is meant to be an application homedirectory, and logging to it .. users are extreme inclined to run tail -f on it very much. ....
*mutters*

does that caused the df output to be high? (will find out ..)

i wonders ....