texjata
2012-05-11 14:33:27 UTC
texjata [http://community.zenoss.org/people/texjata] created the discussion
"Zenprocess false positives and quirkiness"
To view the discussion, visit: http://community.zenoss.org/message/66322#66322
--------------------------------------------------------------
On our 3.2.1 zenoss RedHat package install (on Scientific Linux 5.8) the zenprocess is throwing a lot of false positives, sometimes +20 per minute. Zenprocess is currently disabled as a result.
Occasionally all monitored processes for a server begin alerting and then clearing after a minute. Forcing a remodel clears this up.
Multiple times daily, many servers will begin alerting for all monitored processes being down, which servers alert not being consistent. This is resolved by restarting zenprocess but often starts again within an hour or less. I've turned on debug logging for zenprocess but it throws no actual errors. It does appear to be alerting for process guid's (they look like guid's) as I get emails like:
Unable to read processes on device <HOSTNAME REDACTED :p ); error: usr_local_bin_python 9be7ff841d97fce0fed903685794a63d
but I can connect to the server and plainly see that the process is and has been running. Additionally, the guid is different from what the process monitor on the server says it should be. These errors go away temporarily after restarting zenprocess and stop completely when zenprocess is stopped. I've also set "Ignore Parameters" but the issue persists. Its essentially unusable as it first failed overnight and sent 7000 emails. Zenprocess is currently stopped except for testing/investigation of this issue.
History:
We ran zenoss 2.4 stack install for 2 years (I think 2). Recently I created a virtual instance, migrated to it, upgraded to 3.0.3 then upgraded to 3.2.1. This zenprocess issue is unfortunate as people here were seriously considering dropping zenoss but are very impressed with the new version, its layout and speed improvements.
What can I do to resolve this? What information can I collect and post which would be useful? Does this warrant a bug report or is this perhaps an issue with my particular upgrade/instance?
--------------------------------------------------------------
Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/66322#66322]
Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
"Zenprocess false positives and quirkiness"
To view the discussion, visit: http://community.zenoss.org/message/66322#66322
--------------------------------------------------------------
On our 3.2.1 zenoss RedHat package install (on Scientific Linux 5.8) the zenprocess is throwing a lot of false positives, sometimes +20 per minute. Zenprocess is currently disabled as a result.
Occasionally all monitored processes for a server begin alerting and then clearing after a minute. Forcing a remodel clears this up.
Multiple times daily, many servers will begin alerting for all monitored processes being down, which servers alert not being consistent. This is resolved by restarting zenprocess but often starts again within an hour or less. I've turned on debug logging for zenprocess but it throws no actual errors. It does appear to be alerting for process guid's (they look like guid's) as I get emails like:
Unable to read processes on device <HOSTNAME REDACTED :p ); error: usr_local_bin_python 9be7ff841d97fce0fed903685794a63d
but I can connect to the server and plainly see that the process is and has been running. Additionally, the guid is different from what the process monitor on the server says it should be. These errors go away temporarily after restarting zenprocess and stop completely when zenprocess is stopped. I've also set "Ignore Parameters" but the issue persists. Its essentially unusable as it first failed overnight and sent 7000 emails. Zenprocess is currently stopped except for testing/investigation of this issue.
History:
We ran zenoss 2.4 stack install for 2 years (I think 2). Recently I created a virtual instance, migrated to it, upgraded to 3.0.3 then upgraded to 3.2.1. This zenprocess issue is unfortunate as people here were seriously considering dropping zenoss but are very impressed with the new version, its layout and speed improvements.
What can I do to resolve this? What information can I collect and post which would be useful? Does this warrant a bug report or is this perhaps an issue with my particular upgrade/instance?
--------------------------------------------------------------
Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/66322#66322]
Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]