Discussion:
Zenprocess false positives and quirkiness
texjata
2012-05-11 14:33:27 UTC
Permalink
texjata [http://community.zenoss.org/people/texjata] created the discussion

"Zenprocess false positives and quirkiness"

To view the discussion, visit: http://community.zenoss.org/message/66322#66322

--------------------------------------------------------------
On our 3.2.1 zenoss RedHat package install (on Scientific Linux 5.8) the zenprocess is throwing a lot of false positives, sometimes +20 per minute.  Zenprocess is currently disabled as a result.

Occasionally all monitored processes for a server begin alerting and then clearing after a minute.  Forcing a remodel clears this up.
Multiple times daily, many servers will begin alerting for all monitored processes being down, which servers alert not being consistent.  This is resolved by restarting zenprocess but often starts again within an hour or less.  I've turned on debug logging for zenprocess but it throws no actual errors.  It does appear to be alerting for process guid's (they look like guid's) as I get emails like:
Unable to read processes on device <HOSTNAME REDACTED :p ); error: usr_local_bin_python 9be7ff841d97fce0fed903685794a63d

but I can connect to the server and plainly see that the process is and has been running.  Additionally, the guid is different from what the process monitor on the server says it should be.  These errors go away temporarily after restarting zenprocess and stop completely when zenprocess is stopped.  I've also set "Ignore Parameters" but the issue persists. Its essentially unusable as it first failed overnight and sent 7000 emails.  Zenprocess is currently stopped except for testing/investigation of this issue.

History:
We ran zenoss 2.4 stack install for 2 years (I think 2).  Recently I created a virtual instance, migrated to it, upgraded to 3.0.3 then upgraded to 3.2.1.  This zenprocess issue is unfortunate as people here were seriously considering dropping zenoss but are very impressed with the new version, its layout and speed improvements.

What can I do to resolve this?  What information can I collect and post which would be useful?  Does this warrant a bug report or is this perhaps an issue with my particular upgrade/instance?
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/66322#66322]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
James Stewart
2012-08-30 05:24:09 UTC
Permalink
James Stewart [http://community.zenoss.org/people/amorphic] created the discussion

"Re: Zenprocess false positives and quirkiness"

To view the discussion, visit: http://community.zenoss.org/message/68306#68306

--------------------------------------------------------------
Hi Tex,

I've upgraded our backup server from 3.1.0 to 3.2.1 for evaluation and I'm running into the exact same problems that you describe. Remodelling a device or even just enabling/disabling monitoring of a given process confuses the zenprocess daemon and the only solution is a daemon restart.

It seems that there was a major refactor of zenprocess from 3.1 - 3.2 which wasn't tested thoroughly before release. From looking at bug reports it seems that the developers gave up on fixing this in 3.x, advising people to instead upgrade to 4.x, (which was still in beta at the time).

4.x is still a little fresh for us to roll out into production and we rely too heavily on process monitoring to upgrade to 3.2.1 as it stands. Did you ever find any workarounds to alleviate the zenprocess problems in 3.2.1?

Cheers,

James
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/68306#68306]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
texjata
2012-08-30 12:59:29 UTC
Permalink
texjata [http://community.zenoss.org/people/texjata] created the discussion

"Re: Zenprocess false positives and quirkiness"

To view the discussion, visit: http://community.zenoss.org/message/68316#68316

--------------------------------------------------------------
Sadly I haven't found an actual solution.  My workaround has been to change the modeling frequency to weekly (and I'm tempted to disable automated modeling altogether) and to set up a cron job to restart zenprocess every 4 hours.  The latter is really to prevent zenoss initiated spam attacks when a couple servers are modelled during the sleeping hours.  I've also had to lower the priority of process related alerts, modifying important ones on a case by case basis.

The real downside is that my superiors are considering alternate monitoring solutions, which is unfortunate because, though zenoss has its flaws, I do like it quite a bit.  Easy for newbies to get into, flexible enough to get creative with ;)   Cheap enough to start with and a support option if you can get the money.

I'm hoping to have the time to test an upgrade to the 4.X branch though that may be a while.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/68316#68316]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Loading...