Discussion:
3.2.X Process monitoring bugs
brockp
2011-11-11 02:59:31 UTC
Permalink
brockp [http://community.zenoss.org/people/brockp] created the discussion

"3.2.X Process monitoring bugs"

To view the discussion, visit: http://community.zenoss.org/message/62576#62576

--------------------------------------------------------------
Starting with 3.2.0 and continues in 3.2.1 there is a bug where processes that are found by the modler are 'not found' by zenprocess and showup as down even thought the processes are running.

This is easy to reproduce, the error manifests its self if you have two mysqld's running (like your normal system mysql and zenoss's mysqld.bin).

Use a regex of 'mysqld' for the process and say to ignore command line parameters, in my case an ubuntu machines with zenoss 3.2.1 and the stock ubuntu mysql you zenoss will find 3 mysqld's from modler:

2011-11-10 21:42:33,647 DEBUG zen.ZenModeler: snmpidx: 1070 process: {'procName': 'mysqld', 'parameters': '', '_procPath': '/usr/sbin/mysqld'}
2011-11-10 21:42:33,651 DEBUG zen.ZenModeler: snmpidx: 15432    process: {'procName': 'mysqld_safe', 'parameters': '/usr/local/zenoss/mysql/bin/mysqld_safe --defaults-file=/usr/local/zenoss/mysql/my.cnf --port=3307 --socket=/usr/local/zenoss/my', '_procPath': '/bin/sh'}
2011-11-10 21:42:33,651 DEBUG zen.ZenModeler: snmpidx: 15491    process: {'procName': 'mysqld.bin', 'parameters': '--defaults-file=/usr/local/zenoss/mysql/my.cnf --basedir=/usr/local/zenoss/mysql --datadir=/usr/local/zenoss/mysql/data --user=m', '_procPath': '/usr/local/zenoss/mysql/bin/mysqld.bin'}

Zen process on the other hand will only fine one of them:
2011-11-10 21:47:45,814 DEBUG zen.zenprocess: Found process 1070 on usr_local_zenoss_mysql_bin_mysqld.bin
2011-11-10 21:47:45,817 DEBUG zen.zenprocess: Found process 15491 on usr_local_zenoss_mysql_bin_mysqld.bin

2011-11-10 21:47:45,842 DEBUG zen.zenprocess: Queueing event {'monitor': 'localhost', 'component': '/usr/sbin/mysqld', 'agent': 'zenprocess', 'summary': 'Process not running: /usr/sbin/mysqld', 'manager': 'localhost6.localdomain6', 'eventGroup': 'Process', 'eventKey': '/Processes/MySQL/osProcessClasses/mysqld', 'device': 'myth', 'eventClass': '/Status/OSProcess', 'message': "Process not running: /usr/sbin/mysqld\n Using regex 'mysqld' \nAll Processes have stopped since the last model occurred. Last Modification time (2011/11/10 21:42:39)", 'severity': 4}

2011-11-10 21:47:45,843 DEBUG zen.zenprocess: Queueing event {'monitor': 'localhost', 'component': '/bin/sh', 'agent': 'zenprocess', 'summary': 'Process not running: /bin/sh', 'manager': 'localhost6.localdomain6', 'eventGroup': 'Process', 'eventKey': '/Processes/MySQL/osProcessClasses/mysqld', 'device': 'myth', 'eventClass': '/Status/OSProcess', 'message': "Process not running: /bin/sh\n Using regex 'mysqld' \nAll Processes have stopped since the last model occurred. Last Modification time (2011/11/10 21:42:39)", 'severity': 4}

Anoying thing is they are still running:
ps aux | grep mysqld
mysql     1070  0.1  0.5 848600 11732 ?        Ssl  10:24   0:52 /usr/sbin/mysqld
root     15432  0.0  0.0   4220   620 pts/0    S    20:50   0:00 /bin/sh /usr/local/zenoss/mysql/bin/mysqld_safe --defaults-file=/usr/local/zenoss/mysql/my.cnf --port=3307 --socket=/usr/local/zenoss/mysql/tmp/mysql.sock --old-passwords --datadir=/usr/local/zenoss/mysql/data --log-error=/usr/local/zenoss/mysql/data/mysqld.log --pid-file=/usr/local/zenoss/mysql/data/myth.pid --lower-case-table-names=1 --default-table-type=InnoDB
mysql    15491  0.1  1.2 196468 25552 pts/0    Sl   20:50   0:04 /usr/local/zenoss/mysql/bin/mysqld.bin --defaults-file=/usr/local/zenoss/mysql/my.cnf --basedir=/usr/local/zenoss/mysql --datadir=/usr/local/zenoss/mysql/data --user=mysql --pid-file=/usr/local/zenoss/mysql/data/myth.pid --skip-external-locking --port=3307 --socket=/usr/local/zenoss/mysql/tmp/mysql.sock --old-passwords --lower-case-table-names=1 --default-table-type=InnoDB
zenoss   18318  0.0  0.0   9140  1064 pts/0    S+   21:56   0:00 grep --color=auto mysqld

This used to work just fine in zenoss 3.1.x

If you massage the regex so to exclude the one it finds, example change 'mysqld' to 'mysqld$'  the system will start showing /usr/bin/mysqld as up, won't find mysqld.bin (as expected)  so zenoss should see the process just isn't displaying it correctly.

If you want my zenmodler or zenprocess log files let me know.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/62576#62576]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Luca Maranzano
2011-11-22 14:06:27 UTC
Permalink
Luca Maranzano [http://community.zenoss.org/people/liuk] created the discussion

"Re: 3.2.X Process monitoring bugs"

To view the discussion, visit: http://community.zenoss.org/message/62766#62766

--------------------------------------------------------------
Hi!
Same problem is occurring on our Zenoss 3.2.1 just upgraded from 3.1.0.

Besides, from Infrastructure -> Processes -> Process Instances all entries are marked in RED as DOWN!

I'll try to delete some Process and recreate from scratch to see it the error persists.

More later,
Luca
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/62766#62766]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
wizard113
2012-01-04 01:33:31 UTC
Permalink
wizard113 [http://community.zenoss.org/people/wizard113] created the discussion

"Re: 3.2.X Process monitoring bugs"

To view the discussion, visit: http://community.zenoss.org/message/63501#63501

--------------------------------------------------------------
Have you tried restarting zenprocess?  I ran into the same thin as I tried to monitor for the puppetd process, and restarting zenprocess did the trick.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/63501#63501]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
nozen
2012-01-04 08:30:22 UTC
Permalink
nozen [http://community.zenoss.org/people/nozen] created the discussion

"Re: 3.2.X Process monitoring bugs"

To view the discussion, visit: http://community.zenoss.org/message/63521#63521

--------------------------------------------------------------
i'm running 3.2.1 and i've seen the problem your having i'm only just starting to look at process monitoring.

but i found this post to be most useful and this solved my linux process monitoring.

i can't find the thread but it was on this forum and i used it today, bascially go through in this order:

1. Delete process
2. model device
3. add process and include any changes e.g. error level, zmonitor etc
4. model device
5. should be ok now

its not quite what you talking about it but i'm curious to see if it works.

also check out this ticket

 http://dev.zenoss.org/trac/ticket/7870 http://dev.zenoss.org/trac/ticket/7870

cheers,
nozen.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/63521#63521]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Luca Maranzano
2012-01-05 22:22:23 UTC
Permalink
Luca Maranzano [http://community.zenoss.org/people/liuk] created the discussion

"Re: 3.2.X Process monitoring bugs"

To view the discussion, visit: http://community.zenoss.org/message/63531#63531

--------------------------------------------------------------
Restarting zenprocess didn't help and even recreating the process the problem is still present.

In the ticket 7870 on trac someone from Zenoss posted a message saying it has been fixed, but it seems impossibile to have any update about this issue (patch, new minor release, bho!), despite several requests from different users.
This is quite annoying IMVHO.

Still waiting.
Cheers,
Luca
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/63531#63531]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
jshardlow
2012-01-30 11:24:17 UTC
Permalink
jshardlow [http://community.zenoss.org/people/jshardlow] created the discussion

"Re: 3.2.X Process monitoring bugs"

To view the discussion, visit: http://community.zenoss.org/message/63970#63970

--------------------------------------------------------------
Have just upgraded to 3.2.1 myself and this is driving me crazy. In my case we have a script running a daemon (both found in ps/snmpwalk). Zenoss will pick up both, but one will be marked up and the other down. Annoying as same setup was working fine in 2.5.2.

Even messing about with regex to try and ignore the script process doesn't seem to help either.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/63970#63970]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Luca Maranzano
2012-01-30 21:50:40 UTC
Permalink
Luca Maranzano [http://community.zenoss.org/people/liuk] created the discussion

"Re: 3.2.X Process monitoring bugs"

To view the discussion, visit: http://community.zenoss.org/message/63994#63994

--------------------------------------------------------------
We have similar issues only for certain similar process, still waiting the fix, see the Ticket 7870 on Trac.

Really annoying!

Cheers
Luca
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/63994#63994]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
omeganon
2012-01-31 15:48:08 UTC
Permalink
omeganon [http://community.zenoss.org/people/omeganon] created the discussion

"Re: 3.2.X Process monitoring bugs"

To view the discussion, visit: http://community.zenoss.org/message/64050#64050

--------------------------------------------------------------
This has been a longstanding bug that I've seen since 2.5.2. Process monitoring is very unreliable with several long running false positives daily. They'll show as down for hours or days then return to up status with no change on the device being monitored. I too am eagerly awaiting the fix for #7870 and have been watching it for months...
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/64050#64050]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Loading...