Discussion:
4.2 Process Monitoring issue (still?)
omeganon
2012-10-09 20:43:31 UTC
Permalink
omeganon [http://community.zenoss.org/people/omeganon] created the discussion

"4.2 Process Monitoring issue (still?)"

To view the discussion, visit: http://community.zenoss.org/message/68959#68959

--------------------------------------------------------------
Hi!

So I've recently gone through the upgrade process of 3.2.1 stack -> 3.2.1 RPM -> 4.2 RPM and for the most part, things seem to be working. One of the driving factors was the osprocess bug in 3.x whereby the modeler would see a process but zenmodeler wouldn't or would intermittently see it.

This issue is still occuring for me with 4.2, at least the bit about the modeler seeing a process but zenprocess not. This is happening for about 19 processes across about half as many hosts. Is anyone else seeing this? Suggestions?

Thanks in advance...

Here is one specific example --

** Process definition -

     regex: bigboard-wsgi
     Ignore paramaters when modeling: No
     Ignore parameters: No

** SNMP output -

HOST-RESOURCES-MIB::hrSWRunIndex.12720 = INTEGER: 12720
HOST-RESOURCES-MIB::hrSWRunIndex.12722 = INTEGER: 12722
HOST-RESOURCES-MIB::hrSWRunIndex.12725 = INTEGER: 12725
HOST-RESOURCES-MIB::hrSWRunIndex.12727 = INTEGER: 12727
HOST-RESOURCES-MIB::hrSWRunIndex.12729 = INTEGER: 12729
HOST-RESOURCES-MIB::hrSWRunIndex.12731 = INTEGER: 12731
HOST-RESOURCES-MIB::hrSWRunIndex.12734 = INTEGER: 12734
HOST-RESOURCES-MIB::hrSWRunIndex.12739 = INTEGER: 12739
HOST-RESOURCES-MIB::hrSWRunIndex.12744 = INTEGER: 12744
HOST-RESOURCES-MIB::hrSWRunIndex.12751 = INTEGER: 12751
HOST-RESOURCES-MIB::hrSWRunName.12720 = STRING: "apache2"
HOST-RESOURCES-MIB::hrSWRunName.12722 = STRING: "apache2"
HOST-RESOURCES-MIB::hrSWRunName.12725 = STRING: "apache2"
HOST-RESOURCES-MIB::hrSWRunName.12727 = STRING: "apache2"
HOST-RESOURCES-MIB::hrSWRunName.12729 = STRING: "apache2"
HOST-RESOURCES-MIB::hrSWRunName.12731 = STRING: "apache2"
HOST-RESOURCES-MIB::hrSWRunName.12734 = STRING: "apache2"
HOST-RESOURCES-MIB::hrSWRunName.12739 = STRING: "apache2"
HOST-RESOURCES-MIB::hrSWRunName.12744 = STRING: "apache2"
HOST-RESOURCES-MIB::hrSWRunName.12751 = STRING: "apache2"
HOST-RESOURCES-MIB::hrSWRunID.12720 = OID: SNMPv2-SMI::zeroDotZero
HOST-RESOURCES-MIB::hrSWRunID.12722 = OID: SNMPv2-SMI::zeroDotZero
HOST-RESOURCES-MIB::hrSWRunID.12725 = OID: SNMPv2-SMI::zeroDotZero
HOST-RESOURCES-MIB::hrSWRunID.12727 = OID: SNMPv2-SMI::zeroDotZero
HOST-RESOURCES-MIB::hrSWRunID.12729 = OID: SNMPv2-SMI::zeroDotZero
HOST-RESOURCES-MIB::hrSWRunID.12731 = OID: SNMPv2-SMI::zeroDotZero
HOST-RESOURCES-MIB::hrSWRunID.12734 = OID: SNMPv2-SMI::zeroDotZero
HOST-RESOURCES-MIB::hrSWRunID.12739 = OID: SNMPv2-SMI::zeroDotZero
HOST-RESOURCES-MIB::hrSWRunID.12744 = OID: SNMPv2-SMI::zeroDotZero
HOST-RESOURCES-MIB::hrSWRunID.12751 = OID: SNMPv2-SMI::zeroDotZero
HOST-RESOURCES-MIB::hrSWRunPath.12720 = STRING: "bigboard-wsgi    "
HOST-RESOURCES-MIB::hrSWRunPath.12722 = STRING: "bigboard-wsgi    "
HOST-RESOURCES-MIB::hrSWRunPath.12725 = STRING: "bigboard-wsgi    "
HOST-RESOURCES-MIB::hrSWRunPath.12727 = STRING: "bigboard-wsgi    "
HOST-RESOURCES-MIB::hrSWRunPath.12729 = STRING: "bigboard-wsgi    "
HOST-RESOURCES-MIB::hrSWRunPath.12731 = STRING: "bigboard-wsgi    "
HOST-RESOURCES-MIB::hrSWRunPath.12734 = STRING: "bigboard-wsgi    "
HOST-RESOURCES-MIB::hrSWRunPath.12739 = STRING: "bigboard-wsgi    "
HOST-RESOURCES-MIB::hrSWRunPath.12744 = STRING: "bigboard-wsgi    "
HOST-RESOURCES-MIB::hrSWRunPath.12751 = STRING: "bigboard-wsgi    "
HOST-RESOURCES-MIB::hrSWRunParameters.12720 = STRING: "-k start"
HOST-RESOURCES-MIB::hrSWRunParameters.12722 = STRING: "-k start"
HOST-RESOURCES-MIB::hrSWRunParameters.12725 = STRING: "-k start"
HOST-RESOURCES-MIB::hrSWRunParameters.12727 = STRING: "-k start"
HOST-RESOURCES-MIB::hrSWRunParameters.12729 = STRING: "-k start"
HOST-RESOURCES-MIB::hrSWRunParameters.12731 = STRING: "-k start"
HOST-RESOURCES-MIB::hrSWRunParameters.12734 = STRING: "-k start"
HOST-RESOURCES-MIB::hrSWRunParameters.12739 = STRING: "-k start"
HOST-RESOURCES-MIB::hrSWRunParameters.12744 = STRING: "-k start"
HOST-RESOURCES-MIB::hrSWRunParameters.12751 = STRING: "-k start"
HOST-RESOURCES-MIB::hrSWRunType.12720 = INTEGER: application(4)
HOST-RESOURCES-MIB::hrSWRunType.12722 = INTEGER: application(4)
HOST-RESOURCES-MIB::hrSWRunType.12725 = INTEGER: application(4)
HOST-RESOURCES-MIB::hrSWRunType.12727 = INTEGER: application(4)
HOST-RESOURCES-MIB::hrSWRunType.12729 = INTEGER: application(4)
HOST-RESOURCES-MIB::hrSWRunType.12731 = INTEGER: application(4)
HOST-RESOURCES-MIB::hrSWRunType.12734 = INTEGER: application(4)
HOST-RESOURCES-MIB::hrSWRunType.12739 = INTEGER: application(4)
HOST-RESOURCES-MIB::hrSWRunType.12744 = INTEGER: application(4)
HOST-RESOURCES-MIB::hrSWRunType.12751 = INTEGER: application(4)
HOST-RESOURCES-MIB::hrSWRunStatus.12720 = INTEGER: runnable(2)
HOST-RESOURCES-MIB::hrSWRunStatus.12722 = INTEGER: runnable(2)
HOST-RESOURCES-MIB::hrSWRunStatus.12725 = INTEGER: runnable(2)
HOST-RESOURCES-MIB::hrSWRunStatus.12727 = INTEGER: runnable(2)
HOST-RESOURCES-MIB::hrSWRunStatus.12729 = INTEGER: runnable(2)
HOST-RESOURCES-MIB::hrSWRunStatus.12731 = INTEGER: runnable(2)
HOST-RESOURCES-MIB::hrSWRunStatus.12734 = INTEGER: runnable(2)
HOST-RESOURCES-MIB::hrSWRunStatus.12739 = INTEGER: runnable(2)
HOST-RESOURCES-MIB::hrSWRunStatus.12744 = INTEGER: runnable(2)
HOST-RESOURCES-MIB::hrSWRunStatus.12751 = INTEGER: runnable(2)
HOST-RESOURCES-MIB::hrSWRunPerfCPU.12720 = INTEGER: 170264
HOST-RESOURCES-MIB::hrSWRunPerfCPU.12722 = INTEGER: 172776
HOST-RESOURCES-MIB::hrSWRunPerfCPU.12725 = INTEGER: 140665
HOST-RESOURCES-MIB::hrSWRunPerfCPU.12727 = INTEGER: 126206
HOST-RESOURCES-MIB::hrSWRunPerfCPU.12729 = INTEGER: 173027
HOST-RESOURCES-MIB::hrSWRunPerfCPU.12731 = INTEGER: 128354
HOST-RESOURCES-MIB::hrSWRunPerfCPU.12734 = INTEGER: 177407
HOST-RESOURCES-MIB::hrSWRunPerfCPU.12739 = INTEGER: 148148
HOST-RESOURCES-MIB::hrSWRunPerfCPU.12744 = INTEGER: 172852
HOST-RESOURCES-MIB::hrSWRunPerfCPU.12751 = INTEGER: 166666
HOST-RESOURCES-MIB::hrSWRunPerfMem.12720 = INTEGER: 16336 KBytes
HOST-RESOURCES-MIB::hrSWRunPerfMem.12722 = INTEGER: 16292 KBytes
HOST-RESOURCES-MIB::hrSWRunPerfMem.12725 = INTEGER: 16268 KBytes
HOST-RESOURCES-MIB::hrSWRunPerfMem.12727 = INTEGER: 16320 KBytes
HOST-RESOURCES-MIB::hrSWRunPerfMem.12729 = INTEGER: 16604 KBytes
HOST-RESOURCES-MIB::hrSWRunPerfMem.12731 = INTEGER: 16256 KBytes
HOST-RESOURCES-MIB::hrSWRunPerfMem.12734 = INTEGER: 16356 KBytes
HOST-RESOURCES-MIB::hrSWRunPerfMem.12739 = INTEGER: 16312 KBytes
HOST-RESOURCES-MIB::hrSWRunPerfMem.12744 = INTEGER: 16340 KBytes
HOST-RESOURCES-MIB::hrSWRunPerfMem.12751 = INTEGER: 16508 KBytes

** Zenmodel references --
(highly redacted; I can provide full output privately as needed). After this runs I see the process under OS Processes for the host as 'bigboard-wsgi -k start' with a Down status.

[***@buzz ~]$ zenmodeler run -v10 -d battlezone.int 2>&1 | grep bigboard-wsgi
2012-10-09 15:29:56,081 DEBUG zen.ZenModeler: Plugin zenoss.snmp.HRSWRunMap results = ({}, {<Products.DataCollector.plugins.CollectorPlugin.GetTableMap object at 0x6a5ac10>: {'.1.3.6.1.2.1.25.4.2.1.4.12744': 'bigboard-wsgi    ', '.1.3.6.1.2.1.25.4.2.1.4.12720': 'bigboard-wsgi    ', '.1.3.6.1.2.1.25.4.2.1.4.12751': 'bigboard-wsgi    ', '.1.3.6.1.2.1.25.4.2.1.4.12729': 'bigboard-wsgi    ', '.1.3.6.1.2.1.25.4.2.1.4.12722': 'bigboard-wsgi    ', '.1.3.6.1.2.1.25.4.2.1.4.12725': 'bigboard-wsgi    ', '.1.3.6.1.2.1.25.4.2.1.4.12727': 'bigboard-wsgi    ', '.1.3.6.1.2.1.25.4.2.1.4.12731': 'bigboard-wsgi    ', '.1.3.6.1.2.1.25.4.2.1.4.4977': '/usr/lib/postgresql/8.3/bin/postgres', '.1.3.6.1.2.1.25.4.2.1.4.12734': 'bigboard-wsgi    ', '.1.3.6.1.2.1.25.4.2.1.4.12739': 'bigboard-wsgi    '}}})
2012-10-09 15:29:56,084 DEBUG zen.ZenModeler: battlezone.int tabledata = {'hrSWRunEntry': {'12751': {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}, '12729': {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}, '12727': {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}, '12725': {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}, '12722': {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}, '12720': {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}, '12739': {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}, '2809': {'procName': 'pdflush', 'parameters': '', '_procPath': 'pdflush'}, '12731': {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}, 12734': {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}, '12744': {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}}}}
2012-10-09 15:29:56,091 DEBUG zen.ZenModeler: snmpidx: 12720    process: {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}
2012-10-09 15:29:56,091 DEBUG zen.ZenModeler: snmpidx: 12722    process: {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}
2012-10-09 15:29:56,092 DEBUG zen.ZenModeler: snmpidx: 12725    process: {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}
2012-10-09 15:29:56,092 DEBUG zen.ZenModeler: snmpidx: 12727    process: {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}
2012-10-09 15:29:56,092 DEBUG zen.ZenModeler: snmpidx: 12729    process: {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}
2012-10-09 15:29:56,092 DEBUG zen.ZenModeler: snmpidx: 12731    process: {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}
2012-10-09 15:29:56,092 DEBUG zen.ZenModeler: snmpidx: 12734    process: {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}
2012-10-09 15:29:56,093 DEBUG zen.ZenModeler: snmpidx: 12739    process: {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}
2012-10-09 15:29:56,093 DEBUG zen.ZenModeler: snmpidx: 12744    process: {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}
2012-10-09 15:29:56,093 DEBUG zen.ZenModeler: snmpidx: 12751    process: {'procName': 'apache2', 'parameters': '-k start', '_procPath': 'bigboard-wsgi    '}

** zenprocess run --
(redacted. I can provide full output privately if needed).

[***@buzz ~]$ zenprocess run -v10 -d battlezone.int 2>&1 | grep bigboard-wsgi
2012-10-09 15:40:42,924 DEBUG zen.zenprocess: Discarding match based on name mismatch: bigboard-wsgi bigboard-wsgi   
2012-10-09 15:40:42,934 DEBUG zen.zenprocess: Discarding match based on name mismatch: bigboard-wsgi bigboard-wsgi   
2012-10-09 15:40:42,934 DEBUG zen.zenprocess: Discarding match based on name mismatch: bigboard-wsgi bigboard-wsgi   
2012-10-09 15:40:42,934 DEBUG zen.zenprocess: Discarding match based on name mismatch: bigboard-wsgi bigboard-wsgi   
2012-10-09 15:40:42,935 DEBUG zen.zenprocess: Discarding match based on name mismatch: bigboard-wsgi bigboard-wsgi   
2012-10-09 15:40:42,935 DEBUG zen.zenprocess: Discarding match based on name mismatch: bigboard-wsgi bigboard-wsgi   
2012-10-09 15:40:42,935 DEBUG zen.zenprocess: Discarding match based on name mismatch: bigboard-wsgi bigboard-wsgi   
2012-10-09 15:40:42,935 DEBUG zen.zenprocess: Discarding match based on name mismatch: bigboard-wsgi bigboard-wsgi   
2012-10-09 15:40:42,936 DEBUG zen.zenprocess: Discarding match based on name mismatch: bigboard-wsgi bigboard-wsgi   
2012-10-09 15:40:42,936 DEBUG zen.zenprocess: Discarding match based on name mismatch: bigboard-wsgi bigboard-wsgi   
2012-10-09 15:40:42,941 WARNING zen.zenprocess: (battlezone.int) Process not running: bigboard-wsgi     -k start
   Using regex 'bigboard-wsgi'
2012-10-09 15:40:42,943 DEBUG zen.RRDUtil: /opt/zenoss/perf/Devices/battlezone.int/os/processes/bigboard-wsgi     7e2b4ad503b9da9040a4bc38538f3efe/count_count.rrd: 0.0, @ N
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/68959#68959]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
omeganon
2012-10-11 20:50:09 UTC
Permalink
omeganon [http://community.zenoss.org/people/omeganon] created the discussion

"Re: 4.2 Process Monitoring issue (still?)"

To view the discussion, visit: http://community.zenoss.org/message/69039#69039

--------------------------------------------------------------
All right, had some time to dig into this today. The problem appears to be that the processes in question have trailing spaces in the snmp output that are not being ignored by zenprocess but I'll bet they are by zenmodeler.

I added a bit of debug output to illustrate --

[***@buzz ZenRRD]$ zenprocess run -v10 -d battlezone.int 2>&1 | egrep 'bigboard-wsgi.*bigboard-wsgi'
2012-10-11 15:42:38,013 DEBUG zen.zenprocess: Regex search for self._config.regex: 'bigboard-wsgi' in processName: 'bigboard-wsgi -k start'
2012-10-11 15:42:38,014 DEBUG zen.zenprocess: Checking useName nameRe:  '(.?)bigboard\-wsgi\ \ \ \ $', cleanNameOnly:  'bigboard-wsgi', nameOnly:  'bigboard-wsgi    '
2012-10-11 15:42:38,014 DEBUG zen.zenprocess: Discarding match based on name mismatch: bigboard-wsgi bigboard-wsgi  

Code --

  if result and useName:
            nameOnly = self._config.name.rsplit(' ', 1)[0]
            cleanNameOnly = globalPrepId(name)
            nameRe = '(.?)' + re.escape(nameOnly) + '$'
            log.debug("Checking useName nameRe:  '%s', cleanNameOnly:  '%s', nameOnly:  '%s'" % (nameRe, cleanNameOnly, nameOnly))
            nameMatch = re.search(nameRe, cleanNameOnly)
            if not nameMatch or nameMatch.group(1) not in ('', '_'):
                log.debug("Discarding match based on name mismatch: %s %s" % (cleanNameOnly, nameOnly))
                result = False

I'm not sure yet what the correct behavior should be but I expect that it should be removing leading and trailing whitespace before the rsplit...
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/69039#69039]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
omeganon
2012-10-11 21:06:02 UTC
Permalink
omeganon [http://community.zenoss.org/people/omeganon] created the discussion

"Re: 4.2 Process Monitoring issue (still?)"

To view the discussion, visit: http://community.zenoss.org/message/69040#69040

--------------------------------------------------------------
well, this _works_ but I think there's probably a better solution further up the stack -

if result and useName:
     nameOnly = self._config.name.rsplit(' ', 1)[0].strip()

2012-10-11 15:59:34,725 DEBUG zen.zenprocess: Checking useName nameRe:  '(.?)bigboard\-wsgi$', cleanNameOnly:  'bigboard-wsgi', nameOnly:  'bigboard-wsgi'
2012-10-11 15:59:34,725 DEBUG zen.zenprocess: battlezone.int Found process 12751 on bigboard-wsgi     -k start bigboard-wsgi     7e2b4ad503b9da9040a4bc38538f3efe
2012-10-11 15:59:34,726 DEBUG zen.zenprocess: pre-nameonly: usr_sbin_bacula-fd bbaf4d01ceca1d2490f9dd65ec5e1c0e
2012-10-11 15:59:34,740 DEBUG zen.zenprocess: Found new bigboard-wsgi     -k start bigboard-wsgi     7e2b4ad503b9da9040a4bc38538f3efe pid 12729 on battlezone.int

The reason I think the _real_ solution is somewhere else is that the perf dir uses the spaces in the name --

/opt/zenoss/perf/Devices/battlezone.int/os/processes/bigboard-wsgi     7e2b4ad503b9da9040a4bc38538f3efe:
total 128
drwxr-x---.  2 zenoss zenoss  4096 Oct 11 15:59 .
drwxr-x---. 32 zenoss zenoss  4096 Sep 28 01:59 ..
-rw-r--r--.  1 zenoss zenoss 40440 Oct 11 15:57 count_count.rrd
-rw-rw-r--.  1 zenoss zenoss 40440 Oct 11 15:59 cpu_cpu.rrd
-rw-rw-r--.  1 zenoss zenoss 40440 Oct 11 15:59 mem_mem.rrd

but 3.2.1 didn't --

/usr/local/zenoss/zenoss/perf/Devices/battlezone.int/os/processes/bigboard-wsgi\ 7e2b4ad503b9da9040a4bc38538f3efe/
total 1316
drwxr-x---  2 zenoss zenoss   4096 2012-10-03 16:39 .
drwxr-x--- 28 zenoss zenoss   4096 2012-09-28 01:59 ..
-rw-r--r--  1 zenoss zenoss  40288 2012-10-11 16:03 count_count.rrd
-rw-r--r--  1 zenoss zenoss  40288 2012-10-11 16:03 cpu_cpu.rrd
-rw-r--r--  1 zenoss zenoss  40288 2012-10-11 16:03 mem_mem.rrd
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/69040#69040]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Loading...