Discussion:
Zenoss stop monitoring localhost every 1-2 days
James M
2013-01-24 09:31:34 UTC
Permalink
James M [http://community.zenoss.org/people/James] created the discussion

"Zenoss stop monitoring localhost every 1-2 days"

To view the discussion, visit: http://community.zenoss.org/message/71190#71190

--------------------------------------------------------------
Hey
After upgrading to 4.23 (OS: CentOS 5.4)
From some reason Zenoss server having troubles to monitor itself (localhost)
Every 1-2 days I'm getting snmp timeouts errors.

Examples:
"Unable to read processes on device localhost; Timeout on device"

"Scan stopped; Collection time exceeded interval - Elapsed time 1080.009079 seconds greater than 180 seconds"

All other snmp monitors are working perfectly, only localhost fails.
It's not self-healing and losing data (in the 1st time - lost all data for localhot for a day).

Restarting zenoss does not fixes it.
Only server reboot.

Any ideas?
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/71190#71190]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
jmp242
2013-01-24 13:54:53 UTC
Permalink
jmp242 [http://community.zenoss.org/people/jmp242] created the discussion

"Re: Zenoss stop monitoring localhost every 1-2 days"

To view the discussion, visit: http://community.zenoss.org/message/71205#71205

--------------------------------------------------------------
Can you snmpwalk the zenoss server ... it sounds like something is hanging up net-snmp on the server if a reboot is what you have to do to fix it. Next time, just try restarting net-snmp and see if that fixes it...

--
James Pulver
ZCA Member
LEPP Computer Group
Cornell University
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/71205#71205]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
James M
2013-01-24 14:03:11 UTC
Permalink
James M [http://community.zenoss.org/people/James] created the discussion

"Re: Zenoss stop monitoring localhost every 1-2 days"

To view the discussion, visit: http://community.zenoss.org/message/71195#71195

--------------------------------------------------------------
Thanks James
I'll do that next time I'll notice the problem and will update with results.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/71195#71195]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
James M
2013-01-27 07:35:39 UTC
Permalink
James M [http://community.zenoss.org/people/James] created the discussion

"Re: Zenoss stop monitoring localhost every 1-2 days"

To view the discussion, visit: http://community.zenoss.org/message/71247#71247

--------------------------------------------------------------
Hey

OK this is getting weird..

While the problem is occurring I've noticed Zenoss server indicated both interfaces are down (while it's clearly up, I'm able to login via ssh and also checked interfaces status via CLI, and clearly other monitors works which means that zenoss is able to monitor it)

snmpwalk failed from zenoss GUI and from CLI.

"
[***@zenossprod-2 ~]# snmpwalk -v2c -cpublic 127.0.0.1:161 system
Timeout: No Response from 127.0.0.1:161
"

I've restarted snmpd service and after that snmpwalk succeeded and the localhost monitors seems to be back to normal.

Any Idea how to prevent it from happening again?
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/71247#71247]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
jmp242
2013-01-28 15:09:28 UTC
Permalink
jmp242 [http://community.zenoss.org/people/jmp242] created the discussion

"Re: Zenoss stop monitoring localhost every 1-2 days"

To view the discussion, visit: http://community.zenoss.org/message/71239#71239

--------------------------------------------------------------
It sounds like you need to debug what's happening with net-snmp - I take it it never recovers on it's own? So it's not system load... I don't know much about fixing the snmp daemon itself...

--
James Pulver
ZCA Member
LEPP Computer Group
Cornell University
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/71239#71239]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
dhopp
2013-01-28 20:06:12 UTC
Permalink
dhopp [http://community.zenoss.org/people/dhopp] created the discussion

"Re: Zenoss stop monitoring localhost every 1-2 days"

To view the discussion, visit: http://community.zenoss.org/message/71255#71255

--------------------------------------------------------------
Next time it stops responding, can you verify something is listening on port 161?

netstat -an | grep :161

Also anything in /var/log/messages or /var/log/secure? 

--Dennis
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/71255#71255]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
James M
2013-01-31 15:14:20 UTC
Permalink
James M [http://community.zenoss.org/people/James] created the discussion

"Re: Zenoss stop monitoring localhost every 1-2 days"

To view the discussion, visit: http://community.zenoss.org/message/71337#71337

--------------------------------------------------------------
Hey
It does listening to port 161



[***@zenossprod-2 ~]# netstat -an | grep :161
udp        0      0 0.0.0.0:161                 0.0.0.0:*  


In messages I can see this errors:

Jan 31 15:08:18 zenossprod-2 snmpd[2955]: warning: cannot open /etc/hosts.allow: Too many open files
Jan 31 15:08:18 zenossprod-2 snmpd[2955]: warning: cannot open /etc/hosts.deny: Too many open files
Jan 31 15:08:18 zenossprod-2 snmpd[2955]: Connection from UDP: [10.13.4.144]:58909 REFUSED
Jan 31 15:08:20 zenossprod-2 snmpd[2955]: /proc/stat: Too many open files
Jan 31 15:08:22 zenossprod-2 snmpd[2955]: cannot open /proc/net/dev ...
Jan 31 15:08:22 zenossprod-2 snmpd[2955]: could not create socket
Jan 31 15:08:25 zenossprod-2 snmpd[2955]: /proc/stat: Too many open files
Jan 31 15:08:25 zenossprod-2 snmpd[2955]: could not open /proc/net/arp
Jan 31 15:08:25 zenossprod-2 snmpd[2955]: Unable to create netlink socket
Jan 31 15:08:30 zenossprod-2 snmpd[2955]: /proc/stat: Too many open files

Files attached.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/71337#71337]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
nilie
2013-01-31 19:52:43 UTC
Permalink
nilie [http://community.zenoss.org/people/nilie] created the discussion

"Re: Zenoss stop monitoring localhost every 1-2 days"

To view the discussion, visit: http://community.zenoss.org/message/71342#71342

--------------------------------------------------------------
Seems your server has reached the maximum open files limit and can't open new files. You should try investigating what process is consuming the resource. Hint : look at the results of lsof command.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/71342#71342]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
James M
2013-02-06 08:26:13 UTC
Permalink
James M [http://community.zenoss.org/people/James] created the discussion

"Re: Zenoss stop monitoring localhost every 1-2 days"

To view the discussion, visit: http://community.zenoss.org/message/71391#71391

--------------------------------------------------------------
Thanks a lot nilie!
I've changed the open files limit and it seems that the problem resolved!
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/71391#71391]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Loading...