Discussion:
Weird stop of snmpd after a flow off Zenoss requests
Michel Lombart
2012-03-12 20:44:04 UTC
Permalink
Michel Lombart [http://community.zenoss.org/people/mel] created the discussion

"Weird stop of snmpd after a flow off Zenoss requests"

To view the discussion, visit: http://community.zenoss.org/message/65123#65123

--------------------------------------------------------------
Hello everybody,

I've a good knowledge of Linux ( mostly Debian ) but I'm new in SNMP.

Last week, I've installed a Zenoss server to monitor 6 physical servers on the Internet ( same hosting company ). 4 of them are Proxmox PVE with around 30 virtual servers ( KVM and Openvz ) and 2 are basic Debian. The 4 Proxmox servers are the sames, including updates. In a first step, the Zenoss monitors each physical server and 10 virtual.

Saturday, I noticed that the snmpd daemon of two Proxmox physycal server sudently stop to work at the same moment. The process was still active but the server did not replied. manualy I killed the process and started the snmpd daemon on both servers. Sunday, and today, same problem. The snmpd daemon of these two Proxmox servers halted at the same moment 19:46 on Zenoss event alert !

I opened the syslog of the 6 physical servers and virtual servers. I saw some weird things.

First weird thing :

Some physical servers receive on request form Zenoss each 5 minutes. Some others 2 requests, at the same second, each 5 minutes and some others 3 requests, at the same second, each 5 minutes. On the virtual servers, there are 1 request each 5 minutes. Why these differences ?

Second weird thing :

Around 19:36 / 19:37, each servers received at the same second a lot of requests from the Zenoss server. A lot means from 70 to 140 ! It seems that is that flow which disrupts the snmpd daemon of 2 servers, each day the same. Why that flood and why does it dirupts the daemon on two servers and not on others, even those which are the same.

I've any idea of what happends neither of what log may contain more information.

Thank for your help !

Michel.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/65123#65123]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Michel Lombart
2012-03-12 20:52:38 UTC
Permalink
Michel Lombart [http://community.zenoss.org/people/mel] created the discussion

"Re: Weird stop of snmpd after a flow off Zenoss requests"

To view the discussion, visit: http://community.zenoss.org/message/65125#65125

--------------------------------------------------------------
I would add that not every virtual server are flooded !
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/65125#65125]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
jcurry
2012-03-14 11:37:26 UTC
Permalink
jcurry [http://community.zenoss.org/people/jcurry] created the discussion

"Re: Weird stop of snmpd after a flow off Zenoss requests"

To view the discussion, visit: http://community.zenoss.org/message/65184#65184

--------------------------------------------------------------
Hi Michael,
The constant time of day and your "every 12 hours" comment suggests very strongly that this is when the zenmodeler daemon cuts in - it normally runs every 12 hours.  You could check $ZENHOME/logs/zenmodeler.log to see when it last ran and also to check details for your 2 offending servers.

The modeler cycle will ask for all the information specified in the modeler plugins associated with a device (check the left-hand Modeler Plugins menu for a device to see what this amounts to).  Typically this is a load of SNMP requests.  The Configuration Properties left-hand menu will show you the SNMP timeout and retries values for a device - check whether they are different for different devices.

My suspicion would be that your 2 servers either are not responding to SNMP at all or maybe only have partial support for the questions being asked, resulting in lots of retries.  It is very unusual to manage to crash an snmpd but not impossible!  Try the snmpd.log on your servers (mine is in /var/log/net-snmpd.log).

If you find that some of the modeler plugins are asking SNMP questions that a device cannot answer, then remove that plugin.

You can always model a device from the command line (as the zenoss user) with:
zenmodeler run -v 10 -d <device name as known to Zenoss>

If you want to test the modelling for a particular plugin, say, zenoss.snmp.InterfaceMap, then you can add the --collect parameter:
zenmodeler run -v 10 -d <device name as known to Zenoss> --collect InterfaceMap

(you don't need the zenoss.snmp preface on the --collect parameter)

Cheers,
Jane
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/65184#65184]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Michel Lombart
2012-03-14 21:09:37 UTC
Permalink
Michel Lombart [http://community.zenoss.org/people/mel] created the discussion

"Re: Weird stop of snmpd after a flow off Zenoss requests"

To view the discussion, visit: http://community.zenoss.org/message/65203#65203

--------------------------------------------------------------
Thank a lot Jane !

You're right. I've checked the log of the zenmodeler and I've found that these two physical servers did not replied. Times are perfectly the same.

I was able to reproduce the problem by setting the frequency of the zenmodeler daemon at 10 minutes.

I do not believe that is a plugin which does not reply on these two physical servers because they are in a group of four servers which are almost the same, maybe the serial numbers differ :^0 . More over, a manual modeling work nice. Two of them are bugged, the two others are working.

I should check their load, even if they are low. What is really interesting, and I see that now, is that the Zenoss virtual server is on one of the problematic physical server ! So, I do not believe that is a network infrastructure issue.


I've disabled the zenmodeler daemon, after all, physical and virtual servers are very stable. That solved the problem for now. I will enable it again when I've found why these server have a slow reply delay.

Thank again !

Michel.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/65203#65203]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Loading...