Jo Rhett
2012-08-07 23:01:59 UTC
Jo Rhett [http://community.zenoss.org/people/jorhett] created the discussion
"Has anyone else seen random bursts of query timeouts?"
To view the discussion, visit: http://community.zenoss.org/message/67736#67736
--------------------------------------------------------------
I've been chasing this for about two weeks right now, and coming up dry. Has anyone else seen the following problem? Any idea where to look?
About once every other day, we'll get a spattering of query timeouts. Pick any random ten or twenty queries, and they all time out. The event logs indicates that they were unable to get a response in time. By the time we get the alerts and login the problem has long since cleared.
Obvious troubleshooting:
1. I've set up constant ping and tcp monitoring between some of the affected systems and proved that there was no networking outage when the timeouts occurred.
2. Many of the services which have reported failures would have large primary failures other systems would notice (ie DB servers would create db failure messages in the logs) and this simply doesn't occur.
3. The timing of the messages is completely random and unrelated to load. In fact, they have happened during off-peak periods more often than during peak load.
In short, we've isolated that this "timeout" seems to be occuring inside Zenoss itself, and is not actually a problem with the remote service. Some sort of internal locking?
1. This started about two weeks ago, and there had been zero other changes to the system for many months. Not related to a change.
2. This server does NOTHING except run Zenoss. It has no cron scripts unrelated to Zenoss, etc.
3. Zenoss and SAR monitoring of the system indicate no resource consumption issues -- plenty of free memory, cpu, etc.
Environment:
 CentOS 5
  Zenoss Stack 3.2.1
  24 GB main memory
  16 cores
--------------------------------------------------------------
Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/67736#67736]
Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
"Has anyone else seen random bursts of query timeouts?"
To view the discussion, visit: http://community.zenoss.org/message/67736#67736
--------------------------------------------------------------
I've been chasing this for about two weeks right now, and coming up dry. Has anyone else seen the following problem? Any idea where to look?
About once every other day, we'll get a spattering of query timeouts. Pick any random ten or twenty queries, and they all time out. The event logs indicates that they were unable to get a response in time. By the time we get the alerts and login the problem has long since cleared.
Obvious troubleshooting:
1. I've set up constant ping and tcp monitoring between some of the affected systems and proved that there was no networking outage when the timeouts occurred.
2. Many of the services which have reported failures would have large primary failures other systems would notice (ie DB servers would create db failure messages in the logs) and this simply doesn't occur.
3. The timing of the messages is completely random and unrelated to load. In fact, they have happened during off-peak periods more often than during peak load.
In short, we've isolated that this "timeout" seems to be occuring inside Zenoss itself, and is not actually a problem with the remote service. Some sort of internal locking?
1. This started about two weeks ago, and there had been zero other changes to the system for many months. Not related to a change.
2. This server does NOTHING except run Zenoss. It has no cron scripts unrelated to Zenoss, etc.
3. Zenoss and SAR monitoring of the system indicate no resource consumption issues -- plenty of free memory, cpu, etc.
Environment:
 CentOS 5
  Zenoss Stack 3.2.1
  24 GB main memory
  16 cores
--------------------------------------------------------------
Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/67736#67736]
Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]