Discussion:
Has anyone else seen random bursts of query timeouts?
Jo Rhett
2012-08-07 23:01:59 UTC
Permalink
Jo Rhett [http://community.zenoss.org/people/jorhett] created the discussion

"Has anyone else seen random bursts of query timeouts?"

To view the discussion, visit: http://community.zenoss.org/message/67736#67736

--------------------------------------------------------------
I've been chasing this for about two weeks right now, and coming up dry. Has anyone else seen the following problem? Any idea where to look?

About once every other day, we'll get a spattering of query timeouts. Pick any random ten or twenty queries, and they all time out. The event logs indicates that they were unable to get a response in time. By the time we get the alerts and login the problem has long since cleared.

Obvious troubleshooting:

1. I've set up constant ping and tcp monitoring between some of the affected systems and proved that there was no networking outage when the timeouts occurred.

2. Many of the services which have reported failures would have large primary failures other systems would notice (ie DB servers would create db failure messages in the logs) and this simply doesn't occur.

3. The timing of the messages is completely random and unrelated to load. In fact, they have happened during off-peak periods more often than during peak load.

In short, we've isolated that this "timeout" seems to be occuring inside Zenoss itself, and is not actually a problem with the remote service. Some sort of internal locking?

1. This started about two weeks ago, and there had been zero other changes to the system for many months. Not related to a change.

2. This server does NOTHING except run Zenoss. It has no cron scripts unrelated to Zenoss, etc.

3. Zenoss and SAR monitoring of the system indicate no resource consumption issues -- plenty of free memory, cpu, etc.

Environment:
  CentOS 5
   Zenoss Stack 3.2.1
   24 GB main memory
   16 cores
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/67736#67736]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Jo Rhett
2012-08-27 18:53:12 UTC
Permalink
Jo Rhett [http://community.zenoss.org/people/jorhett] created the discussion

"Re: Has anyone else seen random bursts of query timeouts?"

To view the discussion, visit: http://community.zenoss.org/message/68178#68178

--------------------------------------------------------------
Nobody has seen this before?
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/68178#68178]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
nilie
2012-08-28 02:41:00 UTC
Permalink
nilie [http://community.zenoss.org/people/nilie] created the discussion

"Re: Has anyone else seen random bursts of query timeouts?"

To view the discussion, visit: http://community.zenoss.org/message/68184#68184

--------------------------------------------------------------
Are these snmp queries ?
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/68184#68184]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Jo Rhett
2012-08-28 07:32:14 UTC
Permalink
Jo Rhett [http://community.zenoss.org/people/jorhett] created the discussion

"Re: Has anyone else seen random bursts of query timeouts?"

To view the discussion, visit: http://community.zenoss.org/message/68185#68185

--------------------------------------------------------------
SNMP queries. SQL queries. Localhost commands. Everything and anything. There's no consistency in this, it appears to be "every check command run in that one minute interval".

The nature of it made me suspect that we were running out of memory, file handles, or something like that but after setting up some extensive reporting I am certain that this is not what is happening. I/O doesn't spike, it actually drops during the outage period. No issues hitting file descriptor or any other limits.

My best guess is that there is some global lock contention within Zenoss itself that we are slamming into.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/68185#68185]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
rolfs
2012-08-28 21:46:18 UTC
Permalink
rolfs [http://community.zenoss.org/people/rolfs] created the discussion

"Re: Has anyone else seen random bursts of query timeouts?"

To view the discussion, visit: http://community.zenoss.org/message/68232#68232

--------------------------------------------------------------
dhcp or network card issues? is it using static ip?
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/68232#68232]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Jo Rhett
2012-08-28 22:59:31 UTC
Permalink
Jo Rhett [http://community.zenoss.org/people/jorhett] created the discussion

"Re: Has anyone else seen random bursts of query timeouts?"

To view the discussion, visit: http://community.zenoss.org/message/68225#68225

--------------------------------------------------------------
Static IP. No networking problems. No packet loss to the system during these outages (left a spray running), and tests running on the local system that use no networking fail at the same time.

Whatever the problem is, it's internal to the queueing mechanism for running tests.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/68225#68225]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Loading...