Discussion:
Zenoss 4.2 Core - Worklist not clearing
mfallone
2012-11-20 19:09:11 UTC
Permalink
mfallone [http://community.zenoss.org/people/mfallone] created the discussion

"Zenoss 4.2 Core - Worklist not clearing"

To view the discussion, visit: http://community.zenoss.org/message/70027#70027

--------------------------------------------------------------
Greetings I am using Zenoss Core 4.2 centos 6 x64 with 16GB of RAM and have been monitoring ~15 servers successfully for over a month.  After I added in more servers (~300) I started having problems adding in more devices and modelling my existing ones.


- When manually modeling through the UI, I see that "Zenhub has connected" but it then holds for over a minute. 
- When viewing the logs I see "2012-11-20 13:36:48,698 WARNING zen.zensyslog: No service named 'EventService': ZenHub may be disconnected" (zenoss status shows zenhub up and I can netcat to localhost:8789)
- Enabling debugging on zenhub shows that the worklist increases and holds at around 50.  I do not see anything in the 'Jobs' area of the Zenoss UI


Is there any way to clean out the worklist?  I enabled debugging in zenjobs and see the following:


2012-11-20 13:59:25,946 ERROR celery.apps.worker:
Mediator
=================================================
  File "/opt/zenoss/lib/python2.7/threading.py", line 525, in __bootstrap
    self.__bootstrap_inner()
  File "/opt/zenoss/lib/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "/opt/zenoss/lib/python/celery/utils/threads.py", line 51, in run
    self.body()
  File "/opt/zenoss/lib/python/celery/worker/mediator.py", line 69, in body
    return
  File "/opt/zenoss/lib/python/celery/worker/buckets.py", line 142, in get
    not_empty.wait(timeout)
  File "/opt/zenoss/lib/python2.7/threading.py", line 263, in wait
    _sleep(delay)
=================================================
LOCAL VARIABLES
=================================================
{'delay': 0.03471708297729492,
'endtime': 1353437965.827212,
'gotit': False,
'remaining': -9.989738464355469e-05,
'saved_state': None,
'self': <Condition(<thread.lock object at 0x5779cf0>, 1)>,
'timeout': 1.0,
'waiter': <thread.lock object at 0x57797d0>}




Thread-7
=================================================
  File "/opt/zenoss/lib/python2.7/threading.py", line 525, in __bootstrap
    self.__bootstrap_inner()
  File "/opt/zenoss/lib/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "/opt/zenoss/lib/python/billiard/pool.py", line 274, in run
    return self.body()
  File "/opt/zenoss/lib/python/billiard/pool.py", line 499, in body


Thread-5
=================================================
  File "/opt/zenoss/lib/python2.7/threading.py", line 525, in __bootstrap
    self.__bootstrap_inner()
  File "/opt/zenoss/lib/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "/opt/zenoss/lib/python/billiard/pool.py", line 274, in run
    return self.body()
  File "/opt/zenoss/lib/python/billiard/pool.py", line 300, in body
    time.sleep(0.8)
=================================================
LOCAL VARIABLES
=================================================
{'self': <Supervisor(Thread-5, started daemon 139794026075904)>}




Thread-6
=================================================
  File "/opt/zenoss/lib/python2.7/threading.py", line 525, in __bootstrap
    self.__bootstrap_inner()
  File "/opt/zenoss/lib/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "/opt/zenoss/lib/python/billiard/pool.py", line 274, in run
    return self.body()
  File "/opt/zenoss/lib/python/billiard/pool.py", line 319, in body
    for taskseq, set_length in iter(taskqueue.get, None):
  File "/opt/zenoss/lib/python2.7/Queue.py", line 168, in get
    self.not_empty.wait()
  File "/opt/zenoss/lib/python2.7/threading.py", line 244, in wait
    waiter.acquire()
=================================================
LOCAL VARIABLES
=================================================
{'saved_state': None,
'self': <Condition(<thread.lock object at 0x5779930>, 1)>,
'timeout': None,
'waiter': <thread.lock object at 0x57797f0>}
MainThread
=================================================
  File "/opt/zenoss/Products/Jobber/zenjobs.py", line 118, in <module>
    zj.run()
  File "/opt/zenoss/Products/Jobber/zenjobs.py", line 63, in run
    return self.celery.Worker(**kwargs).run()
  File "/opt/zenoss/lib/python/celery/apps/worker.py", line 140, in run
    self.run_worker()
  File "/opt/zenoss/lib/python/celery/apps/worker.py", line 222, in run_worker
    worker.start()
  File "/opt/zenoss/lib/python/celery/worker/__init__.py", line 238, in start
    component.start()
  File "/opt/zenoss/lib/python/celery/worker/consumer.py", line 350, in start
    self.consume_messages()
  File "/opt/zenoss/lib/python/celery/worker/consumer.py", line 364, in consume_messages
    self.connection.drain_events(timeout=1)
  File "/opt/zenoss/lib/python/kombu/connection.py", line 167, in drain_events
    return self.transport.drain_events(self.connection, **kwargs)
  File "/opt/zenoss/lib/python/kombu/transport/amqplib.py", line 261, in drain_events
    return connection.drain_events(**kwargs)
  File "/opt/zenoss/lib/python/kombu/transport/amqplib.py", line 93, in drain_events
    return self.wait_multi(self.channels.values(), timeout=timeout)
  File "/opt/zenoss/lib/python/kombu/transport/amqplib.py", line 99, in wait_multi
    chanmap.keys(), allowed_methods, timeout=timeout)
  File "/opt/zenoss/lib/python/kombu/transport/amqplib.py", line 158, in _wait_multiple
    channel, method_sig, args, content = read_timeout(timeout)
  File "/opt/zenoss/lib/python/kombu/transport/amqplib.py", line 131, in read_timeout
    return self.method_reader.read_method()
  File "/opt/zenoss/lib/python/amqplib/client_0_8/method_framing.py", line 218, in read_method
    self._next_method()
  File "/opt/zenoss/lib/python/amqplib/client_0_8/method_framing.py", line 133, in _next_method
    frame_type, channel, payload = self.source.read_frame()
  File "/opt/zenoss/lib/python/amqplib/client_0_8/transport.py", line 149, in read_frame
    frame_type, channel, size = unpack('>BHI', self._read(7))
  File "/opt/zenoss/lib/python/amqplib/client_0_8/transport.py", line 261, in _read
    s = self.sock.recv(65536)
  File "/opt/zenoss/lib/python/celery/apps/worker.py", line 309, in cry_handler
    logger.error("\n" + cry())
  File "/opt/zenoss/lib/python/celery/utils/__init__.py", line 145, in cry
    traceback.print_stack(frame, file=out)
=================================================
LOCAL VARIABLES
=================================================
{'frame': <frame object at 0x7f243c003800>,
'main_thread': None,
'out': <StringIO.StringIO instance at 0x4e15638>,
'sep': '=================================================\n',
't': <TaskHandler(Thread-6, started daemon 139794015586048)>,
'thread': <_MainThread(MainThread, started 139794333112064)>,
'tid': 139794333112064,
'tmap': {139793927235328: <Mediator(Mediator, started daemon 139793927235328)>,
          139793937725184: <ResultHandler(Thread-7, started daemon 139793937725184)>,
          139794015586048: <TaskHandler(Thread-6, started daemon 139794015586048)>,
          139794026075904: <Supervisor(Thread-5, started daemon 139794026075904)>,
          139794333112064: <_MainThread(MainThread, started 139794333112064)>}}




debug in zenhub.log:
2012-11-20 13:40:51,280 INFO zen: Setting logging level to DEBUG
2012-11-20 13:40:51,281 INFO zen.zenoss.protocols.amqp: error closing publisher [Errno 4] Interrupted system call
2012-11-20 13:40:51,318 DEBUG zen.Events: ===============  incoming event  ===============
2012-11-20 13:40:51,318 DEBUG zen.Events: Got a localhost zenhub heartbeat event (timeout 90 sec).
2012-11-20 13:40:51,318 DEBUG zen.zenoss.protocols.amqp: Publishing with routing key zenoss.heartbeat.localhost to exchange
zenoss.heartbeats
2012-11-20 13:40:51,343 INFO zen.ZenHub: Worker (2329) reports 2012-11-20 13:26:28,145 INFO zen.pbclientfactory: Initial con
nect timed out after 30 seconds
2012-11-20 13:40:51,343 INFO zen.ZenHub: Worker (2331) reports 2012-11-20 13:26:28,304 INFO zen.pbclientfactory: Initial con
nect timed out after 30 seconds
2012-11-20 13:40:54,154 DEBUG zen.hub: adding listener for localhost:EventService
2012-11-20 13:40:54,157 DEBUG zen.hub: adding listener for localhost:ZenStatusConfig
2012-11-20 13:40:54,180 DEBUG zen.ZenHub: worklist has 1 items
2012-11-20 13:40:54,180 DEBUG zen.ZenHub: get candidate workers for sendEvents...
2012-11-20 13:40:54,180 DEBUG zen.ZenHub: candidate workers are [0, 1]
2012-11-20 13:40:54,180 DEBUG zen.ZenHub: Giving sendEvents to worker 0, (localhost:Products.ZenHub.services.EventService.se
ndEvents)
2012-11-20 13:40:54,181 DEBUG zen.ZenHub: worklist has 1 items
2012-11-20 13:40:54,181 DEBUG zen.ZenHub: get candidate workers for getDevicePingIssues...
2012-11-20 13:40:54,181 DEBUG zen.ZenHub: candidate workers are [1]
2012-11-20 13:40:54,181 DEBUG zen.ZenHub: Giving getDevicePingIssues to worker 1, (localhost:Products.ZenHub.services.EventS
ervice.getDevicePingIssues)
2012-11-20 13:40:54,217 DEBUG zen.ZenHub: worklist has 1 items
2012-11-20 13:40:54,218 DEBUG zen.ZenHub: all workers are busy
2012-11-20 13:40:54,232 DEBUG zen.ZenHub: worker 1, work localhost:Products.ZenHub.services.EventService.getDevicePingIssues
finished in 0.0501899719238
2012-11-20 13:40:54,232 DEBUG zen.ZenHub: worklist has 1 items
2012-11-20 13:40:54,232 DEBUG zen.ZenHub: get candidate workers for getConfigProperties...
2012-11-20 13:40:54,232 DEBUG zen.ZenHub: candidate workers are [1]
2012-11-20 13:40:54,232 DEBUG zen.ZenHub: Giving getConfigProperties to worker 1, (localhost:Products.ZenHub.services.ZenSta
tusConfig.getConfigProperties)
2012-11-20 13:40:54,238 DEBUG zen.ZenHub: worker 1, work localhost:Products.ZenHub.services.ZenStatusConfig.getConfigPropert
ies finished in 0.00522994995117
2012-11-20 13:40:54,274 DEBUG zen.ZenHub: worklist has 1 items
2012-11-20 13:40:54,275 DEBUG zen.ZenHub: get candidate workers for getThresholdClasses...
2012-11-20 13:40:54,275 DEBUG zen.ZenHub: candidate workers are [1]
2012-11-20 13:40:54,275 DEBUG zen.ZenHub: Giving getThresholdClasses to worker 1, (localhost:Products.ZenHub.services.ZenSta
tusConfig.getThresholdClasses)
2012-11-20 13:40:54,283 DEBUG zen.ZenHub: worker 1, work localhost:Products.ZenHub.services.ZenStatusConfig.getThresholdClas
ses finished in 0.00764012336731
2012-11-20 13:40:54,285 DEBUG zen.ZenHub: worklist has 1 items
2012-11-20 13:40:54,285 DEBUG zen.ZenHub: get candidate workers for getCollectorThresholds...
2012-11-20 13:40:54,285 DEBUG zen.ZenHub: candidate workers are [1]
2012-11-20 13:40:54,285 DEBUG zen.ZenHub: Giving getCollectorThresholds to worker 1, (localhost:Products.ZenHub.services.Zen
StatusConfig.getCollectorThresholds)
2012-11-20 13:40:54,333 DEBUG zen.ZenHub: worker 1, work localhost:Products.ZenHub.services.ZenStatusConfig.getCollectorThre
sholds finished in 0.047210931778
2012-11-20 13:40:54,336 DEBUG zen.ZenHub: worklist has 1 items
2012-11-20 13:40:54,336 DEBUG zen.ZenHub: get candidate workers for getDeviceConfigs...
2012-11-20 13:40:54,336 DEBUG zen.ZenHub: candidate workers are [1]
2012-11-20 13:40:54,336 DEBUG zen.ZenHub: Giving getDeviceConfigs to worker 1, (localhost:Products.ZenHub.services.ZenStatus
Config.getDeviceConfigs)
2012-11-20 13:40:56,124 DEBUG zen.hub: adding listener for localhost:EventService
2012-11-20 13:40:56,127 DEBUG zen.hub: adding listener for localhost:ProcessConfig
2012-11-20 13:40:56,151 DEBUG zen.ZenHub: worklist has 1 items
2012-11-20 13:40:56,151 DEBUG zen.ZenHub: all workers are busy
2012-11-20 13:40:56,151 DEBUG zen.ZenHub: worklist has 2 items
2012-11-20 13:40:56,152 DEBUG zen.ZenHub: all workers are busy
2012-11-20 13:40:56,190 DEBUG zen.ZenHub: worklist has 3 items
2012-11-20 13:40:56,190 DEBUG zen.ZenHub: all workers are busy
2012-11-20 13:40:56,283 DEBUG zen.ZenHub: worklist has 3 items
2012-11-20 13:40:56,283 DEBUG zen.ZenHub: all workers are busy
2012-11-20 13:40:56,785 DEBUG zen.ZenHub: worker 1, work localhost:Products.ZenHub.services.ZenStatusConfig.getDeviceConfigs
finished in 2.44905090332
2012-11-20 13:40:56,786 DEBUG zen.ZenHub: worklist has 3 items
2012-11-20 13:40:56,786 DEBUG zen.ZenHub: get candidate workers for getDevicePingIssues...
2012-11-20 13:40:56,786 DEBUG zen.ZenHub: candidate workers are [1]
2012-11-20 13:40:56,837 DEBUG zen.ZenHub: Giving sendEvents to worker 1, (localhost:Products.ZenHub.services.EventService.sendEvents)
2012-11-20 13:40:56,837 DEBUG zen.ZenHub: all workers are busy
2012-11-20 13:40:57,980 DEBUG zen.ZenHub: worklist has 2 items
2012-11-20 13:40:57,980 DEBUG zen.ZenHub: all workers are busy
2012-11-20 13:41:01,284 DEBUG zen.ZenHub: worklist has 2 items
2012-11-20 13:41:01,285 DEBUG zen.ZenHub: all workers are busy
2012-11-20 13:41:06,285 DEBUG zen.ZenHub: worklist has 2 items
2012-11-20 13:41:06,285 DEBUG zen.ZenHub: all workers are busy
2012-11-20 13:41:11,286 DEBUG zen.ZenHub: worklist has 2 items
2012-11-20 13:41:11,286 DEBUG zen.ZenHub: all workers are busy
2012-11-20 13:41:16,287 DEBUG zen.ZenHub: worklist has 2 items
2012-11-20 13:41:16,287 DEBUG zen.ZenHub: all workers are busy
2012-11-20 13:41:21,287 DEBUG zen.ZenHub: worklist has 2 items
2012-11-20 13:41:21,287 DEBUG zen.ZenHub: all workers are busy
2012-11-20 13:41:21,319 DEBUG zen.Events: ===============  incoming event  ===============
2012-11-20 13:41:21,319 DEBUG zen.Events: Got a localhost zenhub heartbeat event (timeout 90 sec).
2012-11-20 13:41:21,319 DEBUG zen.zenoss.protocols.amqp: Publishing with routing key zenoss.heartbeat.localhost to exchange zenoss.heartbeats
2012-11-20 13:41:22,753 DEBUG zen.hub: adding listener for localhost:EventService
2012-11-20 13:41:22,756 DEBUG zen.hub: adding listener for localhost:EventLogConfig
2012-11-20 13:41:22,780 DEBUG zen.ZenHub: worklist has 3 items
2012-11-20 13:41:22,780 DEBUG zen.ZenHub: all workers are busy
2012-11-20 13:41:22,780 DEBUG zen.ZenHub: worklist has 4 items
2012-11-20 13:41:22,780 DEBUG zen.ZenHub: all workers are busy

Thanks,
/mike
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/70027#70027]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Loading...