Discussion:
Problems with heartbeat failure
elreah
2013-07-01 08:53:11 UTC
Permalink
elreah [http://community.zenoss.org/people/elreah] created the discussion

"Problems with heartbeat failure"

To view the discussion, visit: http://community.zenoss.org/message/73791#73791

--------------------------------------------------------------
Hello

We are using Zenoss only as an event management tool. My only Problem is, that i get a lot of heartbeat failure.
Every 2-3h i get a failure for zenmodeler, zenhub, zeneventd, zenactiond and zenstatus.

We use zenoss 4.2.0 on CentOS 6.3.

I already tried to clear heartbeats but it didn't cange.

I noticed that there is sometimes a queue in rabbitmq (zenevents and rawevents) maybe this has something to do with the heartbeat events?

I really hope someone can help me.

/elreah
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/73791#73791]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
hydruid
2013-07-03 14:15:26 UTC
Permalink
hydruid [http://community.zenoss.org/people/hydruid] created the discussion

"Re: Problems with heartbeat failure"

To view the discussion, visit: http://community.zenoss.org/message/73834#73834

--------------------------------------------------------------
What device are you getting heartbeat failures for? If it's for the zenoss server itself (i.e. 127.0.0.1, localhost, etc) then delete that device. If you've already deleted that zenoss server, then add it back.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/73834#73834]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
elreah
2013-07-04 08:22:42 UTC
Permalink
elreah [http://community.zenoss.org/people/elreah] created the discussion

"Re: Problems with heartbeat failure"

To view the discussion, visit: http://community.zenoss.org/message/73820#73820

--------------------------------------------------------------
Yes, they are all for localhost.
How do you mean "delete that device"? Where can i delete this device?
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/73820#73820]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
hydruid
2013-07-04 11:43:09 UTC
Permalink
hydruid [http://community.zenoss.org/people/hydruid] created the discussion

"Re: Problems with heartbeat failure"

To view the discussion, visit: http://community.zenoss.org/message/73821#73821

--------------------------------------------------------------
Check to see if localhost or 127.0.0.1 are listed as a device under
infrastructure.... Possibly under Server/Linux.

The solution is:
A.  if it exists under infrastructure as a device, delete it
B.  If it doesn't exist under infrastructure as a device , add it

Make sense?
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/73821#73821]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
elreah
2013-07-05 14:30:57 UTC
Permalink
elreah [http://community.zenoss.org/people/elreah] created the discussion

"Re: Problems with heartbeat failure"

To view the discussion, visit: http://community.zenoss.org/message/73825#73825

--------------------------------------------------------------
Ok i first deleted it and cleared heartbeats. --> still heartbeat failures
Then i added it back. --> still heartbeat failures.

I recognized that most failures are from zensyslog.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/73825#73825]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
hydruid
2013-07-05 14:39:40 UTC
Permalink
hydruid [http://community.zenoss.org/people/hydruid] created the discussion

"Re: Problems with heartbeat failure"

To view the discussion, visit: http://community.zenoss.org/message/73841#73841

--------------------------------------------------------------
Are the heartbeats for 127.0.0.1 or localhost? If I remember correctly
(didn't use 4.2.0 long before going to 4.2.3) it's looking for localhost.
Make sure when you add it back use the "name" it's looking for (127 vs
local), clear the heartbeat events, and then restart zenoss!
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/73841#73841]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
mwcotton
2013-07-06 19:15:53 UTC
Permalink
mwcotton [http://community.zenoss.org/people/mwcotton] created the discussion

"Re: Problems with heartbeat failure"

To view the discussion, visit: http://community.zenoss.org/message/73859#73859

--------------------------------------------------------------
In earlier versions of Zenoss if you received heatbeat failures and they were associated with a specific daemon than I found that the rate of the incoming events overwhelmed the ability of the Zenoss system to process them. There were two approaches to addressing this issue 1. Get a more powerful/faster IO zenoss server. 2. Reduce the quantity of incoming events. In your case since the heartbeat errors are from zensyslog, I would suggest forwarding the syslog messages on your monitored devices to a syslog-ng install for "pre filtering" use this point to drop messages you don't care about so the Zenoss server will not have to do it. Note: like I said this was for earlier version, I have no experience with these fancy new versions but once I understood the Zenoss heartbeat mechanism, it actu
ally turned out to be my friend.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/73859#73859]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
hydruid
2013-07-06 19:24:41 UTC
Permalink
hydruid [http://community.zenoss.org/people/hydruid] created the discussion

"Re: Problems with heartbeat failure"

To view the discussion, visit: http://community.zenoss.org/message/73850#73850

--------------------------------------------------------------
Perhaps it would be best to enable the Watchdog option for that daemon in
case it gets overloaded or dies for some reason. I had 2 heartbeat failures
from a fresh install, that wasn't monitoring anything.....cleared them up
by starting the daemons!
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/73850#73850]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
mwcotton
2013-07-07 00:15:57 UTC
Permalink
mwcotton [http://community.zenoss.org/people/mwcotton] created the discussion

"Re: Problems with heartbeat failure"

To view the discussion, visit: http://community.zenoss.org/message/73861#73861

--------------------------------------------------------------
The starting of the daemons to clear the heartbeat for that daemon makes sense. The way it works is the daemon occasionally on a regular basis creates a special type of event ( I will call it the pre heartbeat event), if another of the same type doesn't come along within a certain time period a heartbeat event for the daemon is created. So if the daemon ran at one time for a short amount of time, it would have created this special pre heartbeat event, then if the daemon was shut down or died the pre heartbeat event would time out because another one was not created and the system would create the actual heartbeat event. So when you restart the daemon the pre heartbeat event would be updated and the heartbeat event would be moved to history. I don't know what this "pre heartbeat" event is actually called, I just named it "pre heartbeat". If you have a heartbeat event for daemon XX and you dont need and never plan to run the XX daemon, you can move the heartbeat event to history and it will not reappear. The heartbeat mechanism is actually a very cool feature of zenoss and I don't understand why it doesn't get more love.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/73861#73861]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Loading...