Discussion:
Help with Linux thresholds
Troy Clifton
2012-01-10 16:03:21 UTC
Permalink
Troy Clifton [http://community.zenoss.org/people/tclifton] created the discussion

"Help with Linux thresholds"

To view the discussion, visit: http://community.zenoss.org/message/63638#63638

--------------------------------------------------------------
First, I'm a newbie and trying to figure this out.  Second, trying to setup some custom thresholds on typical ssh/Linux base monitoring and not real sure what the formulas should be and can't find documentation to point me in the right direction.
Any suggestions?

      •     Disk alerts for each volume on each server  (I get this one from the default thresholds)
            o     Warning – disk 85% full
            o     Critical – disk 95% full
      •     CPU Alerts
            o     Warning – CPU at 85% for 5 minutes
            o     Critical – CPU at 95% for 5 minutes
      •     Memory alerts
            o     Warning - <250Mb available
            o     Critical - <100Mb available
      •     Swap Space
            o     Warning - swap >1500Mb
            o     Critical - swap >3000Mb

Any suggestions on how to proceed with these threshold calculations?
Thanks,
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/63638#63638]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
jmp242
2012-01-10 16:36:12 UTC
Permalink
jmp242 [http://community.zenoss.org/people/jmp242] created the discussion

"Re: Help with Linux thresholds"

To view the discussion, visit: http://community.zenoss.org/message/63639#63639

--------------------------------------------------------------
Do you want the thresholds for all filesystems monitored by Zenoss, or do you want it per device (per filesystem is sort of possible with transforms, but somewhat tricky for a newbie)...

--
James Pulver
ZCA Member
LEPP Computer Group
Cornell University
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/63639#63639]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Troy Clifton
2012-01-10 16:38:22 UTC
Permalink
Troy Clifton [http://community.zenoss.org/people/tclifton] created the discussion

"Re: Help with Linux thresholds"

To view the discussion, visit: http://community.zenoss.org/message/63640#63640

--------------------------------------------------------------
I am trying to create a new base monitoring template where I copied the Device/Server/SSH/Linux Device template and just creating thresholds off that.  Then I can apply that template to a sub-device class under that S/SSH/Linux device class and it will be applied to each server in that class.
I need thresholds so I can send email alerts out as well as alerts to the console.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/63640#63640]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
jmp242
2012-01-10 16:45:35 UTC
Permalink
jmp242 [http://community.zenoss.org/people/jmp242] created the discussion

"Re: Help with Linux thresholds"

To view the discussion, visit: http://community.zenoss.org/message/63656#63656

--------------------------------------------------------------
Ok. I don't believe the filesystem thresholds are defined in the device templates, they're in /Filesystem (Though this is my upgraded system, I don't have /Device/Server/SSH in the perf templates, though it is in the modeler plugins and a Device Class...

My guess is you'll be using standard device perf templates unless you got a Zenpack to provide SSH based ones... In any event, you'll want to modify the perf template... But BE CAREFUL - don't modify the original one for testing, make a copy (override) so you don't break anything horribly and have to reinstall.

--
James Pulver
ZCA Member
LEPP Computer Group
Cornell University
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/63656#63656]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Troy Clifton
2012-01-10 16:58:41 UTC
Permalink
Troy Clifton [http://community.zenoss.org/people/tclifton] created the discussion

"Re: Help with Linux thresholds"

To view the discussion, visit: http://community.zenoss.org/message/63645#63645

--------------------------------------------------------------
Hi James,
I agree not to modify any default templates.  I have actually copied the default device template that monitors CPU, IO, memory, uptime/load into a new one.  I am trying to create the thresholds off those values, and that is where my problem presented itself.  I can’t figure out how to configure these calculations and can’t find any documentation on what it should be.  For example,
So for example, in my screen shot, making sure I choose the correct datapoint and then even more important, since the values are not typical numbers, how do you calculate real minimum and max values in the boxes?

mailto:cid:***@01CCCF86.BEB768C0 cid:***@01CCCF86.BEB768C0

Troy Clifton
Expedia Inc. | GCS Monitoring and Tools
(c) 512.968.2570 | mailto:***@expedia.com ***@expedia.com
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/63645#63645]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
jmp242
2012-01-10 20:37:15 UTC
Permalink
jmp242 [http://community.zenoss.org/people/jmp242] created the discussion

"Re: Help with Linux thresholds"

To view the discussion, visit: http://community.zenoss.org/message/63661#63661

--------------------------------------------------------------
I think, for linux, you could look at the existing threshold for ssCpuRawIdle_ssCpuRawIdle and see the minimum value is 2. This leads to an alert somewhat equivelent to CPU Utilization @ 98 percent. Some you have to do a calculation for. I'd try searching the forums to see if these are available anywhere while I try and get some additional information for you.

--
James Pulver
ZCA Member
LEPP Computer Group
Cornell University
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/63661#63661]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
dhopp
2012-01-12 01:56:08 UTC
Permalink
dhopp [http://community.zenoss.org/people/dhopp] created the discussion

"Re: Help with Linux thresholds"

To view the discussion, visit: http://community.zenoss.org/message/63690#63690

--------------------------------------------------------------
You have to understand what some of these values are.  For example with net-snmp the values labled as 'Raw' are actual timeticks and not percentages.  The default template uses a DERIVE value which will generate a percent but it is possible that the percent can be above 100% since it's a percentage of the number of processors in use (so if you have a 4 core box the max it could be is 400%).

It gets tricky because if you want to know when CPU is 95% utilized (with 100% being everything maxed out) you need the threshold to know how many cores that the server has.

What the default template does is uses ssCpuIdle (which is a value between 0% and 100%) and checks to see if it is less then 2 which essentially means 98% utilized.  For a newbie I would play with this number as it is the easiest to work with.

As for memory, again you have to think how the Linux kernel works.  Memory on a Linux server will almost always be >90% utilized because of the filesystem cache.  This is not a bad thing and with newer kernels you could even see swap utilized when you don't think it needs to be (without tuning kernel parameters).  I would probably check to see if swap is greater then 50% utilized and then 75% or something.

--Dennis
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/63690#63690]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Loading...