Discussion:
Bottleneck discovery
Kaleb Wallace
2011-12-21 17:04:25 UTC
Permalink
Kaleb Wallace [http://community.zenoss.org/people/Fortecor] created the discussion

"Bottleneck discovery"

To view the discussion, visit: http://community.zenoss.org/message/63377#63377

--------------------------------------------------------------
Hello Everyone,
  I am a new zenoss user and am loving it so far.  I have been in IT for the past 12 years.  I have been a Net Admin for the past 7.  Never once have I started venturing into a world of software troubleshooting.  I have never needed to, all of my networks I managed were small.  My customer base has been mom and pop shops with a max of 30 pc's.  Easy to troubleshoot.  In november I landed this job with a company that has over 500 nodes on their network, with a fiber backbone.  A Cisco Catalyst 6509 is the main network switch with various Cisco switches, all of the switches connected to each other with fiber. This is a distribution center and you can't walk around 1 corner without seeing another piece of the network.  We have 20 different vlans and roughly 30 servers.  Some virtualized with VMware and others physical.  Sorry about the lengthy explaination, just trying to give you the underlying network, which should help eleviate questions later.

     So here starts my issue,  I was wondering if I can use Zenoss to pinpoint my trouble.  Or if I've just gotta go old fashioned and go to one node at a time and try to figure this out.  But, I have really slow network speeds between a single fiber connection.  It was brought to our attention with users complaining about an excel spreadsheet being really slow that is shared and used by about 20 different people.  Which  our original thought, was that 20 people should not be accessing a single excel spreadsheet simultaneously.  However, this is not the problem.  I did copy tests, just copying 1 single 17MB file from one PC/Server to another.  Some over this single fiber connections and others not.  I recieved these results...

*Test 1:*
     Over a 10gb fiber connection, I was averaging a transfer of this 17MB file of 2 minutes.  These results came originating from the same computer we will call PC1.  I tried copying to 3 different servers; 1 virtual windows 2k8 (server1), 1 physical 2k8 (server2) and my linux server (linny) sitting on my desk running my personal samba server and wiki.  Originating from PC1 to all three servers the transfer was averaging 2 minutes.  Each server is across our warehouse (1/2 mile), running through a single fiber connection (1 fiber for transmit and 1 for receive) that has roughly 3 junctions in it.  Anyway, I did 4 different copies of this same file and each had similar results.
     This test showed me the tranfer of a single file from location 2 (East warehouse 1/2 mile from loaction 1) is extremely slow going to location 1 (West side of warehouse 1/2 mile from location 2)

*Test 2:*
     Now I then ran this same "copy test" from the servers (all 3) to PC1.  This file transfer went from the average 2 minutes to an average of 2 seconds.  Again each test beign performed 4 times.
     This test showed me that the same fiber connection is slow from location 2 to location 1.  But is fast and normal from location 1 to location 2.

*Test 3:*
Now, let me throw this in the mix.  When I initiated from PC1 a file transfer from server1 to server2 the transfer avg. 2 minutes.  Same times average when file transfering between all 3 servers when it is initiated from PC1.

*Test 4:*
     Next I used another PC (PC2) utilizing the same network segment as PC1 (cisco switch not taking the path through the fiber), thus taking the fiber out of the equation, and transfered the same file from PC1 to PC2 then, BAM this 17MB file took avg. 2 seconds, same times happened with transfering back from PC2 to PC1. 
     So this test rules out the PC's and this particular network segment as being the problem with slowness.

*Test 5:*
     So I then went across the warehouse to location 1 and tested from my workstation (PC3), thus taking the fiber out of the equation again, but utilizing the same network segment that the servers are, however, not being the segment that the PC's are on. The same file transfered from PC3 to all 3 servers avg. 2 seconds.  Same from all 3 servers to PC3, taking avg. 2 seconds.  I also received the same results transfering the file from server to server, using all 3 servers, initialzing the transfer this time from PC3, all this took avg 2 seeconds.
     This tests rules out the servers as the problem, this network segment as the problem and pinpoints the problem to being the fiber.

*Test 6:*
     This test just confirms the other tests.  I setup a linux box over at location 2 and started ping floods over the fiber which of course gave me exactly what I suspected, dots all over the screen.  Then I ping flooded a PC in the same segment and 1 dot.  I then went to my linux server at my desk (linny) at location 1 and did the same ping flood through the fiber and i was surprised to find that dots all over the screen.  So I have a problem somewhere in that fiber, whether it is the fiber cable itself or the switches.

*What I need:*
     Now, the fiber as I have said goes a distance of 1/2 a mile and spans the warehouse with junctions in multiple spots.  Now our warehouse is 100ft up and the fiber is snaked through all the beams and twists and turns of the warehouse, passing through aisles and pathways of the forklifts and motorized jacks.  So this is a difficult spot to just go troubleshoot the cable so i need to rule out if it is a switch or a configuration on a switch and which switch it is.  i also can't just replace equipment because our our network is so critical in getting uor product out.  This is why I am hoping Zenoss can help me pinpoint the problem, however I have no idea where to start when using the software.  And am hoping it can help me.  I have been researching this and wireshark and all sorts of other programs and it seems that zenoss could be the closest to what i need.  Does anyone have any suggestions as to how I need to start monitoring for this?

Sorry for the long spiel, but I had to lay the ground work so this thread wouldn't get really long with questions.  Thank you all for your time in reading this and your assitance.

Kaleb Wallace
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/63377#63377]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
jmp242
2011-12-21 17:25:01 UTC
Permalink
jmp242 [http://community.zenoss.org/people/jmp242] created the discussion

"Re: Bottleneck discovery"

To view the discussion, visit: http://community.zenoss.org/message/63378#63378

--------------------------------------------------------------
Zenoss, with enough work and setup, might help you notice these issues before users do, but it's not really a diagnostic tool, nor something you're going to want to spend weeks setting up and configuring and learning to troubleshoot your issue. Having a monitoring system is a very good idea, and Zenoss is (IMO) one of the best, so consider it down the road, but it won't help your immediate problem.

You need something like Pathview:
http://www.appneta.com/products/pathview/ http://www.appneta.com/products/pathview/
This will help you find network issues and give ideas as to the problem based on the network responses. Also, you might need a Fluke:
http://www.flukenetworks.com/enterprise-network/network-testing/optiview-xg-network-analysis-tablet http://www.flukenetworks.com/enterprise-network/network-testing/optiview-xg-network-analysis-tablet
that handles fiber. We use one for copper and it can tell you if there's a break and how far away it is on the cable... I don't know if it can do the same with Fiber, and they aren't going to be cheap.

--
James Pulver
ZCA Member
LEPP Computer Group
Cornell University
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/63378#63378]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Kaleb Wallace
2011-12-21 17:31:04 UTC
Permalink
Kaleb Wallace [http://community.zenoss.org/people/Fortecor] created the discussion

"Re: Bottleneck discovery"

To view the discussion, visit: http://community.zenoss.org/message/63379#63379

--------------------------------------------------------------
Thank you so much for your reply and such a speedy one at that.  I will look into these, but my problem is the IT that is already here before me, I have to prove to them what it is because they don't think it is the Ciscos or their fiber.  LOL it is quite cumbersome to jump through these hoops.  Thank you so much again.

Kaleb
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/63379#63379]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
nilie
2011-12-21 19:02:58 UTC
Permalink
nilie [http://community.zenoss.org/people/nilie] created the discussion

"Re: Bottleneck discovery"

To view the discussion, visit: http://community.zenoss.org/message/63380#63380

--------------------------------------------------------------
As far I as can see we can resume your problem to an asymetric speed over your fiber link. Try and check along the slow path over your fiber link looking speciffically for a layer 2 problem, most likely a duplex mismatch but consider any other sign like discarded packets, input/output queueing etc..
Zenoss woill not help you directly in troubleshooting this issue but if setup correctly and used on a day-to-day basis, it surely will give you some useful hints.

Hoping this will be of any help.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/63380#63380]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Kaleb Wallace
2011-12-21 20:41:15 UTC
Permalink
Kaleb Wallace [http://community.zenoss.org/people/Fortecor] created the discussion

"Re: Bottleneck discovery"

To view the discussion, visit: http://community.zenoss.org/message/63383#63383

--------------------------------------------------------------
Thank you *NILIE* i appreciate the info.  I'll start looking into your suggested information.  I am gracious for everyone's input.

Kaleb
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/63383#63383]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Loading...