ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    L2 network head scratcher, losing pings to Management VLAN

    IT Discussion
    5
    18
    1.6k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • crustachioC
      crustachio @Kelly
      last edited by

      @Kelly
      Thanks. Traceroute functions as expected; when testing from the 5406 core it simply times out, no hops (which there shouldn't be since it's directly connected). It never attempts to route via another path.

      1 Reply Last reply Reply Quote 0
      • DashrenderD
        Dashrender
        last edited by

        @crustachio said in L2 network head scratcher, losing pings to Management VLAN:

        y VLANs that we're still in the process of retiring. However the management VLAN doesn't live there. The management VLAN is

        Is the VLAN ID the same for the old and the new Management VLAN?

        crustachioC 1 Reply Last reply Reply Quote 0
        • T
          thecreaitvone91 @crustachio
          last edited by

          @crustachio said in L2 network head scratcher, losing pings to Management VLAN:

          The last little piece of the puzzle... Our old "core" switch (Cisco 3750) is still in use to an extent. It's largely used to route legacy VLANs that we're still in the process of retiring. However the management VLAN doesn't live there. The management VLAN is defined/directly connected on the 5406R core switch. The old 3750 is set to route any management VLAN traffic to the "new" 5406R core. That said, the existing remote wireless links are served off the old 3750 core. So I'm wondering if there's some kind of situation that is causing traffic destined for the same remote MAC to be unsure of which direction to go (old core/new core).

          Are you sure you don't have asymmetric routing going on? You can make asymmetric work if there is a vaild reason for it however things like firewalls and routers have to be setup for this otherwise the packets will be dropped.

          1 Reply Last reply Reply Quote 0
          • notverypunnyN
            notverypunny
            last edited by

            A lot of good things to consider here so far. Keep spanning tree in mind as soon as you're dealing with topology changes and intermittent issues. It can come up and bite you in the a$$ if you've got a static config somewhere or a new vlan that isn't part of the config.

            crustachioC 1 Reply Last reply Reply Quote 0
            • crustachioC
              crustachio @Dashrender
              last edited by

              @Dashrender said in L2 network head scratcher, losing pings to Management VLAN:

              @crustachio said in L2 network head scratcher, losing pings to Management VLAN:

              y VLANs that we're still in the process of retiring. However the management VLAN doesn't live there. The management VLAN is

              Is the VLAN ID the same for the old and the new Management VLAN?

              There was no "old" management VLAN (I know right). Mgmt was done in the default VLAN (1) on the 3750. Hence the creation of a dedicated mgmt VLAN on the 5406 when we started migrating off the 3750.

              1 Reply Last reply Reply Quote 0
              • DashrenderD
                Dashrender
                last edited by

                So you made a new VLAN specifically for management, alright. and what are you pinging and from where on this new management VLAN?

                i.e. are you pinging the switch connected to the fiber on the far side? are you pinging from the switch connected to that same fiber on your side? or your PC?

                crustachioC 1 Reply Last reply Reply Quote 0
                • crustachioC
                  crustachio
                  last edited by crustachio

                  I need to clarify something I said erroneously:

                  "The old 3750 is set to route any management VLAN traffic to the "new" 5406R core. That said, the existing remote wireless links are served off the old 3750 core. So I'm wondering if there's some kind of situation that is causing traffic destined for the same remote MAC to be unsure of which direction to go (old core/new core)."

                  That bolded sentence is actually untrue. I don't know why I was thinking the wireless link was still served off the 3750. We moved it to the 5406 awhile back and it has been working fine.

                  So to clarify, the "working" link (wireless bridge) actually terminates in an L2 access switch on the roof, which trunks back to the 5406 core. The "new" fiber link terminates directly on the 5406. The mgmt VLAN only lives on the 5406. There should be no way any traffic is trying to go out to the 3750. Traceroute confirms this -- when the fiber link is working (intermittently), traceroute shows a hop from my PC to the 5406, then to the remote switch. When the fiber link is down, traceroute hops to the 5406 then dies.

                  1 Reply Last reply Reply Quote 0
                  • crustachioC
                    crustachio @Dashrender
                    last edited by crustachio

                    @Dashrender said in L2 network head scratcher, losing pings to Management VLAN:

                    So you made a new VLAN specifically for management, alright. and what are you pinging and from where on this new management VLAN?

                    i.e. are you pinging the switch connected to the fiber on the far side? are you pinging from the switch connected to that same fiber on your side? or your PC?

                    Pinging fails FROM any host in the mgmt VLAN on the local side TO any host in the mgmt VLAN on the far side. That includes the remote switch, a UPS, and a WAP.

                    On the local side, I've tried pinging from my PC (which is not ACL restricted from talking to the mgmt VLAN or anything), the core switch itself, and other switches in the mgmt VLAN. And of course our NMS.

                    I need to go back onsite and console into the remote switch to see if pings work the other way.

                    DashrenderD 1 Reply Last reply Reply Quote 0
                    • DashrenderD
                      Dashrender @crustachio
                      last edited by

                      @crustachio said in L2 network head scratcher, losing pings to Management VLAN:

                      I need to go back onsite and console into the remote switch to see if pings work the other way.

                      If you have a PC at that remote site - since you said normal data VLANs are working, you could remote into one of them and then access a switch and see if it pinging on that side is working.

                      crustachioC 1 Reply Last reply Reply Quote 0
                      • DashrenderD
                        Dashrender @crustachio
                        last edited by

                        @crustachio said in L2 network head scratcher, losing pings to Management VLAN:

                        ...we recently installed a new direct buried fiber circuit to each building

                        this is fiber you own, it doesn't go through a carrier like AT&T/Cox/Comcast/etc?

                        1 Reply Last reply Reply Quote 0
                        • crustachioC
                          crustachio @Dashrender
                          last edited by

                          @Dashrender said in L2 network head scratcher, losing pings to Management VLAN:

                          @crustachio said in L2 network head scratcher, losing pings to Management VLAN:

                          I need to go back onsite and console into the remote switch to see if pings work the other way.

                          If you have a PC at that remote site - since you said normal data VLANs are working, you could remote into one of them and then access a switch and see if it pinging on that side is working.

                          Nice suggestion but the remote PC VLAN is not authorized to SSH into to the management VLAN of the switch.

                          @Dashrender said in L2 network head scratcher, losing pings to Management VLAN:

                          @crustachio said in L2 network head scratcher, losing pings to Management VLAN:

                          ...we recently installed a new direct buried fiber circuit to each building

                          this is fiber you own, it doesn't go through a carrier like AT&T/Cox/Comcast/etc?

                          We own it, it's a simple PTP SMF span.

                          1 Reply Last reply Reply Quote 0
                          • crustachioC
                            crustachio @notverypunny
                            last edited by crustachio

                            @notverypunny said in L2 network head scratcher, losing pings to Management VLAN:

                            A lot of good things to consider here so far. Keep spanning tree in mind as soon as you're dealing with topology changes and intermittent issues. It can come up and bite you in the a$$ if you've got a static config somewhere or a new vlan that isn't part of the config.

                            I think you are on to something. I had discarded STP from being in the mix at first because we're really not doing any complicated STP -- no PVST or anything. I checked right away to confirm the 5406 was the root, and the remote switch is an appropriately low priority, and everything looked normal. But digging into the STP topology change history logs on the switches does in fact show numerous topo change requests happening, and in the last 15 minutes I've correlated intermittent responsiveness on the remote switch to topo change requests coming from a completely different L2 access switch on the LAN.

                            That switch is generating "CIST starved for a BPDU Rx on port 1 (uplink port)" error and therefore self-promoting to root, forcing topo changes across the tree.

                            If I manually set the STP priority on that switch and let STP reconverge, things go back to normal for a short while and the fiber "problem" switch comes back. Until the "CIST starved for a BPDU Rx" error reoccurs on the other switch, then things go haywire again.

                            OK, now we're getting somewhere. Not sure why that port is no longer receiving BPDU packets... filtering is not enabled, there's no root-guard in place. I'll keep digging, but now I'm on the trail.

                            THANKS

                            1 Reply Last reply Reply Quote 0
                            • crustachioC
                              crustachio
                              last edited by

                              OK still not sure why that "other" access switch on the LAN is getting starved for BPDU packets, but as a band-aid I enabled "tcn-guard" on its upstream port, to prevent it's topology change notifications from flooding the network and goofing the remote fiber switch. So far, so good.

                              I wonder if this is some odd interop issue from the fact that our old 3750 is still on the LAN running its default flavor of PVST. Our Aruba is doing MSTP and has been interop'ing fine alongside the 3750 until now. The plot thickens!

                              If nothing else this will motivate me to finish pulling the plug on that old 3750. Got some work to do yet...

                              1 Reply Last reply Reply Quote 1
                              • crustachioC
                                crustachio
                                last edited by

                                Welp, got it figured out, and it had nothing to do with any of my theories 😆

                                The "other" access switch that was generating all the BPDU starvation errors was also a remote switch at a completely different site (unrelated to this fiber replacement), connected via PTP Ubiquiti NanoBeam radio. The head-end radio, even though it was set for simple bridge mode, had STP toggled on for some [mistaken] reason. Of course Ubiquiti NanoBeams don't speak HPE MSTP, so it was borking the BPDUs to that remote switch. Since that switch was getting starved for BPDUs, it was self-promoting to root bridge. Of course on the upstream switch I had root-guard enabled to prevent the remote switch from actually becoming root, but the TCNs still propagated out and somehow kept crippling the original problem switch on the new fiber. I'm not sure why it was only causing problems on these remote switches on the new fiber, and no other switches/links, but hey.

                                Final solution: Disable STP on the Ubiquiti radio. BPDU starvation resolved immediately, remote fiber switches management VLAN connectivity restored also. Problem solved.

                                Thanks very much to all for being a sounding board and the great suggestions. Special thanks to @notverypunny for pointing me in the right direction with STP. Teaches me to step back and look at the patterns.

                                1 Reply Last reply Reply Quote 1
                                • crustachioC
                                  crustachio
                                  last edited by

                                  Post Script:

                                  Immediately following my last "solution" update, I drove over to the remote site to button things up. En route I noticed a work crew standing around a concrete bridge over a small canal, which our fiber conduit happens to runs alongside. The bridge had just collapsed (nobody injured thankfully). Conduit is torn apart pretty good but the fiber is still in tact. Not sure it will stay that way, I can't see how they'll get the bridge removed without disturbing or removing that conduit entirely. There's also a gas line that runs alongside which complicates things further.

                                  There's never a good time for something like that, but this was just plain uncanny.

                                  DashrenderD 1 Reply Last reply Reply Quote 0
                                  • DashrenderD
                                    Dashrender @crustachio
                                    last edited by

                                    @crustachio said in L2 network head scratcher, losing pings to Management VLAN:

                                    Post Script:

                                    Immediately following my last "solution" update, I drove over to the remote site to button things up. En route I noticed a work crew standing around a concrete bridge over a small canal, which our fiber conduit happens to runs alongside. The bridge had just collapsed (nobody injured thankfully). Conduit is torn apart pretty good but the fiber is still in tact. Not sure it will stay that way, I can't see how they'll get the bridge removed without disturbing or removing that conduit entirely. There's also a gas line that runs alongside which complicates things further.

                                    There's never a good time for something like that, but this was just plain uncanny.

                                    oh man - at least you still have the wifi beam connection option.

                                    1 Reply Last reply Reply Quote 0
                                    • 1 / 1
                                    • First post
                                      Last post