ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    ZeroTier network blip

    IT Discussion
    zerotier
    5
    14
    2.8k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DashrenderD
      Dashrender
      last edited by

      So I'm using ZeroTier in production (one server and one user).

      Said user contacted me last night that they couldn't connect to the network share.

      I hopped online to see the status and the main ZeroTier website was down. I didn't have ZT installed on my laptop, so I couldn't test it from my location.

      I tweeted to ZT and within 5 mins they responded that they didn't see any issues, and I checked the website, and it was back. Had the user try it and it too was working.

      @adam-ierymenko any thoughts on what happened?

      1 Reply Last reply Reply Quote 1
      • scottalanmillerS
        scottalanmiller
        last edited by

        DNS issue on your end or networking routing issue? Are you using Google DNS? I've had Verizon routes fail for hours before, losing access to whole countries, but only from certain endpoints.

        1 Reply Last reply Reply Quote 0
        • DashrenderD
          Dashrender
          last edited by

          Both the website and the ZT network were down.

          I suppose it could be a DNS issue.

          scottalanmillerS 1 Reply Last reply Reply Quote 0
          • scottalanmillerS
            scottalanmiller @Dashrender
            last edited by

            @Dashrender said:

            Both the website and the ZT network were down.

            I suppose it could be a DNS issue.

            I'm assuming that this is a hosted ZT instance and not one that you run yourself, hence the question?

            1 Reply Last reply Reply Quote 0
            • DashrenderD
              Dashrender
              last edited by

              Correct. I am using a free account currently. I now have 4 devices connected, soon to be 6.

              I doubt it was a DNS issue, but it's possible.

              1 Reply Last reply Reply Quote 1
              • stacksofplatesS
                stacksofplates
                last edited by stacksofplates

                I noticed the site was down last night for a couple minutes, but I still had access to the other devices on the network.

                The site (controller) should be able to go down and everything should still be able to communicate. You just can't add/remove/change devices u til the controller comes back.

                1 Reply Last reply Reply Quote 0
                • DashrenderD
                  Dashrender
                  last edited by

                  I'm guessing the user in my case had the computer off, when they turned it off, the controller was offline, therefore they couldn't register with the network, and were down.

                  DashrenderD 1 Reply Last reply Reply Quote 2
                  • A
                    adam.ierymenko
                    last edited by

                    We caught a network glitch on the web site, but this should not have affected actual virtual networks. If it did then please explain what you saw -- the system should not be vulnerable to this.

                    FYI network controllers issue config and certificates to network members but are not (by design) a point of failure for actual network communications. If a network controller goes down the network continues to work, but it just isn't possible to change the network (add new devices, de-authorize devices, change IP assignment settings, etc.).

                    We're doing a round of infrastructure upgrades in the next few weeks anyway. Web will go to redundant bare metal servers and the root infrastructure (which is critical) is getting even more robust and geo-distributed. (It's already spread across three providers on four continents and all nodes are independent.)

                    coliverC 1 Reply Last reply Reply Quote 3
                    • coliverC
                      coliver @adam.ierymenko
                      last edited by

                      @adam.ierymenko said:

                      We caught a network glitch on the web site, but this should not have affected actual virtual networks. If it did then please explain what you saw -- the system should not be vulnerable to this.

                      FYI network controllers issue config and certificates to network members but are not (by design) a point of failure for actual network communications. If a network controller goes down the network continues to work, but it just isn't possible to change the network (add new devices, de-authorize devices, change IP assignment settings, etc.).

                      We're doing a round of infrastructure upgrades in the next few weeks anyway. Web will go to redundant bare metal servers and the root infrastructure (which is critical) is getting even more robust and geo-distributed. (It's already spread across three providers on four continents and all nodes are independent.)

                      What happens if a machine goes down and then comes back up when the network controller is down?

                      1 Reply Last reply Reply Quote 0
                      • A
                        adam.ierymenko
                        last edited by

                        It should use its cached network config and certs -- see the networks.d/<nwid>.conf files, etc.

                        1 Reply Last reply Reply Quote 1
                        • DashrenderD
                          Dashrender @Dashrender
                          last edited by

                          @Dashrender said:

                          I'm guessing the user in my case had the computer off, when they turned it off, the controller was offline, therefore they couldn't register with the network, and were down.

                          I'll have to confirm, but I believe this is the situation in question. The Laptop was turn off at the time of the outage. They turned it on during the outage and tried to connect. and it didn't.. The user did no troubleshooting, and before I could do much, the problem was over.

                          1 Reply Last reply Reply Quote 0
                          • A
                            adam.ierymenko
                            last edited by

                            There can be issues if a network controller is down for a long time because certs have (effective) TTLs, so an old node that's been offline could be unable to communicate. But it would have to be down for a while. Since ZT addresses are portable if a controller goes down it can be brought up elsewhere with the same identity (failover).

                            We're adding multi-homing soon, which will make this even more robust:

                            https://github.com/zerotier/ZeroTierOne/blob/adamierymenko-dev/node/Cluster.hpp#L71

                            Multi-homing will also be useful for nodes within networks. For example, you could create a global Cassandra cluster behind a single IP on your virtual LAN. Next version should contain an alpha version of cluster/multi-homing capability.

                            1 Reply Last reply Reply Quote 2
                            • A
                              adam.ierymenko
                              last edited by

                              @Dashrender How long was the laptop asleep? If it was a while it's possible that its cert was no longer valid and it couldn't get a new one.

                              Unlucky moment... multi-homing/cluster of network controllers should make that orders of magnitude less likely. We're doing a lot of robustness work right now (not that it's bad as-is).

                              1 Reply Last reply Reply Quote 2
                              • DashrenderD
                                Dashrender
                                last edited by

                                @adam-ierymenko I'm guessing the laptop was off two+ days. The user only uses it two days a week at most.

                                1 Reply Last reply Reply Quote 0
                                • 1 / 1
                                • First post
                                  Last post