ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    PRTG Alternative...

    IT Discussion
    4
    26
    1.3k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • wrx7mW
      wrx7m @Jimmy9008
      last edited by

      @jimmy9008 - So you are trying to categorize "types" of downtime?

      J 1 Reply Last reply Reply Quote 0
      • J
        Jimmy9008 @wrx7m
        last edited by

        @wrx7m said in PRTG Alternative...:

        @jimmy9008 - So you are trying to categorize "types" of downtime?

        Yes. And with ability to then add/remove some types to see different reports. For example, 0.0087% of downtime was line issue. 0.0045% was a bad release by development. We may exclude the latter from the report as it's not the fault of the infrastructure team, and for example, should not affect their yearly bonus. (As an example).

        wrx7mW 1 Reply Last reply Reply Quote 1
        • wrx7mW
          wrx7m @Jimmy9008
          last edited by wrx7m

          @jimmy9008 I can see why you would want this. If you are using a secondary ticketing system for all other issues, you can have it send an email alert with certain info and have the other ticketing system create a ticket and be tracked there. That is the only way I can think of automating most of it.

          1 Reply Last reply Reply Quote 0
          • J
            Jimmy9008
            last edited by

            I imagine something can do this. Don't mind moving away from PRTG if needed.

            Essentially, the infrastructure team get a bonus for 99.995% uptime or more. If less, no bonus. Development often run releases causing downtime, or even sometimes screw up and restart services without permission, causing downtime. I'd like to exclude them from reports to see if they get their bonus, can't see a way within PRTG.

            If a server crashes, performance issues, or line drops then that would be included in the calc.

            wrx7mW 1 Reply Last reply Reply Quote 0
            • wrx7mW
              wrx7m @Jimmy9008
              last edited by

              @jimmy9008 I can see the motivation to have these numbers. What ticketing system do you use outside of PRTG?

              J 1 Reply Last reply Reply Quote 0
              • stacksofplatesS
                stacksofplates
                last edited by

                I'll see if Alertmanager has tag abilities for alerts. I know there are comments but not sure if you can sort by anything.

                Having devs be able to change services in prod sounds like fun....

                wrx7mW J 2 Replies Last reply Reply Quote 1
                • wrx7mW
                  wrx7m @stacksofplates
                  last edited by

                  @stacksofplates lol

                  1 Reply Last reply Reply Quote 0
                  • J
                    Jimmy9008 @wrx7m
                    last edited by

                    @wrx7m said in PRTG Alternative...:

                    @jimmy9008 I can see the motivation to have these numbers. What ticketing system do you use outside of PRTG?

                    We use helpscout.

                    1 Reply Last reply Reply Quote 0
                    • J
                      Jimmy9008 @stacksofplates
                      last edited by

                      @stacksofplates said in PRTG Alternative...:

                      I'll see if Alertmanager has tag abilities for alerts. I know there are comments but not sure if you can sort by anything.

                      Having devs be able to change services in prod sounds like fun....

                      It is indeed. They are supposed to deploy to develop, test, staging, then live. But sometimes they will just make a mistake etc. It's the same as an admin accidentally restarting the working server they are RDPd in to.

                      As far as the infrastructure team care, the OS and hardware and networking are under their remit. Any service Dev needs in a server to make the product work is Devs choice. They also have the choice to restart their services under their remit. They just don't care that doing so perhaps affects another teams bonus.

                      For example, they should be telling us when a deployment is planned so we can add planned maintenance for that time, but often forget. (Yes that's all a business problem, but it's still my problem as I can't currently prove using PRTG that downtime should be excluded from the team as I can't rerun the stats after the event)

                      wrx7mW stacksofplatesS 2 Replies Last reply Reply Quote 1
                      • wrx7mW
                        wrx7m @Jimmy9008
                        last edited by

                        @jimmy9008 - Interesting. Does helpscout allow you to specify a category of downtime?

                        J 1 Reply Last reply Reply Quote 0
                        • J
                          Jimmy9008 @wrx7m
                          last edited by

                          @wrx7m said in PRTG Alternative...:

                          @jimmy9008 - Interesting. Does helpscout allow you to specify a category of downtime?

                          Yes. Within helpscout you could tag a record with say "Dev issue". But this is an entirely separate system to PRTG. Would not be sure how to incorporate the data together.

                          You could tag "Dev issue" and count the number of devices issues in a year. But that wouldn't tell you how much of the 0.006% downtime was due to that compared to any other. Helpscout has no understanding of the downtime data.

                          wrx7mW 1 Reply Last reply Reply Quote 0
                          • wrx7mW
                            wrx7m @Jimmy9008
                            last edited by

                            @jimmy9008 - Not ideal, but you could include a screenshot or log of the total downtime from PRTG in the helpscout and classify it as dev issue.

                            J 1 Reply Last reply Reply Quote 0
                            • J
                              Jimmy9008 @wrx7m
                              last edited by

                              @wrx7m said in PRTG Alternative...:

                              @jimmy9008 - Not ideal, but you could include a screenshot or log of the total downtime from PRTG in the helpscout and classify it as dev issue.

                              It's quite a work around. Would be better with one system entirely.

                              wrx7mW 2 Replies Last reply Reply Quote 1
                              • wrx7mW
                                wrx7m @Jimmy9008
                                last edited by

                                @jimmy9008 - Absolutely

                                1 Reply Last reply Reply Quote 0
                                • wrx7mW
                                  wrx7m @Jimmy9008
                                  last edited by

                                  @jimmy9008

                                  Here is a constant interval option (enterprise plan) -

                                  https://www.statuscake.com/pricing/

                                  1 Reply Last reply Reply Quote 0
                                  • stacksofplatesS
                                    stacksofplates @Jimmy9008
                                    last edited by

                                    @jimmy9008 said in PRTG Alternative...:

                                    @stacksofplates said in PRTG Alternative...:

                                    I'll see if Alertmanager has tag abilities for alerts. I know there are comments but not sure if you can sort by anything.

                                    Having devs be able to change services in prod sounds like fun....

                                    It is indeed. They are supposed to deploy to develop, test, staging, then live. But sometimes they will just make a mistake etc. It's the same as an admin accidentally restarting the working server they are RDPd in to.

                                    As far as the infrastructure team care, the OS and hardware and networking are under their remit. Any service Dev needs in a server to make the product work is Devs choice. They also have the choice to restart their services under their remit. They just don't care that doing so perhaps affects another teams bonus.

                                    Yeah that's crazy. That should be handled by something like Kubernetes or Nomad/Consul. Humans restarting in prod should be an emergency scenario. Let the orchestration tools do the work.

                                    1 Reply Last reply Reply Quote 0
                                    • stacksofplatesS
                                      stacksofplates
                                      last edited by

                                      I'll look at Alertmanager when I get home. The comments section might be enough.

                                      1 Reply Last reply Reply Quote 0
                                      • stacksofplatesS
                                        stacksofplates
                                        last edited by

                                        I just looked. The only place to add comments with Alertmanager are when an alert is silenced. I looked in Grafana as well and that might be of use. Grafana will let you set alerts on specific metrics and then you can set annotations on those alerts. Here's a sample graph with alerts (they're the red dotted line).

                                        0_1532217196124_alerts.png

                                        You can click on the alert and give an annotation.

                                        0_1532217291469_annotation.png

                                        Then when you hover over the alert you can see the annotations and tags.

                                        0_1532217335973_annotation-alert.png

                                        1 Reply Last reply Reply Quote 0
                                        • 1
                                        • 2
                                        • 1 / 2
                                        • First post
                                          Last post