Mastering zabbix pdf
If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA. Home current Explore. Words: 9, Pages: They are extensively used to not only measure your system's performance, but also to forecast capacity issues. This is where Zabbix, one of the most popular monitoring solutions for networks and applications, comes into the picture. This new edition will provide you with all the knowledge you need to make strategic and practical decisions about the Zabbix monitoring system.
The setup you'll do with this book will fit your environment and monitoring needs like a glove. You will be guided through the initial steps of choosing the correct size and configuration for your system, to what to monitor and how to implement your own custom monitoring component. Exporting and integrating your data with other systems is also covered.
By the end of this book, you will have a tailor-made and well-configured monitoring system and will understand with absolute clarity how crucial it is to your IT environment. A basic working knowledge of Zabbix and Linux is assumed so that the book can focus on how to use every component to its full potential. This laid the technology foundation that Andrea has built on ever since. He also has a Red Hat Certified Engineer certification. Throughout his career, he has worked on many large-scale environments, often in roles that have been very complex, on a consultant basis.
This has further enhanced his growing skillset, adding to his practical knowledge base and concreting his appetite for theoretical technical studying.
His time was mainly spent on reducing "ownership costs" with specialization in monitoring and automation. This is where he came across Zabbix and the technical and administrative flexibility that it offered. With this as a launch pad, Andrea was inspired to develop Orabbix, the first piece of open source software to monitor Oracle that is completely integrated with Zabbix.
Currently, Andrea is working as a senior architect for a leading global investment bank in a very diverse and challenging environment. Andrea also plays a critical role within the extended management team for the security awareness of the bank, dealing with disciplines such as security, secrecy, standardization, auditing, regulator requirements, and security-oriented solutions.
As an open source product, it's easy to obtain and deploy, and its unique approach to metrics and alarms has helped to set it apart from its competitors, both open and commercial. It's a powerful, compact package with very low requirements in terms of hardware and supporting software for a basic yet effective installation.
If you add a relative ease of use, it's clear that it can be a very good contender for small environments with a tight budget. But it's when it comes to managing a huge number of monitored objects, with a complex configuration and dependencies, that Zabbix's scalability and inherently distributed architecture really shines. More than anything, Zabbix can be an ideal solution in large and complex distributed environments, where being able to manage efficiently and extract meaningful information from monitored objects and events is just as important if not more important than the usual considerations about costs, accessibility, and the ease of use.
The purpose of this book is to help you make the most of your Zabbix installation to leverage all of its power to monitor any large and complex environment effectively. Preface What this book covers Chapter 1, Deploying Zabbix, focuses on choosing the optimal hardware and software configuration for the Zabbix server and database in relation to the current IT infrastructure, monitoring goals, and possible evolution. This chapter also includes a section that covers an interesting database-sizing digression, which is useful in calculating the final database size using a standard environment as the baseline.
Correct environment sizing and a brief discussion about metrics and measurements that can also be used for capacity planning will be covered here. The chapter contains practical examples and calculations framed in a theoretical approach to give the reader the skills required to adapt the information to real-world deployments. Chapter 2, Distributed Monitoring, explores various Zabbix components both on the server side and the agent side.
Different distributed solutions will be given to the same example networks to highlight the advantages and possible drawbacks of each. In addition to the deployment and configuration of agents, the chapter takes proxies, maintenance, and change management into account too. This section will cover all the possible architectural implementations of Zabbix and add the pros and cons considerations. Chapter 3, High Availability and Failover, covers the subjects of high availability and failover.
For each of the three main Zabbix tiers, you will learn to choose among different HA options. The discussion will build on the information provided in the previous two chapters in order to end the first part of the book with a few complete deployment scenarios that will include high-availability servers and databases hierarchically organized in tiered, distributed architectures geared toward monitoring thousands of objects scattered in different geographical locations.
This chapter will include a realworld, practical example and certain possible scenarios that have been implemented. The chapter will explore powerful Zabbix built-in functionalities, how to use them, and how to choose the best metrics to ensure thorough monitoring without overloading the system.
There will also be special considerations about aggregated values and their use in monitoring complex environments with clusters or the more complex grid architectures. Chapter 5, Visualizing Data, focuses on getting the most out of the data visualization features of Zabbix.
You will learn how to leverage live monitoring data to make dynamic maps and how to organize a collection of graphs for big-screen visualization in control centers and implement a general qualitative view. This chapter will cover the data center quality view slide show completely, which is really useful in highlighting problems and warning the first-level support in a proactive approach. Preface Chapter 6, Managing Alerts, gives examples of complex triggers and trigger conditions as well as advice on choosing the right amount of trigger and alerting actions.
The purpose is to help you walk the fine line between being blind to possible problems and being overwhelmed by false positives. You will also learn how to use actions to automatically fix simple problems, raise actions without the need for human intervention to correlate different triggers and events, and tie escalations to your operations management workflow. This section will make you aware of what can be automated, reducing your administrative workload and optimizing the administration process in a proactive way.
Chapter 7, Managing Templates, offers guidelines for effective template management: building complex template schemes out of simple components, understanding and managing the effects of template modification, maintaining existing monitored objects, and assigning templates to discovered hosts.
This will conclude the second part of the book that is dedicated to the different Zabbix monitoring and data management options. The third and final part will discuss Zabbix's interaction with external products and all its powerful extensibility features.
Chapter 8, Handling External Scripts, helps you learn how to write scripts to monitor objects that are not covered by the core Zabbix features. The relative advantages and disadvantages of keeping the scripts on the server side or agent side, how to launch or schedule them, and a detailed analysis of the Zabbix agent protocol will also be covered.
This chapter will make you aware of all the possible side effects, delays, and load caused by scripts; you will be able to implement all the needed external checks, as you will be well aware of all that is connected with them and the relative observer effect.
The chapter will include different implementations of working with Bash, Java, and Python so that you can easily write your own scripts to extend and enhance Zabbix's monitoring possibilities. Chapter 9, Extending Zabbix, delves into the Zabbix API and how to use it to build specialized frontends and complex extensions.
It also covers how to harvest monitoring data for further elaboration and reporting. It will include simple example implementations written in Python that will illustrate how to export and further manipulate data, how to perform massive and complex operations on monitored objects, and finally, how to automate different management aspects such as user creation and configuration, trigger activation, and the like. Chapter 10, Integrating Zabbix, wraps things up by discussing how to make other systems know about Zabbix and the other way around.
This is key to the successful management of any large and complex environment. You will learn how to use built-in Zabbix features, API calls, or direct database queries to communicate with different upstream and downstream systems and applications. There will be concrete examples of possible interaction with inventory applications, trouble ticket systems, and data warehouse systems.
Managing Alerts Checking conditions and alarms is the most characteristic function of any monitoring system, and Zabbix is no exception. What really sets Zabbix apart is that every alarm condition or trigger as it is known in this system can be tied not only to a single measurement, but also to an arbitrary complex calculation based on all of the data available to the Zabbix server. Furthermore, just as triggers are independent from items, the actions that the server can take based on the trigger status are independent from the individual trigger, as you will see in the subsequent sections.
It's based on extensive data collection, as discussed in Chapter 4, Collecting Data, and eventually leads to managing messages, recipients, and delivery media, as we'll see later in the chapter. But all this revolves around the conditions defined for the checks, and this is the main business of triggers.
The expression form, accessible through the Add button, lets you choose an item, a function to perform on the item's data, and some additional parameters and gives an output as shown in the following screenshot: You can see how there's a complete item key specification, not just the name, to which a function is applied. The result is then compared to a constant using a greater than operator. The syntax for referencing item keys is very similar to that for a calculated item.
In addition to this basic way of referring to item values, triggers also add a comparison operator that wraps all the calculations up to a Boolean expression. This is the one great unifier of all triggers; no matter how complex the expression, it must always return either a True value or a False value.
There are no intermediate or soft states for triggers. A trigger can also be in an UNKNOWN state if it's impossible to evaluate the trigger expression because one of the items has no data, for example. Zabbix doesn't apply short-circuit evaluation of the and and or previously, until Zabbix 2. It doesn't matter that the two hosts are monitored by two different proxies. Everything will work as expected as long as the proxy where the trigger is defined has access to the two monitored hosts' historical data.
You can apply all the same functions available for calculated items to your items' data. This means that you can either specify a time period in seconds or a number of measurements, and the trigger will take all of the item's data in the said period and apply the function to it.
Which one should you use in your triggers? While it obviously depends on your specific needs and objectives, each one has its strengths that make it useful in the right context. For all kinds of passive checks initiated by the server, you'll often want to stick to a time period expressed as an absolute value.
A 5 parameter will vary quite dramatically as a time period if you vary the check interval of the relative item. It's not usually obvious that such a change will also affect related triggers. Moreover, a time period expressed in seconds may be closer to what you really mean to check and thus may be easier to understand when you'll visit the trigger definition at a later date.
On the other hand, you'll often want to opt for the num version of the parameter for many active checks, where there's no guarantee that you will have a constant, reliable interval between measurements. This is especially true for trapper items of any kind and for log files. With these kinds of items, referencing the number of measurements is often the best option. These can be useful to create triggers that may change their status only during certain times of the day or during certain specific days or, better yet, to define well-known exceptions to common triggers when we know that some otherwise unusual behavior is to be expected, for example, a case where there's a bug in one of your company's applications that causes a rogue process to quickly fill up a filesystem with huge log files.
While the development team is working on it, they ask you to keep an eye on the said filesystem and kill the process if it's filling the disk up too quickly. The web frontend will display different severity values with different colors, and you will be able to create different actions based on them, but they have no further meaning or function in the system.
This means that the severity of a trigger will not change over time based on how long that trigger has been in a PROBLEM state, nor can you assign a different severity to different thresholds in the same trigger. If you really need a warning alert when a disk is over 90 percent full and a critical alert when it's percent full, you will need to create two different triggers with two different thresholds and severities.
This may not be the best course of action though, as it could lead to warnings that are ignored and not acted upon, critical warnings that will fire up when it's already too late and you have already lost service availability, just a redundant configuration with redundant messages and more possibilities of mistakes, or an increased signal-to-noise ratio. A better approach would be to clearly assess the actual severity of the potential for the disk to fill up and create just one trigger with a sensible threshold and, possibly, an escalating action if you fear that the warning could get lost among the others.
Choosing between absolute values and percentages If you look at many native agent items, you'll see that a lot of them can express measurements either as absolute values or as percentages. It often makes sense to do this while creating one's own custom items as both representations can be quite useful in and of themselves. When it comes to creating triggers on them, though, the two can differ quite a lot, especially if you have the task of keeping track of available disk space.
Filesystem sizes and disk usage patterns vary quite a lot between different servers, installations, application implementations, and user engagements. While a free space of 5 percent of a hypothetical disk A could be small enough that it would make sense to trigger a warning and act upon it, the same 5 percent could mean a lot more space for a large disk array, enough for you to not really need to act immediately but plan a possible expansion without any urgency.
This may lead you to think that percentages are not really useful in these cases and even that you can't really put disk-spacerelated triggers in templates as it would be better to evaluate every single case and build triggers that are tailor-made for every particular disk with its particular usage pattern. While this can certainly be a sensible course of action for particularly sensible and critical filesystems, it can quickly become too much work in a large environment where you may need to monitor hundreds of different filesystems.
You will still need to create more specialized triggers for those special, critical disks, but you'd have to anyway. This means that no matter how big the disk is, based on its usage pattern it could quickly fill up. Note also how the trigger would need progressively smaller and smaller percentages for it to assume a PROBLEM state, so you'd automatically get more frequent and urgent notifications as the disk is filling up. For these kinds of checks, percentage values should prove more flexible and easy to understand than absolute ones, so that's what you probably want to use as a baseline for templates.
On the other hand, absolute values may be your best option if you want to create a very specific trigger for a very specific filesystem. Understanding operations as correlations As you may have already realized, practically every interesting trigger expression is built as a logical operation between two or more simpler expressions.
Naturally, it is not that this is the only way to create useful triggers. Many simple checks on the status of an agent. Let's see a few more examples of relatively complex triggers. During the rest of the day, the number of sessions is neither predictable, nor that significant, so you keep sampling it but don't want to receive any alert. It's used here as a label to make the example easier to read and not as an instance of an actual ready-to-use native item. The only problem with this trigger is that if the number of sessions drops below five in that window of time but it doesn't come up again until after , the trigger will stay in the PROBLEM state until the next day.
This may be a great nuisance if you have set up multiple actions and escalations on that trigger as they would go on for a whole day no matter what you do to address the actual session's problems. You will be guided through the initial steps of choosing the correct size and configuration for your system, to what to monitor and how to implement your own custom monitoring component.
Exporting and integrating your data with other systems is also covered. By the end of this book, you will have a tailor-made and well configured monitoring system and will understand with absolute clarity how crucial it is to your IT environment. Programmer Books. Random Books. Book Description: Nowadays monitoring systems play a crucial role in any IT environment.
Articulate Storyline Essentials. Beginning SharePoint Development. Beginning SharePoint Beginning PowerShell for SharePoint Follow Us! Latest Books.
0コメント