In the last post, I’ve presented a simple way to monitor some particular metrics of your IBM i.
In addition to system monitors, you can go for message and queue monitoring. It is often the case that some information about the status of some system services as well as application services is poured into some message queue (typically the QSYSOPR), hence the need to have an effective probe to take in those messages.
For this reason, again through the Navigator for I GUI, it is possible to go and define the monitoring of certain messages and in some cases even the responses that the system gives on its own. Once connected to Navigator, please click MONITORS and choose MESSAGE MONITORS. Now click on CREATE NEW MESSAGE MONITOR and compile form as you prefer. In my example, I’m going to configure a QSYSOPR monitor that works only in office time.
Now, clicking on Message Set you are able to define trigger and relative actions, to do that you need to specify a list of msgid that you want to monitor. You can start using pre-defined msgid lists shipped by IBM that cover some system topic. Even in this page, you can define some automatic replies that system can give to MSGW jobs:
The last thing, but not the least, is the capability to perform an action when a message as been sent on the message queue, for instance you can remove that message from the queue or you can perform some OS command (like sending email ecc):
I think that for this topic is everything, please give me your feedback about monitoring world and your experience about that.
One of the most important allies of a sysadmin is a good monitoring system.
In this first tutorial, I will show you how it’s so easy implementing an IBM i system monitor using features that are already included in the OS without using third part software.
To do that, we are going to use IBM Navigator for i, so once connected please click on the icon below and choose SYSTEM MONITORS:
At this point, you are able to see active monitors (if you there are any) or you can proceed configuring one new. Let’s consider that nowadays there is quite a little set of metrics that you can use for your own monitoring tool. Some days ago I created a new idea to ask IBM to give the opportunity to create your own metrics by using SQL, here you can find the link to my idea, if you find it interesting please vote that.
So let’s start with a simple monitoring example; so I want to create a monitor that must control my disk space utilization… So, from the drop-down list, choose CREATE NEW SYSTEM MONITOR. Once clicked, choose Disk Storage Utilization (average) from the metric list and after that you have to choose the frequency of the check, in my example I’ve chosen 5 minutes.
Now that we have chosen what kind of metric I want to use and the frequency of the check, we only need to set of thresholds, and we have to define what happens when the threshold has been reached using OS commands.
In my case, I will define one threshold that is triggered when disk is used more than 70% for more than two intervals. When the condition is verified, monitor will email me:
As you probably understand, you can define a combination of metrics monitored in the same “check”.
Consider that once you enable your monitoring, automatically the system starts probing the system status according to what you set. This data is stored and can be analysed by graph; this is a very convenient feature if you want to check the data over the long term to determine, for example, whether there are any growth trends. It is also possible to check monitoring log, you can also see when some threshold as been reached.
Here you can find an example of monitoring graph:
I will put another post on this blog about message monitoring.
One of the dream of sysadmins is monitoring system SQL usage, to check how much temporary storage is used by SQL statements, to check long-running SQL statements and so on. Daily, in my job I am able to see a lot of situations in which user run a lot of SQL statements with ODBC tools in Excel and without them noticing, the temporary memory began to grow because of all the correlations made.
But, what happens if I tell you that in IBM i this feature is already in OS?
Since IBM i V7R3M0 there is a nested mechanism that is called Query Supervisor and its goal is to define metrics on the execution of SQL statements such as the execution time, temporary storage, I/O and CPU count.
So, as I said before this feature is in the operative system so you needn’t to get some additional feature, the only thing you need is only to understand what do you want to do when a threshold has been reached. Yes, that because with this real-time monitor you are able to set up some different actions as:
holding job
ending SQL execution
sending message to a message queue
logging SQL statement
At this point, we can start with some considerations about this topic:
as mentioned earlier, this is real-time monitoring so unlike other mechanisms or tools it calculates thresholds on live data, not based on estimates or assumptions.
metrics that can be monitored are CPU time (i.e., the amount of CPU seconds used during the execution of a SQL statement), elapsed time (i.e., the amount of seconds used to execute the SQL statement), temporary storage (i.e., the amount of temporary memory allocated expressed in MB), I/O count (i.e., the amount of I/O operations used to execute the SQL statement).
not all SQL statements trigger this mechanism, in fact some statements that do not require any kind of processing by the system (such as a trivial INSERT) are not considered. In addition, for some SQL statements the metrics computation is not based on the entire execution but only on the execution of some steps (for example, in a DELETE only the search for records to be deleted in the WHERE clause is part of the metrics computation, while the actual deletion of records is not counted). If an SQL statement contains references or calls to functions or procedures, in that case the metric takes into account the time taken to execute each individual function.
it is possible to go and define very precisely the thresholds and the mechanisms that trigger them; in fact, it is possible to go and work on individual subsystems, users, or jobs. By combining these filters, it is possible to manage individual situations in the optimal way.
Let’s add a new threshold on my dummy lpar!
So before starting, we are able to check if something was already registered with this query:
As you can see, I don’t have any row (no threshold has been setted up).
Now we will create a new threshold that monitor every 5 minutes all statements that are running for more than 30 minutes:
In order to check if everything is ok, let’s check the previous query, I expect to find two rows:
Please, pay attention to the DETECTION_FREQUENCY parameter, it indicates the distance between monitoring checks. It is necessary to identify a correct value that allows you to go and intercept critical conditions in the best possible way, here is a specific page about this parameter.
Now we have to build the program that will intercept and manage the attainment of the threshold. The IBM documentation presents some examples, I will also leave one below. It will simply e-mail me detailing the job that crossed the threshold:
To complete everything, we need only to register this program in exit point QIBM_QQQ_QRY_SUPER with this command: ADDEXITPGM EXITPNT(QIBM_QQQ_QRY_SUPER) FORMAT(QRYS0100) PGMNBR(1) PGM(UTILITIES/QRYSUP) TEXT('Query supervisor EXITPGM')
In IBM i a lot of attention is given to startup programs… we all love our QSTRUPs, but not so much people give the right attention to shut down programs. As you can understand, shutdown programs are a program that will be executed when our IBM i system is powered off. Closing our applications and our service in the right way can be helpful several times, so I think it’s quite important to understand how we can achieve this goal. In this post, I will show you three possible ways.
The first and the classic one is to schedule an IPL via GO POWER menu. In this menu, you are able to manage all schedulation about automatic shutdowns and restarts. Consider that he scheduled shutdown time is reached, QSYSSCD submit a new job on the QCTL jobq that is called QPWROFFPGM, and that job calls the QEZPWROFFP program in QSYS. By default, this program will power off in immediate way your partition. So, if you have scheduled a Power On time, your system will restart when this time will be reached; alternatively, the system will remain off until someone turns it on. From this menu, permanent or occasional schedules can be set, An example of scheduling can be seen in the image below.
As I’ve already said, this mechanism is based on calling the program QSYS/QEZPWROFFP, so for instance you can do a RTVCLSRC of this program, and you can edit this source in order to place all the instructions you need to close in the better way possible your services. Take care that this job runs with QSYSOPR user profile, so you need to check authorization for your command/programs before create this program. Consider also that you can remove the PWRDWNSYS instruction from the source; some application platforms using this way to close interactive jobs and start nightly phase, so in this case scheduling via GO POWER is not the best way to plan an automatic IPL.
The second way, that I personally prefer, is by using Exit Points. In case you don’t know, IBM ships operating systems with the opportunity to change some behaviour. When we talk about closing services or powering down our systems, QIBM_QWC_PWRDWNSYS and QIBM_QWC_PRERESTRICT are the exit points we need.
We can see current configuration with a simple SQL statement:
So, in my example I do not set up any program on this exit points, you can see that because on EXIT_PROGRAMS column there isn’t any with numbers different from 0.
When you run the PWRDWNSYS command, you automatically call the program that is set on QIBM_QWC_PWRDWNSYS exit point. Using this method, you can close services, subsystems, etc. in the order that you prefer and after that, only if the program return a green light, the PWRDWNSYS process can proceed.
By running the ENDSBS *ALL command, instead of PWRDWNSYS, you call the program that is set on QIBM_QWC_PRERESTRICT exit point. As in the other case, you can define what this program does. Let’s consider that this program will be called with several flags that indicate if the program ends well or not, in order to manage this condition in the best possible way. One example of this usage, is a clustered infrastructure: you can manage resources switchover when ENDSBS *ALL is executed.
Here is an example of an easy program that run on my system when ENDSBS *ALL *IMMED is performed:
So, at this point in order to register that I will run this command: ADDEXITPGM EXITPNT(QIBM_QWC_PRERESTRICT) FORMAT(PRER0100) PGMNBR(1) PGM(UTILITIES/ENDSYS) TEXT('Close my system in an ordered way')
and if I re-run my SQL, now I will find that for QIBM_QWC_PRERESTRICT there is one program registered:
At this point, leveraging the exit point mechanism, it’s possible to make a simple CL that run PWRDWNSYS or ENDSBS *ALL, and the system “under the hood” will close services in an ordered and controlled way. Clearly, is it possible to combine all these methods, but on the other side it just becomes more difficult to debug in case errors happens.