All You need to know about Query supervisor

One of the dream of sysadmins is monitoring system SQL usage, to check how much temporary storage is used by SQL statements, to check long-running SQL statements and so on. Daily, in my job I am able to see a lot of situations in which user run a lot of SQL statements with ODBC tools in Excel and without them noticing, the temporary memory began to grow because of all the correlations made.

But, what happens if I tell you that in IBM i this feature is already in OS?

Since IBM i V7R3M0 there is a nested mechanism that is called Query Supervisor and its goal is to define metrics on the execution of SQL statements such as the execution time, temporary storage, I/O and CPU count.

So, as I said before this feature is in the operative system so you needn’t to get some additional feature, the only thing you need is only to understand what do you want to do when a threshold has been reached. Yes, that because with this real-time monitor you are able to set up some different actions as:

  • holding job
  • ending SQL execution
  • sending message to a message queue
  • logging SQL statement

At this point, we can start with some considerations about this topic:

  1. as mentioned earlier, this is real-time monitoring so unlike other mechanisms or tools it calculates thresholds on live data, not based on estimates or assumptions.
  2. metrics that can be monitored are CPU time (i.e., the amount of CPU seconds used during the execution of a SQL statement), elapsed time (i.e., the amount of seconds used to execute the SQL statement), temporary storage (i.e., the amount of temporary memory allocated expressed in MB), I/O count (i.e., the amount of I/O operations used to execute the SQL statement).
  3. not all SQL statements trigger this mechanism, in fact some statements that do not require any kind of processing by the system (such as a trivial INSERT) are not considered. In addition, for some SQL statements the metrics computation is not based on the entire execution but only on the execution of some steps (for example, in a DELETE only the search for records to be deleted in the WHERE clause is part of the metrics computation, while the actual deletion of records is not counted). If an SQL statement contains references or calls to functions or procedures, in that case the metric takes into account the time taken to execute each individual function.
  4. it is possible to go and define very precisely the thresholds and the mechanisms that trigger them; in fact, it is possible to go and work on individual subsystems, users, or jobs. By combining these filters, it is possible to manage individual situations in the optimal way.

Let’s add a new threshold on my dummy lpar!

So before starting, we are able to check if something was already registered with this query:

SELECT * FROM QUERY_SUPERVISOR 

As you can see, I don’t have any row (no threshold has been setted up).

Now we will create a new threshold that monitor every 5 minutes all statements that are running for more than 30 minutes:

CALL QSYS2.ADD_QUERY_THRESHOLD(THRESHOLD_NAME => 'MAXTIME', THRESHOLD_TYPE  => 'ELAPSED TIME', THRESHOLD_VALUE => 1800, DETECTION_FREQUENCY => 300, LONG_COMMENT => 'Check queries running for 30 mins');

In addition, I want to create another threshold only for job of BIUSER in BISBS; in this case I will monitor current temporary storage:

CALL QSYS2.ADD_QUERY_THRESHOLD(THRESHOLD_NAME => 'BIUSER',THRESHOLD_TYPE  => 'TEMPORARY STORAGE', THRESHOLD_VALUE => 1024, DETECTION_FREQUENCY => 300, INCLUDE_USERS=> 'BIUSER', SUBSYSTEMS=> 'BISBS', LONG_COMMENT => 'Check queries with tempstg > 1GB')

In order to check if everything is ok, let’s check the previous query, I expect to find two rows:

Please, pay attention to the DETECTION_FREQUENCY parameter, it indicates the distance between monitoring checks. It is necessary to identify a correct value that allows you to go and intercept critical conditions in the best possible way, here is a specific page about this parameter.

Now we have to build the program that will intercept and manage the attainment of the threshold. The IBM documentation presents some examples, I will also leave one below. It will simply e-mail me detailing the job that crossed the threshold:

To complete everything, we need only to register this program in exit point QIBM_QQQ_QRY_SUPER with this command:  ADDEXITPGM EXITPNT(QIBM_QQQ_QRY_SUPER) FORMAT(QRYS0100) PGMNBR(1) PGM(UTILITIES/QRYSUP) TEXT('Query supervisor EXITPGM')


Here you can find some documentation:

Andrea

All You need to know about Shutdown Programs

In IBM i a lot of attention is given to startup programs… we all love our QSTRUPs, but not so much people give the right attention to shut down programs. As you can understand, shutdown programs are a program that will be executed when our IBM i system is powered off. Closing our applications and our service in the right way can be helpful several times, so I think it’s quite important to understand how we can achieve this goal. In this post, I will show you three possible ways.

The first and the classic one is to schedule an IPL via GO POWER menu. In this menu, you are able to manage all schedulation about automatic shutdowns and restarts. Consider that he scheduled shutdown time is reached, QSYSSCD submit a new job on the QCTL jobq that is called QPWROFFPGM, and that job calls the QEZPWROFFP program in QSYS. By default, this program will power off in immediate way your partition. So, if you have scheduled a Power On time, your system will restart when this time will be reached; alternatively, the system will remain off until someone turns it on. From this menu, permanent or occasional schedules can be set, An example of scheduling can be seen in the image below.

As I’ve already said, this mechanism is based on calling the program QSYS/QEZPWROFFP, so for instance you can do a RTVCLSRC of this program, and you can edit this source in order to place all the instructions you need to close in the better way possible your services. Take care that this job runs with QSYSOPR user profile, so you need to check authorization for your command/programs before create this program. Consider also that you can remove the PWRDWNSYS instruction from the source; some application platforms using this way to close interactive jobs and start nightly phase, so in this case scheduling via GO POWER is not the best way to plan an automatic IPL.


The second way, that I personally prefer, is by using Exit Points. In case you don’t know, IBM ships operating systems with the opportunity to change some behaviour. When we talk about closing services or powering down our systems, QIBM_QWC_PWRDWNSYS and QIBM_QWC_PRERESTRICT are the exit points we need.

We can see current configuration with a simple SQL statement:

So, in my example I do not set up any program on this exit points, you can see that because on EXIT_PROGRAMS column there isn’t any with numbers different from 0.


When you run the PWRDWNSYS command, you automatically call the program that is set on QIBM_QWC_PWRDWNSYS exit point. Using this method, you can close services, subsystems, etc. in the order that you prefer and after that, only if the program return a green light, the PWRDWNSYS process can proceed.

By running the ENDSBS *ALL command, instead of PWRDWNSYS, you call the program that is set on QIBM_QWC_PRERESTRICT exit point. As in the other case, you can define what this program does. Let’s consider that this program will be called with several flags that indicate if the program ends well or not, in order to manage this condition in the best possible way. One example of this usage, is a clustered infrastructure: you can manage resources switchover when ENDSBS *ALL is executed.

Here is an example of an easy program that run on my system when ENDSBS *ALL *IMMED is performed:

So, at this point in order to register that I will run this command: ADDEXITPGM EXITPNT(QIBM_QWC_PRERESTRICT) FORMAT(PRER0100) PGMNBR(1) PGM(UTILITIES/ENDSYS) TEXT('Close my system in an ordered way')

and if I re-run my SQL, now I will find that for QIBM_QWC_PRERESTRICT there is one program registered:

At this point, leveraging the exit point mechanism, it’s possible to make a simple CL that run PWRDWNSYS or ENDSBS *ALL, and the system “under the hood” will close services in an ordered and controlled way. Clearly, is it possible to combine all these methods, but on the other side it just becomes more difficult to debug in case errors happens.


Here you can find more detail directly from IBM: PWRDWNSYS and ENDSBS *ALL Exit Programs

And… what about you, do you power off your system in this way? Please tell us your experience on the comments.

Andrea