All You need to know about Query supervisor
One of the dream of sysadmins is monitoring system SQL usage, to check how much temporary storage is used by SQL statements, to check long-running SQL statements and so on. Daily, in my job I am able to see a lot of situations in which user run a lot of SQL statements with ODBC tools in Excel and without them noticing, the temporary memory began to grow because of all the correlations made.
But, what happens if I tell you that in IBM i this feature is already in OS?
Since IBM i V7R3M0 there is a nested mechanism that is called Query Supervisor and its goal is to define metrics on the execution of SQL statements such as the execution time, temporary storage, I/O and CPU count.
So, as I said before this feature is in the operative system so you needn’t to get some additional feature, the only thing you need is only to understand what do you want to do when a threshold has been reached. Yes, that because with this real-time monitor you are able to set up some different actions as:
- holding job
- ending SQL execution
- sending message to a message queue
- logging SQL statement
- …
At this point, we can start with some considerations about this topic:
- as mentioned earlier, this is real-time monitoring so unlike other mechanisms or tools it calculates thresholds on live data, not based on estimates or assumptions.
- metrics that can be monitored are CPU time (i.e., the amount of CPU seconds used during the execution of a SQL statement), elapsed time (i.e., the amount of seconds used to execute the SQL statement), temporary storage (i.e., the amount of temporary memory allocated expressed in MB), I/O count (i.e., the amount of I/O operations used to execute the SQL statement).
- not all SQL statements trigger this mechanism, in fact some statements that do not require any kind of processing by the system (such as a trivial INSERT) are not considered. In addition, for some SQL statements the metrics computation is not based on the entire execution but only on the execution of some steps (for example, in a DELETE only the search for records to be deleted in the WHERE clause is part of the metrics computation, while the actual deletion of records is not counted). If an SQL statement contains references or calls to functions or procedures, in that case the metric takes into account the time taken to execute each individual function.
- it is possible to go and define very precisely the thresholds and the mechanisms that trigger them; in fact, it is possible to go and work on individual subsystems, users, or jobs. By combining these filters, it is possible to manage individual situations in the optimal way.
Let’s add a new threshold on my dummy lpar!
So before starting, we are able to check if something was already registered with this query:
SELECT * FROM QUERY_SUPERVISOR
As you can see, I don’t have any row (no threshold has been setted up).
Now we will create a new threshold that monitor every 5 minutes all statements that are running for more than 30 minutes:
CALL QSYS2.ADD_QUERY_THRESHOLD(THRESHOLD_NAME => 'MAXTIME', THRESHOLD_TYPE => 'ELAPSED TIME', THRESHOLD_VALUE => 1800, DETECTION_FREQUENCY => 300, LONG_COMMENT => 'Check queries running for 30 mins');
In addition, I want to create another threshold only for job of BIUSER in BISBS; in this case I will monitor current temporary storage:
CALL QSYS2.ADD_QUERY_THRESHOLD(THRESHOLD_NAME => 'BIUSER',THRESHOLD_TYPE => 'TEMPORARY STORAGE', THRESHOLD_VALUE => 1024, DETECTION_FREQUENCY => 300, INCLUDE_USERS=> 'BIUSER', SUBSYSTEMS=> 'BISBS', LONG_COMMENT => 'Check queries with tempstg > 1GB')
In order to check if everything is ok, let’s check the previous query, I expect to find two rows:
Please, pay attention to the DETECTION_FREQUENCY parameter, it indicates the distance between monitoring checks. It is necessary to identify a correct value that allows you to go and intercept critical conditions in the best possible way, here is a specific page about this parameter.
Now we have to build the program that will intercept and manage the attainment of the threshold. The IBM documentation presents some examples, I will also leave one below. It will simply e-mail me detailing the job that crossed the threshold:
To complete everything, we need only to register this program in exit point QIBM_QQQ_QRY_SUPER with this command: ADDEXITPGM EXITPNT(QIBM_QQQ_QRY_SUPER) FORMAT(QRYS0100) PGMNBR(1) PGM(UTILITIES/QRYSUP) TEXT('Query supervisor EXITPGM')
Here you can find some documentation:
- Query Supervisor – IBM Documentation
- QUERY_SUPERVISOR view – IBM Documentation
- ADD_QUERY_THRESHOLD procedure – IBM Documentation
- REMOVE_QUERY_THRESHOLD procedure – IBM Documentation
- Query Supervisor Exit Program – IBM Documentation
Andrea