Karma - Services Help
This document details each of the services monitored by karma, and how to configure monitoring for them.
The Oracle alert.log facility is similar to that of the unix syslog facility. It is used to report various system messages like startup and shutdown, as well as checkpoints, redo-log switches, and most importantly ORA-xxxx errors.
Monitoring ORA-xxxx errors is an important part of the DBA's responsibility, and Karma aims to ease that burden by watching the alert.log for you.
Karma monitors databases remotely, and as such cannot directly access an OS level file such as the alert.log. The solution (if you're interested in monitoring the alert.log of remote databases) is the run an additional script which comes with Karma on any machine whose alert.log you wish to monitor. Essentially it watches the file for changes, and writes any ORA-xxxx errors to a table. Karma then just watches this table for new entries. Checkout the karmagentd for more information on configuring that end of things.
Unlike other services, the alertlog facility actually ignores the warning and alert thresholds right now. The way it works currently is that if there are any ORA-7445, or ORA-600's an ALERT will be flagged, and for any other errors a WARNING is flagged. Only errors logged in the last day (by karmagentd which may differ from the time the db logged the error) are considered by this report. In addition, if you have your config file set to do alert.log monitoring, but the karmagentd daemon isn't running, an ALERT is also flagged. If you haven't created the karma_alertlog_errors table, it is set to NO_STATUS.
This section displays the Oracle initialization parameters from the v$parameter data dictionary view. This section is not configurable, and is always displayed.
SELECT name, value
ORDER BY name
Karma monitors extents of non-sys objects in your database. Basically if any object is within your warning or alert threshold of maxextents for that object, a WARNING or ALERT is flagged. If you have maxextents set to unlimited, you won't encounter a warning or alert status. If there are many objects in your database which are nearing their maxextents, karma displays the first 100 objects only.
Fragmentation occurs at the table (heap) or index (b-tree) level. Essentially when you create objects in a tablespace, if you set them all with different storage parameters, or a pctincrease which is non-zero you'll likely cause tablespace fragmentation of the objects contained therein.
Fragmentation can be resolved by rearranging objects in other tablespaces, rebuilding with different storage parameters, or export/import. Ideally though, it would be best to avoid fragmentation altogether. How can we accomplish this? Oracle recommends in their latest whitepaper on the subject ``How To Stop Defragmenting and Start Living'' to avoid fragmentation altogether by creating tablespaces with with uniform extent sizes, and leaving objects to assume the default storage params when they're created. For more information, check: How To Stop Defragmentating and Start Living.
If you're running Oracle 8i, should seriously consider using locally managed tablespaces. These new types of tablespaces use an allocation bitmap to keep track of equally sized extents which you set at tablespace creation time. For more information, see the following: Getting the Best Out of Oracle8i Space Management,by Dave Ensor
Hitratios are a simple way to get a big picture of how well your database is performing. Essentially a hitratio gives you a ratio with which to quickly judge how many I/O requests are being satisfied via memory vs I/O requests which actually require disk I/O.
We monitor data block buffer hitratio only, for now. The query looks like the following:
SELECT name, value
WHERE name IN ('consistent gets', 'db block gets', 'physical reads')
Then calculate like this:
hitratio = logical reads - physical reads / logical reads
logical reads = consistent gets + db block gets
There are a number of things to look at when tuning your buffer cache hitratio. Generally you want to shoot for upwards of 95% in an OLTP database. Increasing the size of your db_block_buffers parameter. In addition, tuning slow sql queries can often improve your buffer cache hitratio quite dramatically.
Latches are special types of locks that Oracle uses to protect areas of shared memory. There are latches protecting the redo log buffer, shared pool, buffer cache, and sort segments. Locks when used in the strict Oracle sense are used to protect transactions.
You can find out what's happening with the latches in your database by issuing the following query:
SELECT name, gets, misses
ORDER BY name
Then calculate various ratios with this formula:
ratio = (gets - misses) * 100 / gets
Tuning latch contention can be a complex issue, so I won't go into all the details here. Suffice to say that application tuning, or tuning queries that have been identified as doing a lot of disk I/O can often produce substantial benefit with respect to latch contention.
Multi-threaded server is a facility Oracle provides for installations which require a very large number of user sessions, typically 500-1000. Multi-threaded server reduces the memory requirements, and OS load, and is often appropriate for database-backed websites.
As with every facility, in order for it to run properly, it needs to be monitored to ensure no contention for it's resources. With MTS the database preallocates shared server and dispatcher processes. If you don't have enough shared servers or dispatcher processes started, Oracle can dynamically spawn more up to a given threshold controlled by mts_max_servers and mts_max_dispatchers. Generally, however, if you're dynamically spawning these processes, and then they are dying, you'll have performance problems. You want enough of them prespawned to handle all but peak loads. If you're experiencing contention, you'll want to prespawn more processes with mts_dispatchers, or mts_servers. See your Oracle documentation for configuration details.
Karma provides limited ability to monitor operating system level statistics similar to the way it allows monitoring of the alert.log. The karmaagentd script also checks via ``uptime'' the load average and percent idle. As with the alert.log info, this data populates a table which karmad then monitors for changes. Checkout the karmagentd for more information on configuring that end of things.
For now only load averages are monitored, however additional OS level statistics may be added.
Redologs are where Oracle writes all transactions to, in addition to writing to a block of memory, which eventually makes its way to datafiles on disk. Redologs capture INSERT, UPDATE, and DELETE activity, and provide security in case the database or machine which it runs on crashes. The are crucial to point in time recovery. Generally we don't want to be switching redo-logs too quickly lest we degrade the databases performance.
You can view how often the redolog files are switching with the following query:
SELECT TO_CHAR(first_time, 'J'), TO_CHAR (first_time, 'SSSSS'),
group#, sequence#, TO_CHAR (first_time, 'DD/MM/YYYY HH24:MI')
ORDER BY 1,2
Generally it is recommended that these files not switch more than once every half hour during peak database activity. If the redolog files are switching too often, it's probably a good idea to make the files bigger. For large databases, 100M+ redolog files are not uncommon.
You can change the size of the redolog files while the database is running. Suppose you currently have four logfile groups (1-4) which are 25M and you want to make them 50M. You would do something like this:
1. ALTER DATABASE ADD LOGFILE GROUP 5 ('log5a', 'log5b') SIZE 50M
ALTER DATABASE ADD LOGFILE GROUP 6 ('log6a', 'log6b') SIZE 50M
ALTER DATABASE ADD LOGFILE GROUP 7 ('log7a', 'log7b') SIZE 50M
ALTER DATABASE ADD LOGFILE GROUP 8 ('log8a', 'log8b') SIZE 50M
2. ALTER SYSTEM SWITCH LOGFILE
(do this step until you are using the new logfiles)
3. ALTER DATABASE DROP LOGFILE GROUP 1
ALTER DATABASE DROP LOGFILE GROUP 2
ALTER DATABASE DROP LOGFILE GROUP 3
ALTER DATABASE DROP LOGFILE GROUP 4
During step 2, you'll have to verify that your system is now using the new logfiles. You'll be able to do that with the following query:
SELECT group#, status FROM v$log
When the redologs in groups 5-8 are in use or 'CURRENT' you'll be allowed to drop the old logfiles. Oracle won't allow you to drop a logfile that is in use.
The Oracle deferror queue contains transactions that have failed to replicate for various reasons.
Monitoring the deferror queue is crucial to maintaining the health of a replicated environment. Karma monitors the number of transactions which have failed with errors. If it gets too high a warning or alert is flagged.
SELECT deferred_tran_id, origin_tran_db, destination,
to_char(start_time, 'HH24:MI:SS') , error_number
WHERE start_time > sysdate - 1
ORDER BY start_time
The Oracle deftran queue contains transactions bound for remote databases.
Monitoring the deftran queue is crucial to maintaining the health of a replicated environment. Karma monitors the number of transactions pending in this queue. If it gets too high a warning or alert is flagged.
For further information, take a look at the deftran queue with this query:
SELECT t.deferred_tran_id, t.delivery_order,
to_char(t.start_time, 'DD/MM/YYYY HH24:MI:SS')
FROM deftrandest d, deftran t
WHERE d.deferred_tran_id = t.deferred_tran_id
AND d.delivery_order = t.delivery_order
ORDER BY t.start_time
Rollback segment activity is an important facility to monitor in your database to maintain reliable performance. Whenever a transaction modifies a block of data in your database, rollback segments provide a read-consistent view to the other sessions in the database, giving the picture of the data before any changes began. Additionally, as with redologs, rollback segments are important for database recovery.
SELECT a.name, b.status, b.gets, b.waits
FROM v$rollname a, v$rollstat b
WHERE a.usn = b.usn
Slow SQL queries can be one of the most frustrating and performance degrading aspects of database administration. What makes it particularly frustrating is if you have developers on your production box. :-)
Bad queries manage to find their way into every database. Karma provides a method to be a little more proactive about monitoring this activity, and letting you know hopefully before they become a problem. Karma, though, can only help identify those queries that are problems, it can't optimize them.
Optimizing queries can mean anything from analyzing related tables and indexes in a schema, providing hints to suggest a better execution plan, creating indexes to provide Oracle with a faster way to the data, or actually rearranging the query so that perhaps it enables an index that it previously disabled. Application tuning, by redefining the problem in a way that makes it simpler to solve, can also often be a very viable and attractive way to improve some particularly bad queries. For more information on all aspects of SQL query tuning see: Guy Harrison's book "Oracle SQL High Performance Tuning" - ISBN 0136142311
Please test the slow sql funcationality before running it on your production database and limit how often you run it, as it can itself be a ``slow sql'' query.
The following query should help you get started:
SELECT disk_reads, executions, sql_text
If diskreads are excessively high relative to a low number of executions, the query will likely need tuning. Executions represent the number of times the same query was reused.
karma allows tablespaces to be monitored like you monitor disk capacity with ``df'' in Unix. This is above and beyond the extents and fragmentation which you can monitor separately.
Tablespaces with any datafiles set to AUTOEXTEND are ignored in this analysis. Use this query to see where your tablespaces are currently at:
SELECT a.tablespace_name, a.total, b.used
FROM (SELECT tablespace_name, SUM (bytes) total
GROUP BY tablespace_name) a,
(SELECT tablespace_name, SUM (bytes) used
GROUP BY tablespace_name) b
WHERE a.tablespace_name = b.tablespace_name (+)
This section merely monitors that the database is up and reachable. In addition you can view performance statistics from v$sysstat, and other miscellaneous database information. This section is always enabled, and cannot be disabled.