NAME
Karma - Services Help
DESCRIPTION
This document details each of the services monitored by karma, and how
to configure monitoring for them.
Alert Log Errors
The Oracle alert.log facility is similar to that of the unix syslog
facility. It is used to report various system messages like startup and
shutdown, as well as checkpoints, redo-log switches, and most
importantly ORA-xxxx errors.
Monitoring ORA-xxxx errors is an important part of the DBA's
responsibility, and Karma aims to ease that burden by watching the
alert.log for you.
Karma monitors databases remotely, and as such cannot directly access an
OS level file such as the alert.log. The solution (if you're interested
in monitoring the alert.log of remote databases) is the run an
additional script which comes with Karma on any machine whose alert.log
you wish to monitor. Essentially it watches the file for changes, and
writes any ORA-xxxx errors to a table. Karma then just watches this
table for new entries. Checkout the karmagentd for more information on
configuring that end of things.
Unlike other services, the alertlog facility actually ignores the
warning and alert thresholds right now. The way it works currently is
that if there are any ORA-7445, or ORA-600's an ALERT will be flagged,
and for any other errors a WARNING is flagged. Only errors logged in the
last day (by karmagentd which may differ from the time the db logged the
error) are considered by this report. In addition, if you have your
config file set to do alert.log monitoring, but the karmagentd daemon
isn't running, an ALERT is also flagged. If you haven't created the
karma_alertlog_errors table, it is set to NO_STATUS.
Initialization Parameters
This section displays the Oracle initialization parameters from the
v$parameter data dictionary view. This section is not configurable, and
is always displayed.
`SELECT name, value FROM v$parameter ORDER BY name'
Extents
Karma monitors extents of non-sys objects in your database. Basically if
any object is within your warning or alert threshold of maxextents for
that object, a WARNING or ALERT is flagged. If you have maxextents set
to unlimited, you won't encounter a warning or alert status. If there
are many objects in your database which are nearing their maxextents,
karma displays the first 100 objects only.
Fragmentation
Fragmentation occurs at the table (heap) or index (b-tree) level.
Essentially when you create objects in a tablespace, if you set them all
with different storage parameters, or a pctincrease which is non-zero
you'll likely cause tablespace fragmentation of the objects contained
therein.
Fragmentation can be resolved by rearranging objects in other
tablespaces, rebuilding with different storage parameters, or
export/import. Ideally though, it would be best to avoid fragmentation
altogether. How can we accomplish this? Oracle recommends in their
latest whitepaper on the subject "How To Stop Defragmenting and Start
Living" to avoid fragmentation altogether by creating tablespaces with
with uniform extent sizes, and leaving objects to assume the default
storage params when they're created. For more information, check:
If you're running Oracle 8i, should seriously consider using locally
managed tablespaces. These new types of tablespaces use an allocation
bitmap to keep track of equally sized extents which you set at
tablespace creation time. For more information, see the following:
Hit Ratios
Hitratios are a simple way to get a big picture of how well your
database is performing. Essentially a hitratio gives you a ratio with
which to quickly judge how many I/O requests are being satisfied via
memory vs I/O requests which actually require disk I/O.
We monitor data block buffer hitratio only, for now. The query looks
like the following:
`SELECT name, value'
`FROM v$sysstat'
`WHERE name IN ('consistent gets', 'db block gets', 'physical reads')'
Then calculate like this:
`hitratio = logical reads - physical reads / logical reads'
`logical reads = consistent gets + db block gets'
There are a number of things to look at when tuning your buffer cache
hitratio. Generally you want to shoot for upwards of 95% in an OLTP
database. Increasing the size of your db_block_buffers parameter. In
addition, tuning slow sql queries can often improve your buffer cache
hitratio quite dramatically.
Latch Contention
Latches are special types of locks that Oracle uses to protect areas of
shared memory. There are latches protecting the redo log buffer, shared
pool, buffer cache, and sort segments. Locks when used in the strict
Oracle sense are used to protect transactions.
You can find out what's happening with the latches in your database by
issuing the following query:
`SELECT name, gets, misses'
`FROM v$latch'
`ORDER BY name'
Then calculate various ratios with this formula:
`ratio = (gets - misses) * 100 / gets'
Tuning latch contention can be a complex issue, so I won't go into all
the details here. Suffice to say that application tuning, or tuning
queries that have been identified as doing a lot of disk I/O can often
produce substantial benefit with respect to latch contention.
MTS - Multi-Threaded Server
Multi-threaded server is a facility Oracle provides for installations
which require a very large number of user sessions, typically 500-1000.
Multi-threaded server reduces the memory requirements, and OS load, and
is often appropriate for database-backed websites.
As with every facility, in order for it to run properly, it needs to be
monitored to ensure no contention for it's resources. With MTS the
database preallocates shared server and dispatcher processes. If you
don't have enough shared servers or dispatcher processes started, Oracle
can dynamically spawn more up to a given threshold controlled by
mts_max_servers and mts_max_dispatchers. Generally, however, if you're
dynamically spawning these processes, and then they are dying, you'll
have performance problems. You want enough of them prespawned to handle
all but peak loads. If you're experiencing contention, you'll want to
prespawn more processes with mts_dispatchers, or mts_servers. See your
Oracle documentation for configuration details.
OS Statistics
Karma provides limited ability to monitor operating system level
statistics similar to the way it allows monitoring of the alert.log. The
karmaagentd script also checks via "uptime" the load average and percent
idle. As with the alert.log info, this data populates a table which
karmad then monitors for changes. Checkout the karmagentd for more
information on configuring that end of things.
For now only load averages are monitored, however additional OS level
statistics may be added.
Redologs
Redologs are where Oracle writes all transactions to, in addition to
writing to a block of memory, which eventually makes its way to
datafiles on disk. Redologs capture INSERT, UPDATE, and DELETE activity,
and provide security in case the database or machine which it runs on
crashes. The are crucial to point in time recovery. Generally we don't
want to be switching redo-logs too quickly lest we degrade the databases
performance.
You can view how often the redolog files are switching with the
following query:
`SELECT TO_CHAR(first_time, 'J'), TO_CHAR (first_time, 'SSSSS'),'
`group#, sequence#, TO_CHAR (first_time, 'DD/MM/YYYY HH24:MI')'
`FROM v$log'
`ORDER BY 1,2'
Generally it is recommended that these files not switch more than once
every half hour during peak database activity. If the redolog files are
switching too often, it's probably a good idea to make the files bigger.
For large databases, 100M+ redolog files are not uncommon.
You can change the size of the redolog files while the database is
running. Suppose you currently have four logfile groups (1-4) which are
25M and you want to make them 50M. You would do something like this:
`1. ALTER DATABASE ADD LOGFILE GROUP 5 ('log5a', 'log5b') SIZE 50M'
`ALTER DATABASE ADD LOGFILE GROUP 6 ('log6a', 'log6b') SIZE 50M'
`ALTER DATABASE ADD LOGFILE GROUP 7 ('log7a', 'log7b') SIZE 50M'
`ALTER DATABASE ADD LOGFILE GROUP 8 ('log8a', 'log8b') SIZE 50M'
`2. ALTER SYSTEM SWITCH LOGFILE'
`(do this step until you are using the new logfiles)'
`3. ALTER DATABASE DROP LOGFILE GROUP 1'
`ALTER DATABASE DROP LOGFILE GROUP 2'
`ALTER DATABASE DROP LOGFILE GROUP 3'
`ALTER DATABASE DROP LOGFILE GROUP 4'
During step 2, you'll have to verify that your system is now using the
new logfiles. You'll be able to do that with the following query:
`SELECT group#, status FROM v$log'
When the redologs in groups 5-8 are in use or 'CURRENT' you'll be
allowed to drop the old logfiles. Oracle won't allow you to drop a
logfile that is in use.
Deferred Transaction Error Queue
The Oracle deferror queue contains transactions that have failed to
replicate for various reasons.
Monitoring the deferror queue is crucial to maintaining the health of a
replicated environment. Karma monitors the number of transactions which
have failed with errors. If it gets too high a warning or alert is
flagged.
`SELECT deferred_tran_id, origin_tran_db, destination,'
`to_char(start_time, 'HH24:MI:SS') , error_number'
`FROM deferror'
`WHERE start_time > sysdate - 1'
`ORDER BY start_time'
Deferred Transaction Queue
The Oracle deftran queue contains transactions bound for remote
databases.
Monitoring the deftran queue is crucial to maintaining the health of a
replicated environment. Karma monitors the number of transactions
pending in this queue. If it gets too high a warning or alert is
flagged.
For further information, take a look at the deftran queue with this
query:
`SELECT t.deferred_tran_id, t.delivery_order, '
`to_char(t.start_time, 'DD/MM/YYYY HH24:MI:SS') '
`FROM deftrandest d, deftran t'
`WHERE d.deferred_tran_id = t.deferred_tran_id'
`AND d.delivery_order = t.delivery_order'
`ORDER BY t.start_time'
Rollback Segment Contention
Rollback segment activity is an important facility to monitor in your
database to maintain reliable performance. Whenever a transaction
modifies a block of data in your database, rollback segments provide a
read-consistent view to the other sessions in the database, giving the
picture of the data before any changes began. Additionally, as with
redologs, rollback segments are important for database recovery.
`SELECT a.name, b.status, b.gets, b.waits'
`FROM v$rollname a, v$rollstat b'
`WHERE a.usn = b.usn'
Slow SQL
Slow SQL queries can be one of the most frustrating and performance
degrading aspects of database administration. What makes it particularly
frustrating is if you have developers on your production box. :-)
Bad queries manage to find their way into every database. Karma provides
a method to be a little more proactive about monitoring this activity,
and letting you know hopefully before they become a problem. Karma,
though, can only help identify those queries that are problems, it can't
optimize them.
Optimizing queries can mean anything from analyzing related tables and
indexes in a schema, providing hints to suggest a better execution plan,
creating indexes to provide Oracle with a faster way to the data, or
actually rearranging the query so that perhaps it enables an index that
it previously disabled. Application tuning, by redefining the problem in
a way that makes it simpler to solve, can also often be a very viable
and attractive way to improve some particularly bad queries. For more
information on all aspects of SQL query tuning see:
Please test the slow sql funcationality before running it on your
production database and limit how often you run it, as it can itself be
a "slow sql" query.
The following query should help you get started:
`SELECT disk_reads, executions, sql_text'
`FROM v$sqlarea'
If diskreads are excessively high relative to a low number of
executions, the query will likely need tuning. Executions represent the
number of times the same query was reused.
Tablespace Quotas
karma allows tablespaces to be monitored like you monitor disk capacity
with "df" in Unix. This is above and beyond the extents and
fragmentation which you can monitor separately.
Tablespaces with any datafiles set to AUTOEXTEND are ignored in this
analysis. Use this query to see where your tablespaces are currently at:
`SELECT a.tablespace_name, a.total, b.used'
`FROM (SELECT tablespace_name, SUM (bytes) total'
`FROM dba_data_files'
`GROUP BY tablespace_name) a,'
`(SELECT tablespace_name, SUM (bytes) used' `FROM dba_segments'
`GROUP BY tablespace_name) b'
`WHERE a.tablespace_name = b.tablespace_name (+)'
UP Status
This section merely monitors that the database is up and reachable. In
addition you can view performance statistics from v$sysstat, and other
miscellaneous database information. This section is always enabled, and
cannot be disabled.