Tuesday, August 24, 2010

NAGIOS/Centreon: Troubleshooting Graphs

Troubleshooting:Graphs

From Wiki Centreon

Here is a short conversation "how to" troubleshoot graphs in centreon:

Conversation:
.. I know, currently this is only a log from an IRC session, but me (or any volunteers) may turn this in a good and structured HowTo :)
(14:22:09) grandmoun: How can i create graphs in centreon??
(14:28:27) nfilus: graphs are autogenerated, if the service you defined returns performance data
(14:29:01) zelia5: how do i generate perfdata ? ^^
(14:31:09) nfilus: # /usr/local/nagios/libexec/check_centreon_ping -H www.google.de
(14:31:15) nfilus: GPING OK - rtt min/avg/max/mdev = 23.269/23.269/23.269/0.000 ms|time=23.269ms;20;40;; ok=1
(14:31:25) nfilus: |time .... is the perfdata
compare this with the Service Details in Monitoring -> Services -> Details -> [your_service] like shown in the below image: Image:perfdata.png
(09:43:30) nfilus: let's try to analyze step by step
(09:43:35) dharrison: ok cool
(09:44:25) nfilus: your service is running and the last check timestamp is quite recent in centreon?
(09:44:58) nfilus: look for "last check" at the main page in centreon or in monitoring
(09:45:58) dharrison: everything seems to be running ok
(09:46:40) nfilus: goto administratin -> options -> centstorage -> options
(09:47:38) nfilus: no empty fields?
(09:47:49) dharrison: nope
(09:48:06) nfilus: what storage type :)
(09:48:17) nfilus: rrd & mysql?
(09:48:24) dharrison: yup
(09:48:54) nfilus: check on filesystem if service-perfdata file exists and is 644 user:nagios group:nagios
(09:50:20) dharrison: looks like its 777 nagios & www-data
(09:50:43) nfilus: that's too much, but shouldn't be the problem
(09:50:53) nfilus: ok
(09:51:07) nfilus: goto centstorage -> manage in left menu
(09:51:35) nfilus: and choose the service you are interested in
(09:51:43) dharrison: theres nothing there
(09:51:48) dharrison: its empty
(09:52:35) nfilus: that's a symptom, lets look for the cause ...
(09:52:47) nfilus: na values - no graphs, sorry! :)
(09:52:59) dharrison: lol that would make sense  :-)
(09:53:29) nfilus: go to monitoring to your service details
(09:53:41) dharrison: any service?
(09:53:54) nfilus: the one you are interested in mostly
(09:54:31) dharrison: ok i have picked a host, and we will go for CPU Usage
(09:55:08) nfilus: ok
(09:55:31) nfilus: in status details: you have a status and performance data?
(09:56:16) dharrison: yes
(09:56:35) nfilus: please paste the perfdata here
(09:56:54) dharrison: '5 min avg Load'=1%;85;90;0;100
(09:57:35) nfilus: looks ok
(09:57:55) nfilus: so, perfdata is generated, but not processed
(09:58:56) nfilus: go to config -> command -> misc 
(09:59:31) nfilus: you should have sth like a process-service-perfdata command
(09:59:47) nfilus: (i think my definition is not standard)
(09:59:55) dharrison: yup i have that
(10:00:13) nfilus: open it and paste the command line
(10:01:13) dharrison: $USER1$/process-service-perfdata  "$LASTSERVICECHECK$" "$HOSTNAME$" "$SERVICEDESC$" "$LASTSERVICESTATE$" "$SERVICESTATE$" "$SERVICEPERFDATA$"
(10:02:04) nfilus: looks ok
(10:04:20) nfilus: config -> nagios -> nagios.cfg -> data
(10:05:10) dharrison: ok
(10:05:16) nfilus: perdata option is yes
(10:05:26) nfilus: service command is process-service-perfdata
(10:05:37) nfilus: service data file is /usr/local/nagios/var/service-perfdata
(10:06:13) nfilus: ok?
(10:06:16) dharrison: its /var/log/nagios3/service-perfdata
(10:06:28) dharrison: and perfdata option is yes
(10:07:22) nfilus: is this the same path as defined in administratin -> options -> centstorage -> options?
(10:08:14) dharrison: yes, just checked
(10:08:38) nfilus: so, this is the file you checked before for access, right?
(10:09:10) dharrison: yup
(10:09:36) dharrison: but its not the same file that $USER1$ points to. is that correct?
(10:10:24) nfilus: you mean  $USER1$/process-service-perfdata?
(10:11:08) dharrison: yup
(10:11:40) nfilus: no, this was the command that gets the perfdata from service checks and writes them into /var/log/nagios3/service-perfdata
(10:11:47) dharrison: oh ok
(10:12:30) nfilus: please do
(10:12:36) nfilus: tail -f /var/log/nagios3/service-perfdata
(10:13:04) nfilus: and watch for changes for 1-2 minutes
(10:13:27) dharrison: ok running now
(10:13:33) nfilus: is there any data comming in?
(10:13:37) dharrison: yes
(10:14:37) nfilus: ok, 
(10:14:38) nfilus:  ps ax | grep cent
(10:14:43) nfilus: centstorage is running?
(10:16:24) dharrison: seems to be
(10:17:07) nfilus: ok, do
(10:17:18) nfilus: tail -f /usr/local/centreon/log/centstorage.log
(10:17:28) nfilus: any errors or warnings?
(10:18:18) dharrison: no such log file
(10:20:11) nfilus: path centreon is in usr local, yes?
(10:20:59) dharrison: yes
(10:24:57) nfilus: grep LOG /usr/local/centreon/bin/centstorage
(10:25:07) nfilus: what's the log path?
(10:26:07) dharrison: "/usr/local/centreon/log/centstorage.log";
(10:26:46) nfilus: ls -lad  /usr/local/centreon/log
(10:26:57) nfilus: drwxrwxr-x 2 www-data nagios ?
(10:27:42) dharrison: yup   lol
(10:28:50) nfilus: that'S not normal, that no log file is there if centstorage is running!
(10:29:11) nfilus: is there a logAnalyser.log?
(10:29:21) dharrison: yes
(10:34:56) nfilus: can you restart centstorage
(10:35:05) dharrison: yeah 2secs
(10:36:15) dharrison: it did bring this up when i stopped it No lock file found in /var/run/centreon/centstorage.pid
(10:36:49) dharrison: ive stopped it but says its still running????
(10:37:04) dharrison: whats the process name for centstorage?
(10:37:40) nfilus: something like /usr/bin/perl -w /usr/local/centreon/bin/centstorage
(10:38:59) dharrison: hey hey  can't write /usr/local/centreon/log/centstorage.log: Permission denied
(10:39:17) dharrison: when i typed that command above
(10:40:23) nfilus: you are root?
(10:41:10) nfilus: there is no centstorage.log until now and  /usr/local/centreon/log is writeable, yes?
(10:41:30) dharrison: i have now ran that as sudo and came back ok
(10:42:55) dharrison: i ran  /usr/bin/perl -w /usr/local/centreon/bin/centstorage   as sudo which i should have done tbh. sorry
(10:43:03) dharrison: and there is now a centstorage.log
(10:43:39) nfilus: watch it for progress and errors
(10:43:41) nfilus: tail -f 
(10:44:16) dharrison: just two lines at the mo.
(10:44:26) dharrison: 1 stating that its starting
(10:44:32) dharrison: 2 with the PID Number
(10:44:44) nfilus: woow, that's progress :)
(10:44:52) dharrison: lol certainly is
(10:45:26) dharrison: nothing else is coming through
(10:46:13) nfilus: it should stay silent if no errors occur
(10:46:27) nfilus: like in my case:
(10:46:29) nfilus: 22/10/2009 10:47:01 - ERROR while updating /var/lib/centreon/status/186.rrd at 1256201216 -> 100 : illegal attempt to update using time 1256201216 when last update time is 1529719541 (minimum one second step)
(10:47:31) dharrison: lol
(10:47:42) dharrison: nope still silent......but no graphs still
(10:48:46) nfilus: wait 5 minutes and then go back to admin -> options -> centstorage -> manage
(10:48:58) nfilus: there should be some data now
(10:49:44) dharrison: ok currently still empty. but you reckon to wait a few more minutes?
(10:50:51) nfilus: yes, the perfdata needs to be filled in
(10:51:30) dharrison: ok
(10:56:10) nfilus: so, .... is there any data?
(10:56:32) dharrison: WHOA DUDE!
(10:56:33) nfilus: ... or any errors
(10:56:37) dharrison: data
(10:56:39) dharrison: lots
Image:graph_NaN.png

Contents

[hide]

centstorage.log errors


unitialized value ...

Use of uninitialized value in multiplication (*) at /usr/local/centreon/bin/centstorage line 506
(14:54:33) nfilus: the problem is : $interval = getServiceCheckIntervalWithSVCid($index) * getIntervalLenght($con_oreon);
(14:55:27) nfilus: either the global interval (Configuration -> Nagios -> nagios.cfg -> Tuning : Timing Interval) 
           is not defined in config, or there is no check interval for some services
(14:56:56) iLLiZT: Hmm, there might not be a check interval defined for a couple of services, but shouldn't they use some kind of default then?
(14:58:40) nfilus: no
(14:58:59) iLLiZT: Ok, so I have to define the normal check interval and retry check interval for all services?
(14:59:32) nfilus: either for every service or in the used templates

timestamp error while updating - case A

31/1/2010 13:31:30 - ERROR while updating /var/lib/centreon/metrics/561.rrd at 1264941084 -> 31 : illegal attempt to update using time 1264941084 when last update time is 1264941084 (minimum one second step)
In this case, where all timestamps are the same (1264941084) the reason was the service check_smart and a very old smartctl producing a malformed perfdata by repeating a metric twice (... temp=55234323 temp=34 ...). You can query mysql to which service the metric id (example: 561) corresponds to by using:
mysql> select host_name, service_description from metrics, index_data where index_id = id and metric_id = 561;
Afterwards execute the check_command for service_description on host_name on the command line, to see the unparsed performance data output.

timestamp error while updating - case B

31/1/2010 13:31:30 - ERROR while updating /var/lib/centreon/metrics/561.rrd at 1264941084 -> 31 : illegal attempt to update using time 1264941084 when last update time is 1564941084 (minimum one second step)
In this second case, where these errors occur, the last timestamp in error message is (mucht) greater than the first one (in the future of year 2011). Please check the system clock on your monitoring server. It might be that the systime is jumping or beeing re-adjusted by NTP, /etc/adjtime or vmware-tools.

Can't use string (...) as a HASH ref while "strict refs"

Can't use string ("HOSTSTATE::UP") as a HASH ref while "strict refs" in use at 419
This error is common for people migrating from pnp4nagios or who did import their old nagios commands into centreon and who chose to overwrite the default values. For centstorage to work correctly it is essential to process the performance data coming from the plugins, which is expected in a well-defined format. If the format deviates, centstorage can't parse the values anymore. The format is determined by the command definition which nagios is using as Service Performance Data Processing Command in Configuration -> Nagios -> nagios.cfg -> Data (default: process-service-perfdata). Please check the parameters of this command as defined in Configuration -> Commands -> Miscellaneous -> "command-name", which should be:
$USER1$/process-service-perfdata  "$LASTSERVICECHECK$" "$HOSTNAME$" "$SERVICEDESC$" "$LASTSERVICESTATE$" "$SERVICESTATE$" "$SERVICEPERFDATA$"

Customize graphs

Q: Where and how do I configure Centreon that it has to use the performance data to create a graph?
A: Centreon uses the data as soon as it is parsed by centstorage and copied into the configured storages (RRD, RRD and DB). Go to Views -> Curves and define colors for your metrics (time, temperature, total, ...). In Administration -> Options -> CentStorage -> Manage you can disable not needed performace metrics to be not displayed on the graphs. For more control of graph output use the graph templates.

Source:
http://en.doc.centreon.com/Troubleshooting:Graphs

No comments:

Post a Comment