ElasticSearch internals monitoring by Zabbix (v2 traps)

6 Dec 2014 #linux #zabbix

Here is more resource oriented version of ElasticSearch monitoring from previous article with using zabbix-traps. Also, it comes with very basic template, which was so asked in comments:

Graphs included:

Shard’s nodes status
Indices tasks speed
Indices tasks time spend

Installation:

Save this as /etc/zabbix/zabbix_agentd.d/elasticsearch.conf:

#Key jvm.uptime_in_millis used to trigger trap sending
UserParameter=es[*],/etc/zabbix/elasticsearch.py $1

And here is the data-getter:
/etc/zabbix/elasticsearch.py
Then import template:
template_app_elasticsearch.xml

How to add a new counter:

Browse JSON output of your server’s:
curl http://localhost:9200/_nodes/_local/stats?all=true
Write path to value of interest using dot as separator, for example:
indices.docs.count
Create new counter in zabbix with:
key name = es[path.you.found]
and type = zabbix_trap

And here is the difference from previous version:
Please note that you also need to add this path to counter to traps2 section of elasticsearch.py file. Then execute elasticsearch.py without any parameters and debug of zabbix_sender should be written to console. In top section you should find your new counter key (if it isn’t - key is not found or empty in JSON output), and in bottom section number of failed items should be zero (if it isn’t - there is no such key for this server configured in zabbix web)

Comments imported from blogger:

Anonymous at 11:11, 05 January 2015:
Hi, Can you please add the steps to monitor another host instead of localhost .
if i want to monitor like 10.0.1.10 host from my zabbix server 10.0.1.5, so what steps i need to do on remote host and zabbix server as well.

sepa at 11:28, 05 January 2015:
Just a usual monitoring of remote host by zabbix-agent. Something like this:
- install zabbix-agent on remote host
- add this host to zabbix-server, add it to some "generic linux template" and check that basic data started to collect (zabbix-agent must return the same "hostname" as it is configured for this host on zabbix-server side)
- then proceed with expanding data with elasticsearch checks - import template from this article on zabbix-server side, and add it to your host
- on host side configure
/etc/zabbix/zabbix_agentd.d/elasticsearch.conf and
/etc/zabbix/elasticsearch.py

So, the template should be installed only once on server side, and then assigned to hosts needed. But config for agent-check and script to collect the data should be installed on each elasticsearch host (and they are sending data about localhost back to zabbix-server)

Unknown at 19:08, 07 January 2015:
when I run script I get this error:

12539:20150107:102815.007 item "nw-esclient-201.prod.pcln.com:es[jvm.uptime_in_millis]" became not supported: Received value [File "/usr/local/zabbix/bin/elasticsearch.py", line 22 "status", ^SyntaxError: invalid syntax] is not suitable for value type [Numeric (float)]

sepa at 19:23, 07 January 2015:
Maybe there is some EOL issue when you'd downloaded the script?
What is shown when you try:
# python elasticsearch.py
?

Unknown at 19:31, 07 January 2015:
bash-4.1$ python elasticsearch.py
File "elasticsearch.py", line 22
"status",
^
SyntaxError: invalid syntax

sepa at 19:40, 07 January 2015:
Dunno, syntax looks valid. At least for python2 i have installed at my home box (there is no elasticsearch)
# wget https://github.com/sepich/zabbix/raw/master/elasticsearch.py &>/dev/null
# python elasticsearch.py
Unable to load JSON data!

So, no any syntax errors. What's your OS and python?

Unknown at 19:47, 07 January 2015:
Thanks for helping out, OS=Red Hat Enterprise Linux Server release 6.5
with Python 2.6.6

Unknown at 15:21, 12 January 2015:
Hi Tom,
add None value to all keys in traps1 and traps2 dictionaries

ie. (line 22) "status": None,

Worked for me on same environment

Unknown at 19:51, 14 January 2015:
Thanks Federico, that helped:

bash-4.1$ ./elasticsearch.py jvm.uptime_in_millis
10116541216

But this errors:

bash-4.1$ ./elasticsearch.py jvm_heap_p_used
zabbix_sender [18783]: Warning: [line 1] '-' encountered as 'Hostname', but no default hostname was specified
Sending failed.

Anonymous at 19:41, 21 January 2015:
Hi,
Can you help me:

Traceback (most recent call last):
File "./elasticsearch.py", line 70, in
main()
File "./elasticsearch.py", line 64, in main
stats=stats[c.pop(0)]
UnboundLocalError: local variable 'stats' referenced before assignment

sepa at 22:26, 21 January 2015:
There is no such code (stats=stats[c.pop(0)]) anywhere in file:
https://github.com/sepich/zabbix/blob/master/elasticsearch.py
Maybe you are trying with some older version?
Could you try with the latest?

Anonymous at 16:10, 04 March 2015:
Hi There,

This is a great solution, thank you for sharing. I have a small problem with it. elasticsearch.py script runs OK on my other nodes but on my master node, it returns "Unable to load JSON data!" error. I don't know python therefore cannot really figure out what the code does but I assume something fails here:

for node_id in all['nodes']:
if all['nodes'][node_id]['host'].startswith(os.uname()[1]):
node = all['nodes'][node_id]
if len(sys.argv) == 1:
print "node found"
except:
print "Unable to load JSON data!"
sys.exit(1)

Any ideas?

sepa at 17:33, 04 March 2015:
Looks like cluster node name differs from hostname on that server. Tell me what are they and I'll think how script should be modified to handle this

Anonymous at 11:50, 05 March 2015:
Hmm, that's strange :) I have the same naming convention on all nodes.
hostnames: eslog001.abc.local to eslog005.abc.local (first one is the master and the problem is there)
Node names: ES_ONE, ES_TWO, ES_THREE, ES_FOUR, ES_FIVE

Anonymous at 18:00, 09 March 2015:
I don't know if you got the reply, if not, let me know and I'll send it again.

sepa at 18:27, 09 March 2015:
In this case it should not work on other nodes too ;)
if all['nodes'][node_id]['host'].startswith(os.uname()[1]):
Basically what this code does - it search all cluster nodes looking for that which starts with current hostname. So, for eslog001.abc.local hostname would be eslog001.
If your cluster nodes names are not equal to hostnames - you can just hardcode it in script like this:
if all['nodes'][node_id]['host'].startswith('ES_ONE'):

Anonymous at 11:34, 10 March 2015:
This is really strange. I have created a script as you advised with the hard coded node address and hard coded node name. It still does not work on the problematic server (unable to load JSON) but if I run the exact same script on others with the problematic server's address and name hardcoded, it works :)

sepa at 21:59, 14 March 2015:
Actually python scripts are not so hard in debugging
Just open python prompt:
# python
And then copy-paste everything from script which is above
def main():
line (actually only imports and getKeys needed here). Then copy-paste beginning of main():
f = requests.get("http://localhost:9200/_cluster/health")
health = f.json()
f = requests.get("http://localhost:9200/_nodes/_local/stats?all=true")
all = f.json()
Now we are coming to problem place. Let's check what nodes are exist in health report:
>>> for node_id in all['nodes']: print all['nodes'][node_id]['host']
there should be all nodes names printed out. And we looking here for current server, with this name:
>>> print os.uname()[1]
Do you see some node in previous output with name starting with exact same characters?

Oh, I've remember that error was "Unable to load JSON data!", so it is exception during one of those commands execution. When you did copy-paste lines from script - where do you saw errors?

Anonymous at 19:52, 19 March 2015:
Hi

I have tried using this script but keep getting this error when I run it:

File "/etc/zabbix/elasticsearch.py", line 22
"status",
^
SyntaxError: invalid syntax

humit at 16:56, 24 June 2015:
Hi all,

If you receive the error "Unable to load JSON data!" try running the command:

curl -XGET 'http://localhost:9200/_cluster/health'

In my case, the elasticsearch process was not listening on localhost (127.0.0.1) and I got the error:

curl: (7) couldn't connect to host

Then making a "sudo netstat -ntlp" revealed the IP address and port elasticsearch is listening to and replacing "localhost" with this IP address in the script solved the problem.

Anonymous at 16:20, 04 August 2015:
zabbix_sender [6743]: DEBUG: answer [{"response":"success","info":"processed: 0; failed: 33; total: 33; seconds spent: 0.000243"}]
info from server: "processed: 0; failed: 33; total: 33; seconds spent: 0.000243"
sent: 33; skipped: 0; total: 33

Unknown at 09:47, 21 August 2015:
incorrect settings zabbiks-agent?

Anonymous at 03:33, 29 August 2015:
I hardcoded my hostname
line 71
out += "YOURHOSTNAMEHERE es.{0} {1}\n".format(t,s)

Anonymous at 19:58, 05 October 2015:
Hello.
I performed the process but do not have the /etc/zabbix/zabbix_agendtd.d directory, just /etc/zabbix/agentd_conf.d.
Use version 2.4 of zabbix, but monitoring is not bringing any results.

Can you help me?

sepa at 20:03, 05 October 2015:
Sure, you need to use '/etc/zabbix/agentd_conf.d' folder if it is in your /etc/zabbix/zabbix_agent.conf 'Include' statement

Anonymous at 20:31, 05 October 2015:
Tks for your fast reply.

Look that:
/etc/zabbix/zabbix_agentd.conf, is all right.

### Option: Include
# You may include individual files or all files in a directory in the configuration file.
# Installing Zabbix will create include directory in /etc/zabbix, unless modified during the compile time.
#
# Mandatory: no
# Default:
# Include=
# Include=/etc/zabbix/zabbix_agentd.userparams.conf
# Include=/etc/zabbix/zabbix_agentd.conf.d/
Include=/etc/zabbix/zabbix_agentd.conf.d/

I also changed the /etc/zabbix/elasticsearch.py

cfg = '/etc/zabbix/zabbix_agentd.conf.d' # path to zabbix-agent config

Nothing ocurrs

sepa at 23:07, 05 October 2015:
Did you tried to run:
# /etc/zabbix/elasticsearch.py
Did you tried to run:
# zabbix_agentd -k 'es[indices.docs.count]'
What are the results?

David Lang at 23:48, 05 October 2015:
one bug, it doesn't find the node if the case is difference between the hostname and the node name, inserting a couple of .lower() into the comparison fixes that

I also think it's a good idea to modify line 72 to be:

out += "{0} es.{1} {2}\n".format(os.uname()[1],t,s)

Anonymous at 16:18, 06 October 2015:
Runnig /etc/zabbix/elasticsearch.py, nothing occurs.

The other command return these result:

aryel@backend-elasticsearch-dev-data-igm-pri-instance-l7d5:/usr/sbin$ ./zabbix_agentd -t 'es[indices.docs.count]'
es[indices.docs.count] [m|ZBX_NOTSUPPORTED]

Anonymous at 16:20, 06 October 2015:
Retrying the command:

- es.status 503
- es.indices.search.fetch_total 0
- es.indices.merges.total 11950
- es.indices.refresh.total_time_in_millis 1394850
- es.indices.indexing.index_time_in_millis 372322
- es.os.mem.actual_used_in_bytes 3064516608
- es.indices.merges.total_time_in_millis 3231557
- es.indices.flush.total 253
- es.indices.indexing.index_total 328279
- es.indices.indexing.delete_total 0
- es.indices.docs.deleted 0
- es.jvm.mem.heap_committed_in_bytes 3203792896
- es.indices.indexing.delete_time_in_millis 0
- es.indices.get.missing_total 0
- es.indices.search.query_time_in_millis 0
- es.indices.get.exists_time_in_millis 0
- es.indices.search.fetch_time_in_millis 0
- es.indices.search.query_total 0
- es.indices.docs.count 573692
- es.jvm.mem.heap_used_in_bytes 756227784
- es.indices.refresh.total 108393
- es.indices.store.throttle_time_in_millis 25133
- es.indices.warmer.total_time_in_millis 1250
- es.indices.get.exists_total 0
- es.indices.get.missing_time_in_millis 0
- es.indices.flush.total_time_in_millis 15132
- es.os.mem.actual_free_in_bytes 4799635456
- es.indices.warmer.total 5491

sepa at 22:52, 06 October 2015:
ZBX_NOTSUPPORTED - means zabbix agent does not know what 'es[]' item key mean. Agent should be restarted after you put config file in place.
- es.indices.merges.total 11950 ... - means data is collecting ok.

Anonymous at 19:14, 07 October 2015:
I reinstall the agent, but anything still wrong
Could you help me?
aryel@backend-elasticsearch-dev-data-igm-pri-instance-l7d5:/etc/zabbix$ ./elasticsearch.py
node found
- es.status yellow
- es.unassigned_shards 0
- es.active_primary_shards 10
- es.initializing_shards 1
- es.active_shards 19
- es.relocating_shards 0
- es.indices.search.fetch_total 0
- es.indices.merges.total 13889
- es.indices.refresh.total_time_in_millis 1618108
- es.indices.indexing.index_time_in_millis 441191
- es.os.mem.actual_used_in_bytes 3084935168
- es.indices.merges.total_time_in_millis 3596862
- es.indices.flush.total 311
- es.indices.indexing.index_total 395003
- es.indices.indexing.delete_total 0
- es.indices.docs.deleted 0
- es.jvm.mem.heap_committed_in_bytes 3203792896
- es.indices.indexing.delete_time_in_millis 0
- es.indices.get.missing_total 0
- es.indices.search.query_time_in_millis 0
- es.indices.get.exists_time_in_millis 0
- es.indices.search.fetch_time_in_millis 0
- es.indices.search.query_total 0
- es.indices.docs.count 401643
- es.jvm.mem.heap_used_in_bytes 1063793304
- es.indices.refresh.total 124231
- es.indices.store.throttle_time_in_millis 94
- es.indices.warmer.total_time_in_millis 2
- es.indices.get.exists_total 0
- es.indices.get.missing_time_in_millis 0
- es.indices.flush.total_time_in_millis 17301
- es.os.mem.actual_free_in_bytes 4779216896
- es.indices.warmer.total 19
zabbix_sender [2106]: DEBUG: answer [{"response":"success","info":"processed: 0; failed: 33; total: 33; seconds
spent: 0.000331"}]
info from server: "processed: 0; failed: 33; total: 33; seconds spent: 0.000331"
sent: 33; skipped: 0; total: 33

Alper at 11:38, 13 November 2015:
Hi there, I had elasticsearch 1.5 and this script was working like a charm but after update to elasticsearch 2.0 it stopped working and giving this error:

Traceback (most recent call last):
File "/etc/zabbix/elasticsearch.py", line 117, in
main()
File "/etc/zabbix/elasticsearch.py", line 94, in main
out += getKeys(node,traps2) #getting stats values
UnboundLocalError: local variable 'node' referenced before assignment

I don't see any difference in the output of http://localhost:9200/_nodes/_local/stats?all=true
Any idea?

Alper at 17:54, 13 November 2015:
I've found the issue. Previously I was using hostnames now I am using IP addresses. I forgot to change this in the script. :)

Unknown at 16:51, 18 November 2015:
Having a problem.

If i run the script directly as root or zabbix (with shell)

./elasticsearch.py `hostname`
: No such file or directory

or ./elasticsearch.py
: No such file or directory

If I run it as python ./elasticsearch.py it procudes the output fine.

suggestions

Anonymous at 19:29, 19 November 2015:
if I run it manually just once the graphs build without issue, so added a cronjob to give it a kick

I still dont understand exactly why this is.

Unknown at 14:58, 25 November 2015:
Please consider Elasticsearch 2.0 compatibility patch:
https://github.com/islepnev/zabbix/commit/f413717ce5c3a4b9ead0c8f417a2ba2a53
6b78d9

Unknown at 15:04, 25 November 2015:
Please consider Elasticsearch 2.x compatibility patch:

https://github.com/islepnev/zabbix/commit/f413717ce5c3a4b9ead0c8f417a2ba2a536b78d9

sepa at 22:00, 25 November 2015:
Thank you

Unknown at 22:37, 26 December 2015:
Dear Sepa, please explain how do you make your script send traps periodically? It works fine once, but then nothing happens. Do you use cron to schedule it?

sepa at 22:44, 26 December 2015:
There is one 'active check' which is sending traps when it is requested by agent. You see there is even note about it in elasticsearch.conf:
#Key jvm.uptime_in_millis used to trigger trap sending
So you can tune time period for this active check to also change period when traps are send.

Unknown at 01:56, 27 December 2015:
I didn't get sorry,
I put
sudo zabbix_agentd -t 'es[jvm.uptime_in_millis]'
and get
elasticsearch% sudo zabbix_agentd -t 'es[jvm.uptime_in_millis]'
es[jvm.uptime_in_millis] [t|8242898]

and that's all, zabbix_agentd doesn't start with this parametr

Unknown at 03:08, 27 December 2015:
Well, finally I get the thing. I added jvm.uptime_in_millis param to trap list that is sended to server in .py file and set time period for this trap, which is actually Zabbix agent (active)

cookies at 20:34, 18 May 2016:
Hi, can you give me some more hints about this, please? I added to traps:

out += "- {0} {1}\n".format("es[jvm.uptime_in_millis]","5000")

but I still don't get how this is supposed to work.

Anonymous at 06:13, 08 June 2016:
So i have tried, lets call it option 1

"jvm.mem.heap_used_in_bytes",
"os.mem.actual_free_in_bytes",
"os.mem.actual_used_in_bytes",
"jvm.uptime_in_millis",

and this option 2

"jvm.mem.heap_used_in_bytes",
"os.mem.actual_free_in_bytes",
"os.mem.actual_used_in_bytes",
"es[jvm.uptime_in_millis]",

Changed the key on the Zabbix server to match both the above.

Get this if i use the first option i try
zabbix_sender [21571]: DEBUG: answer [{"response":"success","info":"processed: 33; failed: 1; total: 34; seconds spent: 0.000328"}]
info from server: "processed: 33; failed: 1; total: 34; seconds spent: 0.000328"
sent: 34; skipped: 0; total: 34

I get this result on option 2

zabbix_sender [19248]: DEBUG: answer [{"response":"success","info":"processed: 33; failed: 0; total: 33; seconds spent: 0.000305"}]
info from server: "processed: 33; failed: 0; total: 33; seconds spent: 0.000305"
sent: 33; skipped: 0; total: 33

for the life of me i cant get it to update the trap stats automatically, does a great job if I run the script manually.

is there a bug somewhere?

Anonymous at 15:36, 23 August 2016:
Hi,
i am struggeling with the elasticsearch.py.
Everything works well since month with RHEL6.4, zabbix 2.4 client; Java 1.7 and Elasticsearch 1.2.4
Now we upgrade and have RHEL6.7, zabbix Client 3.0.4, Java 1.8 and ES 2.1.2
In this new enviroment elasticsearch.py does not return any info when startet from the commandline. It does in the old enviroment. Do you have any idear what i can do to get this working?

alex.ryabov.dev

ElasticSearch internals monitoring by Zabbix (v2 traps)