Roomba Python API with checkmk
I wanted to know what my Roomba was up to without going through official channels.
This is a part two in the series of setting up checkmk monitoring for all the IoT devices on my network.
- Creating Home IT Monitoring with checkmk
- Roomba Python API with checkmk
- Reolink Python API with checkmk
- Creating a Library for Google Nest API Monitoring
Now that I had checkmk setup, I would have to answer two questions:
- How can I create scripts to get the status of a Roomba on my LAN?
- How can I add this script to my checkmk site?
Monitoring a Roomba
It’s sometimes fun to walk through the unnecessarily convoluted path you take to find the answer to something online.
When I initially wanted to look at what tools were available for Roomba hacking, I started by looking at Hackaday. Their posts are well tagged, and had a few articles for Roombas https://hackaday.com/tag/roomba/. The one that caught my eye was a project where someone turned the mapping output of the Roomba into levels for the original Doom video game http://www.richwhitehouse.com/index.php?postid=72.
Looking at the code for this project it had some Python scripts to interact with the Roomba API, but they were pretty tightly coupled the GUI logic for this particular tool. However, the comments mentioned the logic coming from this GitHub project https://github.com/koalazak/dorita980.
There’s 3 interfaces implemented for that script:
- The roomba will respond to a UDP broadcast identifying itself on the network giving you the IP and username.
- After holding down the “dock” button on the roomba it enters a sync mode. In this mode you can connect over TCP to request the password.
- You can connect to a MQTT broker on the Roomba with the credentials gathered above. When you first connect the Roomba will give a dump of its state.
My Roomba is a more basic model that doesn’t do the localization used in the Doomba project, but it gives some values that are worth tracking like it’s WiFi signal strength and its battery percentage.
Creating a Custom checkmk Check
This took a lot more trial and error then it should have for two main reasons. First a lot of the documentation is not up to date with the 2.0 release, so there was a lot of slightly wrong information out there. This is compounded since the only documentation I found that covered the exact thing I was trying to do was even more out of date.
The other issue is that checkmk is extremely customizable and extensible, so it took awhile to figure out how to bolt on the functionality I was trying to achieve.
The checkmk documentation on plugin development describes four main ways to create a service:
- Localcheck - A light way way to extend an agent running on one of the monitored hosts
- Nagios-compatible check plug-in - A even more restricted way to run a script on a agent or the checkmk server. Let’s you include plugins developed for Nagios
- Log Message Parsing - A limited way to generate metrics from logs
- Genuine Checkmk plug-in - A more involved script that can be seemlessly set up and configured through the GUI
This article also describes how there are four kinds of agents that can invoke a check:
- Checkmk Agent - The “normal” agents installed on a PC like system by the install scripts
- Special agent - An agent that runs directly on the checkmk server and remotely queries the host without running on it (using HTTP or another protocol).
- SNMP - Monitors an SNMP compatible host
- Active Check - Similar to special agent, but perhaps limited to Nagios compatible plugins.
This gave a lot of combinations for how I might write a plugin.
My initial thought was that the
Special agent seemed like the best fit for my use case, but there didn’t seem to be any documentation on how to write a plugin that would run like this. I guess I could look at the source of the existing special agents, but I kept looking for a simpler solution.
My next thought was to look at writing an Active check since it’s the other type of check that doesn’t need to run on the host its monitoring. However, I found that nagios plugins are extremely limited. You can only report an enum of whether the service was OK and weren’t able to plot values over time.
Finally, I read over the page on writing Local checks. These don’t really make sense from an organizational perspective because they run on a host you installed an agent onto. However, they give a very simple, very flexible way to pipe values and thresholds into checkmk.
Basically, you can make the agent run an arbitrary script as part of it’s update. The script returns a string over stdout that is interpreted as a series of metrics and warn/error thresholds.
There are a few limitations:
- The script runs on, and will be tracked as the host with the agent installed. This is fine for me.
- You can’t send configuration to the script from the GUI. I got around this by having the script itself contain the configuration.
- The script runs at the same update rate as the rest of the services on the agent. I was able to get around this by having the results be cached. This was an issue since I wanted to decrease how often the Roomba was woken from sleep. The Local checks documentation explains how to do this by putting the script in a folder named with the cache period.
Despite these limitations I was able to achieve my goal of being able to monitor if my Roomba was at a critical battery level, or having WiFi issues:
See Reolink Python API with checkmk for the next device I added.