All posts by Neil

NumerousApp libraries updated

Newest versions of my NumerousApp API bindings for python and ruby are released. Now includes support for the new fine-grained permissions server APIs.

PYTHON details:

RUBY details:

NumerousApp overview: www.numerousapp.com

API: http://docs.numerous.apiary.io/

Friday the 13th and NumerousApp

I made a new NumerousApp metric that will tell you when the next Friday the 13th is: http://n.numerousapp.com/m/1w4c4tk4uh29j

Nerd info: I wrote this brute-force python program to find the next Friday the 13th:

from datetime import datetime, date

dt = datetime.now()
while dt.day != 13 or dt.weekday() != 4:
    dt = date.fromordinal(dt.toordinal()+1)

It gets today via  datetime.now()and loops until it is a Friday and a 13th. The weekday() method returns 0 .. 6 with Monday as 0; hence Friday is 4. If “dt” isn’t a Friday the 13th, I just bump it to the next day. The toordinal() method converts a date into a number of days since some arbitrary long-ago time. So I do that, add one, and convert back to a date to get the next day.

I didn’t optimize this algorithm because it just doesn’t matter. I could have added code to compute exactly how many days to add to get to the next 13th and then just test if that is a Friday. Even more generally we could write an algorithm that would directly compute when the next Friday the 13th is (has to take month lengths and leap years into account).

But: We know that at most we’ll have to iterate no more than a few hundred times because every year has at least one Friday the 13th (proving this is left as an exercise for the reader). Speed just doesn’t matter here at all and this works and it is simple.

Once a night I run the above code and use my python NumerousApp API class to update the metric. This too could have been optimized; there’s no reason to run the code at all except on a 13th (indeed, except on a Friday the 13th) but it was just easier to stick it into my nightly updates I do for my other metrics.

Newest versions of NumerousApp API class libraries

As I wrote earlier, I have created Python and Ruby class libraries for the NumerousApp APIs. I’ve recently made a bunch of updates; the newest versions are:

And, as always, github repos: https://github.com/outofmbufs/

Ignore / Enjoy.

Arduino Data Monitor Program

Finally got around to uploading (to github) and documenting my Arduino program for monitoring analog inputs. It’s pretty cool; it:

  • Keeps a ring-buffer of readings and allows you to see what your monitored input has been doing over time.
  • Implements a web server – you can access the readings with your web browser.
  • Has a JSON interface
  • Can be used to provide network-accessible analog input readings.

Documentation: https://github.com/outofmbufs/arduino-datamon/wiki and of course source code on github too.

I’m using it with a simple circuit: a pull-up resistor (one side attached to +5V) and a photo-resistor (one side attached to the pull-up, the other to ground). This is a simple voltage divider and the voltage you will read at the midpoint will depend on how much light is hitting the sensors. I made four of these to monitor several different indicator lights on equipment in my house, as well as the basement area lights. I determined the appropriate resistor values experimentally; the behavior of the photo-resistor depends quite a bit on how much light hits it and thus how much of a difference it sees between “on” and “off”.

I hooked four of these simple circuits up to my Arduino and I use this data monitor code to record the readings. I can also get real-time readings. For example, right now I can tell that my basement area lights are on (I know there are workers doing maintenance today) because when I surf to http://monitor/v2 I get:

pin 2 value 401 @ 454619317

(“monitor” is the hostname of my arduino on my network). Granted, to interpret this I had to know that pin 2 (the “2” in the “v2″ part of the URL) is the basement light sensor pin and I had to know that values below about 800 means the basement lights are on. (pin 2 is connected to the photo-resistor monitoring basement lighting). Of course I’ve also written a small status program that shows me this in English. If I surf that web page (which runs a CGI script that queries the monitor and returns human-readable results):

Lights:

Basement: ON for 3.2 hours
Server Room: OFF for 5.3 days

Code for the arduino monitor on github as already mentioned: https://github.com/outofmbufs/arduino-datamon/

Simple Screen Scraping

I’ve integrated a lot of data sources in my NumerousApp metrics and some of them come from screen scraping web pages. I’ve stumbled into a fairly general way to do this so I’m writing it up here.

First off, make sure the web site you are scraping from allows it. The example I’ll use here is scraping data off the LCRA “How full are our lakes” site: http://www.lcra.org/water/Pages/default.aspx

This is a public-data site and we are allowed to scrape it. I’ll show how to get that “lake level” percentage (34% at the time I’m writing this) off the site in a way that works for a lot of other sites as well.

First off we need to inspect the HTML surrounding the number we want. I use Chrome which has a very easy way to do this:

LCRA screen scrape example

In this screen grab I have right-clicked on the highlighted “34%” and selected “Inspect Element” from the pop-up menu. Down below we see the crucial thing we are looking for: a “div” tag with a class identifier. This is the key. It can be a div tag, a span tag, or anything that has either a “class” or (even better) an “id” tag associated with it.

Because of the tools people use to create web sites like this, and sometimes because they specifically create the site with scraping in mind, the number part of the display (the “34%” in this case) will often be contained within an HTML element with its own tag with a unique identifier. If you’ve lucked into a site that meets this criteria you can screen-scrape it very simply without parsing much of the structure of the site at all. Here’s how I do it with python using the Requests library (for http) and BeautifulSoup (for HTML parsing):

import requests ; import bs4

selectThis = "[class=lcraHowFull-Percentage]"
q = requests.get("http://www.lcra.org/water/Pages/default.aspx")
soup = bs4.BeautifulSoup(q.text)
items = soup.select(selectThis)

v = None
for s in items[0].stripped_strings:
    for c in "%$,":
        s = s.replace(c, "")
    try:
        v = float(s)
        break
    except ValueError:
        pass

print(v)

For the “selectThis” value we can use any CSS selector syntax; in this case the LCRA conveniently tagged the item with class “lcraHowFull-Percentage”

The code simply takes every string found under that search criteria, strips out some “noise” characters that often adorn numbers, and tries to convert a floating point number.

[ as an aside, there are faster ways to strip those characters out than the loop I wrote but unfortunately the string translation functions changed from python to python3; I wrote the code this way to work unchanged under either version of python ]

The first thing that successfully converts to a number is (we hope) the data we want. Easy!

One advantage of this is that we’ve completely ignored all the formatting, layout, and structure of the web page. We found a number, wherever it happened to be, that was tagged with the identifier we were looking for. This is both the strength and weakness of this technique. It’s probably robust against future changes in the web site formatting (assuming they don’t change the identifier). But it’s also just blindly accepting that “whatever the first number we find under that identifier is the one we are looking for”. It’s up to you whether this is “good enough” as far as scraping goes.

Of course my real code contains more error checking and an argparse section (so I can supply command arguments for the URL and the selector) and so forth. But the above code works as-is as a simple example.

Sometimes you have to do more work with the “soup” and explicitly parse/traverse the tree to find the data you want. But so far I’ve found that this simple outline works in a surprisingly wide variety of cases. All you have to do is find the right CSS selector to pluck the right class or id descriptor and off you go.