andrew makes things

a blog about programming, machine learning, and whatever else I'm working on

Adding RSS Feeds to Any Site With Huginn

| Comments

This is my third post about Huginn, a tool that I’ve been working on with the generous support of other open source collaborators. Huginn is a light-weight platform for building data-gathering and data-reacting tasks for everyday life. Think of it as an open source Yahoo! Pipes, IFTTT, or Zapier.

In this post I will show you how to create an RSS feed for a website that doesn’t have one, using Huginn.

Problem: Many sites don’t have RSS feeds.
Solution: Let Huginn watch the site for changes and build a feed for you.

Know When the World Changes– With Huginn

| Comments

This is my second post about Huginn, a tool that I’ve been working on with the generous support of other open source collaborators. Huginn is a light-weight platform for building data-gathering and data-reacting tasks for everyday life. Think of it as an open source Yahoo! Pipes, IFTTT, or Zapier.

In this post I will show you how to setup standing alerts about the world; basically, your Huginn will be able to answer arbitrary requests like “Tell me when the date of the next Superbowl is announced”, “Tell me when we discover gravity waves”, or “Tell me when there is a tsunami headed toward San Francisco”.

Problem: I often think of events that I’d like to be alerted of, but frequently miss them in the news.
Solution: Let Huginn watch the news (via Twitter) and alert you when there are spikes of interest around topics that you care about.

Parsing Bash in JavaScript in Chrome With Browserify

| Comments

For a side project, I wanted to be able to use js-shell-parse to parse complex Bash commands in JavaScript, in a Chrome extension. (More on this craziness in a future post!)

The js-shell-parse library is targeted at node, and it makes frequent use of require and of various npm packages. Being a node noob, I hadn’t used Browserify before, but it turned out to be exactly what I needed: a tool to bundle complex dependency chains of node packages for the browser. Here are the steps to convert js-shell-parse into a single, compiled bundle:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Clone the repo
git clone https://github.com/grncdr/js-shell-parse.git

# Install the various npm dependencies and browserify (you may need to use sudo)
npm install -g pegjs pegjs-override-action isarray array-map browserify

# Run the included build script
node build.js > js-shell-parse.js

# Create a very simple loader script called 'loader.js' that contains one line
echo "window.jsShellParse = require('./js-shell-parse');" > loader.js

# Run browserify on the loader, which will parse the AST and bundle all dependencies
browserify loader.js -o compiled-js-shell-parse.js

Finally, you can include the output in any website!

1
<script src="compiled-js-shell-parse.js"></script>

and try this in the console:

1
2
3
var structure = jsShellParse('echo "The date is: `date`" > output');
console.log(JSON.stringify(structure));
"[{"type":"command","command":{"type":"literal","value":"echo"},"args":[{"type":"concatenation","pieces":[{"type":"literal","value":"The date is: "},{"type":"command-substitution","commands":[{"type":"command","command":{"type":"literal","value":"date"},"args":[],"redirects":[],"env":{},"control":";","next":null}]}]}],"redirects":[{"type":"redirect-fd","fd":1,"op":">","filename":{"type":"literal","value":"output"}}],"env":{},"control":";","next":null}]"

If you’re playing with js-shell-parse, the tests are helpful to see what kinds of shell/bash commands it can parse. (Pretty much everything!)

Never Forget Your Umbrella Again, With Huginn

| Comments

Huginn is a tool that I’ve been working on, with the support of generous open source collaborators, for about a year. Huginn is a light-weight infrastructure for building data-gathering and data-reacting tasks for your everyday life. Think of it as an open source Yahoo! Pipes, IFTTT, or Zapier. It wouldn’t surprise me if, in the future, Huginn evolves into something like Google Now, but without the creepiness factor, because you control and host your own data.

I haven’t done a very good job of sharing all of the things that can be built with Huginn, but I’m resolved to start.

So, in this post, a very simple example:

Problem: I always forget to check the weather, leave my umbrella at home, and get soaked.
Solution: Let Huginn send you an email (or SMS) each morning when rain is expected in your area.

Archive a PDF of Your Posterous Blog

| Comments

My wife and I had a private travel blog on Posterous. Unfortunately, Posterous got aquihired by Twitter and is shutting down, so I spent a few minutes figuring out how to save a PDF of the blog. Chrome and Firefox did pretty poorly at saving a decent looking PDF for my long blog so I installed wkhtmltopdf and made one myself. Here’s how.

Install wkhtmltopdf, then run:

wkhtmltopdf --enable-plugins --margin-bottom 0 --margin-top 0 --margin-left 0 --margin-right 0 "http://yourblog.com" archive.pdf

If your blog has a password, log in with a browser and copy the cookie. Firefox makes this easy in the Developer Toolbar by entering cookie list.

wkhtmltopdf --cookie cookiename cookievalue ...

Finally, if you’re on a Mac and used the pre-built wkhtmltopdf disk image, you can still use the command line.

/path/to/wkhtmltopdf.app/Contents/MacOS/wkhtmltopdf ...

Command Line Accounting With Ledger and Reckon, an Update

| Comments

I’ve been using ledger, combined with a custom Ruby gem called reckon, to balance my small business’s accounts for the last few years. The command line, Bayesian statistics, and Double Entry Accounting! What could be better? Here’s how I do it.

First, I export the year’s transaction history from Chase (in my case) and save it as a CSV file called chase-2012.csv. It looks something like this:

Type,Post Date,Description,Amount
DEBIT,12/31/2012,"ODESK***BAL-27DEC12 650-12345 CA           12/28",-123.45
DEBIT,12/24/2012,"ODESK***BAL-20DEC12 650-12345 CA           12/21",-123.45
DEBIT,12/24/2012,"GH *GITHUB.COM     FP 12345 CA        12/23",-12.00

Then, I make a new ledger file called 2012.dat and start it with:

2012/01/01 * Checking
    Assets:Bank:Checking            $10,000.00
    Equity:Opening Balances

Where the $10,000.00 is the hypothetical starting balance of my bank account on the first day of 2012. Since I’ve been using ledger, this is just the balance of the account from the summary that I generated at the end of 2011.

Now, I run reckon, initially with the -p option to see its analysis of the CSV file:

> reckon -f chase-2012.csv -v -p --contains-header

What is the account name of this bank account in Ledger? |Assets:Bank:Checking| 
I didn't find a high-likelyhood money column, but I'm taking my best guess with column 4.
+------------+------------+----------------------------------------------------------+
| Date       | Amount     | Description                                              |
+------------+------------+----------------------------------------------------------+
| ...        | ...        | ...                                                      |
| 2012/12/24 | -$12.00    | DEBIT; GH *GITHUB.COM FP 12345 CA 12/23                  |
| 2012/12/24 | -$123.45   | DEBIT; ODESK***BAL-20DEC12 650-12345 CA 12/21            |
| 2012/12/31 | -$123.45   | DEBIT; ODESK***BAL-27DEC12 650-12345 CA 12/28            |
+------------+------------+----------------------------------------------------------+

It looks like reckon has guessed the correct columns from the CSV, so now I run it in “learning” mode. It loads in my data from 2011 and uses it to guess at labels for my 2012 data, using a simple Naive Bayes classifier.

> reckon -f chase-2012.csv -v -o 2012.dat -l 2011.dat --contains-header

...
+------------+---------+------------------------------------------------+
| 2012/12/24 | -$12.00 | DEBIT; GH *GITHUB.COM FP 12345 CA 12/23        |
+------------+---------+------------------------------------------------+
To which account did this money go? ([account]/[q]uit/[s]kip) |Expenses:Web Hosting:Github| 
+------------+----------+-------------------------------------------------+
| 2012/12/24 | -$123.45 | DEBIT; ODESK***BAL-20DEC12 650-12345 CA 12/21   |
+------------+----------+-------------------------------------------------+
To which account did this money go? ([account]/[q]uit/[s]kip) |Expenses:Programming| 
+------------+----------+-------------------------------------------------+
| 2012/12/31 | -$123.45 | DEBIT; ODESK***BAL-27DEC12 650-12345 CA 12/28   |
+------------+----------+-------------------------------------------------+
To which account did this money go? ([account]/[q]uit/[s]kip) |Expenses:Programming|

In each of these cases, the Bayesian classifier correctly guessed the appropriate label for these expenses based on last year’s data.

Now, with a fully-updated 2012.dat file, I run it through ledger and get the following hypothetical results:

> ledger -f 2012.dat -s bal

    $20,000   Assets:Bank:Checking
   $-10,000   Equity:Opening Balances
    $258.90   Expenses
    $246.90     Programming
     $12.00     Web Hosting
     $12.00       Github
   $-10,258.90  Income
   $-10,258.90    Some source of income that makes this math work

I like to do all of this work inside of my Dropbox folder in case I delete or overwrite a file by mistake.

Want to do all of this yourself? Start by visiting ledger-cli.org, or by installing ledger with homebrew:

> brew install ledger

Then install reckon:

> (sudo) gem install reckon

Have fun!

Reckon is available on GitHub, and, for updates, you should follow me on Twitter.