I’ve been having fun today sitting on a heating pad for a bad lower back and learning a bit of statistics, python, clojure and Incanter by porting Toby Segaram’s “Collective Intelligence” to clojure. My code is on github.
I just thought I’d share a few of my initial thoughts after today’s futzing around.
1. I like the clojure set functions. It feels much clearer to me than iterating a set of keys to find a match. I would often write a helper function to do that or comment for clarity, but with clojure it is just (intersection coll1 coll2).
2. let is a wonderful thing when you have to string a lot of formulae together (or even when it isn’t a lot yet). It would look almost like perl if I did it all inline.
3. I love, love, love the combination of a good repl with easy to use unit tests. Explore in one and codify your understanding in the other (I’ll let you guess which way around that works).
4. I got caught out by operator precedence translating from the python to clojure. Even in Java I try to make the order of operations explicit. It was easy to see what was wrong once I found it, but it took me ages to actually find the bug I had.
5. I love being able to do a sum of products by doing (sum (map * ratings1 ratings2)) or (reduce + (map * ratings1 ratings2)). You can just keep adding collections to the map and it will keep multiplying.
6. I’m having fun. I’m enjoying teasing out the python code. I like how doing the translation forces me properly read the python and force me to think about how I’d do it in clojure.
Good work Bruce. Nice idea about porting the examples to Clojure. You’ve reminded me I need to write-up by experiences of Clojure so far.
Have you seen http://i-need-closures.blogspot.com/ ? Somebody did some conversion work in common lisp a while ago. Maybe it’ll be interesting to take some cribs from there? Although, IIRC he worked in a very imperative style.
That looks interesting and reminds me of some of the syntactic sugar I like in clojure (especially how he puts the data together). Having clojure.set makes things really clear for me too (though I understand that it makes it clear for *me* and not the world).
Our Pearson Correlations look very similar. I’m glad I wasn’t the only one who thought it needed a gigantic let. He also has a divide by 0 error in his Pearson which I test for if there are no common films. I might still steal some ideas though. Thanks for the pointer.