I made a tiny mac app that checks GitHub Status. It lives in your status items menu, goes orange for a minor outage, goes red for major outage, and will stay deliciously black if GitHub is all systems go. Clicking it will show you the current status message, if any. You can get it here.
We are the music makers, and we are the dreamers of dreams. Since the dawn of consciousness, humanity has strived to reach those faint glimmers of light in the nights sky, to satisfy our curiosity, to know what we can find in the great encompassing dark. Few are the men and women that have taken the reins, and dared to go to those far away places that we can only dream of.
History is largely told as a chronicle of great people doing great things, but for most of us life is not made up of big moments, it’s made of small moments. When Neil Armstrong uttered those all too famous words, and took his giant leap we all took part in a big moment, we all felt the pride in what we had achieved, finally dipping our toes into the ocean of the universe.
It is Neil deGrasse Tyson who said that “we stopped dreaming”, that the end of the space exploration programme has taken something away from us. With the passing of the first man on the moon, I hope that we can return to dreaming, and maybe look toward tomorrow.
So today marks the last day of my internship at Opposable Games, where I’ve spent the summer hacking on stuff for their iOS based games. I’ve learnt an awful lot about how to develop for iOS and had a really great time doing it.
I have a relatively web-based development background, spending most of my time working in scripting languages, working with databases etcetera. It’s been refreshing to have a change of pace to high-performance low level programming, although a certain amount of pain has been involved with this.
I’ve been working with the Cocos2d library which is very good for rapid game development, but as newer versions of Xcode were released a small number of bugs seemed to creep into the project, given that few other things were changing my guess is that the library hasn’t been updated to work with the newer version of the compilers or something of that ilk. Either way I’d very much recommend anyone who wants to do iOS game development takes a look at Cocos.
Speaking of Xcode I have to say that I’m not a fan. It’s crashed on me many times, run slowly and forced me to reboot both my mac and my iOS devices to get it to work. It’s a shame that it’s the only development platform for iOS because as an IDE I have to say that I’m not impressed.
Another problem with working in a native programming language (Objective-c) is that operations are memory unsafe and there’s no garbage collector. As much as that’s great from a performance standpoint, when you’re used to programming with the “safeties” on, it can be a weird phase shift to stop using them. Manual memory management is always going to be more hit and miss than having a garbage collector, but for the most part I think we’ve done a pretty ok job in this department.
One of the games we’re working on requires a more or less completely real time sound engine to work. I was tasked with building the prototype that plays the sounds in (as close to possible) perfect sync. This lead to me learning about the lowest level of core audio, which I would describe as being like a jet fighter. It’s insanely powerful, but get anything wrong and you’re going to crash and burn horribly (in most cases getting an earful of audio noise for your trouble)
If you’re a programmer interested in getting into the games industry I’d very much advise you to do a short internship with a company like Opposable where you can learn what it’s like to build the sorts of technologies required for modern video games, because it’s nothing like any other type of system you’re ever going to build, you’ll learn a loads, and probably have a really great time doing it.
For my part I’ve had a great time this summer and am now looking forward to a well earned break before heading back to university for my final year.
Last weekend I participated in the Data Science London group’s hackathon. The challenge was to take some data provided by EMI and use it to build a recommender system that could predict how much a user would like a track based on previous ratings, demographic data and some interview responses.
When I arrived at the event I grouped up with some guys from a company called Musicmetric. The team then eventually split into two groups, a guy called Ben and I worked on the recommender system problem. The rest of the Musicmetric team started working on building visualisations with the data.
The hackathon officially started at 1pm on Saturday London time, and went on until 1pm the next day. I was one of a small group of people that survived the entire 24 hours, with most of the participants going home late on Saturday evening/early on Sunday morning. Food was provided which was excellent, and this allowed us to focus entirely on the problem. As a tea drinker I was slightly disappointed by the quality of the tea, but everything else was really good.
The hackathon took place in The HUB Westminster, which is a really nice work space. It is light and airy and there were even some rooms left intentionally dark for crashing in (I slept on a beanbag for about 2 hours, and would recommend that if you go to a future hackathon you take a thermarest/camping mattress).
The problem was hosted on the Kaggle platform, which provides training and test data and takes classified test data and evaluates it behind the scenes, giving you an output score. You can see the scores of all the other participants, and within seconds of the competition starting a solution had been posted that was very good. This was probably due to the data set being released before the competition started, and someone training a really strong classifier ahead of time, testing it in cross validation and then running their solution against the data and submitting. The evaluation criteria for the problem was RMSE which means we have to focus on minimising the overall distance between our solutions and the correct answer, as opposed to the number of instances we get correct.
Our first solution to the problem was to apply simple collaborative filtering to the problem, this seemed like an obvious approach because we’re trying to build a recommendation system given a bunch of input (user,item,rating) triples and a bunch of user,item pairs to predict. The RMSE of this approach in cross validation was about 22 (out of 100) with a result of roughly 18 on the actual test set.
We were given a lot of demographic information for each of the users, and it seemed to make sense to attempt to break our approach down by demographic bins. Trying various combinations of the demographic information we were given, however, yielded no gain in cross validation or against the actual testing data.
After racking our brains for a while we came up with the idea of using a random forest ensemble method to solve the problem: shoving all the demographic, interview response and other data in and having the forest classify in a brute force manor. This solution was implemented with roughly 2 hours to go until the end of the competition. Knowing we did not have long to run our solutions we started with a very rough and ready approach and jumped several places in the rankings. Excited we started running a number of different random forest solutions with different parameters to try and find which parameter gave us the best jump. After determining that tree height was going to give the best results we set two classifiers runnning with different tree heights on each of our laptops.
They both finished and we submitted them with a minute and 20 seconds to go until the end of the competition. We jumped all the way to third place, which was really exciting. The person who won the competition used the exact same approach as we did, but had been running it since the start of the competition which suggests that we may well have been able to win if we had more time to fiddle with the solution parameters.
Thoughts about the data
The data we were provided with by EMI contained a lot of information. We found, however, that the demographic information did not improve our classification accuracy at all. There are a couple of conclusions we could draw from this. The first is that music taste is not effected by age, gender, region or any of the other information we were provided with. I’m not sure I believe that 94 year old males have the same listening tastes as 16 year old females, so I’m going to reject this conclusion.
The more likely conclusion is that there wasn’t enough data provided for demographic information to help. Every time you split by demographic you reduce the size of your training and validation sets. This means that the accuracy of the individual classifiers are reduced, and as such the accuracy of the overall classifier of all the bins is also reduced. Given a couple of order of magnitudes more data it might well have been the case that we were able to produce an accurate classifier based on demographic information.
I had a great time at the Data Science Hackathon, I would very much like to participate in another one in the future. There were prizes, free t-shirts, free good food and really excellent people who understand a lot more about machine learning and data mining than I do. I’m really really glad that I went. I’d like to give a special shout out to Ben for being an awesome teammate, Greg for being supportive overnight when I began to burn out and Carlos for running things and just being a generally awesome dude.
TL;DR: Opposable Games good, Sam learns things, program games for iOS not Android
So this week I started my summer internship at Opposable Games. Opposable is a small independent game development team in Bristol. I’m really enjoying working with the team, spending 3 days a week in the office with the other programmers and games designers. The week started with a meeting. In the meeting progress on current projects was discussed and it was decided what progress was to be made this week. Whilst I didn’t have much to contribute in the first meeting it was good to see exactly what the rest of the team was working on and getting to know the people I’m working with.
After the meeting I dove head first into building an iOS game from the ground up. Ben suggested I use the Cocos2d framework to build the game. Having never coded for iOS before the progress at the beginning of the week was quite slow. I tried to focus mainly on learning how to write code, and use the framework. We’re now at the end of the week and I feel that I’ve made excellent progress with the project, having integrated graphics, physics, networking and sound into the project I am working on. There are a number of features that are yet to be implemented which don’t require crazy technical gymnastics, but will require lots of building to polish.
Working with Opposable so far has been great fun, I’ve learnt a lot, and I’ve also been asked for creative input on the project I’m working on and others. I really enjoy writing code and seeing results come out the other end, and interfacing with the projects that the group is building has been really satisfying. Particularly they have a controller system that connects via a network and my iOS program has had to integrate with that. For this I needed to learn socket programming which is incredibly complex, but it was very satisfying to see the working result.
Whilst I am only an intern, there have been times when my technical knowledge has been called upon to help the other programmers in the team, particularly related to the use of the Git revision control system. We’re also continually discussing which technologies I am familiar with to see if there’s a good place where I can deploy my expertise.
Week 1 status: happy, learned new things, tired.
I often build scripts that need to have some kind of network persistance layer or tiny web services that munge files or json or whatever. When I have to do this I don’t immediately reach for rails, or any of these other super heavyweight frameworks. The reason for this being that I don’t need all the extra super powers those frameworks come with, and I can instead deal with a little more of the manual stuff because I’m not going to be spending much time doing any of that anyway. This article will try to serve as a guide to setting up tiny python projects on heroku. Using the Notely Server as an example.
Set up your project
Make sure you’ve got the heroku gem,foreman and venv installed and run the following commands
#create diriectory mkdir app_name && cd app_name #create a virtual python environment that won't screw with your global one virtualenv venv --distribute #use python environment and install dependancies source venv/bin/activate pip install flask pip install psycopg2 #create a base app.py file wget http://samphippen.com/app.py -O app.py #create files necessary for heroku to run pip freeze > requirements.txt echo "web: python app.py" > Procfile #add everything into git wget http://samphippen.com/pyapp.gitignore -O .gitignore git init git add . git commit -m "Initial commit" #setup heroku heroku create --stack cedar heroku addons:add shared-database #push to heroku and open in a browser git push heroku master heroku open
In app.py, you can see a route that matches “/” and returns the text ‘Hello World!’.
This is the base point for our app, use the Flask docs to
change something, run the server with
forman start and see what it’s doing locally
before pushing back to heroku
When you ran the giant blob of commands up there, you added a database to heroku using postgres. You can interface with this database by using a psycopg connection. To create one you can use the following python snippet
username = os.environ["DATABASE_URL"].split(":").replace("//","") password = os.environ["DATABASE_URL"].split(":").split("@") host = os.environ["DATABASE_URL"].split(":").split("@").split("/") dbname = os.environ["DATABASE_URL"].split(":").split("@").split("/") conn = psycopg2.connect(dbname=dbname, user=username, password=password, host=host)
once you’ve got a database connection you can query it using Psycopg’s interface .
This is, I’m pretty sure, the fastest way to get from nothing to a running web service with a database that you can use to build stuff in existence at the moment. For me it’s been incredibly useful to be able to throw these services up. I wouldn’t have been able to do that with heroku.
Let me know if you’ve done something cool with this by mailing me
Redis has a bunch of datatypes: Strings, Hashes, Lists, Sets and Sorted Sets. Firstly, you’ll note that there’s a lack of the integer data type, but redis has an INCR command. This command operates on redis’s string data type, and if that string is actually an integer, that integer will be atomically incremented. Whilst I know that you can store integers (and floats) in strings, it doesn’t seem to me to be a good way of storing these commonly used data types. Additionally if you’re using a redis binding and you do something like this:
>>> redis.set("my_key",0) True >>> redis.get("my_key") '0'
The binding has no way of figuring out if the data it gets back should be: an integer, or the string “0”. This means that any code one writes where integer values for keys are set, one has to add extra code when one pulls the data out of redis, so that the data can be treated as integer values. Alternative key/value stores and databases have had the ability to store values in integer data types for long time. (redis also does the same thing with strings for booleans and nulls)
You’d think that with redis’s more advanced data structures (hashes and lists for example), you’d be able to do some nesting, so that for example you could have a list of hashes. Unfortunately this is not the case. When we were working with redis we spent a little while trying to come up with a solution and we came up with two alternatives
A list of json strings: redis’s list structure can only store strings (or intish strings), so we nested our data structures using json strings. This meant that when we took items out of the list they had to be json parsed and json encoded. This wasn’t too much of a pain, but it wasn’t particularly elegant.
Make keys heirarchical: For student robotics we decided that we’d namespace our keys in the same way, prefixed with “org.srobo”. For our a list of teams we had keys of the form “org.srobo.teams.n.thing” where n was the team number. This meant that we could nest our data structures by using a tree of variables, storing things in some nodes and nothing in others.
Of these solutions I tend to prefer the first one. Whilst it’s slightly more horrible it does mean that all your data is conceptually stored in one place in redis. Redis makes no distinction between keys, so there’s nothing in redis that allows it to interact directly with our structured heirarchy, instead that was dealt with in python scripts.
Redis has a publish subscribe mechanism which is extremely useful. The basic idea being that you can subscribe to or publish on a “channel”. There isn’t anything that particularly relates the data you’ve got stored in redis to the way output occurs on any given channel, in fact you could not store any data in redis and just use it as a publish subscribe mechanism. I can think of many strategies for combining uses of variables and keys, but for our project we came up with a pretty good solution.
In our solution we use the redis command monitor which sends an update any redis command is executed, we then read the output of that and any time a variable is modified we publish a message on a channel with the same name as the variable letting any subscribed programs know that that variable has been updated. We don’t publish the value but just the fact that an update has occured.
Redis is a very cool piece of technology, and I think it’s definitely worth having a play around with. We used it for a production system over the weekend with about 20 updates a second and it seemed to work fairly stably. I’m not convinced I prefer the system over SQL or other key/value stores (like MongoDB), but I’ve met people who use it in production, and they all say that they love it.
So yesterday (trust me, it was yesterday from my point of view) I created an application called notely (github). I’ve just finished creating the sync component of the notely software. You can now “pair” your notely instance with another notely instance and sync it by typing “notely sync”. The notely server is a tiny python app which I plan to blog a little about the construction of at some time in the future. It’s hosted on heroku because heroku’s cool. You can get the source here: github
edit 23:36:10 UTC+1: there was an error which has now been fixed.
I’m at an airport with nothing but a laptop and wifi, so I built a little command line utility to allow me to quickly save small text notes for myself. I’ll probably extend this to have note sections. The tool is called notely and it has a really simple command line interface. I mostly made this because I often want to have a list of a few text items for things like to-do lists or to leave reminders for myself. I think something super fast like this is exactly what I need. Todo: sync and a webpage that makes the data available on tablets/phones/whatever. The code’s available here
London is a great place for tourists, and people often talk about the “hottest” places in London. I wanted to build some way to visualise this, and I’m really happy with the results.
By mining the 4square api I was able to determine where people were within London, places like coffee shops, tourist attractions, clubs and gig venues are all unified into 4square. By getting a map from Open Street Map and overlaying the data from 4square using a simple lighting equation I got really nice results.