A new year with new resolutions! But I’ll be honest, new years resolutions never really did it for me. I kinda set mini-goals throughout the year so I usually don’t need to make drastic changes on January 1st. But, to keep in the spirit of things, I’ll tie in what I’ve been doing since December and call them my resolutions.
Using a predetermined universe of companies, the app allows you to compare company fundamental data based on certain accounting figures. Eventually, this is summarized into one number for each of the 5 categories (listed below) as well as an overall figure. A higher number is better, except for those values below with an *. Each category consists of multiple figures:
A programmer will always aim to take out human repetition. Usually, this means a small investment in time leads to a much larger windfall later when the magic happens with the click of a button (or more likely a command sent to the shell). Sometimes, however, the opposite happens.
Even with a cursory introduction to “Big Data”, you will likely see some mention of MapReduce. It provides the framework to Hadoop which in turn has been used (with some modifications) by many applications such as Facebook, Google, Twitter, and Yahoo. MapReduce is a programming model that actually abstracts the complexities of parallel computing and allows users to tap into the power without having to get into the low level architecture. I covered this in my Data Science class and was surprised at how easy it was to pick up. So I copied what they did in Python and created a MapReduce emulator in R.
I had actually worked on this interactive digit recognizer a while ago. I planned to make it more comprehensive, but will do that later. The basic premise is to have a canvas for the user to write a number between 0 to 9. Then run an algorithm to guess which digit was written. This is a very common problem and has been tackled thoroughly before (For examples, see post offices, license plate reading, Google books). In fact, current methods have over 99% accuracy (but significantly lower on noisy data) when dealing with various inputs. I started with the basics and have a few ideas of my own on where I want to use it next.
I just hopped on the CouchSurfing.org bandwagon (yea, a couple of years late) and it is a truly fascinating experiment. For those unfamiliar with it, the original premise was to allow travelers to stay with locals and get a more personalized experience when exploring the world. That was then. Recently, it has expanded rapidly with 6 million members, secured $15M in Series B funding, and has become the go-to couch-surfing service. A member can use it to host their couch or find places to crash around the world. They also have a (messy) stream that allows users to ask for tips and plan meetups. I signed up for it because I was bored. I want to meet more interesting people and share experiences with them. Plus, if I travel in the future, it’ll be awesome to know locals around the globe.
For this project, I decided to look at giving data from around the world. I find most fascinating the psychology and sociology behind charity, but I can only infer on that given that I don’t have the means to test various ideas. Most numbers given by organizations are dispersed among many sites making it hard to collect. But, luckily, Gallup conducts a survey that provides some numbers so we can entertain such a discussion.
I just turned in Assignment 2 for my Data Analysis Course so I can now share it on
here (Unfortunately, I’ve been warned that people have been plagiarizing so I’ve removed my files to prevent cheating… which ironically I did not list as a challenge for a MOOC below, but should be added). In this assignment, we were given sensor data from the Samsung Galaxy SII recorded while users performed specific activities. The goal was to develop a model on some training data to predict what activity the test subjects are performing. As usual, I wish I had more time to spend on it because I always feel like there is more I can add. Using random forests, I got a misclassification error rate of about 5% on the test subjects. Not too shabby, but at some point I would like to compare it to other models such as SVMs or Neural networks. Continue reading
47 years. That is the life expectancy of Sierra Leone, the lowest of all countries. Compare that with the highest life expectancy of 83 years in San Marino. Its easy to brush these numbers aside as another dull statistic, but that is a 36 year difference. I have yet to even experience that length of time! Imagine being told, that your life will end 36 years earlier than expected. It’s not very easy for most of us to comprehend. Continue reading