DEV Community

loading...

Pravega, 2016

Laneone
Hacker with a knack for automation software
・2 min read

Pravega was a data science hackathon conducted in Indian Institute of Science, Bangalore, during January 2016. It was a solely Microsoft sponsored event, they were trying to push this new platform called Azure ML that took care of most of the heavy lifting when it came to Machine Learning and gave a simple user interface that you could use to learn/train data.

We were given a three hour lecture on how to use Machine Learning for preliminary tasks like prediction in case of linear regression and other easy to explain examples. At the end of the lecture, they asked us to find data-sets on http://data.gov.in and use those to show them valuable visualizations. It was an open topic, so you could pick any topic that you were comfortable with and draw conclusions on that topic.

The Azure ML platform took raw data from the user and learnt using the data based on the algorithm you decide. The learnt algorithm could then be exposed to a web service and hosted on the Azure cloud, which can be run at any time. We were supposed to, as part of this hackathon, claim raw data, learn from the data, expose the algorithm to a web service, optionally call the web service from a thin client and display the visualization to the user.

We picked blackouts as our area of interest, we looked at two data-sets one from data.gov.in which was rather useless and one from http://powercuts.in, which wouldn't allow you to download the reports. We later found a cached copy on a repository in GitHub of the same set. It had only around 292 rows of data which I assume was a factor in the prediction of the algorithm.

The intuition was to machine learn the different factors that influenced power cuts in India. A lot of factors such as whether it was planned/unplanned, the geolocation of the powercut, date, time and other factors were given as variables in the data-set. We used Azure ML to machine learn the report using the Two-Class Boosted Decision Tree algorithm. So the input data-set was divided into 2 parts, 70% and 30% respectively, 70% is used to train the model and the algorithm, and 30% is used to test the algorithm and get an accuracy score which can be used to compare how good the algorithm is working. We got an accuracy of 99.7% and that seemed a bit too good to be true at the time. We planned to expose this to a web service that listened for a timestamp and a location and given the two variables, it would tell you whether there could be a power blackout expected in the next 8 hours.

We did not win anything at this hackathon, but it was an amazing experience, the judges and the mentors we were given throughout the hackathon were extremely helpful and resourceful. I loved the time I spent at this hackathon and learnt a significant amount about machine learning and the Azure ML platform.

Discussion (0)