loading...

RapidMiner wannabe: EDA Miner

kmouratidis profile image Konstantinos Mouratidis ・2 min read

During my MSc in Data Science I really hated how we got taught closed-source tools like SAP Hana and RapidMiner (partly open), and I found Weka to be pretty old and ugly. Innocent me said: "How hard can it be to create one?"

githubsdp ? I used TravisCI for pipelines and private repos from pro (it wasn't free then) for untested features / parts of the codebase (like a dev branch). It also helped with integrations with other tools. I also used namecheap, gitkraken, pycharm.

My Final Project

A web-based visualization and analytics dashboard that is able to connect to APIs, receive your data (including datatype inference), and allow you to define, train, export and (almost) productionize Machine Learning models.

Link to Code

GitHub logo KMouratidis / EDA_miner

Swiss army knife, but for visualization, analytics, and machine learning. View docs here: http://edaminer.com/docs/ and a demo (don't abuse) here: http://edaminer.com/

EDA_miner

Contributor Covenant Language grade: JavaScript Language grade: Python Build Status

A visualization and analytics dashboard that is able to connect to APIs, receive your data and allow you to run Machine Learning models from a server. Started as a university project, and will be deployed in their servers probably later this year Also being worked on together with university staff for an E.U.-sponsored project.

Want to contribute? Take a moment to review the style and contributor guidelines

Want to chat? Join us on

Just looking around? Then you can either install locally or with docker.

Locally:

  1. Get Python3.6+, optionally with Anaconda. You might want to set up a virtual environment
  2. Download (either via git clone https://github.com/KMouratidis/EDA_miner or as a zip)
  3. You'll need redis (if on Windows, you might also need this) and graphviz (for pygraphviz)
  4. Run pip install -r requirements.txt.
  5. Navigate to the /EDA_miner folder.
  6. Create an env.py file with your credentials, according to the given template…

How I built it

It took me ~3 months for most of the code and another ~3 months for improvements and additional features. It uses Redis and SQLite as databases, lots of Flask (and extensions) and Dash for the interface, sklearn for modeling, and plotly for visualization. Docker for deployment. There are lots on info in the huge contributor file.

I learnt more in this project than in the past 2 years of intense self-studying, mainly due to looking up things in depth: from CI/CD stuff (Travis, cron for autodeploys, git) to networking and servers (configuring nginx, port forwarding, domain stuff), from NoSQL databases to mailing protocols, metaprogramming and advanced object-oriented patterns (I rediscovered the Django wheel). Also got a bit into unit-testing, Docker, bash.

I stole lots of stuff from RapidMiner, Orange, Power BI, a SAP algorithm, Plotly, etc. Sadly, I'm not maintaining it anymore, and the rush to add new features before graduation kinda broke a few stuff :/

Additional Thoughts / Feelings / Stories

I never worked so hard on anything in my entire life. I wish I get the will to rebuild this in a better way with a cleaner architecture and account for scalability issues and handling of bigger data workloads.

Posted on by:

kmouratidis profile

Konstantinos Mouratidis

@kmouratidis

Ex Data Scientist Ex ML freelancer Ex business consultant Ex International Relations & Economics student + history nerd

Discussion

pic
Editor guide