During my MSc in Data Science I really hated how we got taught closed-source tools like SAP Hana and RapidMiner (partly open), and I found Weka to be pretty old and ugly. Innocent me said: "How hard can it be to create one?"
githubsdp ? I used TravisCI for pipelines and private repos from pro (it wasn't free then) for untested features / parts of the codebase (like a dev branch). It also helped with integrations with other tools. I also used namecheap, gitkraken, pycharm.
A web-based visualization and analytics dashboard that is able to connect to APIs, receive your data (including datatype inference), and allow you to define, train, export and (almost) productionize Machine Learning models.
Swiss army knife, but for visualization, analytics, and machine learning. View docs here: http://edaminer.com/docs/ and a demo (don't abuse) here: http://edaminer.com/
A visualization and analytics dashboard that is able to connect to APIs, receive your data and allow you to run Machine Learning models from a server. Started as a university project, and will be deployed in their servers probably later this year Also being worked on together with university staff for an E.U.-sponsored project.
Want to contribute? Take a moment to review the style and contributor guidelines
Just looking around? Then you can either install locally or with docker.
- Get Python3.6+, optionally with Anaconda. You might want to set up a virtual environment
- Download (either via
git clone https://github.com/KMouratidis/EDA_mineror as a zip)
- You'll need redis (if on Windows, you might also need this) and graphviz (for pygraphviz)
pip install -r requirements.txt.
- Navigate to the
- Create an
env.pyfile with your credentials, according to the given template…
It took me ~3 months for most of the code and another ~3 months for improvements and additional features. It uses Redis and SQLite as databases, lots of Flask (and extensions) and Dash for the interface, sklearn for modeling, and plotly for visualization. Docker for deployment. There are lots on info in the huge contributor file.
I learnt more in this project than in the past 2 years of intense self-studying, mainly due to looking up things in depth: from CI/CD stuff (Travis, cron for autodeploys, git) to networking and servers (configuring nginx, port forwarding, domain stuff), from NoSQL databases to mailing protocols, metaprogramming and advanced object-oriented patterns (I rediscovered the Django wheel). Also got a bit into unit-testing, Docker, bash.
I stole lots of stuff from RapidMiner, Orange, Power BI, a SAP algorithm, Plotly, etc. Sadly, I'm not maintaining it anymore, and the rush to add new features before graduation kinda broke a few stuff :/
I never worked so hard on anything in my entire life. I wish I get the will to rebuild this in a better way with a cleaner architecture and account for scalability issues and handling of bigger data workloads.