Data Science today :: Where do you or your skill fit in ?

#datascience #database #python #file

Image Courtesy : Oreilly

Source / Scrape : Like wiki pages, today's data source could be anything. Splunk query, Server log any device which produces data in any format (.csv, .log, .json, ,.log, .xml, .xls etc..). You need to be skilled in bunch of tools like Python / Pandas, Shell Scripts, JQ, CSV, Sed and so on..

Data Cleansing : Pandas is a good option, but its not limited to that. Community Edition of Pentaho PDI can help in data cleansing. SSDT (former SSIS) has transformations which help in data cleansing. Knowledge of RegEX will be of great help.

Database : Knowledge of database is key. Any RDBMS such as Microsoft SQL Server, Postgres, MySQL, Oracle) will do. I personally like MySQL 8 as it has NoSQL and API capability.

Explore : Jupyter / Anaconda / IPython + Pandas + Matplotlib is a good combination. SPARK with Zeppelin will also work, but too much work setting up the cluster.

Deliver : GraphQL or REST API is one option, accessing via DB (using ORM tools such as SQL Alchemy) or direct SQL will be time saving option.

Transform : Knowledge of powerful open source JS libraries such as D3.js, Chart.js will be helpful. Pentaho community edition is a good alternative. If you have access to Tableau (paid version) that is good too.

Knowledge of Docker will help you with one more more components given above.

https://www.datascienceatthecommandline.com/1e/

DEV Community

Data Science today :: Where do you or your skill fit in ?

Top comments (0)

Read next

Weight Decay's Critical Role: Theoretical Insights for Better Deep Learning Generalization

AI Boosts Math Construction Problems: PatternBoost Finds Elegant Solutions

GenXD: The AI System Generating Realistic 3D and 4D Scenes Without Complex Modeling

Multimodal AI Explained: Why It’s Transforming the Future of Technology