DEV Community

Cover image for Artstation Data Analysis Project
Dmitriy Zub ☀️
Dmitriy Zub ☀️

Posted on • Updated on

Artstation Data Analysis Project

Contents: introduction, project goals, tools used, data extraction, data cleaning, data visualization, observations, conclusions, kaggle-github links.

Introduction

Hey! My name is Dimitry Zub, currently transitioning from 3D modeling to Data Analytics (~over a month) and Python Programming (~3 months) field. Recently completed Google Data Analytics Professional Certificate from Coursera.

Project goals

The goals of this project were to better understand the process of gathering data, processing, cleaning, analyzing, and visualizing using industry tools. Besides that, I wanted to understand what is the most popular software, tag, affiliation among artists on Artstation.

Tools used

To scrape data these Python libraries/packages were used:

  • requests
  • json
  • googlesheets api
  • selenium
  • regex

To clean, analyze and visualize data:

  • googlesheets
  • tableau

Data extraction

To extract data Python was used. Jump to the GitHub source code:

Data cleaning

To clean and prepare data I used Google Sheets. Functions that were used:

=IMPORTRANGE()
=FILTER()
=SPLIT() or =SPLIT() + CHAR(10)
=IF() or =IF() + =ISBLANK()
=ISBLANK()
=COUNTIF()
=ROW()
Enter fullscreen mode Exit fullscreen mode

=IMPORTRANGE("SHEET_URL","'SHEET_NAME'!RANGE") was used to import specific column(s) to different sheet in order to clean, analyze them.

=FILTER() was used to, well, filter specific word from the range, e.g. =FILTER(A1:A, A1:A="Student") will filter word "Student" from a whole column.

=SPLIT() + were used to split multiline cell:

# multi-line cell
ZBrush
Maya
Substance Painter
Mari
Photoshop
____
=SPLIT(A1,CHAR(10)) 
will split multiline cell to separate columns like so: 
ZBrush  Maya    Substance Painter   Mari    Photoshop
Enter fullscreen mode Exit fullscreen mode

=IF() was used to check if the cell is blank, leave it blank.

=ISBLANK() was used in combination with =IF() to check if the cell is blank.

=COUNTIF() was used to count number of word occurrences in specific range, e.g. =COUNTIF(SHEET_NAME!RANGE,"Word or cell reference")

=ROW() was used to create unique ID's, e.g. =ROW(A1)-1 will create ID 1 rather than ID 2 because of subtraction.

Data visualization

Tableau public dashboard

image

image

image

Note: following visualizations contains data bias. Not every tag, affiliation has taken to count due to the difficulties of data extraction, and the mistakes I made.

Observations

Observations on personal mistakes made:

The problem is that each cell contained multiline string and when I was trying to =SPLIT() using \n delimiter it doesn't do anything. Instead I had to use =SPLIT(CELL,CHAR(10))

One of the options would be to distribute Tags directly to different variables inside Python code and enter them into the table under different columns so there no wasting time afterwards.

It would be a good idea to check right away in the table whether I get what was expected or not🙂

Observations about Artstation:

The description of the role / type of affiliation in most cases is not clear, confusing, not specific.

In this regard, from Recruiter point of view it may be difficult to find candidates due to the fact that simply something else is indicated in the brief role description.
Although from the point of view of professionalism the artist may be extremely good.

Consequence - an artist simply may not fall into the horizon of the Recruiter or anyone else who's looking for specific artist role.

As I see it, the type of employment should be standardized using drop-down list, keyword search or anything similar to the choice of the program used when publishing an artwork.

There everything is clear and understandable, e.g. "Software used: Blender, Photoshop", nothing like:

  • "Award winning Tattoo artist, digital art, 3d character art": too much in one line, how to understand in which position person will show himself in best the light?
  • "I draw stuff": not clear, specific, confusing.

Conclusions

The analysis of artists is very interesting. The next step(s) might be to understand the relationship between:

  • the number of views to likes and comments.
  • the number of views to the used program.
  • the number of views and used tags.
  • top 10 popular programs and top 10 popular tags.

Understand, what is the most popular genre and why.

Links:

Thank you for going all the way to these words! Hope you get a bit of value out of this post.

Discussion (0)