DEV Community

Cover image for Flatten Pandas dataframe MultiIndex
Rolando Gómez Tabar
Rolando Gómez Tabar

Posted on • Updated on

Flatten Pandas dataframe MultiIndex

Context

A few days ago, I had to extract a data from Oracle Database using SQL, and then PIVOT a long set of values in a column, to multiple columns. The sintax to do this, requires to write every single value you want to be as a column, not one but twice to avoid the default quotes.

See reference
https://www.oracle.com/technical-resources/articles/database/sql-11g-pivot.html

There are a few options like pivoting using PIVOT XML, or even more recent to build a JSON column using JSON_ARRAYAGG and JSON_OBJECT to get dynamic columns or any value as an attribute, but still not so straightforward.

First try:

Lets try with a python notebook (I use VSCode notebook).
First import pandas:

import pandas as pd
Enter fullscreen mode Exit fullscreen mode

We will use a random free dataset:

# source file from https://www.kaggle.com/gregorut/videogamesales
df = pd.read_csv('./vgsales.csv')
df
Enter fullscreen mode Exit fullscreen mode

vgsales.csv

Using the function pivot_table we will transponse the values of the column Genre to be a column for every value in the dataset.

pivot_df = df.pivot_table(index=['Year', 'Publisher'], columns=['Genre'], values=['Global_Sales'], aggfunc=np.sum, fill_value=0)
pivot_df
Enter fullscreen mode Exit fullscreen mode

simple df.pivot_table

Now let's reset_index() to "try" flattening the indexes.

mi_pivot_df = pivot_df.reset_index()
mi_pivot_df
Enter fullscreen mode Exit fullscreen mode

pivot_df.reset_index
Ook, not sure, but, lets try to export to an Excel, with index=False to avoid showing the index column to the left.

pivot_df.to_excel('./global_sales_by_publishers_genres.xlsx', index=False)
Enter fullscreen mode Exit fullscreen mode

pivot_df.to_excel 1
So, it's not implemented. Although, it does work when you remove the index=False part. However it will show the index column to the left and the two column indexes, which I don't want to be there.

Second try

I searched the solution to flat the most, but no answer was full, complete enough, and free of errors. I collected all, and by try and error, got here, a working solution. I also works after using pivot function:

flat_index_pivot_df = pivot_df['Global_Sales'].reset_index()
flat_index_pivot_df
Enter fullscreen mode Exit fullscreen mode

df.reset_index 2

Now you can get a clean Excel Sheet, free of MultiIndex.

flat_index_pivot_df.to_excel('./global_sales_by_publishers_genres.xlsx', index=False)
Enter fullscreen mode Exit fullscreen mode

Do you know another solution?

Top comments (1)

Collapse
 
chris1610 profile image
Chris Moffitt

Nice article. I wrote a small utility for pandas called sidetable which flattens multi-indexes and gives some additional options. It might be useful to you and your readers.

You can see it here - github.com/chris1610/sidetable#fla...