I’m always looking for new tools to improve my Data Scientist activity. This time I came across Mito, a spreadsheet provided as a Python library, which allows you to manipulate a dataset in a simple and fast way, and above all in an interactive way.
Mito provides a graphical interface within the Jupyter Lab environment, so you can manipulate any dataset.
In practice, Mito combines the classic functionalities of spreadsheets (such as their user friendly usability), with all potentialities of Python. In fact, all the operations carried out within the interface are automatically translated into a Python code that can also be used elsewhere.
Mito provides the following features:
- import/export datasets
- add/delete columns to a dataset
- build pivot tables
- merge two datasets
- plot charts
- filter/sort columns
- columns statistics
In this article, I give an overview of Mito, as well as a practical example of usage.
Mito is an interactive spreadsheet, provided as a Python library, which runs within Jupiter Lab, which must be installed in advance.
Mito can be easily installed through the following commands:
python3 -m pip install mitoinstaller
python3 -m mitoinstaller install
The Mito Official Documentation comes with a detailed description of common installation problems, which can help you if the installation process fails. In my case, the installation was ok.
Once installed, you must restart Jupyter Lab, if it is running, in order to enable Mito. You can run Jupyter Lab from command line through the following command:
In order to run the Mito interactive interface, you can create a new notebook and write the following code in a cell:
Each time you modify a dataset through the Mito interface, Mito generates the equivalent Python code in the cell below.
Datasets are stored as Pandas dataframes, which can be also manipulated directly in a cell.
The first time you run Mito, a new popup opens, asking for some information, such as your email and other similar stuff:
After Signed up, the Mito interactive interface starts within the notebook environment. You can enable the full screen modality, by clicking the top right button.
The interface provides the following menu bar:
Starting from the left, the following menu items are available:
- undo — erase the last change done on the dataset;
- import — load a new dataset from the file system. A popup opens, which permits you to browse among the directories of the filesystem. Supported formats include Excel (XLSX) and CSV;
- export — download a manipulated dataset as a CSV to your local filesystem;
- add column — add a new column to the dataset. You can change the column name, as well as the column values. In this last case, you can either enter values manually or calculate values from the other columns. It is sufficient to double click on the first row of the column to insert a formula; delete column — erase a column completely; pivot — build a pivot table. The resulting table is opened into a new tab, thus it can be manipulated separately; *merge — merge two datasets. You can choose among the following merge types:
Continue Reading on Towards Data Science