※ Open data is data that is openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license.(https://en.wikipedia.org/wiki/Open_data)
For example
- https://data.gov/
- https://www.data.go.jp/?lang=en
- https://github.com/awesomedata/awesome-public-datasets
We have released the CLI tool that is dim (Open Data Package Manager) to manage open data.
dim
Data Installation Manager: Manage the open data in your project like a package manager.
Join community
We are looking for members to develop together as an open source community.
Features
- 📀 Record the source url and post-processing, etc., of downloaded open-data
- 🔧 Prepare all open data needed for the project in one command by using the
dim.json
recorded by someone else - 🚀 General post-processing, such as unzip, encoding, etc., is available from the start
- 🔍 Search open-data from CKAN
- 🧠 Generate code to process data using GPT-3
Document
For more information about how to use it, please refer to this document.
Quick Start
Install the dim
Install the dim from binary files or Run the dim using Deno
Install the dim from binary files
Download the dim from binary files.
curl -L https://github.com/c-3lab/dim/releases/latest/download/aarch64-apple-darwin-dim -o /usr/local/bin/dim
curl -L https://github.com/c-3lab/dim/releases/latest/download/x86_64-apple-darwin-dim -o /usr/local/bin/dim
curl https://github.com/c-3lab/dim/releases/latest/download/x86_64-pc-windows-msvc-dim.exe -o C:\Users\user-name\dim.exe
…
Demo
I thought open data should be managed by a package manager just like the software (ex: npm, apt, pip, gem...).
When fetching the open data, it would be convenient for users to be able to fetch them with commands like:
npm install xxxxx
After data is installed, it is recorded in a dim.json
such as package.json
Stop chaotic open data management
A systematic method of managing software and libraries has been established by package managers(npm, gem, apt...). However, there is no systematic management approach for open data users.
If you were given the assignment to visualize a map using some kind of open data, how would you prepare the data?
The following flow is a common example.
- Search for open data you want from Google
- When you find the open data you want, download it from your browser
- Check the open data and return to 1 if the open data is incomplete or not what you wanted
- Processing the open data for utilization (character encoding conversion, file format conversion...)
- Save the open data in the project directory or database
This process is sufficient for simple projects to utilize.
However, you may want to record the specs(name, URL, last-updated, etc...) of open data.
- Project developed by multiple people
- Projects to be maintained in the medium to long term
- Public projects (published on GitHub as OSS, etc.) , etc.
List of required the open data specifications
If you download the open data from various sites and process datasets, you may forget where you downloaded the open data from or how you processed the data. Therefore, it is useful to record the following specifications.
- URL
- Last-updated
- Version
- Post-processing
- Hash value , etc.
Approach
We have released a CLI tool the dim (Open Data Package Manager) v1.0.
Feature
(1) Support for search/download/processing/recording processes
The dim support search/download/processing/recording processes. The dim can also execute a series of processes by interactive commands.
(2) Support for post-processing commonly used in the data processing
The dim includes several post-processes commonly used in data processing. The post-process is recorded as well as the data URL. You can also use your scripts as post-process.
(3) Prepare data in one step using the existing data specification file
You can fetch and process all open data in one step by using a data specification file(dim.json) that has already been recorded.
As a user, you only share a data specification file(dim.json) without including the open data body in the repository by publishing the data specification file on GitHub.
(This is the same as publishing package.json etc. to GitHub)
About the development environment
- Language: TypeScript
- Execution environment: Deno
- CI/CD: GitHub Actions
- CI: Test/Lint/Type Check/Coverage
- CD: Automatically publish a release by tagging, building dim binary & upload
We are using Deno, which is expected to replace Node.js. We evaluated Deno for the following reasons.
- simple to set up and easy to start projects
- Lint and formatter are provided as standard functions
- TypeScript can be executed as is etc.
Usage dim
Install dim
Download the dim from binary files.
curl -L https://github.com/c-3lab/dim/releases/latest/download/aarch64-apple-darwin-dim -o /usr/local/bin/dim
curl -L https://github.com/c-3lab/dim/releases/latest/download/x86_64-apple-darwin-dim -o /usr/local/bin/dim
curl https://github.com/c-3lab/dim/releases/latest/download/x86_64-pc-windows-msvc-dim.exe -o C:\Users\user-name\dim.exe
curl -L https://github.com/c-3lab/dim/releases/latest/download/x86_64-unknown-linux-gnu-dim -o /usr/local/bin/dim
Grant user execution permission
chmod u+x /usr/local/bin/dim
New Project
init the project
Generate dim.json
, dim-lock.json
, and data_files/
by the init command.
$ dim init
Install a data
This command stores information about installed data in dim.json
and dim-lock.json
.
$ dim install https://example.com -n "example"
Installed data is saved in data_files/
.
$ ls ./data_files
Install all data written to dim.json shared by members
Install all data written to dim.json
shared by members.
Make sure existing the dim.json in the current directory
$ ls ./
dim.json ....
Install all data written in the dim.json
$ dim install
Installed data is saved in data_files/
.
$ ls ./data_files
The dim has many other features required for package manager in addition to these functions.
https://github.com/c-3lab/dim#command-usage
For an example of functions
- Search the open data
- Update the open data
- Uninstall the open data
- Download the dim.json via the internet
- Use the dim from python(https://github.com/c-3lab/dim-python)
etc.
We have released version v1.0 of the open data package manager dim, which manages the open data like a package manager.
There are still a lot of features We want to add. If there is someone who can sympathize with the issues and solve the issue together, we would be very welcome.
dim
Data Installation Manager: Manage the open data in your project like a package manager.
Join community
We are looking for members to develop together as an open source community.
Features
- 📀 Record the source url and post-processing, etc., of downloaded open-data
- 🔧 Prepare all open data needed for the project in one command by using the
dim.json
recorded by someone else - 🚀 General post-processing, such as unzip, encoding, etc., is available from the start
- 🔍 Search open-data from CKAN
- 🧠 Generate code to process data using GPT-3
Document
For more information about how to use it, please refer to this document.
Quick Start
Install the dim
Install the dim from binary files or Run the dim using Deno
Install the dim from binary files
Download the dim from binary files.
curl -L https://github.com/c-3lab/dim/releases/latest/download/aarch64-apple-darwin-dim -o /usr/local/bin/dim
curl -L https://github.com/c-3lab/dim/releases/latest/download/x86_64-apple-darwin-dim -o /usr/local/bin/dim
curl https://github.com/c-3lab/dim/releases/latest/download/x86_64-pc-windows-msvc-dim.exe -o C:\Users\user-name\dim.exe
…
Top comments (0)