I've talked at conferences before, but this time it was a bit different. This thursday may 28th was my first time presenting at a local Drupal Meetup. I've been wanting to do it for a couple of years now but wasn't able to do it until then, in a virtual event. I want to thank all the people working hard to keep this community alive and for giving all of us the chance to keep learning even when we're away.
In this presentation, I talked about open data in drupal and how DKAN is helping grow the open data efforts across the world. Here, I'm going to do a quick written version of my talk, so read on!
Basically, open data is all data that users can use and redistribute freely. In order to consider data as "open" it should be available for anyone, it should be accessible, reusable, interoperable and open to the public in general.
For data to be really open, it should be presented in structured formats that allow and make it easy for people to compare it, exchange it, traze it and reuse it effectively, all of this encourages transparency and accountability in the public institutions.
Having open data allows people to take well informed decisions based on data, it allows users to tell the story of their communities and it strengthens citizen participation.
Well, Drupal help us manage that data, that's what DKAN does. Basically, DKAN is an open source, open data platform based on Drupal. It allows organizations to upload and publish their data, while at the same time, users can use it to search and download that data.
But, it is not enough just to open the data, it is really important for the users to be able to find it, understand it, compare it to other data they are looking at so it is interoperable, and, of course, to download it. All of this is possible thanks to the catalogs.
Open data catalogs are more than just the data itself, it is also who collected the data, when it was collected, what format is it in, and all of this is important because it helps the users to understand what they are looking at and makes it easy to compare it.
DKAN was first created as a distribution for Drupal 7, it was kind of a monolithic structure where Drupal was in charge of managing everything: data, metadata, stories, visualizations, the users, the publishing workflow and the look and feel of the application.
It has been working like that for some years now, it has an active community and there is a lot of sites out in the world that are opening their data through it.
The structure of DKAN for Drupal 7 poses some scalability challenges, also the entry barrier for starting to use it is a bit high; these are some of the reasons why DKAN was rebuilt for Drupal 8.
The rebuilt of DKAN was made thinking in a componentized architecture, so it is a Drupal 8 module where the frontend and backend are decoupled, it is considered to be schema driven and API first.
With this new architecture, Drupal is in charge of managing just users, files, metadata and the search index, while the frontend is all managed through a decoupled React frontend and we can add as many other pieces/microservices that we want.
The main components for managing a site in DKAN v2 are:
- the module DKAN
- the decoupled frontend, which is based on:
- data-catalog-frontend: it is the starter kit for managing the frontend, here you can define any new components you need and you can update colors and the whole look and feel of your application.
- data-catalog-components: this is a package where we have all the react components for showing all the data pieces.
- DKAN Tools: it is a CLI tool created for helping manage DKAN sites.
One of the big changes between DKAN classic and v2 is that in DKAN v2 we are using just one content type called "Data" and it has one single field for "Json Metadata".
This allows us to provide a higher level of scalability, because we can support multiple schemas without the need of changing a lot of components in our Drupal site, instead, we just need to update the schema and that way we'll be able to support more or less fields, some of the most common schemas right now are:
Even though having just one field may seem hard for the users, DKAN is actually showing all the fields from the schema using a React form:
Another way for getting data into the platform is through the Harvest component, this is a tool that allows users to collect data from an endpoint and making all the data and metadata available in the DKAN portal.
So, publishers can easily update the data and metadata as they need in a user friendly way and then general users can search and find the data easily.
Another pretty cool feature from DKAN v2 is that it is API first, so you can do everything just with URLs, you can play with DKAN v2 directly in https://demo.getdkan.com/ and if you go to the "API" section, you'll be able to test the API there.
DKAN v2 is being actively developed and there is a lot more coming soon, so I encourage you to use it, play with it and keep an eye on it in general.
This talk was part of a Drupal meetup in Costa Rica, the whole event was transmitted here so take a look at it, we talked about open data but also Ronald Aguilar talked about the Drupal community and Alain Martinez talked about his journey as an entrepreneur using Drupal.