loading...

Help gathering data

peiche profile image Paul ・1 min read

I don't even really know where to start, so I figure I'll just throw this out there and see where it goes.

My daughter is disabled. She's in sports and dance, but it's taken my wife and I years to find out about the organizations she's now in, and that's mostly through word of mouth. Other families have told us because they've had the years of experience that we didn't. And now we're passing the information on to other less experienced families. And that's a problem that everyone we've talked to agrees upon: there's really no good way of discovering what organizations are out there, and what they can help with.

There exist some sites out there like https://www.challengedathletes.org/resources/ which are really just lists of sites, but really nothing more to indicate that this group has wheelchair basketball, that group has adaptive ballet, that kind of thing. So I'm thinking, what if I built a site that provided an index. Searchable, faceted, like Algolia or AWS Cloudsearch. That part I can do. But how would I go about gathering the information? Could I somehow scrape it? If so, how do I organize it? Do I crowdsource by petitioning /r/disability, the Facebook support groups my family belongs to, and other places across the interwebs?

I can design the data model. I can build the webapp. I can make it fast and pretty and easy to use. But how do I get the data?

Discussion

markdown guide
 

I've had ideas wither away because they lacked access to the data sources required to drive their purpose. It's frustrating at times, even more so when you know the datasets exist.

PageScraping might work, but I do wonder how complex that might get. It's one thing to scrape a product catalog for a vendor, and another to parse independent websites for possibilities of data.

A couple alternative approaches:

  • Outsource it to an offshore resource to compile
  • Create a means for community contributions, and run a campaign to get an initial dataset populated by reaching out directly to the orgs and asking them to fill out a survey or something to capture the data
 

The first step is getting as many programs and resources into the system as possible. Scraping the public link dumps you have is a perfectly reasonable start: some data is better than no data, and once you have records, you can start attaching tags to them. BeautifulSoup is the last dedicated scraping tool I've used but you should be able to get a list of links and descriptions with a little console JavaScript too.

What I'd do after getting the data on what programs simply exist is to set up a form where people can submit reports that such-and-such program serves this-or-that need (and therefore needs this-or-that tag). Some manual validation or cleanup will probably be necessary but it's crucial to get as many reports on as many programs as possible: any individual report can be incomplete or even incorrect, but the more you have the more you reliably know about any given program. Then it's a matter of generating publicity: blog, post on social media, explore partnering with disability support and activist organizations or getting sponsorships.

The Wine AppDb is an example of a website that does something similar to build a community-driven database.

This sounds like something that needs to exist. Good luck!