From Abstract to Schematic: COVID Infection Probability Algorithm

#datascience #diagram #covid19

Social media is truly a double edged sword. It allows us to share ideas FAST, but also can lead to a mass of hysteria over false information. Unfortunately, social media has propagated a lot of the "fake news" that exists about COVID-19.

This week, I was tasked with building a high level look at a COVID-19 risk algorithm for a web app designed to combat this misinformation. Ideally, any person could log on, enter their information, and see their risk for becoming infected with COVID-19. Another goal of the algorithm is for it to be semi-anonymous and not have any deeply personal information that could uniquely identify a user to the outside world.

Diagram

To model the data needed for the algorithm, I created an entity-relationship diagram to show the connections between different sets of data that would be collected. Additionally, the diagram shows they would be stored in a database so that a final risk analysis can be conducted.

Watch this video for an in depth look at the diagram:

CORRECTION In the video I mentioned that the primary key is a unique identifier to differentiate entities. I meant to say that the primary key is a unique identifier for tuples. The primary key differentiates tuples within a table.

The diagram breaks the data down into different entities which are linked together. Each entity has a set of attributes that in this case, are the basis for our risk analysis.

Entities and their attributes

Person: this is the subject of the risk analysis. When first using the web-app, a universally unique identifier (UUID) will be created in order to maximize anonymity.

Additionally, basic information such as gender and age are collected along with marital status, number of children, and pre-existing health conditions. The users employer and town is also stored as an attribute of the person, but are foreign keys or references to another unique entity. All of these attributes attributes are tied to the UUID, which serves as the primary key for the user.

Geographic Location: this is the basic location of the user. In the interest of privacy, the town is collected rather than the user's address.

Based on the town, the population density, air quality, type of location (Urban/Rural), Additionally, the user will provide their style of housing. Based on location, the nearest hospital can be determined, which will serve as a foreign key for the hospital entity.

Hospital: The hospital name is identified as the primary key to a set of data relating to the user's healthcare accessibility. Since the hospital name has already been determined, the available ICU beds and proximity to the user can be determined.
Occupation: Referenced by Company Name as the primary key, the ability to work remotely, stress level, industry, and amount are collected.
Behavior: The user is asked a series of yes or no questions about their activity. These items are stored as boolean values to calculate a "behavior risk level" which serves as the primary key and is assigned to the user.

Calculations

Each piece of data collected from a particular person is utilized to calculate the "total risk level" for a particular individual. This risk level is put in the context of other risky activities, for example driving or skydiving. This total risk level is an attribute of the user's entity, but is also the output of the web application.

DEV Community

From Abstract to Schematic: COVID Infection Probability Algorithm

Diagram

Entities and their attributes

Calculations

Top comments (0)

Read next

Supercharging LLM Testing: TICK Lets You Check the Boxes

Selective Attention Boosts Transformer Performance on Language Tasks

Frontier AI Developers Need Internal Audit Function to Address Key Governance Challenges

Logits of API-Protected LLMs Reveal Proprietary Model Details, Researchers Find