Recommender algorithms for distributed social networks

#algorithms #socialnetworks #pods #python

In distributed social networks, each person stores their content, such as tweets or videos, in their own data pod, and services and other pods can query that data. This architecture can provide dramatic benefits in privacy, algorithmic fairness, and protection against surveillance capitalism. But, the distributed architecture of these networks makes search and discovery of content more complex than in the typical centralized social networks we know today.

In traditional centralized social networks, algorithms look at the content consumption patterns of a person and group them with the consumption patterns of other persons that like similar content to create a list of recommended content. Centralized networks have 'data lakes' that contain information about all the content and all the consumption data of all the people that use the service, making the work of the algorithms easier. In a distributed network, there is no holistic view of all the content stored in all the pods, and the pod does not know what content other people have been consuming.

Distributed social networks have, however, several benefits for the people that use them over centralized networks:

The data about a person is stored within the pod of that person, enhancing privacy and reducing the impact of 'surveillance capitalism,' the term for companies collecting vast troves of data about people to maximize the revenue per ad they show to those people.
Each person can select the algorithm to generate their feed or timeline as they have full access to the information about themselves in their data pod. When you can choose the algorithm yourself, you can pick algorithms that don't prioritize maximizing the engagement with the social network and reduce the amount of content selected to enrage, frighten, or evoke other negative emotions. The question is thus: How can distributed social networks implement content recommendation capabilities in a robust, scalable, and cost-effective architecture that does not negate the privacy benefits of their distributed nature?

Three possible approaches for the data discovery are:

Use P2P networking to exchange data. Requests for data can be proxied from one pod to the next based on the social graph of the person initiating the request. Depending on the application, people may wish to restrict sharing to their family/friends/neighbors/colleagues. For other applications, sharing anonymized data with wider audiences may be acceptable.
Build a globally shared data set with all data: Make each pod handle the burden of hosting some of the data set and have a routing mechanism so that a pod knows what pods to contact for accessing that shard of the data.
Individual pods upload their aggregated and anonymized data to a central API. The API processes the data and makes it available to the pods. The recommender algorithms running on the pod can use its local data and the processed data from the API to make recommendations. All three options hold promise depending on the features provided by different services. The implementation of the pods can leverage previous work from distributed services such as distributed search engines (Yaci), P2P sharing networks, distributed ledgers, and consensus algorithms.

While the pods improve privacy, people must share some of their data to enable recommendations on social networks. They will have to select the services that implement this sharing based on their evaluation of the benefits of joining the service versus the degree of loss of privacy. The Byoda distributed social networking platform implements 'data contracts' enforced by the pod that explicitly specify how the data for the service in the pod is shared. Other platforms may not provide such a feature, but people will still have control over the data stored in their pod and can stop further sharing that data at any point.

People may require from services that any shared information is anonymized. While filtering out personally-identifiable data points and hashing data can hide ownership for small data sets, services on distributed social networks must consider that extensive collection of anonymized information may be at risk of de-anonymization. The services must design solutions to minimize those risks.

In all but the smallest network of pods, there is a risk that some pods are operated by bad actors and insert incorrect data. This can result in people getting incorrect recommendations. This problem is already present in centralized social networks, where the companies owning the services make varying levels of effort to moderate all submitted content, with varying levels of success. A more effective and scalable moderation strategy may be leveraging the social graphs to moderate content. Instead of a single company enforcing their moderating policies and, in some cases, failing to implement them effectively, distributed social networks could empower people to select their moderators. New, open-source distributed social networks provide a platform to research and experiment with such strategies.

The BYODA distributed social networking platform is currently implementing P2P capabilities and invites everyone to host their own pod and contribute to the design and implementation of the platform and services on the platform.

DEV Community

Recommender algorithms for distributed social networks

Latest comments (0)

Read next

PASS With IF ELSE In PYTHON

Manipulação de dados desnormalizados em Python: Utilizando re e lstrip()

Code Elegance - Beyond Loops

Django With Postgres On Ubuntu.