In my previous post, we went through what a Graph is in the context of math, as well as node-based features used for machine-learning on graphs.
TLDR: a graph is a set of nodes connected by edges, both of with can contain features. Graph ML is concerned with using graph-based representations to infer these features on new graphs, or in some cases to learn structures existing within a graph.
The previous post covered features which assumed that we wanted to perform inference on the nodes or edges of a graph. Some applications however, tend to warrant entire-graph operations; For example, predicting whether a new molecule is toxic or whether a new protein is compatible with particular enzymes.
To this end, there are three common approaches:
- Bag of nodes
- Weisfeiler-Lehman Kernel
This is the simplest method. Summary statistics, such as histograms, node degree, centrality measures from node-level operations can be aggregated and used as a graph-level representation. However, this is entirely based on node-level data, which means that larger, structural features of the graph may be missed.
The idea behind this method is to iteratively aggregate node-level information, which contain data extending past their local ego graph (their immediate neighbourhood). This method can get mathsy quite quickly;
A label is assigned to each node, such as the node-degree. A hash-function then iteratively assigns each node a new label using the multi-set of current labels within the current node's neighbourhood (multi-set, since some neighbours may have the same degree). This is run a fixed number of time (depending on graph-size and how much data we wish to capture). Each node now encodes the structure of its neighbourhood, which can then be summarized for further processing.
This is a typically combinatoric-ally difficult problem, since it analyses different possible subgraph structures existing in a graph (called graphlets). A graphlet kernel would encode the number of times graphlets of a certain type occur in a graph, typically in a column vector. A similar approach looks at paths that occur in a graph, and encodes the number of times particular degree-sequences occur. A slight variation to this approach uses only the shortest-path between nodes which is used to encode this data.