Introduction to Graph Matching

[1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

The graph matching problem (GMP), is meant to find an alignment of nodes between two graphs that minimizes the number of edge disagreements between those two graphs. Therefore, the GMP can be formally written as an optimization problem:

\begin{equation} \begin{aligned} \min & {\;-\text{trace}(APB^T P^T)}\\ \text{s.t. } & {\;P \: \epsilon \: \mathcal{P}} \\ \end{aligned} \end{equation}

Where \(\mathcal{P}\) is the set of possible permutation matrices.

The Quadratic Assignment problem is a combinatorial opimization problem, modeling following the real-life problem:

“Consider the problem of allocating a set of facilities to a set of locations, with the cost being a function of the distance and flow between the facilities, plus costs associated with a facility being placed at a certain location. The objective is to assign each facility to a location such that the total cost is minimized.” [1]

When written as an optimization problem, the QAP is represented as:

\begin{equation} \begin{aligned} \min & {\; \text{trace}(APB^T P^T)}\\ \text{s.t. } & {\;P \: \epsilon \: \mathcal{P}} \\ \end{aligned} \end{equation}

Since the GMP objective function is the negation of the QAP objective function, any algorithm that solves one can solve the other.

This class is an implementation of the Fast Approximate Quadratic Assignment Problem (FAQ), an algorithm designed to efficiently and accurately solve the QAP, as well as GMP.

[1] Optimierung, Diskrete & Er, Rainer & Ela, A & Burkard, Rainer & Dragoti-Cela, Eranda & Pardalos, Panos & Pitsoulis, Leonidas. (1998). The Quadratic Assignment Problem. Handbook of Combinatorial Optimization. 10.1007/978-1-4613-0303-9_27.

[2]:
from graspologic.match import graph_match
from graspologic.simulations import er_np
/home/runner/.cache/pypoetry/virtualenvs/graspologic-pkHfzCJ8-py3.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

For the sake of this tutorial, we will use FAQ to solve the GMP for two graphs where we know a solution exists. Below, we sample a binary graph (undirected and no self-loops) \(G_1 \sim ER_{NP}(50, 0.3)\). Then, we randomly shuffle the nodes of \(G_1\) to initiate \(G_2\). The number of edge disagreements as a result of the node shuffle is printed below.

[3]:
n = 50
p = 0.3

np.random.seed(1)
G1 = er_np(n=n, p=p)
node_shuffle_input = np.random.permutation(n)
G2 = G1[np.ix_(node_shuffle_input, node_shuffle_input)]
print("Number of edge disagreements: ", np.sum(abs(G1-G2)))
Number of edge disagreements:  1012.0

Visualize the adjacency matrices

[4]:
from graspologic.plot import heatmap

heatmap(G1, cbar=False, title = 'G1 [ER-NP(50, 0.3) Simulation]')
_ = heatmap(G2, cbar=False, title = 'G2 [G1 Randomly Shuffled]')
../../_images/tutorials_matching_faq_7_0.png
../../_images/tutorials_matching_faq_7_1.png

Below, we solve the GMP using the graph_match function. The number of edge disagreements after optimization is printed below. With zero edge disagreements, we see that FAQ is successful in unshuffling the graph.

[5]:
_, perm_inds, score, misc = graph_match(G1, G2)
G2 = G2[perm_inds][:, perm_inds] # permute both rows and columns to preserve adjacency
print("Number of edge disagreements: ", np.sum(abs(G1-G2)))
Number of edge disagreements:  0.0