Nomination via SGM¶
This class implements Vertex Nomination via Seeded Graph Matching (VNviaSGM) with the algorithm described in [1].
Given two graphs \(G_1\) and \(G_2\) with associated adjacency matrices \(A\) and \(B\), VNviaSGM proposes a nomination list of potential matches in graph \(G_2\) to a vertex of interest \(voi \in G_1\) with associated probabilities.
Let \(A_L(a)\) be the induced subgraph derived from \(A\), and centered about vertex \(a \in A\) with a maximum distance from \(a\) of \(L\). VNviaSGM first finds \(A_L(voi)\), and if no seeds are in this subgraph, the algorithm stops early and returns a nomination list of None.
Define \(S_A \subset A_L(voi)\) to be the seed vertices in the subgraph centered around the voi, with associated seeds from graph \(B\) (\(S_B\)).
Two subgraphs are then generated around \(S_A\) for graph \(A\), as well as around the associated seeds \(S_B\) for graph \(B\).
Specifically, define \(SG_1 = \underset{s_A \in S_A}{\bigcup} A_L(s_A)\) and \(SG_2 = \underset{s_B \in S_B}{\bigcup} B_L(s_B)\)
These subgraphs (\(SG_1\) and \(SG_2\)) are matched using SGM over several random initializations, resulting in probabilities corresponding to the proportion in which a node in \(B\) is matching to the voi. See Graph Matching Algorithm Reference for more details.
[1] Patsolic, HG, Park, Y, Lyzinski, V, Priebe, CE. Vertex nomination via seeded graph matching. Stat Anal Data Min: The ASA Data Sci Journal. 2020; 13: 229– 244. https://doi.org/10.1002/sam.11454
[1]:
from graspologic.nominate import VNviaSGM
from graspologic.simulations import er_np
from graspologic.plot import heatmap
import numpy as np
import matplotlib.pyplot as plt
np.set_printoptions(suppress=True)
/home/runner/.cache/pypoetry/virtualenvs/graspologic-pkHfzCJ8-py3.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
[2]:
# Define parameters
n = 50
p = 0.3
num_seeds = 4
voi = 5 # choose a vertex of interest
[3]:
np.random.seed(2)
G1 = er_np(n=n, p=p)
node_shuffle_input = np.random.permutation(n)
G2 = G1[np.ix_(node_shuffle_input, node_shuffle_input)]
heatmap(G1, title = "Origional ER Graph (unshuffled)")
heatmap(G2, title = "Shuffled ER graph")
[3]:
<Axes: title={'center': 'Shuffled ER graph'}>
[4]:
kklst= [(xx, yy) for xx, yy in zip(node_shuffle_input, np.arange(len(node_shuffle_input)))]
kklst.sort(key=lambda x:x[0])
print("Association voi in G1 to vertex in G2 =", kklst[voi])
kklst = np.array(kklst)
Association voi in G1 to vertex in G2 = (5, 37)
The algorithm produces an \(n \times 2\) nomination list, where n is the number of nominees. Each row has the following format (vertex \(j \in G_2\), probability that j matches voi). Note: the output is sorted with the largest probability coming first in the output list.
[5]:
VNalg = VNviaSGM()
print(VNalg.fit_predict(G1, G2, voi, [kklst[0:num_seeds, 0], kklst[0:num_seeds, 1]]))
[[37. 1.]
[48. 0.]
[25. 0.]
[ 7. 0.]
[ 8. 0.]
[11. 0.]
[14. 0.]
[19. 0.]
[21. 0.]
[23. 0.]
[26. 0.]
[45. 0.]
[34. 0.]
[35. 0.]
[38. 0.]
[40. 0.]
[41. 0.]
[42. 0.]
[43. 0.]
[24. 0.]]
As seen, the actual correspondence is 5–37 and the model predicts that 5 (in graph \(G_1\)) matches with 37 (in graph \(G_2\)) with >90% confidence.