{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Padded Graph Matching\n", "Consider the scenario where one would like to match graphs $A$ and $B$ with $n_1$ and $n_2$ nodes, respectively, where \n", "$n_1 < n_2$. The most straightforward fashion to 'pad' $A$, such that $A$ and $B$ have the same shape, is to add $n_2 - n_1$ isolated nodes to $A$ (represented as empty row/columns in the adjacency matrix). This padding scheme is known as $\\textit{naive padding}$, substituting $A \\oplus 0_{(n_2-n_1)x(n_2-n_1)}$ and $B$ in place of $A$ and $B$, respectively.\n", "\n", "The effect of this is that one matches $A$ to the best subgraph of $B$. That is, the isolated vertices added to $A$ through padding have an affinity to the low-density subgraphs of $B$, in effect giving the isolates a false signal. \n", "\n", "Instead, we may desire to match $A$ to the best fitting induced subgraph of $B$. This padding scheme is known as $\\textit{adopted padding}$, and is achieved by substituting $\\tilde{A} \\oplus 0_{(n_2-n_1)x(n_2-n_1)}$ and $\\tilde{B}$ in place of $A$ and $B$, respectively, where $\\tilde{A} = 2A - 1_{n_1}1_{n_1}^T$ and $\\tilde{B} = 2B - 1_{n_2}1_{n_2}^T$.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To demonstrate the difference between the two padding schemes, we sample two graph's $G_1$ and $G_2'$, each having 400 vertices, from a $0.5 \\sim SBM(4,b,\\Lambda)$, where b assigns 100 vertices to each of the k = 4 blocks, and\n", "\n", "\\begin{align*}\n", "\\Lambda &= \\begin{bmatrix} \n", "0.9 & 0.4 & 0.3 & 0.2\\\\\n", "0.4 & 0.9 & 0.4 & 0.3\\\\\n", "0.3 & 0.4 & 0.9 & 0.4\\\\\n", "0.2 & 0.3 & 0.4 & 0.7\n", "\\end{bmatrix}\\\\\n", "\\end{align*}\n", "\n", "We create $G_2$ from $G_2'$ by removing 25 nodes from each block of $G_2'$, yielding a 300 node graph (example adapted from section 2.5 of [1]).\n", "\n", "The goal of the matching in this case is to recover $G_2$ by matching the right most figure below and $G_1$. That is, we seek to recover the shared community structure common between two graphs of differing shapes.\n", "\n", "[1] \n", "D. Fishkind, S. Adali, H. Patsolic, L. Meng, D. Singh, V. Lyzinski, C. Priebe,\n", " \"Seeded graph matching\", Pattern Recognit. 87 (2019) 203–215" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SBM correlated graph pairs" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# simulating G1', G2, deleting 25 vertices\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from graspologic.match import graph_match\n", "from graspologic.simulations import sbm_corr\n", "from graspologic.plot import heatmap\n", "\n", "np.random.seed(1)\n", "rng = np.random.default_rng(1)\n", "\n", "directed = False\n", "loops = False\n", "block_probs = [\n", " [0.9, 0.4, 0.3, 0.2],\n", " [0.4, 0.9, 0.4, 0.3],\n", " [0.3, 0.4, 0.9, 0.4],\n", " [0.2, 0.3, 0.4, 0.7],\n", "]\n", "n = 100\n", "n_blocks = 4\n", "rho = 0.5\n", "block_members = np.array(n_blocks * [n])\n", "n_verts = block_members.sum()\n", "G1, G2_full = sbm_corr(block_members, block_probs, rho, directed, loops)\n", "\n", "keep_indices = np.concatenate(\n", " (np.arange(75), np.arange(100, 175), np.arange(200, 275), np.arange(300, 375))\n", ")\n", "\n", "G2 = G2_full[keep_indices][:, keep_indices]\n", "\n", "G2_deleted = np.full((G1.shape), -1)\n", "G2_deleted[np.ix_(keep_indices, keep_indices)] = G2\n", "\n", "topleft_G2 = np.zeros((400, 400))\n", "topleft_G2[:300, :300] = G2\n", "fig, axs = plt.subplots(1, 4, figsize=(20, 10))\n", "heatmap(G1, ax=axs[0], cbar=False, title=\"G1'\")\n", "heatmap(G2_full, ax=axs[1], cbar=False, title=\"G2 (before deletions)\")\n", "heatmap(G2_deleted, ax=axs[2], cbar=False, title=\"G2 (after deletions)\")\n", "_ = heatmap(topleft_G2, ax=axs[3], cbar=False, title=\"G2 (to top left corner)\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Naive vs Adopted Padding" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "seed2 = rng.choice(np.arange(G2.shape[0]), 8)\n", "seed1 = [int(x / 75) * 25 + x for x in seed2]\n", "\n", "partial_match = np.column_stack((seed1, seed2))\n", "\n", "naive_indices1, naive_indices2, _, _ = graph_match(\n", " G1, G2, partial_match=partial_match, padding=\"naive\", rng=rng\n", ")\n", "G2_naive = G2[naive_indices2][:, naive_indices2]\n", "G2_naive_full = np.zeros(G1.shape)\n", "G2_naive_full[np.ix_(naive_indices1, naive_indices1)] = G2_naive\n", "\n", "adopted_indices1, adopted_indices2, _, _ = graph_match(\n", " G1, G2, partial_match=partial_match, padding=\"adopted\", rng=rng\n", ")\n", "G2_adopted = G2[adopted_indices2][:, adopted_indices2]\n", "G2_adopted_full = np.zeros(G1.shape)\n", "G2_adopted_full[np.ix_(adopted_indices1, adopted_indices1)] = G2_adopted\n", "\n", "fig, axs = plt.subplots(1, 2, figsize=(14, 7))\n", "heatmap(G2_naive_full, ax=axs[0], cbar=False, title=\"Naive Padding\")\n", "heatmap(G2_adopted_full, ax=axs[1], cbar=False, title=\"Adopted Padding\")\n", "\n", "\n", "def compute_match_ratio(indices1, indices2):\n", " match_ratio = 0\n", " for i in range(len(indices2)):\n", " if (indices1[i] == keep_indices[i]) and (indices2[i] == i):\n", " match_ratio += 1\n", " return match_ratio / len(indices2)\n", "\n", "print(f\"Matching accuracy with naive padding: {compute_match_ratio(naive_indices1, naive_indices2):.2f}\")\n", "print(f\"Matching accuracy with adopted padding: {compute_match_ratio(adopted_indices1, adopted_indices2):.2f}\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We observe that the two padding schemes perform as expected. The naive scheme permutes $G_2$ such that it matches a subgraph of $G_1$, specifically the subgraph of the first three blocks. Additionally, (almost) all isolated vertices of $G_2$ are permuted to the fourth block of $G_1$.\n", "\n", "On the other hand, we see that adopted padding preserves the common block structure between $G_1$ and $G_2$." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.9.7 ('.venv': poetry)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" }, "vscode": { "interpreter": { "hash": "bc13b1df6b0248aaf34f3e7f5790740f6f355623370613abc0a11b70d06c20f2" } } }, "nbformat": 4, "nbformat_minor": 4 }