{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Padded Graph Matching\n",
    "Consider the scenario where one would like to match graphs $A$ and $B$ with $n_1$ and $n_2$ nodes, respectively, where \n",
    "$n_1 < n_2$. The most straightforward fashion to 'pad' $A$, such that $A$ and $B$ have the same shape, is to add $n_2 - n_1$ isolated nodes to $A$ (represented as empty row/columns in the adjacency matrix). This padding scheme is known as $\\textit{naive padding}$, substituting $A \\oplus 0_{(n_2-n_1)x(n_2-n_1)}$ and $B$ in place of $A$ and $B$, respectively.\n",
    "\n",
    "The effect of this is that one matches $A$ to the best subgraph of $B$. That is, the isolated vertices added to $A$ through padding have an affinity to the low-density subgraphs of $B$, in effect giving the isolates a false signal.  \n",
    "\n",
    "Instead, we may desire to match $A$ to the best fitting induced subgraph of $B$. This padding scheme is known as $\\textit{adopted padding}$, and is achieved by substituting $\\tilde{A} \\oplus 0_{(n_2-n_1)x(n_2-n_1)}$ and $\\tilde{B}$ in place of $A$ and $B$, respectively, where $\\tilde{A} = 2A - 1_{n_1}1_{n_1}^T$ and $\\tilde{B} = 2B - 1_{n_2}1_{n_2}^T$.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To demonstrate the difference between the two padding schemes, we sample two graph's $G_1$ and $G_2'$, each having 400 vertices, from a $0.5 \\sim SBM(4,b,\\Lambda)$, where b assigns 100 vertices to each of the k = 4 blocks, and\n",
    "\n",
    "\\begin{align*}\n",
    "\\Lambda &= \\begin{bmatrix} \n",
    "0.9 & 0.4 & 0.3 & 0.2\\\\\n",
    "0.4 & 0.9 & 0.4 & 0.3\\\\\n",
    "0.3 & 0.4 & 0.9 & 0.4\\\\\n",
    "0.2 & 0.3 & 0.4 & 0.7\n",
    "\\end{bmatrix}\\\\\n",
    "\\end{align*}\n",
    "\n",
    "We create $G_2$ from $G_2'$ by removing 25 nodes from each block of $G_2'$, yielding a 300 node graph (example adapted from section 2.5 of [1]).\n",
    "\n",
    "The goal of the matching in this case is to recover $G_2$ by matching the right most figure below and $G_1$. That is, we seek to recover the shared community structure common between two graphs of differing shapes.\n",
    "\n",
    "<a id=\"1\">[1]</a> \n",
    "D. Fishkind, S. Adali, H. Patsolic, L. Meng, D. Singh, V. Lyzinski, C. Priebe,\n",
    "        \"Seeded graph matching\", Pattern Recognit. 87 (2019) 203–215"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SBM correlated graph pairs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# simulating G1', G2, deleting 25 vertices\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from graspologic.match import graph_match\n",
    "from graspologic.simulations import sbm_corr\n",
    "from graspologic.plot import heatmap\n",
    "\n",
    "np.random.seed(1)\n",
    "rng = np.random.default_rng(1)\n",
    "\n",
    "directed = False\n",
    "loops = False\n",
    "block_probs = [\n",
    "    [0.9, 0.4, 0.3, 0.2],\n",
    "    [0.4, 0.9, 0.4, 0.3],\n",
    "    [0.3, 0.4, 0.9, 0.4],\n",
    "    [0.2, 0.3, 0.4, 0.7],\n",
    "]\n",
    "n = 100\n",
    "n_blocks = 4\n",
    "rho = 0.5\n",
    "block_members = np.array(n_blocks * [n])\n",
    "n_verts = block_members.sum()\n",
    "G1, G2_full = sbm_corr(block_members, block_probs, rho, directed, loops)\n",
    "\n",
    "keep_indices = np.concatenate(\n",
    "    (np.arange(75), np.arange(100, 175), np.arange(200, 275), np.arange(300, 375))\n",
    ")\n",
    "\n",
    "G2 = G2_full[keep_indices][:, keep_indices]\n",
    "\n",
    "G2_deleted = np.full((G1.shape), -1)\n",
    "G2_deleted[np.ix_(keep_indices, keep_indices)] = G2\n",
    "\n",
    "topleft_G2 = np.zeros((400, 400))\n",
    "topleft_G2[:300, :300] = G2\n",
    "fig, axs = plt.subplots(1, 4, figsize=(20, 10))\n",
    "heatmap(G1, ax=axs[0], cbar=False, title=\"G1'\")\n",
    "heatmap(G2_full, ax=axs[1], cbar=False, title=\"G2 (before deletions)\")\n",
    "heatmap(G2_deleted, ax=axs[2], cbar=False, title=\"G2 (after deletions)\")\n",
    "_ = heatmap(topleft_G2, ax=axs[3], cbar=False, title=\"G2 (to top left corner)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Naive vs Adopted Padding"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "seed2 = rng.choice(np.arange(G2.shape[0]), 8)\n",
    "seed1 = [int(x / 75) * 25 + x for x in seed2]\n",
    "\n",
    "partial_match = np.column_stack((seed1, seed2))\n",
    "\n",
    "naive_indices1, naive_indices2, _, _ = graph_match(\n",
    "    G1, G2, partial_match=partial_match, padding=\"naive\", rng=rng\n",
    ")\n",
    "G2_naive = G2[naive_indices2][:, naive_indices2]\n",
    "G2_naive_full = np.zeros(G1.shape)\n",
    "G2_naive_full[np.ix_(naive_indices1, naive_indices1)] = G2_naive\n",
    "\n",
    "adopted_indices1, adopted_indices2, _, _ = graph_match(\n",
    "    G1, G2, partial_match=partial_match, padding=\"adopted\", rng=rng\n",
    ")\n",
    "G2_adopted = G2[adopted_indices2][:, adopted_indices2]\n",
    "G2_adopted_full = np.zeros(G1.shape)\n",
    "G2_adopted_full[np.ix_(adopted_indices1, adopted_indices1)] = G2_adopted\n",
    "\n",
    "fig, axs = plt.subplots(1, 2, figsize=(14, 7))\n",
    "heatmap(G2_naive_full, ax=axs[0], cbar=False, title=\"Naive Padding\")\n",
    "heatmap(G2_adopted_full, ax=axs[1], cbar=False, title=\"Adopted Padding\")\n",
    "\n",
    "\n",
    "def compute_match_ratio(indices1, indices2):\n",
    "    match_ratio = 0\n",
    "    for i in range(len(indices2)):\n",
    "        if (indices1[i] == keep_indices[i]) and (indices2[i] == i):\n",
    "            match_ratio += 1\n",
    "    return match_ratio / len(indices2)\n",
    "\n",
    "print(f\"Matching accuracy with naive padding: {compute_match_ratio(naive_indices1, naive_indices2):.2f}\")\n",
    "print(f\"Matching accuracy with adopted padding: {compute_match_ratio(adopted_indices1, adopted_indices2):.2f}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We observe that the two padding schemes perform as expected. The naive scheme permutes $G_2$ such that it matches a subgraph of $G_1$, specifically the subgraph of the first three blocks. Additionally, (almost) all isolated vertices of $G_2$ are permuted to the fourth block of $G_1$.\n",
    "\n",
    "On the other hand, we see that adopted padding preserves the common block structure between $G_1$ and $G_2$."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.9.7 ('.venv': poetry)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  },
  "vscode": {
   "interpreter": {
    "hash": "bc13b1df6b0248aaf34f3e7f5790740f6f355623370613abc0a11b70d06c20f2"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}