{ "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3-final" }, "orig_nbformat": 2, "kernelspec": { "name": "python3", "display_name": "Python 3" } }, "nbformat": 4, "nbformat_minor": 2, "cells": [ { "source": [ "# Mixed Membership Stochastic Blockmodel (MMSBM)" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import graspologic\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "%matplotlib inline" ] }, { "source": [ "Unlike [Stochastic Block Models (SBM)](./sbm.ipynb), Mixed Membership Stochastic Blockmodels (MMSBM) allow nodes to pertain to multiple communities when interacting with other nodes.\n", "\n", "Given a network with $k$ communities, an MMSBM is parametrized by the number of nodes $n$ of the graph, a block connectivity matrix $B \\in \\mathbb{R}^{k x k}$ where each element specifies the probability of interaction between nodes based upon their community membership, and a vector $\\vec{\\alpha}$ which controls the mixed-membership allowed for each node.\n", "\n", "Below, we sample a two-block MMSBM with following parameters:\n", "\n", "\\begin{align*}\n", "n &= 100\\\\\n", "P &= \\begin{bmatrix} \n", "0.7 & 0.2\\\\\n", "0.2 & 0.05\n", "\\end{bmatrix}\\\\\n", "\\vec{\\alpha} &= \\begin{bmatrix} 0.1 & 0.1 \\end{bmatrix}\\\\\n", "\\end{align*}" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from graspologic.simulations import mmsbm\n", "\n", "rng = np.random.default_rng(123)\n", "\n", "n = 100\n", "p = [[0.7, 0.2],\n", " [0.2, 0.05]]\n", "alpha = [0.1]*2\n", "\n", "G = mmsbm(n, p, alpha, rng = rng)" ] }, { "source": [ "## Visualize the graph using heatmap" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from graspologic.plot import heatmap\n", "\n", "heatmap(G, cbar= False, title ='MMSBM Simulation');" ] }, { "source": [ "## Generating various types of graphs" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "Each node $p$ is assigned a membership vector $\\vec{\\pi}_p \\in \\mathbb{R}^{k}$ sampled from a [Dirichlet distribution](https://en.wikipedia.org/wiki/Dirichlet_distribution):\n", "\\begin{align*}\n", "\\vec{\\pi}_p \\sim \\mbox{Dir}(\\vec{\\alpha})\\\\\n", "\\end{align*}\n", "\n", "This vector indicates the probability of a node to interact with other nodes as if belonging to a certain community. For example, in a four communities graph, let:\n", "\n", "\\begin{align*}\n", "\\vec{\\pi}_p &= \\begin{bmatrix} 1 & 0 & 0 & 0 \\end{bmatrix}\\\\\n", "\\vec{\\pi}_q &= \\begin{bmatrix} 0.25 & 0.25 & 0.25 & 0.25 \\end{bmatrix}\\\\\n", "\\end{align*}\n", "\n", "According to $\\vec{\\pi}_p$, node $p$ will always behave as if pertaining to the first community when interacting with other nodes, while node $q$ will interact 25% of the time as if belonging to community 1, 25% of the time as if belonging to community 2, etc.\n", "\n", "Therefore, controlling these membership vectors by varying the parameter $\\vec{\\alpha}$ allows generation of various types of graphs.\n" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "### Generating Erdos-Renyi (ER) graphs" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "To generate [Erdos-Renyi (ER)](./erdos_renyi.ipynb), one of the $\\vec{\\alpha}$ entries can be set to be much greater than the others.\n", "\n", "For example, consider the two graphs (undirected, no self-loops) parameterized by:\n", "\n", "\\begin{align*}\n", "n &= 100\\\\\n", "P &= \\begin{bmatrix} \n", "0.8 & 0.2\\\\\n", "0.2 & 0.5\n", "\\end{bmatrix}\\\\\n", "\\end{align*}\n", "\n", "With $\\vec{\\alpha}$ respectively equal to:\n", "\n", "\\begin{align*}\n", "\\vec{\\alpha_1} &= \\begin{bmatrix} 100 & 1 \\end{bmatrix}\\\\\n", "\\vec{\\alpha_2} &= \\begin{bmatrix} 1 & 100 \\end{bmatrix}\\\\\n", "\\end{align*}\n" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from graspologic.simulations import er_np\n", "\n", "n = 100\n", "p = [[0.8, 0.2],\n", " [0.2, 0.5]]\n", "\n", "alpha_1 = [100, 1]\n", "G_1 = mmsbm(n, p, alpha_1, rng = rng)\n", "\n", "alpha_2 = [1, 100]\n", "G_2 = mmsbm(n, p, alpha_2, rng = rng)" ] }, { "source": [ "### Visualize the graphs using heatmap" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "heatmap(G_1, cbar= False, title ='MMSBM Simulation(alpha = [100, 1])');\n", "heatmap(G_2, cbar= False, title ='MMSBM Simulation(alpha = [1, 100])');" ] }, { "source": [ "### Differences with Stochastic Block Model (SBM) graphs" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "According to the [Stochastic Block Models (SBM)](./sbm.ipynb), each node in a graph can only pertain to one community. Therefore, one way to recreate such model with an MMSBM is to set all the entries of $\\vec{\\alpha}$ equal to the same value. The closer the value is to $0$ the more the simulated graphs will appear similar to an SBM, the closer the value is to $1$ the more uniform the graph will look, thus losing the community structure dictated by the SBM.\n", "\n", "To show this we sample a four-block SBM (undirected, no self-loops) graph with following parameters:\n", "\\begin{align*}\n", "n &= \\begin{bmatrix} 50 & 50 & 50 & 50 \\end{bmatrix}\\\\\n", "B &=\\begin{bmatrix} 0.8 & 0.2 & 0.2 & 0.2 \\\\ 0.2 & 0.8 & 0.2 & 0.2 \\\\ 0.2 & 0.2 & 0.8 & 0.2 \\\\ 0.2 & 0.2 & 0.2 & 0.8 \\end{bmatrix}\n", "\\end{align*}\n", "\n", "We also sample a four-block MMSBM with parameter $\\vec{\\alpha}$ such that:\n", "\\begin{align*}\n", "\\vec{\\alpha} &= \\begin{bmatrix} 0.01 & 0.01 & 0.01 & 0.01 \\end{bmatrix}\\\\\n", "\\end{align*}\n" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from graspologic.simulations import sbm\n", "\n", "n = [50] * 4\n", "p = [[0.8,0.2,0.2,0.2],\n", " [ 0.2,0.8,0.2,0.2],\n", " [ 0.2,0.2,0.8,0.2],\n", " [ 0.2,0.2,0.2,0.8]]\n", "\n", "G_sbm = sbm(n, p, directed=False, loops=False)\n", "\n", "n = 200\n", "k = 4\n", "alpha = [0.01]*k\n", "\n", "G_mmsbm = mmsbm(n, p, alpha= alpha, rng = rng)" ] }, { "source": [ "### Plot SBM and MMSBM graphs using heatmap" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "heatmap(G_sbm, cbar= False, title ='SBM Simulation');\n", "heatmap(G_mmsbm, cbar= False, title ='MMSBM Simulation(alpha = 0.01)');" ] }, { "source": [ "As expected, the two graphs appear to be generated from four-block SBMs with connectivity matrix $B$. In the MMSBM case, the number of nodes assigned to each block is not specified which is why we can see a different number of nodes per block.\n", "\n", "Now let's try increasing the values of each entry of the concentration parameter $\\vec{\\alpha}$ first to $0.25$, then $0.5$ and finally to $1$." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alpha = [0.25]*k\n", "print(\"First increase alpha to: \" + str(alpha))\n", "\n", "G_mmsbm_25 = mmsbm(n, p, alpha= alpha, rng = rng)\n", "heatmap(G_mmsbm_25, cbar= False, title ='MMSBM Simulation(alpha = 0.25)');\n", "\n", "alpha = [0.5]*k\n", "print(\"Then increase alpha to: \" + str(alpha))\n", "\n", "G_mmsbm_50 = mmsbm(n, p, alpha= alpha, rng = rng)\n", "heatmap(G_mmsbm_50, cbar= False, title ='MMSBM Simulation(alpha = 0.5)');\n", "\n", "alpha = [1.0]*k\n", "print(\"Finally increase alpha to: \" + str(alpha))\n", "\n", "G_mmsbm_100 = mmsbm(n, p, alpha= alpha, rng = rng)\n", "heatmap(G_mmsbm_100, cbar= False, title ='MMSBM Simulation(alpha = 1.0)');" ] }, { "source": [ "As expected, as we increase the value of $\\vec{\\alpha}$ we increase the mixed-membership allowed at each node so that the community structure is progressively lost." ], "cell_type": "markdown", "metadata": {} } ] }