Correlated Random Dot Product Graph (RDPG) Graph Pair

[1]:
from graspologic.simulations.rdpg_corr import rdpg_corr
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set_context('talk')
/home/runner/.cache/pypoetry/virtualenvs/graspologic-pkHfzCJ8-py3.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

RDPG is a latent position generative model. An explanation of the uncorrelated model is in the tutorial.

Here, we want to generate a pair of graphs with the same latent positions but with correlation between edges.

There are several parameters in this function: \(X\) and \(Y\) are the input matrices (latent positions) which are used to generate the probability matrix; \(r\) is the correlation between the graph pair, which should be (-1,1) (note that not all values of r may be possible for a given set of latent positions).

Below, we sample a RDPG graph pair (undirected and no self-loops), G1 and G2, with the following parameters: \begin{align*} n &= [50, 50]\\ r &= 0.5 \end{align*}

[2]:
np.random.seed(1234)
X = np.array([[0.5, 0.2, 0.2]] * 50 + [[0.1, 0.1, 0.1]] * 50)
Y = None
r = 0.3

G1, G2 = rdpg_corr(X, Y, r, rescale=False, directed=False, loops=False)
[3]:
X @ X.T
[3]:
array([[0.33, 0.33, 0.33, ..., 0.09, 0.09, 0.09],
       [0.33, 0.33, 0.33, ..., 0.09, 0.09, 0.09],
       [0.33, 0.33, 0.33, ..., 0.09, 0.09, 0.09],
       ...,
       [0.09, 0.09, 0.09, ..., 0.03, 0.03, 0.03],
       [0.09, 0.09, 0.09, ..., 0.03, 0.03, 0.03],
       [0.09, 0.09, 0.09, ..., 0.03, 0.03, 0.03]])

Visualize the graphs using heatmap

Here, we define difference rate to be the number of edges between the two graphs which are not the same (exist or not exist) out of all potential edges (roughly \(n^2\))

[4]:
from graspologic.plot import heatmap

fig, axs = plt.subplots(1, 3, figsize=(10, 5))
heatmap(G1, ax=axs[0], cbar=False, title = 'Corr. RDPG 1')
heatmap(G2, ax=axs[1], cbar=False, title = 'Corr. RDPG 2')
heatmap(G1-G2, ax=axs[2], cbar=False, title='diff(G1-G2)')
ndim=G1.shape[0]
print("Difference rate is ", np.sum(abs(G1-G2))/(ndim*ndim))
Difference rate is  0.1438
../../_images/tutorials_simulations_rdpg_corr_6_1.png

Compare it to the correlated SBM graph pair

Below, we sample a two-block SBM graph pair (undirected and no self-loops) G1 and G2 with the following parameters:

\begin{align*} n &= [50, 50]\\ p &= \begin{bmatrix} 0.33 & 0.09\\ 0.09 & 0.03 \end{bmatrix}\\ r &= 0.5 \end{align*}

This happens to be the SBM formulation of the same model framed as an RDPG above. Let’s see the difference between the correlated RDPG and correlated SBM graph pairs.

[5]:
from graspologic.simulations import sbm_corr

np.random.seed(123)
directed = False
loops = False
n_per_block = 50
n_blocks = 2
block_members = np.array(n_blocks * [n_per_block])
n_verts = block_members.sum()
rho = .3
block_probs = np.array([[0.33, 0.09], [0.09, 0.03]])

A1, A2 = sbm_corr(block_members, block_probs, rho, directed=directed, loops=loops)
fig, axs = plt.subplots(1, 3, figsize=(10, 5))
heatmap(A1, ax=axs[0], cbar=False, title="Corr. SBM 1")
heatmap(A2, ax=axs[1], cbar=False, title="Corr. SBM 2")
heatmap(A1 - A2, ax=axs[2], cbar=False, title="Diff (G1 - G2)")

ndim=G1.shape[0]
print("Difference rate with sbm_corr function is ", np.sum(abs(A1-A2))/(ndim*ndim))
Difference rate with sbm_corr function is  0.145
../../_images/tutorials_simulations_rdpg_corr_9_1.png

We can see the difference between G1 and G2 with both functions are similar.

Varying the correlation

We change the correlation between the graph pairs from -0.5 to 0.9 and see the difference between graph 1 and graph 2:

[6]:
X = np.random.dirichlet([10, 10], size=100)
Y = None

np.random.seed(12345)
r = -0.5

G1, G2 = rdpg_corr(X, Y, r, rescale=False, directed=False, loops=False)


fig, axs = plt.subplots(1, 3, figsize=(10, 5))
heatmap(G1, ax=axs[0], cbar=False, title="Corr. RDPG 1")
heatmap(G2, ax=axs[1], cbar=False, title="Corr. RDPG 2")
heatmap(G1 - G2, ax=axs[2], cbar=False, title="Diff (G1 - G2)")
ndim=G1.shape[0]
print("Difference rate when correlation = -0.5 is ", np.sum(abs(G1-G2))/(ndim*ndim))
Difference rate when correlation = -0.5 is  0.7332
../../_images/tutorials_simulations_rdpg_corr_13_1.png
[7]:
np.random.seed(12345)
r = 0.3

G1, G2 = rdpg_corr(X, Y, r, rescale=False, directed=False, loops=False)


fig, axs = plt.subplots(1, 3, figsize=(10, 5))
heatmap(G1, ax=axs[0], cbar=False, title="Corr. RDPG 1")
heatmap(G2, ax=axs[1], cbar=False, title="Corr. RDPG 2")
heatmap(G1 - G2, ax=axs[2], cbar=False, title="Diff (G1 - G2)")
ndim=G1.shape[0]
print("Difference rate when correlation =0.3 is ", np.sum(abs(G1-G2))/(ndim*ndim))
Difference rate when correlation =0.3 is  0.3544
../../_images/tutorials_simulations_rdpg_corr_14_1.png
[8]:
np.random.seed(12345)
r = 0.9

G1, G2 = rdpg_corr(X, Y, r, rescale=False, directed=False, loops=False)


fig, axs = plt.subplots(1, 3, figsize=(10, 5))
heatmap(G1, ax=axs[0], cbar=False, title="Corr. RDPG 1")
heatmap(G2, ax=axs[1], cbar=False, title="Corr. RDPG 2")
heatmap(G1 - G2, ax=axs[2], cbar=False, title="Diff (G1 - G2)")
ndim=G1.shape[0]
print("Difference rate when correlation =0.9 is ", np.sum(abs(G1-G2))/(ndim*ndim))
Difference rate when correlation =0.9 is  0.0508
../../_images/tutorials_simulations_rdpg_corr_15_1.png

We calculate the difference rate between graph 1 and graph 2 with different correlation ranging from -0.5 to 0.9, and show them in a scatter plot:

[9]:
np.random.seed(12345)
X = np.random.dirichlet([10, 10], size=100)
Y = None
rlist=[]
for i in range(-5,10):
    g1,g2 = rdpg_corr(X, Y, i/10, rescale=False, directed=False, loops=False)
    ndim=g1.shape[0]
    rate=np.sum(abs(g1-g2))/(ndim*ndim)
    rlist.append(rate)
[10]:
x_list = np.linspace(-0.5,0.9,15)
plt.plot(x_list,rlist,'o-')
plt.xlabel("Correlation")
_ = plt.ylabel('Difference rate')
../../_images/tutorials_simulations_rdpg_corr_18_0.png

We can see that the difference rate goes down as the correlation grows.