Tutorial on Replica Trick

Author

Clark Miyamoto

Published

February 28, 2025

In physics, we want to compute the quenched free energy \(F\). Equivalently in bayesian machine learning / probablistic graphical models, we want to compute the evidence \(p(\mathcal D)\). \[\begin{align} - \beta F = \mathbb E[\log Z] \end{align}\] However in practice, it’s extrememly difficult to calculate this quantity. Here I’ll present Georgio Parisi’s method (for which he won the Nobel Prize) to compute such quantities: the replica trick.

“Quenched free energy” refers to an average over the paramters \(\theta\) in your Hamiltonian \(H_\theta\) (assuming \(\theta\) is sampled from some distribution). Note this is distinct from the thermodynamic average, which is where you compute derivatives on the partition function.

The trick can be summarized as applying the formula \[\begin{align} \mathbb E[\log Z] = \lim_{n \to 0} \frac{1}{n} \log \mathbb E[Z^n] \end{align}\] where one interprets \(Z^n\) to be \(n\) statistically independent replicas of the variables. From there, you non-rigoroulsy analytically continue \(n\) down to zero.

Random Energy Model

This section follows Information, Physics, and Computation by Mézard and Montanari, Chapter 8

Here we do a tutorial calculation of the random energy model. \[\begin{align} Z = \sum_{i=1}^{2^N} \exp(- \beta E_{i}) \end{align}\]

where \(E_i \sim \mathcal N(0, N/2)\) iid. Our replica partition function becomes \[\begin{align} Z^n & = \sum_{i_1, ..., i_n = 1}^{2^N} \exp(- \beta E_{i_1} - ... - \beta E_{i_n}) \end{align}\] The configuration of the replicated system is given by a tuple \((i_1, ..., i_n)\) where \(i_\alpha \in \{1, ..., 2^N\}\). So there are a total of \(2^{N^n}\) possible configurations. A convient way of re-expressing this is to insert the identity in the form of the indicator function \(1 = \sum_{a=1}^n \mathbb I(i_a = j)\) (this holds when \(j \in \{i_1, ..., i_n\}\)) \[\begin{align} Z^n & = \sum_{i_1, ..., i_n = 1}^{2^N} \exp \left (- \sum_{j=1}^{2^N} \beta E_j \sum_{a=1}^n \mathbb I(i_a = j)\right) \\ & = \sum_{i_1, ..., i_n = 1}^{2^N} \prod_{j=1}^{2^N} \exp \left (- \beta E_j \sum_{a=1}^n \mathbb I(i_a = j)\right) \end{align}\] where \(\mathbb I\) is the indicator function. We can now take the expectation value over \(E_i\), as we assumed iid, so it factorizes. \[\begin{align} \mathbb E[Z^n] = \sum_{i_1, ..., i_n = 1}^{2^N} \prod_{j=1}^{2^N} \exp \left (\frac{\beta^2N}{4} \left [\sum_{a=1}^n \mathbb I(i_a = j) \right]^2\right) \end{align}\] Note that \(\sum_{j=1}^{2^N}\left [\sum_{a=1}^n \mathbb I(i_a = j) \right]^2 = \sum_{j=1}^{2^N} \sum_{a,b=1}^n \mathbb I(i_a = j) \mathbb I(i_b = j) = \sum_{a,b=1}^n \mathbb I(i_a = i_b)\). \[\begin{align} \mathbb E[Z^n] = \sum_{i_1, ..., i_n = 1}^{2^N} \exp \left (\frac{\beta^2N}{4} \sum_{a,b=1}^n \mathbb I(i_a = i_b) \right) \end{align}\] If we interpret this directly, the energy of the entire replica system now looks like \(E_{i_1, ..., i_n} = \frac{-\beta N}{4} \sum_{a,b=1}^n \mathbb I(i_a = i_b)\). So after quenching the replicas “interact” (I precisely mean the energy does not factorize over addition \(E_{i_1, ..., i_n} \neq \sum_a E_{i_a}\)).

Now we introduce the overlap matrix \(Q_{ab} \equiv \mathbb I(i_a = i_b)\) where the entries \(\in \{0, 1\}\) (because they’re given by indicator function). The name matrix is a misnomer, this should really be a tensor, as we can iterate in \(i_a \in \{1, ..., 2^N\} ~ \forall a\). \[\begin{align} \mathbb E[Z^n] & = \sum_{i_1, ..., i_n = 1}^{2^N} \exp \left (\frac{\beta^2N}{4} \sum_{a,b=1}^n Q_{ab}(i_1, ..., i_n) \right)\\ & = \sum_{Q} \mathcal N_N(Q) \exp\left (\frac{\beta^2N}{4} \sum_{a,b=1}^n Q_{ab} \right) \end{align}\]

  • \(\mathcal N_N(Q)\) is the number of configurations (where \((i_1, ..., i_n)\) denotes the microstates, and \(Q\) is the macrostate generated by said microstates).
  • \(\sum_Q\) runs over all symmetric matrices with all ones on the diagonal, and the off diagonal only contains \(\{0 ,1\}\). There are \(n (n-1)/2\) degrees of freedom, hence there are \(2^{n (n-1)/2}\) configurations.

Because the number of configurations is exponential (in \(N\)), we can rewrite \(\mathcal N_N\) in large-deviation form. This is fancy words for we insert the identity in the form \(\exp \log (\cdot) = 1\). \[\begin{align} \mathcal N_N(Q) = \exp(\log (\mathcal N_N (Q)) \doteq \exp(N s(Q)) \end{align}\] where \(s(Q) = \frac{1}{N} \log(\mathcal N_N(Q))\) is the entropy density (we’ll always apply Stirling’s approximations to this thing),.

An abridged expression for our quenched partition fucntion is \(\mathbb E[Z^n] = \sum_{Q} \exp(N g(Q))\), where \(g(Q) \equiv s(Q) + \frac{\beta^2}{4}\sum_{a,b=1}^n Q_{ab}\). Now, our method of attack is to estimate the sum using saddle point. We must proceed carefully because sometimes the saddle point exhibits replica symmetry breaking!

Replica Symmetry Solution

Definition (Replica Symmetry (Breaking)): At the beginning of the calculation, we assumed replicas are statistically independent. So that means we can permute our replicas at no cost. Stated formally, let \(\pi\) is some permutation operator. When the dominant saddle point exhibits \[\begin{align} g(Q_{\pi(a), \pi(b)}) = g(Q) \end{align}\] the system has replica symmetry.

In the case where we have replica symmetry, this fixes \(Q\) as it’s entries should be invariant under permutation. \(Q_{aa} = \alpha\), and \(Q_{ab} = \beta\) for all combinations \(a\neq b\). \[\begin{align} Q = \begin{pmatrix} \alpha & \beta & \beta & \\ \beta & \alpha & \beta & ...\\ \beta & \beta & \alpha & \\ & \vdots & & \ddots \end{pmatrix} = (\alpha - \beta) \mathbb I + \beta ~ xx^T, && \text{where } x = (1,1,...) \end{align}\] Hence it should look like an identity plus a rank 1 correction.

However, by some law I saw in Interstelar (2014), if something can go wrong, it will go wrong. And in physics, it goes extra crazy. Replica symmetry can be spontaneously broken in the \(N \to \infty\) limit (analogous to the \(\mathbb Z_2\) symmetry breaking exhibits by the Curie-Weiss model). We’ll see this in action in the next section…

Going back to our problem, let’s analyze the entires of \(Q\) assuming replica symmetry. We already know the diagonal \(Q_{aa} = 1\), what about the off-diagonal. Well recall entries can only take in two values \(Q_{ab} \in \{0, 1\}\):

  • For \(Q_{ab} = 0\), we’ll denote these solutions as \(Q_{RS,0}\). Then \[\begin{align} \mathcal N_N(Q_{RS,0}) & = \prod_{\alpha=0}^{n-1} (2^N - \alpha) \\ s(Q_{RS,0}) & = \frac{1}{N} \sum_{\alpha =0}^{n-1} \log(2^N - \alpha) \approx \frac{1}{N} \sum_{\alpha = 0}^{n-1} \log(2^N) = n \log 2\\ g_0(\beta, n) \equiv g(Q_{RS,0}) & = n (\frac{\beta^2}{4} + \log 2) \end{align}\]
  • For \(Q_{ab} = 1\), we’ll denote these solutions as \(Q_{RS,1}\). Then \[\begin{align} \mathcal N_N(Q_{RS,1}) & = 2^N\\ s(Q_{RS,1}) & = \log 2\\ g_1(\beta, n) \equiv g(Q_{RS,1}) & = \frac{n^2 \beta^2}{4} + \log 2 \end{align}\]

Because we’re approximating this as a saddle point, our solution will be dictated by becomes \(\text{max}[g_0(\beta, n), g_1(\beta, n)]\). Interpretation-wise, we can say there’s a phase-transition between the \(g_0\) solution and the \(g_1\) solution, which is \(\beta_c(n) = \sqrt{4 \log 2 /n}\)

  • When \(n > 1\):
    • If \(\beta > \beta_c\) (low temperature phase), then \(g_1 > g_0\). The \(g_1\) solution dominates, all replicas are locked together.
    • If \(\beta < \beta_c\) (high temperature phase), then \(g_0 < g_1\). The \(g_0\) solution dominates, hence all replicas are independent.
  • When \(n < 1\), which is the limit we’ll eventually take, the phases flip (in temperature!)
    • If \(\beta > \beta_c\) (low temperature phase), then \(g_1 < g_0\).

    • If \(\beta < \beta_c\) (high temperature phase), then \(g_0 > g_1\).

      Replicas becomes independent at low temperatures, and correlated at high temperatures. Wait whaaaat? We’ve previously stated that they are correlated…

In the \(n< 1\) regime, replica trick requires us to use the minimum of \(g(Q)\)… There is no mathematical justification for this.

Let’s compute the free energy now \[\begin{align} - \beta f & \equiv \lim_{N \to \infty} \frac{1}{N} \mathbb E [\log Z] \\ & = \lim_{N \to \infty} \lim_{n \to 0} \frac{1}{N n} \log (\mathbb E[Z^n]) & \text{Replica trick formula}\\ & = \lim_{n \to 0} \lim_{N \to \infty} \frac{1}{N n} \log (\mathbb E[Z^n]) & \text{Interchange limits (non rigorous)}\\ & = \lim_{n \to 0} \frac{g_0(n, \beta)}{n} = \frac{\beta^2}{4} + \log 2 \end{align}\] We have recovered the low temperature phase \(\beta > \beta_c\), but it misses the existence of the phase transition. In general, the RS solution isn’t able to find the low temperature phase.

One-Step Replica-Symmetry Breaking (1RSB) Solution

To recover the lower temperature phase, perhaps the system breaks replica symmetry! So how do we find these new saddle points? We’ll use one-step replica symmetry break.

The idea is to suppose \(n\) is a multiple of \(x\), divide the \(n\) replicas into \(n/x\) groups of \(x\) elements, and now set \[\begin{align*} Q_{aa} &= \alpha_1 &\text{In this problem it is 1}\\ Q_{ab} & = \beta_1 & \text{If $a$ and $b$ are in the same group}\\ Q_{ab} & = \gamma_1 & \text{If $a$ and $b$ are NOT in the same group} \end{align*}\] Recall for for this problem, it’s a symmetric matrix with elements in \(\{0,1\}\). So \(Q_{aa} = 1\), and this ansatz is only different if we allow \(\beta_1 = 1\) and \(\gamma_1 = 0\).

So now our indices becomes \(i_1 = ... = i_x\), \(i_{x+1} = ... = i_{2x}\). The number of microstates \((i_1, ..., i_n)\) which satsify this becomes \[\begin{align} \mathcal N_N(Q) &= 2^N \times (2^N - 1) \times ... \times (2^N - n/x +1 )\\ s(Q) &= (n/x) \log 2\\ g_{1RSB}(\beta, n, x) & = n (\frac{\beta^2}{4} x + \frac{1}{x} \log 2) \end{align}\] So now we’ll minimize this (recall our discussion / example earlier). Minimizing w.r.t. \(x\) yields \[\begin{align} x_* = \frac{2 \sqrt{\log 2}}{\beta} = \frac{\beta_c}{\beta} \end{align}\] We can now compute our free energy \[\begin{align} - \beta f & = \lim_{N \to \infty} \frac{1}{N}\mathbb E[\log Z]\\ & = \lim_{N \to \infty} \lim_{n \to 0} \frac{1}{Nn} \log(\mathbb E[Z^n]) \\ & = \lim_{n \to 0}\lim_{N \to \infty}\frac{1}{Nn} \log(\mathbb E[Z^n])\\ & = \lim_{n \to 0} \min_x \frac{g_{1{RSB}}(\beta, n \, x)}{n} & \text{Exchanged limits}\\ & = \beta \sqrt{\log 2} \end{align}\] which is the correct result!


Appendix

Understanding minimization of \(g(\cdot)\) via toy model

In order to understand this oddity, we can examine a toy example. Consider the integral \[\begin{align} Z(n) = \int e^{-(N/2) \sum_{(ab)} Q_{ab}^2} \prod_{(ab)} dQ_{ab} = \left (\frac{2\pi}{N} \right)^{n(n-1)/4} \end{align}\] where \((ab)\) runs over all unordered pairs of indices \(a,b \in \{1,...,n\}\) with \(a\neq b\). For example: all unordered pairs of \(\{1,2,3,4\}\) is \(\{(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)\}\); meaning it’s equivalent to \(\sum_{i < j}\).

Anyways, in our case \(g(Q) = - \frac{1}{2} \sum_{a<b}Q^2_{ab}\). If we assume the saddle point is dominated by the replica-symmetric point \(Q^*_{ab} = q_0\) for \(a\neq b\). We get \[\begin{align} g(Q^*) = - n(n-1)q_0^2 /4 \end{align}\] To recover the \(Z(n=0) = 1\) limit, we should make \(q_0 = 0\). However, \(q_0 = 0\) corresponds to when \(g(Q^*, q_0 = 0)\) is minimized. Hmm…

Mezard & Montanari explain this by saying that the number of degrees of freedom become negative when \(n < 1\). But it isn’t clear to me if this reason generalizes, and what the implications are.

Saddle Point

The saddle point approximation is for estimating integrals of the form \[\begin{align} I = \int e^{-f(x)} dx \end{align}\] The idea is that the exponential will pick out the minimum of \(f(x)\) and exponentially kill other contributions. We can get an analytic result if we Taylor expand \(f(x)\) to second order around it’s minimum \(x_0\) \[\begin{align} f(x) \approx f(x_0) + \cancel{f'(x_0)} x_0 + \frac{1}{2} f''(x_0) (x - x_0)^2 + \mathcal O(x^3) \end{align}\] Plugging this into the integral, we get a gaussian \[\begin{align} I \approx e^{-f(x_0)} \int e^{-\frac{f''(x_0)}{2}(x-x_0)^2} dx = e^{-f(x_0)} \sqrt{\frac{2\pi}{f''(x_0)}} \end{align}\]

Gartner-Ellis Theorem

Definition (Exposed Point): \(x \in \mathbb R\) is consider an exposed point for function \(f: \mathbb R \to \mathbb R\), if \(\exists t\) s.t. \(\forall y \neq x\) the following holds \[\begin{align} t y - f(y) > t x - f(x) \end{align}\]