# Quasi-Bayesian Inference for Grouped Panels

** Last Update:**

### Why Quasi-Bayesian Clustering?

There are two popular methods to recover latent group structure:

**Frequentist clustering** is given by

- $\overline{L}(\boldsymbol{\beta},\boldsymbol{\gamma})$ is a sample loss function, for example, least squares
- loss function specification is flexible and robust to model misspecification, compared to full data likelihood
- but measuring uncertainty in \(\boldsymbol{\gamma}\) is difficult

**Bayesian clustering** is characterized by the posterior ^{1}

- \(\pi(\boldsymbol{W}\mid\boldsymbol{\beta},\boldsymbol{\gamma})\) is the likelihood and \(\pi(\boldsymbol{\beta},\boldsymbol{\gamma})\) a prior, for example, mixture prior on \(\boldsymbol{\gamma}\) and normal prior on \(\boldsymbol{\beta}\)
- straightforward to quantify uncertainty in the group structure \(\boldsymbol{\gamma}\)
- but specifying the full data distribution, i.e., the likelihood, can be difficult

The idea of **quasi-Bayesian clustering** is to combine the loss component and the prior component, leading to the **quasi-posterior**:

Here, we have an additional **learning rate parameter** $\psi$, which controls the relative weight on the loss component. In plain words, it determines how much we learn from the data (relative to prior knowledge).

### Learning Rate Calibration

One key feature of the quasi-Bayesian clustering framework is the learning rate calibration. To see how this learning rate can improve inference, I show below the quasi-posterior densities of two bootstrapped samples, under different learning rates.

- When learning rate is large (say, $\psi=3.0$), the quasi-posterior puts a large weight on the loss component, and thus suffer from selection bias (as shown by different posterior modes) while the variance is relative low (the posterior densities are quite concentrated)
- As the learning rate gets smaller, the two posteriors get closer and closer—the posterior modes are closer to the true, at the expense of larger variance (the quasi-posteriors are more dispersed)
- Ideally, we would calibrate the learning rate to a point where bootstrapped samples give identical posterior distributions (as the bottom right panel $\psi=0.05$ shows), while keeping the variance as small as possible

Another likelihood-based clustering method is finite mixture models, usually estimated by efficient EM algorithm. Like Bayesian methods, it requires the likelihood to be correctly specified. ↩