STA702
Duke University
Regression Model (Sampling model) \[\mathbf{Y}\mid \boldsymbol{\beta}, \phi \sim \textsf{N}(\mathbf{X}\boldsymbol{\beta}, \phi^{-1} \mathbf{I}_n) \]
Semi-Conjugate Prior Independent Normal Gamma \[\begin{align*} \boldsymbol{\beta}& \sim \textsf{N}(\mathbf{b}_0, \boldsymbol{\Phi}_0^{-1}) \\ \phi & \sim \textsf{Gamma}(\nu_0/2, \textsf{SS}_0/2) \end{align*}\]
Regression Model (Sampling model) \[\mathbf{Y}\mid \boldsymbol{\beta}, \phi \sim \textsf{N}(\mathbf{X}\boldsymbol{\beta}, \phi^{-1} \mathbf{I}_n) \]
Conjugate Normal-Gamma Model: factor joint prior \(p(\boldsymbol{\beta}, \phi ) = p(\boldsymbol{\beta}\mid \phi)p(\phi)\) \[\begin{align*} \boldsymbol{\beta}\mid \phi & \sim \textsf{N}(\mathbf{b}_0, \phi^{-1}\boldsymbol{\Phi}_0^{-1}) & p(\boldsymbol{\beta}\mid \phi) & = \frac{|\phi \boldsymbol{\Phi}_0|^{1/2}}{(2 \pi)^{p/2}}e^{\left\{- \frac{\phi}{2}(\boldsymbol{\beta}- \mathbf{b}_0)^T \boldsymbol{\Phi}_0 (\boldsymbol{\beta}- \mathbf{b}_0) \right\}}\\ \phi & \sim \textsf{Gamma}(v_0/2, \textsf{SS}_0/2) & p(\phi) & = \frac{1}{\Gamma{(\nu_0/2)}} \left(\frac{\textsf{SS}_0}{2} \right)^{\nu_0/2} \phi^{\nu_0/2 - 1} e^{- \phi \frac{\textsf{SS}_0}{2}}\\ \Rightarrow (\boldsymbol{\beta}, \phi) & \sim \textsf{NG}(\mathbf{b}_0, \boldsymbol{\Phi}_0, \nu_o, \textsf{SS}_0) \end{align*}\]
Normal-Gamma distribution indexed by 4 hyperparameters
Note Prior Covariance for \(\boldsymbol{\beta}\) is scaled by \(\sigma^2 = 1/\phi\)
Likelihood: \({\cal{L}}(\beta, \phi) \propto \phi^{n/2} e^{- \frac{\phi}{2} (\mathbf{Y}- \mathbf{X}\boldsymbol{\beta})^T(\mathbf{Y}- \mathbf{X}\boldsymbol{\beta})}\)
\[\begin{eqnarray*}
p(\boldsymbol{\beta}, \phi \mid \mathbf{Y}) &\propto& \phi^{\frac {n}{2}}
e^{- \frac \phi 2 (\mathbf{Y}- \mathbf{X}\boldsymbol{\beta})^T(\mathbf{Y}- \mathbf{X}\boldsymbol{\beta}) } \times \\
& & \phi^{\frac{\nu_0}{2} - 1} e^{- \phi \frac{\textsf{SS}_0}{2} }\times
\phi^{\frac{p}{2}} e^{- \frac{\phi}{2} (\boldsymbol{\beta}- \mathbf{b}_0)^T \boldsymbol{\Phi}_0 (\boldsymbol{\beta}- \mathbf{b}_0) }
\end{eqnarray*}\]
Quadratic in Exponential \[\exp\left\{- \frac{\phi}{2} (\boldsymbol{\beta}- \mathbf{b})^T \boldsymbol{\Phi}(\boldsymbol{\beta}- \mathbf{b}) \right\} = \exp\left\{- \frac{\phi}{2} (\boldsymbol{\beta}^T \boldsymbol{\Phi}\boldsymbol{\beta}- 2 \boldsymbol{\beta}^T \boldsymbol{\Phi}\mathbf{b}+ \mathbf{b}^T\boldsymbol{\Phi}\mathbf{b})\right\}\]
\[\begin{eqnarray*} p(\boldsymbol{\beta}, \phi \mid \mathbf{Y}) &\propto& \phi^{\frac {n}{2}} e^{- \frac \phi 2 ( \mathbf{Y}^T\mathbf{Y}- 2 \boldsymbol{\beta}^T \mathbf{X}^T \mathbf{Y}+ \boldsymbol{\beta}^T \mathbf{X}^T \mathbf{X}\boldsymbol{\beta})} \times \\ & & \phi^{\frac{\nu_0}{2} - 1} e^{- \phi \frac{\textsf{SS}_0}{2} }\times \phi^{\frac{p}{2}} e^{- \frac{\phi}{2} (\boldsymbol{\beta}\boldsymbol{\Phi}_0\boldsymbol{\beta}- 2 \boldsymbol{\beta}^T \boldsymbol{\Phi}_0 \mathbf{b}+ \mathbf{b}_0^T \boldsymbol{\Phi}_0 \mathbf{b}_0) } \end{eqnarray*}\]
\[\begin{eqnarray*} p(\boldsymbol{\beta}, \phi \mid \mathbf{Y}) &\propto& \phi^{\frac {n + p + \nu_0}{ 2} - 1} e^{- \frac \phi 2 (\textsf{SS}_0 + \mathbf{Y}^T\mathbf{Y}+ \mathbf{b}_0^T \boldsymbol{\Phi}_0 \mathbf{b}_0) } \times \\ & & e^{-\frac{\phi}{2} (\boldsymbol{\beta}^T(\mathbf{X}^T\mathbf{X})\boldsymbol{\beta}-2 \boldsymbol{\beta}^T\textcolor{red}{\mathbf{X}^T\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}}\mathbf{X}^T\mathbf{Y}+ \boldsymbol{\beta}\boldsymbol{\Phi}_0\boldsymbol{\beta}- 2 \boldsymbol{\beta}^T \boldsymbol{\Phi}_0 \mathbf{b}) } \end{eqnarray*}\]
\[\begin{eqnarray*} & = & \phi^{\frac {n + p + \nu_0}{ 2} - 1} e^{- \frac \phi 2 (\textsf{SS}_0 + \mathbf{Y}^T\mathbf{Y}+ \mathbf{b}_0^T \boldsymbol{\Phi}_0 \mathbf{b}_0)} \times \\ & & e^{ -\frac{\phi}{2} \left( \boldsymbol{\beta}^T (\mathbf{X}^T\mathbf{X}+ \boldsymbol{\Phi}_0) \boldsymbol{\beta}\right) } \times \\ & & e^{ -\frac{\phi}{2} \left( -2 \boldsymbol{\beta}^T (\mathbf{X}^T\mathbf{X}\textcolor{red}{\hat{\boldsymbol{\beta}}} + \boldsymbol{\Phi}_0 \mathbf{b}_0) \right)} \end{eqnarray*}\]
\[\begin{eqnarray*} p(\boldsymbol{\beta}, \phi \mid \mathbf{Y}) &\propto& \phi^{\frac {n + p + \nu_0}{ 2} - 1} e^{- \frac \phi 2 (\textsf{SS}_0 + \mathbf{Y}^T\mathbf{Y}+ \mathbf{b}_0^T \boldsymbol{\Phi}_0 \mathbf{b}_0 )} \times \\ & & e^{ -\frac{\phi}{2} \left( \boldsymbol{\beta}^T \textcolor{\red}{(\mathbf{X}^T\mathbf{X}+ \boldsymbol{\Phi}_0)} \boldsymbol{\beta} \right) } \times \qquad \qquad \qquad \qquad \boldsymbol{\Phi}_n \equiv \mathbf{X}^T\mathbf{X}+ \boldsymbol{\Phi}_0 \\ & & e^{ -\frac{\phi}{2} \left( -2 \boldsymbol{\beta}^T \textcolor{red}{\boldsymbol{\Phi}_n \boldsymbol{\Phi}_n^{-1}} (\mathbf{X}^T\mathbf{X}\hat{\boldsymbol{\beta}}+ \boldsymbol{\Phi}_0 \mathbf{b}_0) \right)} \times \qquad \qquad \mathbf{b}_n \equiv \boldsymbol{\Phi}_n^{-1} (\mathbf{X}^T\mathbf{X}\hat{\boldsymbol{\beta}}+ \boldsymbol{\Phi}_0 \mathbf{b}_0) \\ & & e^{ -\frac{\phi}{2} ( \textcolor{red}{\mathbf{b}_ n^T \boldsymbol{\Phi}_n \mathbf{b}_n - \mathbf{b}_n^T \boldsymbol{\Phi}_n \mathbf{b}_n}) } \end{eqnarray*}\]
\[\begin{eqnarray*} & = & \phi^{\frac {n + \nu_0}{ 2} - 1} e^{- \frac \phi 2 ( \textsf{SS}_0 + \mathbf{Y}^T\mathbf{Y}+ \mathbf{b}_0^T \boldsymbol{\Phi}_0 \mathbf{b}_0 - \mathbf{b}_n^T \boldsymbol{\Phi}_n \mathbf{b}_n)} \times \\ & & \textcolor{red}{\phi^{\frac p 2}} e^{ -\frac{\phi}{2} \left( (\boldsymbol{\beta}^T - \mathbf{b}_n)^T \boldsymbol{\Phi}_n (\boldsymbol{\beta}- \mathbf{b}_n) \right) } \end{eqnarray*}\]
\[\begin{eqnarray*} & \propto & \phi^{\frac {n + \nu_0}{ 2} - 1} e^{- \frac \phi 2 ( \textsf{SS}_0 + \mathbf{Y}^T\mathbf{Y}+ \mathbf{b}_0^T \boldsymbol{\Phi}_0 \mathbf{b}_0 - \mathbf{b}_n^T \boldsymbol{\Phi}_n \mathbf{b}_n)} \times \\ & & \textcolor{red}{|\phi \Phi_n |^{\frac 1 2}} e^{ -\frac{\phi}{2} \left( (\boldsymbol{\beta}^T - \mathbf{b}_n)^T \boldsymbol{\Phi}_n (\boldsymbol{\beta}- \mathbf{b}_n) \right) } \end{eqnarray*}\]
Posterior density (up to normalizing contants) \(p(\boldsymbol{\beta}, \phi \mid \mathbf{Y}) = p(\phi \mid \mathbf{Y}) p(\boldsymbol{\beta}\mid \phi \mathbf{Y})\) \[\begin{eqnarray*} p(\phi \mid \mathbf{Y}) p(\boldsymbol{\beta}\mid \phi \mathbf{Y}) & \propto & \phi^{\frac {n + \nu_0}{ 2} - 1} e^{- \frac \phi 2 ( \textsf{SS}_0 + \mathbf{Y}^T\mathbf{Y}+ \mathbf{b}_0^T \boldsymbol{\Phi}_0 \mathbf{b}_0 - \mathbf{b}_n^T \boldsymbol{\Phi}_n \mathbf{b}_n)} \times \\ & & (2 \pi)^{- \frac p 2} |\phi \Phi_n |^{\frac 1 2}e^{- \frac{\phi}{2} (\boldsymbol{\beta}- \mathbf{b}_n)^T \boldsymbol{\Phi}_n (\boldsymbol{\beta}- \mathbf{b}_n) } \end{eqnarray*}\]
Marginal \[\begin{eqnarray*} p(\phi \mid \mathbf{Y}) & \propto & \phi^{\frac {n + \nu_0}{ 2} - 1} e^{- \frac \phi 2 ( \textsf{SS}_0 + \mathbf{Y}^T\mathbf{Y}+ \mathbf{b}_0^T \boldsymbol{\Phi}_0 \mathbf{b}_0 - \mathbf{b}_n^T \boldsymbol{\Phi}_n \mathbf{b}_n)} \times \\ & & \int_{\mathbb{R}^p} (2 \pi)^{- \frac p 2} |\phi \Phi_n |^{\frac 1 2}e^{- \frac{\phi}{2} (\boldsymbol{\beta}- \mathbf{b}_n)^T \boldsymbol{\Phi}_n (\boldsymbol{\beta}- \mathbf{b}_n) \ d\boldsymbol{\beta}} \\ & = & \phi^{\frac {n + \nu_0}{ 2} - 1} e^{- \frac \phi 2 ( \textsf{SS}_0 + \mathbf{Y}^T\mathbf{Y}+ \mathbf{b}_0^T \boldsymbol{\Phi}_0 \mathbf{b}_0 - \mathbf{b}_n^T \boldsymbol{\Phi}_n \mathbf{b}_n)} \end{eqnarray*}\]
Conditional Normal for \(\boldsymbol{\beta}\mid \phi, \mathbf{Y}\) and marginal Gamma for \(\phi \mid \mathbf{Y}\)
No need for Gibbs sampling!
\[\begin{eqnarray*} \boldsymbol{\beta}\mid \phi, \mathbf{Y}& \sim &\textsf{N}(\mathbf{b}_n, (\phi \boldsymbol{\Phi}_n)^{-1}) \\ \phi \mid \mathbf{Y}&\sim &\textsf{Gamma}(\frac{\nu_n}{2}, \frac{\textsf{SS}_n}{2}) \\ (\boldsymbol{\beta}, \phi) \mid \mathbf{Y}& \sim & \textsf{NG}(\mathbf{b}_n, \boldsymbol{\Phi}_n, \nu_n, \textsf{SS}_n) \end{eqnarray*}\]
Hyperparameters: \[\begin{align*} \Phi_n & = \mathbf{X}^T\mathbf{X}+ \boldsymbol{\Phi}_0 & \quad \mathbf{b}_n & = \boldsymbol{\Phi}_n^{-1} (\mathbf{X}^T\mathbf{X}\hat{\boldsymbol{\beta}}+ \boldsymbol{\Phi}_0 \mathbf{b}_0) \\ \nu_n & = n + \nu_0 & \quad \textsf{SS}_n & = \textsf{SS}_0 + \mathbf{Y}^T\mathbf{Y}+ \mathbf{b}_0^T \boldsymbol{\Phi}_0 \mathbf{b}_0 - \mathbf{b}_n^T \boldsymbol{\Phi}_n \mathbf{b}_n \end{align*} \]
\[\begin{align*} \textsf{SS}_n & = \textsf{SS}_0 + \| \mathbf{Y}- \mathbf{X}\mathbf{b}_n \|^2 + (\mathbf{b}_0 - \mathbf{b}_n)^T \boldsymbol{\Phi}_0 (\mathbf{b}_0 - \mathbf{b}_n) \\ & = \textsf{SS}_0 + \| \mathbf{Y}- \mathbf{X}\mathbf{b}_n \|^2 + \| \mathbf{b}_0 - \mathbf{b}_n \|^2_{\boldsymbol{\Phi}_0} \end{align*}\]
Inner product induced by prior precision \(\langle u, v \rangle_A \equiv u^TAv\)
\(\| \mathbf{b}_0 - \mathbf{b}_n \|^2_{\boldsymbol{\Phi}_0}\) mismatch of prior and posterior mean under prior
Then \(\boldsymbol{\theta}\) \((p \times 1)\) has a \(p\) dimensional multivariate \(t\) distribution \[\boldsymbol{\theta}\sim t_\nu( m, {\hat{\sigma}}^2\boldsymbol{\Sigma})\] with location \(m\), scale matrix \({\hat{\sigma}}^2\boldsymbol{\Sigma}\) and density
\[p(\boldsymbol{\theta}) \propto \left[ 1 + \frac{1}{\nu} \frac{ (\boldsymbol{\theta}- m)^T \boldsymbol{\Sigma}^{-1} (\boldsymbol{\theta}- m)}{{\hat{\sigma}}^2} \right]^{- \frac{p + \nu}{2}}\]
Note - true for prior or posterior given \(\mathbf{Y}\)
Marginal density \(p(\boldsymbol{\theta}) = \int_0^\infty p(\boldsymbol{\theta}\mid \phi) p(\phi) \, d\phi\)
\[\begin{eqnarray*} p(\boldsymbol{\theta}) & \propto & \int | \boldsymbol{\Sigma}/\phi|^{-1/2} e^{- \frac{\phi}{2} (\boldsymbol{\theta}- m)^T \boldsymbol{\Sigma}^{-1} (\boldsymbol{\theta}- m)} \phi^{\nu/2 - 1} e^{- \phi \frac{\nu {\hat{\sigma}}^2}{2}}\, d \phi \\ & \propto & \int \phi^{p/2} \phi^{\nu/2 - 1} e^{- \phi \frac{(\boldsymbol{\theta}- m)^T \boldsymbol{\Sigma}^{-1} (\boldsymbol{\theta}- m)+ \nu {\hat{\sigma}}^2}{2}}\, d \phi \\ & \propto & \int \phi^{\frac{p +\nu}{2} - 1} e^{- \phi \frac{(\boldsymbol{\theta}- m)^T \boldsymbol{\Sigma}^{-1} (\boldsymbol{\theta}- m)+ \nu {\hat{\sigma}}^2}{2}} \, d \phi \\ & = & \Gamma((p + \nu)/2 ) \left( \frac{(\boldsymbol{\theta}- m)^T \boldsymbol{\Sigma}^{-1} (\boldsymbol{\theta}- m)+ \nu {\hat{\sigma}}^2}{2} \right)^{- \frac{p + \nu}{2}} \\ & \propto & \left( (\boldsymbol{\theta}- m)^T \boldsymbol{\Sigma}^{-1} (\boldsymbol{\theta}- m)+ \nu {\hat{\sigma}}^2\right)^{- \frac{p + \nu}{2}} \\ & \propto & \left( 1 + \frac{1}{\nu} \frac{(\boldsymbol{\theta}- m)^T \boldsymbol{\Sigma}^{-1} (\boldsymbol{\theta}- m)}{{\hat{\sigma}}^2} \right)^{- \frac{p + \nu}{2}} \end{eqnarray*}\]
\[\begin{eqnarray*} \boldsymbol{\beta}\mid \phi, \mathbf{Y}& \sim & \textsf{N}( \mathbf{b}_n, \phi^{-1} \boldsymbol{\Phi}_n^{-1}) \\ \phi \mid \mathbf{Y}& \sim & \textsf{Gamma}\left(\frac{\nu_n}{2}, \frac{\textsf{SS}_n}{ 2} \right) \end{eqnarray*}\]
Let \({\hat{\sigma}}^2= \textsf{SS}_n/\nu_n\) (Bayesian MSE)
The marginal posterior distribution of \(\boldsymbol{\beta}\) is multivariate Student-t \[ \boldsymbol{\beta}\mid \mathbf{Y}\sim t_{\nu_n} (\mathbf{b}_n, {\hat{\sigma}}^2\boldsymbol{\Phi}_n^{-1}) \]
Any linear combination \(\lambda^T\boldsymbol{\beta}\) has a univariate \(t\) distribution with \(\mathbf{v}_n\) degrees of freedom \[\lambda^T\boldsymbol{\beta}\mid \mathbf{Y}\sim t_{\nu_n} (\lambda^T\mathbf{b}_n, {\hat{\sigma}}^2\lambda^T\Phi_n^{-1}\lambda)\]
use for individual \(\boldsymbol{\beta}_j\), the mean of \(Y\), \(\mathbf{x}^T \boldsymbol{\beta}\), at \(\mathbf{x}\), or predictions \(Y^* = {\mathbf{x}^*}^T \boldsymbol{\beta}+ \epsilon_i^*\)
Suppose \(\mathbf{Y}^* \mid \boldsymbol{\beta}, \phi \sim \textsf{N}_s(\mathbf{X}^* \boldsymbol{\beta}, \mathbf{I}_s/\phi)\) and is conditionally independent of \(\mathbf{Y}\) given \(\boldsymbol{\beta}\) and \(\phi\)
What is the predictive distribution of \(\mathbf{Y}^* \mid \mathbf{Y}\)?
Use the representation that \(\mathbf{Y}^* \mathrel{\mathop{=}\limits^{\rm D}}\mathbf{X}^* \boldsymbol{\beta}+ \boldsymbol{\epsilon}^*\) and \(\boldsymbol{\epsilon}^*\) is independent of \(\mathbf{Y}\) given \(\phi\)
\[\begin{eqnarray*} \mathbf{X}^* \boldsymbol{\beta}+ \boldsymbol{\epsilon}^* \mid \phi, \mathbf{Y}& \sim & \textsf{N}(\mathbf{X}^*\mathbf{b}_n, (\mathbf{X}^{*} \boldsymbol{\Phi}_n^{-1} \mathbf{X}^{*T} + \mathbf{I}_s)/\phi) \\ \mathbf{Y}^* \mid \phi, \mathbf{Y}& \sim & \textsf{N}(\mathbf{X}^*\mathbf{b}_n, (\mathbf{X}^{*} \Phi_n^{-1} \mathbf{X}^{*T} + \mathbf{I}_s)/\phi) \\ \phi \mid \mathbf{Y}& \sim & \textsf{Gamma}\left(\frac{\nu_n}{2}, \frac{{\hat{\sigma}}^2\nu_n}{ 2} \right) \end{eqnarray*}\]
need to specify Normal prior mean \(\mathbf{b}_0\) and precision \(\boldsymbol{\Phi}_0\)
need to specify Gamma shape (\(\nu_o\) prior df) and rate (estimate of \(\sigma^2\))
hard in higher dimensions!
default choices?
Jeffreys prior is invariant to model parameterization of \(\boldsymbol{\theta}= (\boldsymbol{\beta},\phi)\) \[p(\boldsymbol{\theta}) \propto |{\cal{I}}(\boldsymbol{\theta})|^{1/2}\]
\({\cal{I}}(\boldsymbol{\theta})\) is the Expected Fisher Information matrix \[{\cal{I}}(\theta) = - \textsf{E}[ \left[ \frac{\partial^2 \log({\cal{L}}(\boldsymbol{\theta}))}{\partial \theta_i \partial \theta_j} \right] ]\]
log likelihood expressed as function of sufficient statistics
\[\log({\cal{L}}(\boldsymbol{\beta}, \phi)) = \frac{n}{2} \log(\phi) - \frac{\phi}{2} \| (\mathbf{I}_n - \mathbf{P}_\mathbf{x}) \mathbf{Y}\|^2 - \frac{\phi}{2}(\boldsymbol{\beta}- \hat{\boldsymbol{\beta}})^T(\mathbf{X}^T\mathbf{X})(\boldsymbol{\beta}- \hat{\boldsymbol{\beta}})\]
\[\begin{eqnarray*} \frac{\partial^2 \log {\cal{L}}} { \partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} & = & \left[ \begin{array}{cc} -\phi (\mathbf{X}^T\mathbf{X}) & -(\mathbf{X}^T\mathbf{X}) (\boldsymbol{\beta}- \hat{\boldsymbol{\beta}}) \\ - (\boldsymbol{\beta}- \hat{\boldsymbol{\beta}})^T (\mathbf{X}^T\mathbf{X}) & -\frac{n}{2} \frac{1}{\phi^2} \\ \end{array} \right] \\ \textsf{E}[\frac{\partial^2 \log {\cal{L}}} { \partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}] & = & \left[ \begin{array}{cc} -\phi (\mathbf{X}^T\mathbf{X}) & \mathbf{0}_p \\ \mathbf{0}_p^T & -\frac{n}{2} \frac{1}{\phi^2} \\ \end{array} \right] \\ & & \\ {\cal{I}}((\boldsymbol{\beta}, \phi)^T) & = & \left[ \begin{array}{cc} \phi (\mathbf{X}^T\mathbf{X}) & \mathbf{0}_p \\ \mathbf{0}_p^T & \frac{n}{2} \frac{1}{\phi^2} \end{array} \right] \end{eqnarray*}\]
Jeffreys’ Prior (don’t use!) \[\begin{eqnarray*} p_J(\boldsymbol{\beta}, \phi) & \propto & |{\cal{I}}((\boldsymbol{\beta}, \phi)^T) |^{1/2} = |\phi \mathbf{X}^T\mathbf{X}|^{1/2} \left(\frac{n}{2} \frac{1}{\phi^2} \right)^{1/2} \propto \phi^{p/2 - 1} |\mathbf{X}^T\mathbf{X}|^{1/2} \\ & \propto & \phi^{p/2 - 1} \end{eqnarray*}\]
\[ {\cal{I}}((\boldsymbol{\beta}, \phi)^T) = \left[ \begin{array}{cc} \phi (\mathbf{X}^T\mathbf{X}) & \mathbf{0}_p \\ \mathbf{0}_p^T & \frac{n}{2} \frac{1}{\phi^2} \end{array} \right] \]
\[\begin{align*} p_{IJ}(\boldsymbol{\beta}) & \propto |\phi \mathbf{X}^T\mathbf{X}|^{1/2} \propto 1 \\ p_{IJ}(\phi) & \propto \phi^{-1} \\ p_{IJ}(\beta, \phi) & \propto p_{IJ}(\boldsymbol{\beta}) p_{IJ}(\phi) = \phi^{-1} \end{align*}\]
Use Independent Jeffreys Prior \(p_{IJ}(\beta, \phi) \propto p_{IJ}(\boldsymbol{\beta}) p_{IJ}(\phi) = \phi^{-1}\)
Formal Posterior Distribution \[\begin{eqnarray*} \boldsymbol{\beta}\mid \phi, \mathbf{Y}& \sim & \textsf{N}(\hat{\boldsymbol{\beta}}, (\mathbf{X}^T\mathbf{X})^{-1} \phi^{-1}) \\ \phi \mid \mathbf{Y}& \sim& \textsf{Gamma}((n-p)/2, \| \mathbf{Y}- \mathbf{X}\hat{\boldsymbol{\beta}}\|^2/2) \\ \boldsymbol{\beta}\mid \mathbf{Y}& \sim & t_{n-p}(\hat{\boldsymbol{\beta}}, {\hat{\sigma}}^2(\mathbf{X}^T\mathbf{X})^{-1}) \end{eqnarray*}\]
Bayesian Credible Sets \(p(\boldsymbol{\beta}\in C_\alpha) \mid \mathbf{Y}) = 1- \alpha\) correspond to frequentist Confidence Regions \[\frac{\mathbf{x}^T\boldsymbol{\beta}- \mathbf{x}^T \hat{\boldsymbol{\beta}}} {\sqrt{{\hat{\sigma}}^2\mathbf{x}^T(\mathbf{X}^T\mathbf{X})^{-1} \mathbf{x}} }\sim t_{n-p}\]
conditional on \(\mathbf{Y}\) for Bayes and conditional on \(\boldsymbol{\beta}\) for frequentist
the model in vector form \(Y \mid \beta, \phi \sim \textsf{N}_n (X\beta, \phi^{-1} I_n)\)
What if we transform the mean \(X\beta = X H H^{-1} \beta\) with new \(X\) matrix \(\tilde{X} = X H\) where \(H\) is \(p \times p\) and invertible and coefficients \(\tilde{\beta} = H^{-1} \beta\).
obtain the posterior for \(\tilde{\beta}\) using \(Y\) and \(\tilde{X}\)
\[ Y \mid \tilde{\beta}, \phi \sim \textsf{N}_n (\tilde{X}\tilde{\beta}, \phi^{-1} I_n)\]
since \(\tilde{X} \tilde{\beta} = X H \tilde{\beta} = X \beta\) invariance suggests that the posterior for \(\beta\) and \(H \tilde{\beta}\) should be the same
plus the posterior of \(H^{-1} \beta\) and \(\tilde{\beta}\) should be the same
Exercise for the Energetic Student
With some linear algebra, show that this is true for a normal prior if \(b_0 = 0\) and \(\Phi_0\) is \(k X^TX\) for some \(k\)
Popular choice is to take \(k = \phi/g\) which is a special case of Zellner’s g-prior \[\beta \mid \phi, g \sim \textsf{N}\left(0, \frac{g}{\phi} (X^TX)^{-1}\right)\]
Full conditional \[\beta \mid \phi, g \sim \textsf{N}\left(\frac{g}{1 + g} \hat{\beta}, \frac{1}{\phi} \frac{g}{1 + g} (X^TX)^{-1}\right)\]
one parameter \(g\) controls shrinkage
if \(\phi \sim \textsf{Gamma}(v_0/2, s_0/2)\) then posterior is \[\phi \mid y_1, \ldots, y_n \sim \textsf{Gamma}(v_n/2, s_n/2)\]
Conjugate so we could skip Gibbs sampling and sample directly from gamma and then conditional normal!
If \(X^TX\) is nearly singular, certain elements of \(\beta\) or (linear combinations of \(\beta\)) may have huge variances under the \(g\)-prior (or flat prior) as the MLEs are highly unstable!
Ridge regression protects against the explosion of variances and ill-conditioning with the conjugate priors: \[\beta \mid \phi \sim \textsf{N}(0, \frac{1}{\phi \lambda} I_p)\]
Posterior for \(\beta\) (conjugate case) \[\beta \mid \phi, \lambda, y_1, \ldots, y_n \sim \textsf{N}\left((\lambda I_p + X^TX)^{-1} X^T Y, \frac{1}{\phi}(\lambda I_p + X^TX)^{-1} \right)\]
Posterior mean (or mode) given \(\lambda\) is biased, but can show that there always is a value of \(\lambda\) where the frequentist’s expected squared error loss is smaller for the Ridge estimator than MLE!
related to penalized maximum likelihood estimation
Choice of \(\lambda\)
Bayes Regression and choice of \(\Phi_0\) in general is a very important problem and provides the foundation for many variations on shrinkage estimators, variable selection, hierarchical models, nonparameteric regression and more!
Be sure that you can derive the full conditional posteriors for \(\beta\) and \(\phi\) as well as the joint posterior in the conjugate case!