/
The Horseshoe Estimator for Sparse Signals The Horseshoe Estimator for Sparse Signals

The Horseshoe Estimator for Sparse Signals - PowerPoint Presentation

lois-ondreau
lois-ondreau . @lois-ondreau
Follow
346 views
Uploaded On 2018-10-26

The Horseshoe Estimator for Sparse Signals - PPT Presentation

Reading Group Presenter Zhen Hu Cognitive Radio Institute Friday October 08 2010 Authors Carlos M Carvalho Nicholas G Polson and James G Scott Outline Introduction Robust Shrinkage of Sparse Signals ID: 697875

signals horseshoe sparse estimator horseshoe signals estimator sparse prior noise model signal mixture priors bayes function large posterior experiment

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "The Horseshoe Estimator for Sparse Signa..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

The Horseshoe Estimator for Sparse Signals

Reading Group Presenter:Zhen HuCognitive Radio InstituteFriday, October 08, 2010

Authors: Carlos M.

Carvalho

, Nicholas G. Polson and James G. ScottSlide2

Outline

IntroductionRobust Shrinkage of Sparse SignalsComparison with other Bayes estimatorsEstimation RiskClassification RiskEstimation of

Hyperparameters

Final RemarksSlide3

Introduction

This paper proposes a new approach to sparse signal detection called the horseshoe estimator.The advantage of the horseshoe estimator is its robustness at handling unknown scarcity and large outlying signals.The horseshoe estimator is a good default for robust estimation of sparse normal means along with a new representation theorem for

the posterior mean

under normal scale mixtures.Slide4

The Proposed Estimator

Suppose we observe a p-dimensional vector and wish to estimate under quadratic loss. Suppose that is believed to be sparse, in the sense that many of its entries are zero, or nearly so.Our proposed estimator arises as the posterior mean under an exchangeable model that we call the horseshoe prior. This model has an interpretation as a scale mixture of

normals

:

A standard half-Cauchy distribution on the positive

realsSlide5

The Proposed Estimator

Observe the difference from a typical model involving a common variance component , each is mixed over its own and each has an independent half-Cauchy prior. This places the horseshoe in the widely studied class of multivariate scale mixtures of normals. Slide6

Horseshoe Prior

The name horseshoe prior arises from the observation thatWhere assuming Slide7

The Implied PriorsSlide8

Why Horseshoe

This density function resembles a horseshoe.Slide9

Strengths of the Horseshoe Estimator

It is highly adaptive both to unknown sparsity and to unknown signal-to-noise ratio.It is robust to large, outlying signals. It exhibits a strong form of multiplicity control by limiting the number of spurious signals.The horseshoe shares one of the most appealing features of Bayesian and empirical-

Bayes

model-selection techniques: after a simple

thresholding

rule is applied, the horseshoe exhibits an automatic penalty for multiple hypothesis testing.Slide10

The Horseshoe Density Function

The density is not expressible in closed form, but very tight upper and lower bounds in terms of elementary functions are available. Slide11

ComparisonSlide12

Its Interesting Features

It is symmetric about zero.It has heavy, Cauchy like tails that decay like .It has an infinitely tall spike at 0, in the sense that the density approaches infinity logarithmically fast as from either side. This paper is primarily about why is a good default shrinkage prior for sparse signals.The prior’s flat tails allow each to be large if the data warrant such a conclusion, and yet its infinitely tall spike at zero means that the estimate can also be quite severely shrunk. Slide13

Robust Shrinkage of Sparse Signals

A representation of the posterior mean.A normal likelihood of known varianceThe prior for the mean The marginal densityFor one sample of Slide14
Slide15

Why Robust by Horseshoe Prior

Its Cauchy-like tails ensure a redescending score function, in the venerable tradition of robust Bayes estimators involving heavy-tailed priors. Since , then .. This is essentially uniform, leading to having small derivative in a larger neighborhood near the origin than other priors. Slide16

The Horseshoe Score Function

The following results speak to the horseshoe’s robustness to large outlying signals.Using the previously quoted identities, it is easy to show that for fixed , the difference between and is bounded for all . The horseshoe prior is therefore of bounded influence, with the bound decaying to zero independent of for large .Slide17

Comparison with Other Bayes Estimators

The advantage of the horseshoe estimator can now be state directly. In sparse situations, posterior learning allows most noise observations to be shrunk very near zero. Yet this small value of will not inhibit the estimation of large obvious signals due to a redescending score function.The common used double-exponential priorSlide18

A Comparison of Score Function for

Small value of lead to strong shrinkage near the origin, just as under the horseshoe. Slide19

A Comparison of the Posterior MeanSlide20

An Illustrative Example

Two standard normal observations were simulated for each of 1000 means: 10 signals of mean 10, 90 signals of mean 2and 900 noise of mean 0. Two models were then fit to this data: one that used independent horseshoe priors and one that used independent double-exponential priors.Slide21

The double-exponential prior tends to shrink small observations not enough, and the larger observations too much.Slide22

Estimation Risk

To assess the risk properties of the horseshoe prior under both squared-error and absolute-error loss. Three alternatives: the double exponential model; full Bayesian and empirical-Bayes version of the discrete-mixture model with Strawderman-Berger Priors.

Experiment 1: strongly sparse signals

Strongly sparse signals are versions in which some of the components are identically zero.

Experiment 2: Weakly sparse signals

A vector is considered weakly if none of its components are identically zero, but its component

nontheless

follow some kind of power law.Slide23

Results in Experiment 1Slide24

Results in Experiment 2Slide25

Comments

The double-exponential systematically loses out to the horseshoe, and to both versions of the discrete mixture rule under both squared-error and absolute-error loss.It is equally interesting that no meaningful systematic edges could be found for any of the other three approaches. All three models have heavy tails and all three models can shrink y arbitrarily close to zero.Slide26

Classification Risk

We now describe a simple thresholding rule for the horseshoe estimator that can yield accurate decisions about whether each is signal or noise.The horseshoe estimator yields significance weights withBy analogy with the decision rule one would apply to the discrete mixture under 0-1 loss function, one possible threshold is to call a signal if the horseshoe yields , and to call it noise otherwise. Slide27

Experiment 3: Fixed-k

asymptoticsUnder fixed-k asymptotics, the number of true signals remains fixed, while the number of noise observations grows without bound. We study this asymptotic regime by fixing 10 true signals that are repeatedly tested in the face of an increasingly large number of noise observation.Experiment 4: Ideal signal-recovery

asymptoticsSlide28

ResultsSlide29

Comments

The weak signal should rapidly be overwhelmed by noise, while the strong signal should remain significant. This is precisely what happens under both the discrete mixture and the horseshoe prior, whose significance weights coincide almost perfectly as n grows. Comprehensive results are given in the paper. The horseshoe significance weights are a good stand-in for the posterior inclusion probabilities under the discrete mixture. They lead to nearly identifical numerical summaries of the strength of evidence in the data and nearly

identifical

classifications of signal versus noise Slide30

Estimation of Hyperparameters

One possibility is to proceed with a fully Bayesian solution by placing priors upon model hyperparameters.A second possibility is to estimate and along with w if the discrete mixture model is being used by empirical Bayes.

A third possibility is cross-validation a common approach when fitting double exponential priors

to regression models.Slide31

Final Remarks

We do not claim that the horseshoe is a panacea for sparse problem, merely a good default option.It is both surprising and interesting that its answers coincide so closely with the answers from the two-group gold standard of Bayesian mixture model. The horseshoe estimator arrives at a good multiple testing procedure by way of a shrinkage rule.Its combination of strong global shrinkage through along with robust local adaptation to signals through the is unmatched by other common

Bayes

rules using scale mixtures.Slide32

Thank yOu!