-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathuser_guide.html
772 lines (735 loc) · 73.6 KB
/
user_guide.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>3. User guide — pypmc 1.2.2 documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="_static/plot_directive.css" type="text/css" />
<!--[if lt IE 9]>
<script src="_static/js/html5shiv.min.js"></script>
<![endif]-->
<script src="_static/jquery.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script type="text/x-mathjax-config">MathJax.Hub.Config({"extensions": ["tex2jax.js"], "jax": ["input/TeX", "output/HTML-CSS"], "TeX": {"Macros": {"vecgamma": "{\\vec \\gamma}", "vecth": "{\\vec \\theta}"}}})</script>
<script async="async" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
<script src="_static/js/theme.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="4. Examples" href="examples.html" />
<link rel="prev" title="2. Installation" href="installation.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="index.html" class="icon icon-home">
pypmc
</a>
<div class="version">
1.2.2
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="introduction.html">1. Overview</a></li>
<li class="toctree-l1"><a class="reference internal" href="installation.html">2. Installation</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">3. User guide</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#densities">3.1. Densities</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#component-density">3.1.1. Component density</a></li>
<li class="toctree-l3"><a class="reference internal" href="#mixture-density">3.1.2. Mixture density</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#indicator-function">3.2. Indicator function</a></li>
<li class="toctree-l2"><a class="reference internal" href="#markov-chain">3.3. Markov chain</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#initialization">3.3.1. Initialization</a></li>
<li class="toctree-l3"><a class="reference internal" href="#adaptation">3.3.2. Adaptation</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#importance-sampling">3.4. Importance sampling</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#standard">3.4.1. Standard</a></li>
<li class="toctree-l3"><a class="reference internal" href="#deterministic-mixture">3.4.2. Deterministic mixture</a></li>
<li class="toctree-l3"><a class="reference internal" href="#comparison">3.4.3. Comparison</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#pmc">3.5. PMC</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#basic-approach">3.5.1. Basic approach</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id6">3.5.2. Student’s t</a></li>
<li class="toctree-l3"><a class="reference internal" href="#pmc-with-multiple-em-steps">3.5.3. PMC with multiple EM steps</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#variational-bayes">3.6. Variational Bayes</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#classic-version">3.6.1. Classic version</a><ul>
<li class="toctree-l4"><a class="reference internal" href="#id13">3.6.1.1. Initialization</a></li>
<li class="toctree-l4"><a class="reference internal" href="#running">3.6.1.2. Running</a></li>
<li class="toctree-l4"><a class="reference internal" href="#results">3.6.1.3. Results</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#mixture-reduction">3.7. Mixture reduction</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#hierarchical-clustering">3.7.1. Hierarchical clustering</a></li>
<li class="toctree-l3"><a class="reference internal" href="#vbmerge">3.7.2. VBmerge</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#putting-it-all-together">3.8. Putting it all together</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="examples.html">4. Examples</a></li>
<li class="toctree-l1"><a class="reference internal" href="references.html">5. References</a></li>
<li class="toctree-l1"><a class="reference internal" href="api.html">6. Reference Guide</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">pypmc</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html" class="icon icon-home" aria-label="Home"></a></li>
<li class="breadcrumb-item active"><span class="section-number">3. </span>User guide</li>
<li class="wy-breadcrumbs-aside">
<a href="_sources/user_guide.rst.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="user-guide">
<h1><span class="section-number">3. </span>User guide<a class="headerlink" href="#user-guide" title="Permalink to this heading"></a></h1>
<section id="densities">
<h2><span class="section-number">3.1. </span>Densities<a class="headerlink" href="#densities" title="Permalink to this heading"></a></h2>
<p>Pypmc revolves around adapting mixture densities of the form</p>
<div class="math notranslate nohighlight">
\[q(x) = \sum_{j=1}^K \alpha_j q_j(x), \: \sum_{j=1}^K \alpha_j = 1\]</div>
<p>where each component <span class="math notranslate nohighlight">\(q_j\)</span> is either a <a class="reference external" href="https://en.wikipedia.org/wiki/Normal_distribution">Gaussian</a></p>
<div class="math notranslate nohighlight">
\[q_j(x) = \mathcal{N}(x | \mu_j, \Sigma_j)\]</div>
<p>or a <a class="reference external" href="https://en.wikipedia.org/wiki/Student%27s_t-distribution">Student’s t</a> distribution</p>
<div class="math notranslate nohighlight">
\[q_j(x) = \mathcal{T}(x | \mu_j, \Sigma_j, \nu) \,.\]</div>
<p>The free parameters of the mixture density, <span class="math notranslate nohighlight">\(\theta\)</span>, are the component weights
<span class="math notranslate nohighlight">\(\alpha_j\)</span>, the means <span class="math notranslate nohighlight">\(\mu_j\)</span>, the covariances
<span class="math notranslate nohighlight">\(\Sigma_j\)</span> and in case <span class="math notranslate nohighlight">\(q_j = \mathcal{T}\)</span> the degree of
freedom <span class="math notranslate nohighlight">\(\nu_j\)</span> for <span class="math notranslate nohighlight">\(j=1 \dots K\)</span>.</p>
<section id="component-density">
<h3><span class="section-number">3.1.1. </span>Component density<a class="headerlink" href="#component-density" title="Permalink to this heading"></a></h3>
<p>The two densities — Gauss and Student’s t — supported by pypmc
come in two variants whose methods have identical names but differ in
their arguments. The standard classes are
<a class="reference internal" href="api.html#pypmc.density.gauss.Gauss" title="pypmc.density.gauss.Gauss"><code class="xref py py-class docutils literal notranslate"><span class="pre">Gauss</span></code></a> and
<a class="reference internal" href="api.html#pypmc.density.student_t.StudentT" title="pypmc.density.student_t.StudentT"><code class="xref py py-class docutils literal notranslate"><span class="pre">StudentT</span></code></a>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">mean</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="n">sigma</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([[</span> <span class="mf">0.1</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.001</span><span class="p">],</span>
<span class="p">[</span><span class="o">-</span><span class="mf">0.001</span><span class="p">,</span> <span class="mf">0.02</span><span class="p">]])</span>
<span class="n">dof</span> <span class="o">=</span> <span class="mf">5.</span>
<span class="n">gauss</span> <span class="o">=</span> <span class="n">pypmc</span><span class="o">.</span><span class="n">density</span><span class="o">.</span><span class="n">gauss</span><span class="o">.</span><span class="n">Gauss</span><span class="p">(</span><span class="n">mean</span><span class="p">,</span> <span class="n">sigma</span><span class="p">)</span>
<span class="n">student_t</span> <span class="o">=</span> <span class="n">pypmc</span><span class="o">.</span><span class="n">density</span><span class="o">.</span><span class="n">student_t</span><span class="o">.</span><span class="n">StudentT</span><span class="p">(</span><span class="n">mean</span><span class="p">,</span> <span class="n">sigma</span><span class="p">,</span> <span class="n">dof</span><span class="p">)</span>
<span class="c1"># density at (1, 1)</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">gauss</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="mi">2</span><span class="p">))</span>
<span class="c1"># draw sample from density</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">gauss</span><span class="o">.</span><span class="n">propose</span><span class="p">()</span>
</pre></div>
</div>
<p>For the use as proposal densities in Markov chains (see below), there
are also <em>local</em> variants whose mean can be varied in each call to
<code class="docutils literal notranslate"><span class="pre">evaluate</span></code> or <code class="docutils literal notranslate"><span class="pre">propose</span></code>. <a class="reference internal" href="api.html#pypmc.density.gauss.LocalGauss" title="pypmc.density.gauss.LocalGauss"><code class="xref py py-class docutils literal notranslate"><span class="pre">LocalGauss</span></code></a>
and <a class="reference internal" href="api.html#pypmc.density.student_t.LocalStudentT" title="pypmc.density.student_t.LocalStudentT"><code class="xref py py-class docutils literal notranslate"><span class="pre">LocalStudentT</span></code></a> don’t take the
<code class="docutils literal notranslate"><span class="pre">mean</span></code> argument in the constructor. To reproduce the previous
results, one would do:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">local_gauss</span> <span class="o">=</span> <span class="n">pypmc</span><span class="o">.</span><span class="n">density</span><span class="o">.</span><span class="n">gauss</span><span class="o">.</span><span class="n">LocalGauss</span><span class="p">(</span><span class="n">sigma</span><span class="p">)</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">local_gauss</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">mean</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="mi">2</span><span class="p">))</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">local_gauss</span><span class="o">.</span><span class="n">propose</span><span class="p">(</span><span class="n">mean</span><span class="p">)</span>
</pre></div>
</div>
</section>
<section id="mixture-density">
<h3><span class="section-number">3.1.2. </span>Mixture density<a class="headerlink" href="#mixture-density" title="Permalink to this heading"></a></h3>
<p>Mixture densities are represented by
<a class="reference internal" href="api.html#pypmc.density.mixture.MixtureDensity" title="pypmc.density.mixture.MixtureDensity"><code class="xref py py-class docutils literal notranslate"><span class="pre">MixtureDensity</span></code></a>. One can create a
mixture density from a list of arbitrary component densities and
weights. However, usually all one needs are the convenience shortcuts
to create mixtures of Gaussians and Student’s from means and
covariances. For example, for two components with weight 60% and 40%:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pypmc.density.mixture</span> <span class="kn">import</span> <span class="n">create_gaussian_mixture</span><span class="p">,</span> <span class="n">create_t_mixture</span>
<span class="n">weights</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mf">0.6</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">])</span>
<span class="n">means</span> <span class="o">=</span> <span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">D</span><span class="p">),</span> <span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="n">D</span><span class="p">)]</span>
<span class="n">covariances</span> <span class="o">=</span> <span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">eye</span><span class="p">(</span><span class="n">D</span><span class="p">),</span> <span class="n">np</span><span class="o">.</span><span class="n">eye</span><span class="p">(</span><span class="n">D</span><span class="p">)]</span>
<span class="n">gauss_mixture</span> <span class="o">=</span> <span class="n">create_gaussian_mixture</span><span class="p">(</span><span class="n">means</span><span class="p">,</span> <span class="n">covariances</span><span class="p">,</span> <span class="n">weights</span><span class="p">)</span>
<span class="n">dofs</span> <span class="o">=</span> <span class="p">[</span><span class="mi">13</span><span class="p">,</span> <span class="mi">17</span><span class="p">]</span>
<span class="n">mixture</span> <span class="o">=</span> <span class="n">create_t_mixture</span><span class="p">(</span><span class="n">means</span><span class="p">,</span> <span class="n">covariances</span><span class="p">,</span> <span class="n">dofs</span><span class="p">,</span> <span class="n">weights</span><span class="p">)</span>
</pre></div>
</div>
<p>The most common interaction pattern with a mixture density requires
only a few attributes and methods:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">gauss_mixture</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">D</span><span class="p">))</span>
<span class="n">samples</span> <span class="o">=</span> <span class="n">gauss_mixture</span><span class="o">.</span><span class="n">propose</span><span class="p">(</span><span class="n">N</span><span class="o">=</span><span class="mi">500</span><span class="p">)</span>
<span class="n">D</span> <span class="o">=</span> <span class="n">gauss_mixture</span><span class="o">.</span><span class="n">dim</span>
<span class="n">first_component</span> <span class="o">=</span> <span class="n">gauss_mixture</span><span class="o">.</span><span class="n">components</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">second_weight</span> <span class="o">=</span> <span class="n">gauss_mixture</span><span class="o">.</span><span class="n">weight</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
</pre></div>
</div>
</section>
</section>
<section id="indicator-function">
<span id="indicator"></span><h2><span class="section-number">3.2. </span>Indicator function<a class="headerlink" href="#indicator-function" title="Permalink to this heading"></a></h2>
<p>The indicator function <span class="math notranslate nohighlight">\(\mathbf{1}_V\)</span> can be used to limit the
support of the target density to the volume <span class="math notranslate nohighlight">\(V\)</span> in the samplers
discussed below. It is defined as</p>
<div class="math notranslate nohighlight">
\[\begin{split}\mathbf{1}_{V} (x) =
\begin{cases}
1, x \in V \\
0, {\rm else}
\end{cases}\end{split}\]</div>
<p>The <a class="reference internal" href="api.html#module-pypmc.tools.indicator" title="pypmc.tools.indicator"><code class="xref py py-mod docutils literal notranslate"><span class="pre">indicator</span></code></a> module provides indicator
functions for a <a class="reference internal" href="api.html#pypmc.tools.indicator.ball" title="pypmc.tools.indicator.ball"><code class="xref py py-func docutils literal notranslate"><span class="pre">ball</span></code></a> and a
<a class="reference internal" href="api.html#pypmc.tools.indicator.hyperrectangle" title="pypmc.tools.indicator.hyperrectangle"><code class="xref py py-func docutils literal notranslate"><span class="pre">hyperrectangle</span></code></a> in <span class="math notranslate nohighlight">\(D\)</span> dimensions.
The indicator function can be merged with the (unbounded) target
density such that the wrapper calls the target density only if the
parameter vector is in V and returns <span class="math notranslate nohighlight">\(\log(0)= -\infty\)</span> otherwise:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pypmc.tools.indicator</span> <span class="kn">import</span> \
<span class="n">merge_function_with_indicator</span>
<span class="k">def</span> <span class="nf">target_density</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="c1"># define unbounded density on log scale</span>
<span class="c1"># define indicator</span>
<span class="n">ind_lower</span> <span class="o">=</span> <span class="p">[</span><span class="n">p</span><span class="o">.</span><span class="n">range_min</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">priors</span><span class="p">]</span>
<span class="n">ind_upper</span> <span class="o">=</span> <span class="p">[</span><span class="n">p</span><span class="o">.</span><span class="n">range_max</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">priors</span><span class="p">]</span>
<span class="n">ind</span> <span class="o">=</span> <span class="n">pypmc</span><span class="o">.</span><span class="n">tools</span><span class="o">.</span><span class="n">indicator</span><span class="o">.</span><span class="n">hyperrectangle</span><span class="p">(</span><span class="n">ind_lower</span><span class="p">,</span> <span class="n">ind_upper</span><span class="p">)</span>
<span class="c1"># merge with indicator</span>
<span class="n">log_target</span> <span class="o">=</span> <span class="n">merge_function_with_indicator</span><span class="p">(</span><span class="n">target_density</span><span class="p">,</span> <span class="n">ind</span><span class="p">,</span> <span class="o">-</span><span class="n">np</span><span class="o">.</span><span class="n">inf</span><span class="p">)</span>
</pre></div>
</div>
</section>
<section id="markov-chain">
<h2><span class="section-number">3.3. </span>Markov chain<a class="headerlink" href="#markov-chain" title="Permalink to this heading"></a></h2>
<section id="initialization">
<h3><span class="section-number">3.3.1. </span>Initialization<a class="headerlink" href="#initialization" title="Permalink to this heading"></a></h3>
<p>We provide a generic implementation of adaptive local-random-walk MCMC
<a class="reference internal" href="references.html#hst01" id="id1"><span>[HST01]</span></a> featuring Gauss and Student’s t local proposals. To create a
<a class="reference internal" href="api.html#pypmc.sampler.markov_chain.MarkovChain" title="pypmc.sampler.markov_chain.MarkovChain"><code class="xref py py-class docutils literal notranslate"><span class="pre">MarkovChain</span></code></a>, one needs three ingredients:</p>
<ol class="arabic simple">
<li><p>Evaluate the target density on the log scale.</p></li>
<li><p>A local proposal density.</p></li>
<li><p>A valid initial point.</p></li>
</ol>
<p>For example:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pypmc.density.student_t.LocalStudentT</span>
<span class="kn">import</span> <span class="nn">pypmc.sampler.markov_chain.AdaptiveMarkovChain</span>
<span class="c1"># unit gaussian, unnormalized</span>
<span class="k">def</span> <span class="nf">log_target</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="k">return</span> <span class="o">-</span><span class="mf">0.5</span> <span class="o">*</span> <span class="n">x</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">prop</span> <span class="o">=</span> <span class="n">LocalStudentT</span><span class="p">(</span><span class="n">prop_sigma</span><span class="p">,</span> <span class="n">prop_dof</span><span class="p">)</span>
<span class="n">start</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="o">-</span><span class="mf">2.</span><span class="p">,</span> <span class="mf">10.</span><span class="p">])</span>
<span class="n">mc</span> <span class="o">=</span> <span class="n">AdaptiveMarkovChain</span><span class="p">(</span><span class="n">log_target</span><span class="p">,</span> <span class="n">prop</span><span class="p">,</span> <span class="n">start</span><span class="p">)</span>
</pre></div>
</div>
<p>The initial proposal covariance should be chosen similar to the
target’s covariance, but scaled to yield an acceptance rate in the
range of 20%. For a Gaussian target and a Gaussian proposal in
<span class="math notranslate nohighlight">\(D\)</span> dimensions, the scaling should be <span class="math notranslate nohighlight">\(2.38^2/D\)</span>.</p>
<p>In order to constrain the support of the target in a simple way, one
can pass an <a class="reference internal" href="api.html#module-pypmc.tools.indicator" title="pypmc.tools.indicator"><code class="xref py py-class docutils literal notranslate"><span class="pre">indicator</span></code></a> function to the
constructor using the keyword argument <code class="docutils literal notranslate"><span class="pre">ind=indicator</span></code>. Then any
proposed point is first checked to lie in the support; i.e., if
<code class="docutils literal notranslate"><span class="pre">indicator(x)</span> <span class="pre">==</span> <span class="pre">True</span></code>. Only then is the target density called. This
leads to significant speed-ups if the mass of the target density is
close to a boundary, and its evaluation is slow.</p>
</section>
<section id="adaptation">
<h3><span class="section-number">3.3.2. </span>Adaptation<a class="headerlink" href="#adaptation" title="Permalink to this heading"></a></h3>
<p>The prototypical use is to run the chain for a number of iterations
until it finds the bulk of the distribution, and to discard these
samples as burn-in or warm-up. Then the samples can be used to tune
the proposal covariance:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">mc</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="mi">10</span><span class="o">**</span><span class="mi">4</span><span class="p">)</span>
<span class="n">mc</span><span class="o">.</span><span class="n">clear</span><span class="p">()</span>
<span class="c1"># run 100,000 steps adapting the proposal every 500 steps</span>
<span class="c1"># hereby save the accept count which is returned by mc.run</span>
<span class="n">accept_count</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">200</span><span class="p">):</span>
<span class="n">accept_count</span> <span class="o">+=</span> <span class="n">mc</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="mi">500</span><span class="p">)</span>
<span class="n">mc</span><span class="o">.</span><span class="n">adapt</span><span class="p">()</span>
</pre></div>
</div>
<p>Note that the proposal can be tuned continously so the Markov property
is lost but the samples are still asymptotically distributed according
to the target; i.e., there is no need to fix the proposal to generate
valid samples.</p>
<p>The parameters like the desired minimum and maximum acceptance rate
can be set via
<code class="xref py py-meth docutils literal notranslate"><span class="pre">set_adapt_params</span></code>.</p>
</section>
</section>
<section id="importance-sampling">
<h2><span class="section-number">3.4. </span>Importance sampling<a class="headerlink" href="#importance-sampling" title="Permalink to this heading"></a></h2>
<section id="standard">
<h3><span class="section-number">3.4.1. </span>Standard<a class="headerlink" href="#standard" title="Permalink to this heading"></a></h3>
<p>The standard
<a class="reference internal" href="api.html#pypmc.sampler.importance_sampling.ImportanceSampler" title="pypmc.sampler.importance_sampling.ImportanceSampler"><code class="xref py py-class docutils literal notranslate"><span class="pre">ImportanceSampler</span></code></a>
implements serial importance sampling to compute the expectation of
some function <span class="math notranslate nohighlight">\(f\)</span> under the target <span class="math notranslate nohighlight">\(P\)</span> as</p>
<div class="math notranslate nohighlight" id="fundamental-is">
\[E_P[f] = \int dx P(x) f(x) \approx \frac{1}{N} \sum_{i=1}^N P(x_i) / q(x_i) f(x_i)=\frac{1}{N} \sum_{i=1}^N w_i f(x_i), x \sim q,\]</div>
<p>where <span class="math notranslate nohighlight">\(w_i\)</span> is the importance weight and <span class="math notranslate nohighlight">\(q\)</span> is the
proposal density.</p>
<p>To start, one only needs the target density <span class="math notranslate nohighlight">\(P\)</span> defined by a
function that computes <span class="math notranslate nohighlight">\(log(P(x))\)</span> for an input vector
<span class="math notranslate nohighlight">\(x\)</span>, and similarly for <span class="math notranslate nohighlight">\(q\)</span>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pypmc.sampler.importance_sampling.ImportanceSampler</span>
<span class="n">sampler</span> <span class="o">=</span> <span class="n">ImportanceSampler</span><span class="p">(</span><span class="n">log_target</span><span class="p">,</span> <span class="n">log_proposal</span><span class="p">)</span>
</pre></div>
</div>
<p>Optionally, the <code class="docutils literal notranslate"><span class="pre">sampler</span></code> accepts an <a class="reference internal" href="api.html#module-pypmc.tools.indicator" title="pypmc.tools.indicator"><code class="xref py py-class docutils literal notranslate"><span class="pre">indicator</span></code></a>;
see <a class="reference internal" href="#indicator"><span class="std std-ref">Indicator function</span></a>. What to do with <code class="docutils literal notranslate"><span class="pre">sampler</span></code>? Run it:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">sampler</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">N</span><span class="o">=</span><span class="mi">500</span><span class="p">)</span>
</pre></div>
</div>
<p>to draw 500 samples. If the proposal is a
<a class="reference internal" href="api.html#pypmc.density.mixture.MixtureDensity" title="pypmc.density.mixture.MixtureDensity"><code class="xref py py-class docutils literal notranslate"><span class="pre">MixtureDensity</span></code></a> and the option
<code class="docutils literal notranslate"><span class="pre">trace_sort=True</span></code>, then <code class="docutils literal notranslate"><span class="pre">run</span></code> returns the generating component for
each sample.</p>
<p>The samples and weights are stored in two
<code class="xref py py-attr docutils literal notranslate"><span class="pre">history</span></code>
objects:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">samples</span> <span class="o">=</span> <span class="n">sampler</span><span class="o">.</span><span class="n">samples</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">weights</span> <span class="o">=</span> <span class="n">sampler</span><span class="o">.</span><span class="n">weights</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
</pre></div>
</div>
<p>Note that a <a class="reference internal" href="api.html#pypmc.tools.History" title="pypmc.tools.History"><code class="xref py py-class docutils literal notranslate"><span class="pre">History</span></code></a> object can contain the output
of several runs, the last one is available as <code class="docutils literal notranslate"><span class="pre">history[-1]</span></code>.</p>
<p>The samples are ordered according to the generating component if
<cite>trace_sort=True</cite>.</p>
</section>
<section id="deterministic-mixture">
<h3><span class="section-number">3.4.2. </span>Deterministic mixture<a class="headerlink" href="#deterministic-mixture" title="Permalink to this heading"></a></h3>
<p>If weighted samples from the same target but different proposal
densities are available, the weights can be combined in a clever way
as though they were drawn from the mixture of individual proposals
<a class="reference internal" href="references.html#cor-12" id="id2"><span>[Cor+12]</span></a>. This preserves the unbiasedness of the <a class="reference internal" href="#fundamental-is"><span class="std std-ref">fundamental
estimate of importance sampling</span></a>. The motivation to
combine multiple proposals is to improve the variance of the estimator
by reducing the effect of <cite>outliers</cite>; i.e., samples with very large
weights in the tails of <span class="math notranslate nohighlight">\(q\)</span>. For proposals <span class="math notranslate nohighlight">\(\{q_l:
l=1 \dots T\}\)</span> and <span class="math notranslate nohighlight">\(N_l\)</span> available samples per proposal, the
combined importance weight of sample <span class="math notranslate nohighlight">\(x\)</span> becomes</p>
<div class="math notranslate nohighlight">
\[\frac{P(x)}{\frac{1}{\sum_{k=0}^T N_k} \sum_{l=0}^T N_l q_l(x)}\]</div>
<p>The function
<a class="reference internal" href="api.html#pypmc.sampler.importance_sampling.combine_weights" title="pypmc.sampler.importance_sampling.combine_weights"><code class="xref py py-class docutils literal notranslate"><span class="pre">combine_weights</span></code></a> takes the
samples and regular importance weights as lists of arrays and the
proposals as a list and returns the combined weights as
<a class="reference internal" href="api.html#pypmc.tools.History" title="pypmc.tools.History"><code class="xref py py-class docutils literal notranslate"><span class="pre">History</span></code></a> object such that the weights for each
proposal are easily accessible.</p>
</section>
<section id="comparison">
<h3><span class="section-number">3.4.3. </span>Comparison<a class="headerlink" href="#comparison" title="Permalink to this heading"></a></h3>
<p>Compared to the regular
<a class="reference internal" href="api.html#pypmc.sampler.importance_sampling.ImportanceSampler" title="pypmc.sampler.importance_sampling.ImportanceSampler"><code class="xref py py-class docutils literal notranslate"><span class="pre">ImportanceSampler</span></code></a>,
<code class="xref py py-class docutils literal notranslate"><span class="pre">combined_weights</span></code> requires
more memory and slightly more cpu, but usually increases the relative
effective sample size, and in most cases significantly increases the
total effective sample size compared to throwing away samples from all
but the last run. If the samples are all drawn from the <cite>same</cite>
proposal, then both samplers yield identical results.</p>
</section>
</section>
<section id="pmc">
<h2><span class="section-number">3.5. </span>PMC<a class="headerlink" href="#pmc" title="Permalink to this heading"></a></h2>
<p><em>Population Monte Carlo</em> <a class="reference internal" href="references.html#cap-08" id="id3"><span>[Cap+08]</span></a> is a class of algorithms designed
to approximate the target density by a mixture density. The basic idea
is to minimize the Kullback-Leibler divergence between the target and
the mixture by optimizing the mixture parameters. The expectation
values taken over the unknown target distribution are approximated by
importance sampling using samples from the proposal mixture; the set
of samples is the <em>population</em>. The algorithm is a form of expectation
maximization (EM) and yields the optimal values of the parameters of a
Gaussian or Student’s t mixture density. The crucial task (more on
this below) is to supply a good initial proposal.</p>
<section id="basic-approach">
<h3><span class="section-number">3.5.1. </span>Basic approach<a class="headerlink" href="#basic-approach" title="Permalink to this heading"></a></h3>
<p>In the simplest scheme, new samples are drawn from the proposal
<span class="math notranslate nohighlight">\(q\)</span> in each iteration, importance weights computed, and only one
EM step is performed to tune the mixture parameters of the
proposal. Then new samples are drawn, and the updating is iterated
until a user-defined maximum number of steps or some heuristic
convergence criterion is reached <a class="reference internal" href="references.html#bc13" id="id4"><span>[BC13]</span></a>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pypmc.density.mixture.MixtureDensity</span>
<span class="kn">import</span> <span class="nn">pypmc.sampler.importance_sampling.ImportanceSampler</span>
<span class="kn">import</span> <span class="nn">pypmc.mix_adapt.pmc.gaussian_pmc</span>
<span class="n">initial_proposal</span> <span class="o">=</span> <span class="n">MixtureDensity</span><span class="p">(</span><span class="n">initial_prop_components</span><span class="p">)</span>
<span class="n">sampler</span> <span class="o">=</span> <span class="n">ImportanceSampler</span><span class="p">(</span><span class="n">log_target</span><span class="p">,</span> <span class="n">initial_proposal</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
<span class="n">generating_components</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">sampler</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="mi">10</span><span class="o">**</span><span class="mi">3</span><span class="p">,</span> <span class="n">trace_sort</span><span class="o">=</span><span class="kc">True</span><span class="p">))</span>
<span class="n">samples</span> <span class="o">=</span> <span class="n">sampler</span><span class="o">.</span><span class="n">samples</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">weights</span> <span class="o">=</span> <span class="n">sampler</span><span class="o">.</span><span class="n">weights</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">gaussian_pmc</span><span class="p">(</span><span class="n">samples</span><span class="p">,</span> <span class="n">sampler</span><span class="o">.</span><span class="n">proposal</span><span class="p">,</span>
<span class="n">weights</span><span class="p">,</span>
<span class="n">latent</span><span class="o">=</span><span class="n">generating_components</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span>
<span class="n">mincount</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span> <span class="n">rb</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">copy</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
</pre></div>
</div>
<p>In the example code, we keep track of which sample came from which
component by passing the argument <code class="docutils literal notranslate"><span class="pre">trace_sort=True</span></code> to the
<code class="docutils literal notranslate"><span class="pre">sampler</span></code> that returns the indices from the <code class="docutils literal notranslate"><span class="pre">run</span></code> method. The PMC
update can use this information to prune irrelevant components that
contributed less than <code class="docutils literal notranslate"><span class="pre">mincount</span></code> samples. If <code class="docutils literal notranslate"><span class="pre">mincount=0</span></code>, the
pruning is disabled. This may lead to many components with vanishing
weights, which can slow down the PMC update, but otherwise does no
harm.</p>
<p>Note that in the actual parameter update, one needs the latent
variables but when using the recommended Rao-Blackwellization
(<code class="docutils literal notranslate"><span class="pre">rb=True</span></code>), the generating components are ignored, and the
corresponding latent variables are inferred from the data. This is
more time consuming, but leads to more robust fits <a class="reference internal" href="references.html#cap-08" id="id5"><span>[Cap+08]</span></a>. The
faster but less powerful variant <code class="docutils literal notranslate"><span class="pre">rb=False</span></code> then requires that the
generating components be passed to <code class="docutils literal notranslate"><span class="pre">latent</span></code>.</p>
<p>The keyword <code class="docutils literal notranslate"><span class="pre">copy=False</span></code> allows <code class="docutils literal notranslate"><span class="pre">gaussian_pmc</span></code> to update the
<code class="docutils literal notranslate"><span class="pre">density</span></code> in place.</p>
</section>
<section id="id6">
<h3><span class="section-number">3.5.2. </span>Student’s t<a class="headerlink" href="#id6" title="Permalink to this heading"></a></h3>
<p>A Student’s t distribution should be preferred over a Gaussian mixture
if one suspects long tails in the target density. In the original
proposal by Cappé et al. <a class="reference internal" href="references.html#cap-08" id="id7"><span>[Cap+08]</span></a>, the degree of freedom of each
component, <span class="math notranslate nohighlight">\(\nu_k\)</span>, had to be set manually, and it was not
updated. To add more flexibility and put less burden on the user, we
update <span class="math notranslate nohighlight">\(\nu_k\)</span> by numerically solving equation 16 of <a class="reference internal" href="references.html#hod12" id="id8"><span>[HOD12]</span></a>,
which involves the digamma function.</p>
<p>The function <a class="reference internal" href="api.html#pypmc.mix_adapt.pmc.student_t_pmc" title="pypmc.mix_adapt.pmc.student_t_pmc"><code class="xref py py-func docutils literal notranslate"><span class="pre">student_t_pmc</span></code></a> is invoked
just like its Gaussian counterpart, but has three extra arguments to
limit the number of steps of the numerical solver
(<code class="docutils literal notranslate"><span class="pre">dof_solver_steps</span></code>), and to pass the allowed range of values of
<span class="math notranslate nohighlight">\(\nu_k\)</span> (<code class="docutils literal notranslate"><span class="pre">mindof,</span> <span class="pre">maxdof</span></code>). The Student’s t distribution
converges to the Gaussian distribution as <span class="math notranslate nohighlight">\(\nu_k \to \infty\)</span>,
but for practical purposes, <span class="math notranslate nohighlight">\(\nu_k \approx 30\)</span> is usually close
enough to <span class="math notranslate nohighlight">\(\infty\)</span> and thus provides a sufficient upper bound.</p>
<p>For small problems (few samples/components), the numerical
solver may add a significant overhead to the overall time of one PMC
update. But since it adds flexibility, our recommendation is to start
with it and to only turn it off (<code class="docutils literal notranslate"><span class="pre">dof_solver_steps=0</span></code>) if the overhead is
intolerable.</p>
</section>
<section id="pmc-with-multiple-em-steps">
<h3><span class="section-number">3.5.3. </span>PMC with multiple EM steps<a class="headerlink" href="#pmc-with-multiple-em-steps" title="Permalink to this heading"></a></h3>
<p>In order to make the most out of the available samples, it is better
to run multiple EM update steps to infer the parameters of the mixture
density. The convergence criterion is the likelihood value given in
Eq. 5 of <a class="reference internal" href="references.html#cap-08" id="id9"><span>[Cap+08]</span></a>. Depending on the sought precision, several
hundreds of EM steps may be required. We advise the user to decide
based on the cost of computing new samples whether it is worth running
the EM for many iterations or if one gets better results by just
computing new samples for a mixture that is not quite at the (local)
optimum.</p>
<p>A <a class="reference internal" href="api.html#pypmc.mix_adapt.pmc.PMC" title="pypmc.mix_adapt.pmc.PMC"><code class="xref py py-class docutils literal notranslate"><span class="pre">PMC</span></code></a> object handles the convergence
testing for both Gaussian and Student’s t mixtures as follows:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pmc</span> <span class="o">=</span> <span class="n">PMC</span><span class="p">(</span><span class="n">samples</span><span class="p">,</span> <span class="n">prop</span><span class="p">)</span>
<span class="n">pmc</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">iterations</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">rel_tol</span><span class="o">=</span><span class="mf">1e-10</span><span class="p">,</span> <span class="n">abs_tol</span><span class="o">=</span><span class="mf">1e-5</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</pre></div>
</div>
<p>This means a maximum number of 1000 updates is performed but the
updates are stopped if convergence is reached before that. For
details, see <a class="reference internal" href="api.html#pypmc.mix_adapt.pmc.PMC.run" title="pypmc.mix_adapt.pmc.PMC.run"><code class="xref py py-meth docutils literal notranslate"><span class="pre">run</span></code></a> that calls
<a class="reference internal" href="api.html#pypmc.mix_adapt.pmc.gaussian_pmc" title="pypmc.mix_adapt.pmc.gaussian_pmc"><code class="xref py py-func docutils literal notranslate"><span class="pre">gaussian_pmc</span></code></a> or
<a class="reference internal" href="api.html#pypmc.mix_adapt.pmc.student_t_pmc" title="pypmc.mix_adapt.pmc.student_t_pmc"><code class="xref py py-func docutils literal notranslate"><span class="pre">student_t_pmc</span></code></a> to do the heavy lifting.</p>
</section>
</section>
<section id="variational-bayes">
<h2><span class="section-number">3.6. </span>Variational Bayes<a class="headerlink" href="#variational-bayes" title="Permalink to this heading"></a></h2>
<p>The general idea of variational Bayes is pedagogically explained in
<a class="reference internal" href="references.html#bis06" id="id10"><span>[Bis06]</span></a>, Ch. 10. In a nutshell, the unknown joint posterior density
of hidden (or latent) data <span class="math notranslate nohighlight">\(Z\)</span> and the parameters <span class="math notranslate nohighlight">\(\vecth\)</span>
is approximated by a distribution that factorizes as</p>
<div class="math notranslate nohighlight">
\[q(Z, \vecth) = q(Z) q(\vecth)\]</div>
<p>In our case, we assume the data <span class="math notranslate nohighlight">\(X\)</span> to be generated from a
mixture of Gaussians; i.e.,</p>
<div class="math notranslate nohighlight">
\[X \sim P(X|\vecth) = \prod_{i=1}^N \sum_k \alpha_k q_k(x_i|\vecth).\]</div>
<p>where the latent data have been marginalized out. The priors over the
parameters <span class="math notranslate nohighlight">\(\vecth\)</span> are chosen conjugate to the likelihood such
that the posterior <span class="math notranslate nohighlight">\(q(\vecth)\)</span> has the same functional form as
the prior. The prior and the variational posterior over <span class="math notranslate nohighlight">\(\vecth\)</span>
depend on hyperparameters <span class="math notranslate nohighlight">\(\vecgamma_0\)</span> and <span class="math notranslate nohighlight">\(\vecgamma\)</span>
respectively. The only difference between <span class="math notranslate nohighlight">\(P(\vecth)\)</span> and
<span class="math notranslate nohighlight">\(q(\vecth)\)</span> are the values of the hyperparameters, hence the
knowledge update due to the data <span class="math notranslate nohighlight">\(X\)</span> is captured by updating the
values of <span class="math notranslate nohighlight">\(\vecgamma\)</span>. In practice, this results in an
expectation-maximization-like algorithm that seeks to optimize the
lower bound of the evidence, or equivalently minimize the
Kullback-Leibler divergence <span class="math notranslate nohighlight">\(KL(q||P)\)</span>. The result of the
optimization is a <em>local</em> optimum <span class="math notranslate nohighlight">\(\vecgamma^{\ast}\)</span> that
depends rather sensitively on the starting values. In each step,
<span class="math notranslate nohighlight">\(q(Z)\)</span> and <span class="math notranslate nohighlight">\(q(\vecth)\)</span> are alternately updated.</p>
<p>Note that variational Bayes yields an approximation of the <cite>posterior</cite>
over the mixture parameters <span class="math notranslate nohighlight">\(q(\vecth | \vecgamma^{\ast})\)</span>,
while the output of PMC is an optimal value <span class="math notranslate nohighlight">\(\vecth^{\ast}\)</span>. So
in variational Bayes we can fully account for the uncertainty, while
in PMC we cannot. However, when we are forced to create <cite>one</cite> mixture
density based on <span class="math notranslate nohighlight">\(q(\vecth | \vecgamma^{\ast})\)</span>, we choose
<span class="math notranslate nohighlight">\(\vecth^{\ast}\)</span> at the mode; i.e.</p>
<div class="math notranslate nohighlight">
\[\vecth^{\ast} = \arg \max_{\vecth} q(\vecth | \vecgamma^{\ast}).\]</div>
<p>Perhaps the biggest advantage of variational Bayes over PMC is that we
can choose a prior that is noninformative but still prevents the usual
pathologies of maximum likelihood such as excessive model complexity
due to components that are responsible for only one sample and whose
covariance matrix shrinks to zero. Variational Bayes is very effective
at automatically determining a suitable number of components by
assigning weight zero to irrelevant components.</p>
<p>As opposed to PMC, variational Bayes has a natural convergence
criterion, the lower bound to the evidence. We propose to run as many
update steps as necessary until the change of the lower bound is less
than some user-configurable number. Often the smaller that number, the
more irrelevant components are removed.</p>
<p>We implement two variants of variational Bayes, both yield a posterior
over the parameters of a Gaussian mixture. In either case, one can
fully specify all hyperparameter values for both the prior and the
starting point of the posterior.</p>
<p>The <em>classic</em> version <a class="reference internal" href="references.html#bis06" id="id11"><span>[Bis06]</span></a> is the most well known and widely
used. It takes <span class="math notranslate nohighlight">\(N\)</span> samples as input. The <em>mixture reduction</em>
version <a class="reference internal" href="references.html#bgp10" id="id12"><span>[BGP10]</span></a> seeks to compress an input mixture of Gaussians to an
output mixture with fewer components. This variant arises as a
limiting case of the classic version.</p>
<section id="classic-version">
<span id="classic-vb"></span><h3><span class="section-number">3.6.1. </span>Classic version<a class="headerlink" href="#classic-version" title="Permalink to this heading"></a></h3>
<p>A basic example: draw samples from a standard Gaussian in 2D. Then run
variational Bayes to recover that exact Gaussian. Paste the following code into
your python shell and you should get plots similar to those shown modulo the random data points:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="kn">from</span> <span class="nn">pypmc.mix_adapt.variational</span> <span class="kn">import</span> <span class="n">GaussianInference</span>
<span class="kn">from</span> <span class="nn">pypmc.tools</span> <span class="kn">import</span> <span class="n">plot_mixture</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
<span class="c1"># data points</span>
<span class="n">N</span> <span class="o">=</span> <span class="mi">500</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">2</span><span class="o">*</span><span class="n">N</span><span class="p">)</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="c1"># maximum number of components in mixture</span>
<span class="n">K</span> <span class="o">=</span> <span class="mi">6</span>
<span class="n">vb</span> <span class="o">=</span> <span class="n">GaussianInference</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">components</span><span class="o">=</span><span class="n">K</span><span class="p">,</span>
<span class="n">alpha</span><span class="o">=</span><span class="mi">10</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="n">K</span><span class="p">),</span>
<span class="n">nu</span><span class="o">=</span><span class="mi">3</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="n">K</span><span class="p">))</span>
<span class="c1"># plot data and initial guess</span>
<span class="n">plt</span><span class="o">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">data</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">],</span> <span class="n">data</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s1">'gray'</span><span class="p">)</span>
<span class="n">initial_mix</span> <span class="o">=</span> <span class="n">vb</span><span class="o">.</span><span class="n">make_mixture</span><span class="p">()</span>
<span class="n">plot_mixture</span><span class="p">(</span><span class="n">initial_mix</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="s1">'gist_rainbow'</span><span class="p">)</span>
<span class="n">x_range</span> <span class="o">=</span> <span class="p">(</span><span class="o">-</span><span class="mi">4</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span>
<span class="n">y_range</span> <span class="o">=</span> <span class="n">x_range</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlim</span><span class="p">(</span><span class="n">x_range</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">ylim</span><span class="p">(</span><span class="n">y_range</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">gca</span><span class="p">()</span><span class="o">.</span><span class="n">set_aspect</span><span class="p">(</span><span class="s1">'equal'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s1">'Initial'</span><span class="p">)</span>
<span class="c1"># compute variational Bayes posterior</span>
<span class="n">vb</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">prune</span><span class="o">=</span><span class="mf">0.5</span><span class="o">*</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="o">/</span> <span class="n">K</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="c1"># obtain most probable mixture and plot it</span>
<span class="n">mix</span> <span class="o">=</span> <span class="n">vb</span><span class="o">.</span><span class="n">make_mixture</span><span class="p">()</span>
<span class="n">plt</span><span class="o">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">data</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">],</span> <span class="n">data</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s1">'gray'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlim</span><span class="p">(</span><span class="n">x_range</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">ylim</span><span class="p">(</span><span class="n">y_range</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">gca</span><span class="p">()</span><span class="o">.</span><span class="n">set_aspect</span><span class="p">(</span><span class="s1">'equal'</span><span class="p">)</span>
<span class="n">plot_mixture</span><span class="p">(</span><span class="n">mix</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="s1">'gist_rainbow'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s1">'Final'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div>
</div>
<p>(<a class="reference download internal" download="" href="_downloads/c9eb61305bef74860085c67537d59b36/user_guide-1.py"><code class="xref download docutils literal notranslate"><span class="pre">Source</span> <span class="pre">code</span></code></a>, <a class="reference download internal" download="" href="_downloads/2d7494cd43f3f876b47b635c6f2056bc/user_guide-1.png"><code class="xref download docutils literal notranslate"><span class="pre">png</span></code></a>, <a class="reference download internal" download="" href="_downloads/6b5ec9b0af6336ede0afce3bbc6ca356/user_guide-1.hires.png"><code class="xref download docutils literal notranslate"><span class="pre">hires.png</span></code></a>, <a class="reference download internal" download="" href="_downloads/4b5f3844d5f86b47be21b187f40b0663/user_guide-1.pdf"><code class="xref download docutils literal notranslate"><span class="pre">pdf</span></code></a>)</p>
<figure class="align-default">
<img alt="_images/user_guide-1.png" class="plot-directive" src="_images/user_guide-1.png" />
</figure>
<section id="id13">
<h4><span class="section-number">3.6.1.1. </span>Initialization<a class="headerlink" href="#id13" title="Permalink to this heading"></a></h4>
<p>In more complicated examples, it may be necessary to give good
starting values to the means and covariances of the components in
order to accelerate convergence to a sensible solution. You can pass
this information when you create the
<a class="reference internal" href="api.html#pypmc.mix_adapt.variational.GaussianInference" title="pypmc.mix_adapt.variational.GaussianInference"><code class="xref py py-class docutils literal notranslate"><span class="pre">GaussianInference</span></code></a>
object. Internally, the info is forwarded to a call to
<a class="reference internal" href="api.html#pypmc.mix_adapt.variational.GaussianInference.set_variational_parameters" title="pypmc.mix_adapt.variational.GaussianInference.set_variational_parameters"><code class="xref py py-meth docutils literal notranslate"><span class="pre">set_variational_parameters</span></code></a>,
where all parameter names and symbols are explained in detail.</p>
<p>If an initial guess in the form of a Gaussian
<a class="reference internal" href="api.html#pypmc.density.mixture.MixtureDensity" title="pypmc.density.mixture.MixtureDensity"><code class="xref py py-class docutils literal notranslate"><span class="pre">MixtureDensity</span></code></a> is available, this can
be used to define the initial values using
<code class="docutils literal notranslate"><span class="pre">GaussianInference(...</span> <span class="pre">initial_guess=mixture)</span></code></p>
<p>Note that the <code class="docutils literal notranslate"><span class="pre">vb</span></code> object carries the posterior distribution of
hyperparameters describing a Gaussian mixture. Invoking
<code class="docutils literal notranslate"><span class="pre">make_mixture()</span></code> singles out the mixture at the mode of the
posterior. To have a well defined mode one needs <code class="docutils literal notranslate"><span class="pre">nu[k]</span> <span class="pre">></span> <span class="pre">d</span></code> and
<code class="docutils literal notranslate"><span class="pre">alpha[k]</span> <span class="pre">></span> <span class="pre">0</span></code> for at least one component <code class="docutils literal notranslate"><span class="pre">k</span></code>. We set <span class="math notranslate nohighlight">\(\nu=3\)</span>
such that the covariance at the mode of the Wishart distribution</p>
<div class="math notranslate nohighlight">
\[\boldsymbol{\Sigma} = (\nu - d) \boldsymbol{W}^{-1} = \boldsymbol{W}^{-1}\]</div>
<p>equals <span class="math notranslate nohighlight">\(\boldsymbol{W}^{-1}\)</span> for <span class="math notranslate nohighlight">\(d=2\)</span>. This allows us to
plot the initial guess. The default placement
<code class="docutils literal notranslate"><span class="pre">GaussianInference(...initial_guess="random")</span></code> is to randomly select
<code class="docutils literal notranslate"><span class="pre">K</span></code> data points and start with a Gaussian of unit covariance
there. <code class="docutils literal notranslate"><span class="pre">K</span></code> is the maximum number of components and has to be chosen
by user. A safe procedure is to choose <code class="docutils literal notranslate"><span class="pre">K</span></code> larger than desired, and
let variational Bayes figures out the right value.</p>
</section>
<section id="running">
<h4><span class="section-number">3.6.1.2. </span>Running<a class="headerlink" href="#running" title="Permalink to this heading"></a></h4>
<p>Running variational Bayes with <code class="docutils literal notranslate"><span class="pre">vb.run()</span></code> can take a while if you
have a lot of data points, lots of components, and high-dimensional
data. Monitor the progress with <code class="docutils literal notranslate"><span class="pre">verbose=True</span></code>.</p>
<p>The pruning (removal) of components is determined by the <code class="docutils literal notranslate"><span class="pre">prune</span></code>
keyword. After a VB update, every component is <em>responsible</em> for an
effective number of samples. If this is lower than the threshold set
by <code class="docutils literal notranslate"><span class="pre">prune</span></code>, the component is pruned. In our experiments, a good rule
of thumb to remove many components is to set the threshold to
<span class="math notranslate nohighlight">\(K/2\)</span>.</p>
</section>
<section id="results">
<h4><span class="section-number">3.6.1.3. </span>Results<a class="headerlink" href="#results" title="Permalink to this heading"></a></h4>
<p>Continuing the example, you can inspect how all hyperparameters were
updated by the data:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">vb</span><span class="o">.</span><span class="n">prior_posterior</span><span class="p">()</span>
</pre></div>
</div>
<p>and you can check that the mean of the most probable Gaussian
(assuming the mixture only has one component) is close to zero and the
covariance is close to the identity matrix:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">mix</span> <span class="o">=</span> <span class="n">vb</span><span class="o">.</span><span class="n">make_mixture</span><span class="p">()</span>
<span class="n">mix</span><span class="o">.</span><span class="n">components</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">mu</span>
<span class="n">mix</span><span class="o">.</span><span class="n">components</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">sigma</span>
</pre></div>
</div>
</section>
</section>
</section>
<section id="mixture-reduction">
<h2><span class="section-number">3.7. </span>Mixture reduction<a class="headerlink" href="#mixture-reduction" title="Permalink to this heading"></a></h2>
<p>Let us suppose samples are fed into a clustering algorithm that yields
a Gaussian mixture. To save memory, we discard the samples and retain
only the mixture as a description of the data. Assume the same
procedure is carried out on different sets of samples from the same
parent distribution, and we end up with a collection of mixture
densities that contain similar information. How to combine them? A
simple merge would be overly complex, as similar information is stored
in every mixture. How then to compress this collection into one
Gaussian mixture with less components but similar descriptive power?
We provide two algorithms for this task illustrated in the example
<a class="reference internal" href="examples.html#ex-mix-red"><span class="std std-ref">Mixture reduction</span></a>.</p>
<section id="hierarchical-clustering">
<h3><span class="section-number">3.7.1. </span>Hierarchical clustering<a class="headerlink" href="#hierarchical-clustering" title="Permalink to this heading"></a></h3>
<p>While the KL divergence between two Gaussians is known analytically,
the corresponding result between Gaussian mixtures is not known. The
<cite>hierarchical clustering</cite> described in <a class="reference internal" href="references.html#gr04" id="id14"><span>[GR04]</span></a> seeks to minimize an
ad-hoc function used as a proxy for the metric between two Gaussian
mixtures. The basic idea is very simple: map input components to
output components such that every component in the output mixture is
made up of an <cite>integer</cite> number of input components (<cite>regroup</cite>
step). Then update the output component weights, means, and
covariances (<cite>refit</cite> step). Continue until the metric is unchanged.</p>
<p>Note that this is a discrete problem: each input component is
associated to only one output component, thus if the mapping doesn’t
change, then the metric does not change either. Output components can
only die out if they receive no input component. Typically this is
rare, so the number of output components is essentially chosen by the
user, and not by the algorithm
<a class="reference internal" href="api.html#pypmc.mix_adapt.hierarchical.Hierarchical" title="pypmc.mix_adapt.hierarchical.Hierarchical"><code class="xref py py-class docutils literal notranslate"><span class="pre">Hierarchical</span></code></a>. A user has to
supply the input mixture, and an initial guess of the output mixture,
thereby defining the maximum number of components:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pypmc.mix_adapt.hierarchical</span> <span class="kn">import</span> <span class="n">Hierarchical</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">Hierarchical</span><span class="p">(</span><span class="n">input_mixture</span><span class="p">,</span> <span class="n">initial_guess</span><span class="p">)</span>
</pre></div>
</div>
<p>where both arguments are <a class="reference internal" href="api.html#pypmc.density.mixture.MixtureDensity" title="pypmc.density.mixture.MixtureDensity"><code class="xref py py-class docutils literal notranslate"><span class="pre">pypmc.density.mixture.MixtureDensity</span></code></a>
objects. To perform the clustering:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">h</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
</pre></div>
</div>
<p>Optional arguments to <code class="xref py py-meth docutils literal notranslate"><span class="pre">pypmc.density.mixture.MixtureDensity.run</span></code>
are the tolerance by which the metric may change to declare
convergence (<code class="docutils literal notranslate"><span class="pre">eps</span></code>), whether to remove output components with zero
weight (<code class="docutils literal notranslate"><span class="pre">kill</span></code>), and the total number of (regroup + refit) steps
(<code class="docutils literal notranslate"><span class="pre">max_steps</span></code>).</p>
</section>
<section id="vbmerge">
<h3><span class="section-number">3.7.2. </span>VBmerge<a class="headerlink" href="#vbmerge" title="Permalink to this heading"></a></h3>
<p>In <a class="reference internal" href="references.html#bgp10" id="id15"><span>[BGP10]</span></a>, a variational algorithm is derived in the limit of large
<span class="math notranslate nohighlight">\(N\)</span>, the total number of <cite>virtual</cite> input samples. That is, the
original samples are not required, only the mixtures. Hence the
clustering is much faster but less accurate compared to standard
variational Bayes. To create a
<a class="reference internal" href="api.html#pypmc.mix_adapt.variational.VBMerge" title="pypmc.mix_adapt.variational.VBMerge"><code class="xref py py-class docutils literal notranslate"><span class="pre">VBMerge</span></code></a> object, the required
inputs are a <a class="reference internal" href="api.html#pypmc.density.mixture.MixtureDensity" title="pypmc.density.mixture.MixtureDensity"><code class="xref py py-class docutils literal notranslate"><span class="pre">MixtureDensity</span></code></a>, the total
number of samples encoded in the mixture <span class="math notranslate nohighlight">\(N\)</span>, and the the
maximum number of components <span class="math notranslate nohighlight">\(K\)</span> desired in the compressed
output mixture:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pypmc.mix_adapt.variational</span> <span class="kn">import</span> <span class="n">VBMerge</span>
<span class="n">VBMerge</span><span class="p">(</span><span class="n">input_mixture</span><span class="p">,</span> <span class="n">N</span><span class="p">,</span> <span class="n">K</span><span class="p">)</span>
</pre></div>
</div>
<p>As guidance, if <span class="math notranslate nohighlight">\(N\)</span> is not known, one should choose a large
number like <span class="math notranslate nohighlight">\(N=10^4\)</span> to obtain decent results.</p>
<p>The classes <a class="reference internal" href="api.html#pypmc.mix_adapt.variational.VBMerge" title="pypmc.mix_adapt.variational.VBMerge"><code class="xref py py-class docutils literal notranslate"><span class="pre">VBMerge</span></code></a> and
<a class="reference internal" href="api.html#pypmc.mix_adapt.variational.GaussianInference" title="pypmc.mix_adapt.variational.GaussianInference"><code class="xref py py-class docutils literal notranslate"><span class="pre">GaussianInference</span></code></a> share the
same interface; please check <a class="reference internal" href="#classic-vb"><span class="std std-ref">Classic version</span></a>.</p>
<p>The great advantage compared to hierarchical clustering is that the
number of output components is chosen automatically. One starts with
(too) many components, updates, and removes those components with
small weight using the <code class="docutils literal notranslate"><span class="pre">prune</span></code> argument to
<a class="reference internal" href="api.html#pypmc.mix_adapt.variational.GaussianInference.run" title="pypmc.mix_adapt.variational.GaussianInference.run"><code class="xref py py-meth docutils literal notranslate"><span class="pre">pypmc.mix_adapt.variational.GaussianInference.run</span></code></a>.</p>
</section>
</section>
<section id="putting-it-all-together">
<h2><span class="section-number">3.8. </span>Putting it all together<a class="headerlink" href="#putting-it-all-together" title="Permalink to this heading"></a></h2>
<p>The examples in the next section show how to use the different
algorithms in practice. The most advanced example, <a class="reference internal" href="examples.html#ex-mcmc-vb"><span class="std std-ref">MCMC + variational Bayes</span></a>,
demonstrates how to combine various algorithms to integrate and sample
from a multimodal function:</p>
<blockquote>
<div><ol class="arabic simple">
<li><p>run multiple Markov chains to learn the local features of the
target density;</p></li>
<li><p>combine the samples into a mixture density with variational Bayes</p></li>
<li><p>run importance sampling</p></li>
<li><p>rerun variational Bayes on importance samples</p></li>
<li><p>repeat importance sampling with improved proposal</p></li>
<li><p>combine samples with the deterministic-mixture approach</p></li>
</ol>
</div></blockquote>
</section>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="installation.html" class="btn btn-neutral float-left" title="2. Installation" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="examples.html" class="btn btn-neutral float-right" title="4. Examples" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>© Copyright 2014-2021, Frederik Beaujean and Stephan Jahn and others.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>