Skip to content

Commit 09644db

Browse files
committed
reganchor
Former-commit-id: 9e2ef2b [formerly 8403d9b] Former-commit-id: bbb2c55
1 parent 3b88888 commit 09644db

File tree

70 files changed

+2620
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+2620
-0
lines changed

2014_acl_reganchor.tex

+120
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
%
2+
% File acl2014.tex
3+
%
4+
5+
%%
6+
%% Based on the style files for ACL-2013, which were, in turn,
7+
%% Based on the style files for ACL-2012, which were, in turn,
8+
%% based on the style files for ACL-2011, which were, in turn,
9+
%% based on the style files for ACL-2010, which were, in turn,
10+
%% based on the style files for ACL-IJCNLP-2009, which were, in turn,
11+
%% based on the style files for EACL-2009 and IJCNLP-2008...
12+
13+
%% Based on the style files for EACL 2006 by
14+
15+
%% and that of ACL 08 by Joakim Nivre and Noah Smith
16+
17+
\documentclass[11pt]{article}
18+
\usepackage{style/acl2014}
19+
\usepackage{times}
20+
\usepackage{url}
21+
\usepackage{latexsym}
22+
23+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
24+
25+
\usepackage{booktabs}
26+
\usepackage{algorithm}
27+
\usepackage[noend]{algorithmic}
28+
%\usepackage[caption=false]{subfig}
29+
\usepackage[table]{xcolor}
30+
\usepackage{subfigure}
31+
32+
\usepackage{style/mfirstuc}
33+
\newcommand{\etal}[2]{\makefirstuc{#1}~et~al.~\cite{#1-#2}}
34+
\newcommand{\cd}[1]{\bar{\bm{Q}}_{#1, \cdot} }
35+
\newcommand{\citet}[1]{\newcite{#1}}
36+
37+
\newif\ifcomment\commentfalse
38+
\input{style/preamble}
39+
40+
\newcommand{\red}[1]{{\color{red}{\bf #1}}}
41+
\newcommand{\blue}[1]{{\color{blue}{\bf #1}}}
42+
\newcommand{\green}[1]{{\color{green}{\bf #1}}}
43+
\newcommand{\purple}[1]{{\color{purple}{\bf #1}}}
44+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
45+
46+
\title{Anchors Regularized: Adding Robustness and Extensibility \\
47+
to Scalable Topic-Modeling Algorithms}
48+
49+
\author{Thang Nguyen \\
50+
iSchool and \abr{umiacs}, \\
51+
University of Maryland \\
52+
and National Library of Medicine, \\
53+
National Institutes of Health \\
54+
\email{[email protected]} \\\And
55+
Yuening Hu \\
56+
Computer Science \\
57+
University of Maryland \\
58+
\email{[email protected]} \\ \And
59+
Jordan Boyd-Graber \\
60+
iSchool and \abr{umiacs} \\
61+
University of Maryland \\
62+
\email{[email protected]} \\
63+
}
64+
65+
\date{}
66+
67+
68+
69+
\begin{document}
70+
71+
%\maketitle
72+
73+
% TODO
74+
% 1. Explain different corpora for TI
75+
% 2. Hyperparameter selection for HL
76+
% 3. Discussion of HL equivalence, VB and Gibbs competitive
77+
% 4. Explain why NIPS has poor WIKITI
78+
% 5. Remove informed prior equation
79+
% 6. Rewrite final discussion
80+
81+
%\jbgcomment{Took a stab at improving the abstract, but not sure it's all the way
82+
%there yet.}
83+
84+
\begin{abstract}
85+
Spectral methods offer scalable alternatives to Markov chain Monte
86+
Carlo and expectation maximization. However, these new methods lack
87+
the rich priors associated with probabilistic models. We examine
88+
Arora et al.'s anchor words algorithm for topic modeling and develop
89+
new, regularized algorithms that not only mathematically resemble
90+
Gaussian and Dirichlet priors but also improve the interpretability
91+
of topic models. Our new regularization approaches make these
92+
efficient algorithms more flexible; we also show that these methods can
93+
be combined with informed priors.
94+
\end{abstract}
95+
96+
\input{2014_acl_reganchor/sections/intro}
97+
\input{2014_acl_reganchor/sections/background}
98+
\input{2014_acl_reganchor/sections/model}
99+
\input{2014_acl_reganchor/sections/experiments}
100+
\input{2014_acl_reganchor/sections/discussion}
101+
\input{2014_acl_reganchor/sections/conclusion}
102+
103+
\section*{Acknowledgments}
104+
105+
We would like to thank the anonymous reviewers, Hal Daum\'e III, Ke Wu,
106+
and Ke Zhai for their helpful comments. This work was supported by
107+
\abr{nsf} Grant IIS-1320538. Boyd-Graber is also supported by
108+
\abr{nsf} Grant CCF-1018625. Any opinions, findings, conclusions, or
109+
recommendations expressed here are those of the authors and do not
110+
necessarily reflect the view of the sponsor.
111+
112+
\newpage
113+
114+
%\bibliographystyle{style/icml2013}
115+
\bibliographystyle{style/acl2014}
116+
%\bibliographystyle{apalike}
117+
%\footnotesize
118+
\bibliography{bib/journal-full,bib/thang,bib/jbg,bib/ynhu}
119+
120+
\end{document}

2014_acl_reganchor/figures.R

Whitespace-only changes.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
f3ef1ca20b78d9066d345f1ed397685f7b3573ab
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
b99ebb9b0977d5bcdb78d100af1fb2f0c23c7071
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
bb98e2cc9c6003dc0ac1b8327b40efdd6b668189
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
51b3fddc6a5dfc8c110d085be5443e47ba66bd35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
7ff71ef45294e3196b3a5772698a51393d5c565b
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
5f34ca6f520f2b5f404b57cee95a24626972c7bd
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
2cac3f72fdb6cd9b25eee492f0ad370d2e850d05
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
929643606b09f827d6ec8131e9fa30768315f0c0
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
0e4dbd1fdad2ba1843a870cdb579dd2c98ff47e1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
37b07a36f124e013d1937dc60e52ef6ea8c4f715
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
34b1ef3c16049f2b65051287c5ac6a7a10eaedb0
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
ea8ce91056ffd5d02ec57345c3277ebe5e144752
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
dc951b22efd866289e52857c49b445be264e397b
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
ea59da73ff2e256a491005381262afe56444e1e3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
021b1ced8ee4aa70be85efe22a63d6dced0b6b21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
7807fd34fd814e3eac3a84fa177b145c92414de5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
051e622d0da12660659257c198fc4d6387907685
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
07f7aa3de444a1963b64cf48ce48ca8ea75728ce
6.3 KB
Binary file not shown.

2014_acl_reganchor/figures/DiffC.pdf

6.15 KB
Binary file not shown.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
e9250684498f480de292b8e7cd7b9f0581fac0fb

2014_acl_reganchor/figures/HL.pdf

11.7 KB
Binary file not shown.

2014_acl_reganchor/figures/HL_HL.pdf

17.9 KB
Binary file not shown.

2014_acl_reganchor/figures/HL_L2.jpg

35.3 KB

2014_acl_reganchor/figures/HL_TI.pdf

12.2 KB
Binary file not shown.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
980d48cc6187a013839572f8d7328a49599e8b7f
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
739e4bdbd0a19e185cdb23f956e330f31505e838

2014_acl_reganchor/figures/M_HL.pdf

5.2 KB
Binary file not shown.

2014_acl_reganchor/figures/M_TI.pdf

5.23 KB
Binary file not shown.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
c67b1e28c45fdb142c64475d22c4da52d92bd917
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
c8f012f334dbf9489ac2b2804fe6d775f970b030
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
18f1409aef727c3134513370a57968e3b7eb1552
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
86965f939b41322590525594277b4fb6299fa198
6.43 KB
Binary file not shown.

2014_acl_reganchor/figures/TI.pdf

11.6 KB
Binary file not shown.

2014_acl_reganchor/figures/TI_HL.pdf

12.4 KB
Binary file not shown.

2014_acl_reganchor/figures/TI_L2.jpg

29.9 KB

2014_acl_reganchor/figures/TI_TI.pdf

17.9 KB
Binary file not shown.
17.9 KB
Binary file not shown.
92.8 KB
Binary file not shown.
Binary file not shown.
50.5 KB
41.2 KB
Binary file not shown.
10.1 KB
+73
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
2+
library(ggplot2)
3+
library(GGally)
4+
5+
parallel_plot <- function(top, bottom, old_col, new_col, word_label, topic_group) {
6+
whole <- rbind(top, bottom)
7+
8+
p <- ggparcoord(data=whole, columns=c(old_col,new_col), scale="globalminmax", groupColumn=topic_group)
9+
10+
anchors = levels(whole[,topic_group])
11+
12+
for(i in 1:nrow(whole)) {
13+
row <- whole[i,]
14+
color = which(anchors == row$anchor)
15+
yval_old <- as.numeric(row[old_col])
16+
yval_new <- as.numeric(row[new_col])
17+
p <- p + geom_text(data= NULL, x = 0.9, y = yval_old, label=row$word, colour="black")
18+
p <- p + geom_text(data= NULL, x = 2.1, y = yval_new, label=row$word, colour="black")
19+
}
20+
21+
return(p)
22+
}
23+
24+
orig <- read.table("orig.txt")
25+
beta <- read.table("beta.txt")
26+
27+
beta <- data.frame(score = beta$V3,
28+
type = "beta",
29+
anchor = beta$V1,
30+
word = beta$V2,
31+
key = sprintf("%s_%s", beta$V1, beta$V2))
32+
33+
beta$rank <- ave(beta$score, beta$anchor, FUN=rank)
34+
beta$rank <- max(beta$rank) - beta$rank
35+
36+
orig <- data.frame(score = orig$V3,
37+
type = "orig",
38+
anchor = orig$V1,
39+
word = orig$V2,
40+
key = sprintf("%s_%s", orig$V1, orig$V2))
41+
42+
orig$rank <- ave(orig$score, orig$anchor, FUN=rank)
43+
orig$rank <- max(orig$rank) - orig$rank
44+
45+
words <- rbind(orig, beta)
46+
diffs <- merge(orig, beta, by="key")
47+
48+
diffs$word <- diffs$word.x
49+
diffs$anchor <- diffs$anchor.x
50+
diffs$rank <- diffs$rank.x - diffs$rank.y
51+
diffs$orig_rank <- diffs$rank.x
52+
diffs$beta_rank <- diffs$rank.y
53+
diffs$orig_score <- diffs$score.x
54+
diffs$beta_score <- diffs$score.y
55+
diffs$score <- diffs$score.x - diffs$score.y
56+
57+
density_plot <- ggplot(words, aes(rank, log(score)), scales="free") + geom_line() + facet_grid(type ~ anchor) + ylim(c(-25, 0)) + ylab("p(word|topic)") + xlab("Rank of word in topic") + scale_x_continuous(labels = c())
58+
59+
diff_rank <- diffs[order(diffs$rank),]
60+
diff_score <- diffs[order(diffs$score),]
61+
62+
num_words <- 15
63+
top_diff_rank <- diff_rank[1:num_words,]
64+
bottom_diff_rank <- diff_rank[(dim(diff_rank)[1]-num_words):dim(diff_rank)[1],]
65+
66+
top_diff_score <- diff_score[1:num_words,]
67+
bottom_diff_score <- diff_score[(dim(diff_score)[1]-num_words):dim(diff_score)[1],]
68+
69+
70+
rank_diff <- parallel_plot(top_diff_rank, bottom_diff_rank, which(colnames(top_diff_rank)=="orig_rank"), which(colnames(top_diff_rank)=="beta_rank"), which(colnames(top_diff_rank)=="word"), which(colnames(top_diff_rank)=="anchor"))
71+
score_diff <- parallel_plot(top_diff_score, bottom_diff_score, which(colnames(top_diff_score)=="orig_score"), which(colnames(top_diff_score)=="beta_score"), which(colnames(top_diff_score)=="word"), which(colnames(top_diff_score)=="anchor"))
72+
73+
3.56 KB
14.1 KB
Binary file not shown.
13.5 KB
Binary file not shown.
7.56 KB
Binary file not shown.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
d71c33128faad75cd7fa7917d5d9e9290c074dea
45.8 KB
Binary file not shown.
16.7 KB
101 KB

0 commit comments

Comments
 (0)