Skip to content

Commit 044c163

Browse files
committed
Adding Dockerfile
1 parent ef67d43 commit 044c163

File tree

3 files changed

+63
-19
lines changed

3 files changed

+63
-19
lines changed

.dockerignore

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
__pycache__/
2+
.git/
3+
checkpoints/
4+
data/
5+
lib/

Dockerfile

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
FROM ubuntu:16.04
2+
3+
MAINTAINER Riddhiman Dasgupta <[email protected]>
4+
5+
RUN apt-get update
6+
RUN apt-get install -y --no-install-recommends git curl wget ca-certificates bzip2 unzip openjdk-8-jdk-headless
7+
RUN apt-get -y autoclean && apt-get -y autoremove
8+
9+
RUN curl -o /root/miniconda.sh -O https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
10+
chmod +x /root/miniconda.sh && \
11+
/root/miniconda.sh -b && \
12+
rm /root/miniconda.sh && \
13+
/root/miniconda3/bin/conda clean -ya
14+
15+
ENV PATH /root/miniconda3/bin:$PATH
16+
WORKDIR /root/treelstm.pytorch
17+
COPY requirements.txt .
18+
RUN ["/bin/bash", "-c", "pip install -r requirements.txt"]
19+
20+
CMD ["/bin/bash"]

README.md

+38-19
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,52 @@
1+
12
# Tree-Structured Long Short-Term Memory Networks
23
This is a [PyTorch](http://pytorch.org/) implementation of Tree-LSTM as described in the paper [Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks](http://arxiv.org/abs/1503.00075) by Kai Sheng Tai, Richard Socher, and Christopher Manning. On the semantic similarity task using the SICK dataset, this implementation reaches:
3-
- Pearson's coefficient: `0.8492` and MSE: `0.2842` with learning rate of `0.010` and fine-tuned embeddings
4-
- Pearson's coefficient: `0.8674` and MSE: `0.2536` with learning rate of `0.025` and frozen embeddings
4+
- Pearson's coefficient: `0.8492` and MSE: `0.2842` using hyperparameters `--lr 0.010 --wd 0.0001 --optim adagrad --batchsize 25`
5+
- Pearson's coefficient: `0.8674` and MSE: `0.2536` using hyperparameters `--lr 0.025 --wd 0.0001 --optim adagrad --batchsize 25 --freeze_embed`
6+
- Pearson's coefficient: `0.8676` and MSE: `0.2532` are the numbers reported in the original paper.
7+
- Known differences include the way the gradients are accumulated (normalized by batchsize or not).
58

69
### Requirements
7-
- Python (tested on **2.7.13** and **3.6.3**)
8-
- [PyTorch](http://pytorch.org/) (tested on **0.1.12** and **0.2.0**)
9-
- [tqdm](https://github.com/tqdm/tqdm)
10+
- Python (tested on **3.6.4**, should work on **>=2.7**)
1011
- Java >= 8 (for Stanford CoreNLP utilities)
12+
- Other dependencies are in `requirements.txt`
1113

1214
### Usage
13-
- First run the script `./fetch_and_preprocess.sh`, which, as the name suggests, does two things:
14-
- Fetch data, such as:
15-
- [SICK dataset](http://alt.qcri.org/semeval2014/task1/index.php?id=data-and-tools) (semantic relatedness task)
16-
- [Glove word vectors](http://nlp.stanford.edu/projects/glove/) (Common Crawl 840B) -- **Warning:** this is a 2GB download!
17-
- [Stanford Parser](http://nlp.stanford.edu/software/lex-parser.shtml) and [Stanford POS Tagger](http://nlp.stanford.edu/software/tagger.shtml)
18-
- Preprocess data, i.e. generate dependency parses using [Stanford Neural Network Dependency Parser](http://nlp.stanford.edu/software/nndep.shtml).
19-
- Run `python main.py` to try the Dependency Tree-LSTM from the paper to predict similarity for pairs of sentences on the SICK dataset. For a list of all command-line arguments, have a look at `config.py`.
20-
- The first run takes a few minutes to read and store the GLOVE embeddings for the words in the SICK vocabulary to a cache for future runs. In later runs, only the cache is read in during later runs.
15+
Before delving into how to run the code, here is a quick overview of the contents:
16+
- Use the script `fetch_and_preprocess.sh` to download the [SICK dataset](http://alt.qcri.org/semeval2014/task1/index.php?id=data-and-tools), [Stanford Parser](http://nlp.stanford.edu/software/lex-parser.shtml) and [Stanford POS Tagger](http://nlp.stanford.edu/software/tagger.shtml), and [Glove word vectors](http://nlp.stanford.edu/projects/glove/) (Common Crawl 840) -- **Warning:** this is a 2GB download!), and additionally preprocees the data, i.e. generate dependency parses using [Stanford Neural Network Dependency Parser](http://nlp.stanford.edu/software/nndep.shtml).
17+
- `main.py`does the actual heavy lifting of training the model and testing it on the SICK dataset. For a list of all command-line arguments, have a look at `config.py`.
18+
- The first run caches GLOVE embeddings for words in the SICK vocabulary. In later runs, only the cache is read in during later runs.
2119
- Logs and model checkpoints are saved to the `checkpoints/` directory with the name specified by the command line argument `--expname`.
2220

23-
### Results
24-
- Using hyperparameters `--lr 0.010 --wd 0.0001 --optim adagrad --batchsize 25` gives Pearson's coefficient of `0.8492` and MSE of `0.2842`
25-
- Using hyperparameters `--lr 0.025 --wd 0.0001 --optim adagrad --batchsize 25 --freeze_embed` gives Pearson's coefficient of `0.8674` and MSE of `0.2536`
26-
- In the original paper, the numbers reported include Pearson's coefficient of `0.8676` and MSE of `0.2532`
27-
28-
Minor differences include the way the gradients are accumulated (normalized by batchsize or not) and embeddings are updated (frozen or fine-tuned).
21+
Next, these are the different ways to run the code here to train a TreeLSTM model.
22+
#### Local Python Environment
23+
If you have a working Python3 environment, simply run the following sequence of steps:
24+
```
25+
- bash fetch_and_preprocess.sh
26+
- pip install -r requirements.txt
27+
- python main.py
28+
```
29+
#### Pure Docker Environment
30+
If you want to use a Docker container, simply follow these steps:
31+
```
32+
- docker build -t treelstm .
33+
- docker run -it treelstm bash
34+
- bash fetch_and_preprocess.sh
35+
- python main.py
36+
```
37+
#### Local Filesystem + Docker Environment
38+
If you want to use a Docker container, but want to persist data and checkpoints in your local filesystem, simply follow these steps:
39+
```
40+
- bash fetch_and_preprocess.sh
41+
- docker build -t treelstm .
42+
- docker run -it --mount type=bind,source="$(pwd)",target="/root/treelstm.pytorch" treelstm bash
43+
- python main.py
44+
```
45+
**NOTE**: Setting the environment variable OMP_NUM_THREADS=1 usually gives a speedup on the CPU. Use it like `OMP_NUM_THREADS=1 python main.py`. To run on a GPU, set the CUDA_VISIBLE_DEVICES instead. Usually, CUDA does not give much speedup here, since we are operating at a batchsize of `1`.
2946

3047
### Notes
48+
- (**Apr 02, 2018**) Added Dockerfile
49+
- (**Apr 02, 2018**) Now works on **PyTorch 0.3.1** and **Python 3.6**, removed dependency on **Python 2.7**
3150
- (**Nov 28, 2017**) Added **frozen embeddings**, closed gap to paper.
3251
- (**Nov 08, 2017**) Refactored model to get **1.5x - 2x speedup**.
3352
- (**Oct 23, 2017**) Now works with **PyTorch 0.2.0**.

0 commit comments

Comments
 (0)