Skip to content

Commit eb1cc30

Browse files
committed
#1. Support sequence-to-sequence model
#2. Fix bug in softmax output layer when computing hidden layer value #3. Refactoring code
1 parent ddcd55d commit eb1cc30

28 files changed

+1166
-338
lines changed

README.md

+37-15
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,14 @@
11
# RNNSharp
2-
RNNSharp is a toolkit of deep recurrent neural network which is widely used for many different kinds of tasks, such as sequence labeling. It's written by C# language and based on .NET framework 4.6 or above version.
2+
RNNSharp is a toolkit of deep recurrent neural network which is widely used for many different kinds of tasks, such as sequence labeling, sequence-to-sequence and so on. It's written by C# language and based on .NET framework 4.6 or above version.
33

4-
This page will introduces you about what is RNNSharp, how it works and how to use it. To get the demo package, please access release page and download the package.
4+
This page introduces what is RNNSharp, how it works and how to use it. To get the demo package, you can access release page.
55

66
## Overview
7-
RNNSharp supports many different types of deep recurrent neural network (aka DeepRNN) structures.In the aspect of historical memory, it supports BPTT(BackPropagation Through Time) and LSTM(Long Short-Term Memory) structures. And in respect of output layer structure, RNNSharp supports native output layer and recurrent CRFs[1]. In additional, RNNSharp also support forward RNN and bi-directional RNN structures.
7+
RNNSharp supports many different types of deep recurrent neural network (aka DeepRNN) structures. In terms of historical memory, it supports BPTT(BackPropagation Through Time) and LSTM(Long Short-Term Memory) structures. And in respect of output layer structure, RNNSharp supports softmax, negative sampling softmax and recurrent CRFs[1]. In additional, RNNSharp also supports forward RNN and bi-directional RNN structures.
88

9-
For BPTT and LSTM, BPTT-RNN is usually called as "simple RNN", since the structure of its hidden layer node is very simple. It's not good at preserving long time historical memory. LSTM-RNN is more complex than BPTT-RNN, since its hidden layer node has inner-structure which helps it to save very long time historical memory. In general, LSTM has better performance than BPTT on longer sequences.
9+
For BPTT and LSTM, BPTT-RNN is usually called as "simple RNN", since the structure of its hidden layer node is very simple. It's not good at preserving long time historical memory. LSTM-RNN is more complex than BPTT-RNN, since its hidden layer node has inner-structure for very long time historical memory. In general, LSTM has better performance than BPTT on longer sequences.
1010

11-
For native RNN output, many widely experiments and applications have proved that it has better results than tranditional algorithms, such as MMEM, for online sequence labeling tasks, such as speech recognition, auto suggestion and so on.
12-
13-
For RNN-CRF, based on native RNN outputs and their transition, we compute CRF output for entire sequence. Compred with native RNN, RNN-CRF has better performance for many different types of sequence labeling tasks in offline, such as word segmentation, named entity recognition and so on. With the similar feature set, it has better performance than linear CRF.
11+
For output layer, softmax output layer is the tranditional type which is widely used in online sequence labeling tasks, such as speech recognition, auto suggestion and so on. Negative sampling softmax output layer is especially used for the tasks with large output vocabulary, such as sequence generation tasks (sequence-to-sequence model). For recurrent CRF, based on softmax outputs and tags transition, RNNSharp computes CRF output for entire sequence. Compred with native RNN, RNN-CRF has better performance for many different types of sequence labeling tasks in offline, such as word segmentation, named entity recognition and so on. With the similar feature set, it has better performance than linear CRF.
1412

1513
For bi-directional RNN, the output result combines the result of both forward RNN and backward RNN. It usually has better performance than single-directional RNN.
1614

@@ -20,8 +18,11 @@ Here is an example of deep bi-directional RNN-CRF network. It contains 3 hidden
2018
Here is the inner structure of one bi-directional hidden layer.
2119
![](https://github.com/zhongkaifu/RNNSharp/blob/master/RNNSharpLayer.jpg)
2220

21+
Here is the neural network for sequence-to-sequence task. "TokenN" are from source sequence, and "ELayerX-Y" are auto-encoder's hidden layers. Auto-encoder is defined in feature configuration file. "<s>" is always the beginning of target sentence, and "DLayerX-Y" means the decoder's hidden layers. In decoder, it generates one token at one time until "</s>" is generated.
22+
![](https://github.com/zhongkaifu/RNNSharp/blob/master/RNNSharpSeq2Seq.jpg)
23+
2324
## Supported Feature Types
24-
RNNSharp supports four types of feature set. They are template features, context template features, run time feature and word embedding features. These features are controlled by configuration file, the following paragraph will introduce what these features are and how to use them in configuration file.
25+
RNNSharp supports four types of feature set. They are template features, context template features, run time feature and word embedding features. These features are controlled by configuration file, the following paragraph will introduce how these feaures work.
2526

2627
## Template Features
2728

@@ -155,9 +156,7 @@ Training corpus contains many records to describe what the model should be. For
155156

156157
In training file, each record can be represented as a matrix and ends with an empty line. In the matrix, each row describes one token and its features, and each column represents a feature in one dimension. In entire training corpus, the number of column must be fixed.
157158

158-
When RNNSharp encodes, if the column size is N, according template file describes, the first N-1 columns will be used as input data for binary feature set generation and model training. The Nth column (aka last column) is the answer of current token, which the model should output.
159-
160-
There is an example for named entity recognition task(The full training file is at release section, you can download it there):
159+
Sequence labeling task and sequence-to-sequence task have different training corpus format. For sequence labeling tasks, the first N-1 columns are input features for training, and the Nth column (aka last column) is the answer of current token. Here is an example for named entity recognition task(The full training file is at release section, you can download it there):
161160

162161
Word | Pos | Tag
163162
-----------|------|----
@@ -195,13 +194,31 @@ The named entity type looks like "Position_NamedEntityType". "Position" is the w
195194
ORGANIZATION : the name of one organization
196195
LOCATION : the name of one location
197196

197+
For sequence-to-sequence task, the training corpus format is different. For each sequence pair, it has two sections, one is source sequence, the other is target sequence. Here is an example:
198+
199+
Word
200+
--------
201+
What
202+
is
203+
your
204+
name
205+
?
206+
207+
I
208+
am
209+
Zhongkai
210+
Fu
211+
212+
In above example, "What is your name ?" is the source sentence, and "I am Zhongkai Fu" is the target sentence generated by RNNSharp seq-to-seq model. In source sentence, beside word features, any other feautes can be added for it as well such as postag feature in sequence labeling task in above.
213+
214+
198215
## Test file format
199216

200-
Test file has the similar format as training file. The only different between them is the last column. In test file, all columns are features for model decoding.
217+
Test file has the similar format as training file. For sequence labeling task, the only different between them is the last column. In test file, all columns are features for model decoding. For sequence-to-sequence task, it only contains source sequence. The target sentence will be generated by model.
201218

202-
## Tag Mapping File
219+
## Tag (Output Vocabulary) File
203220

204-
This file contains available result tags of the model. For readable, RNNSharp uses tag name in corpus, however, for high efficiency in encoding and decoding, tag names are mapped into integer values. The mapping is defined in a file (-tagfile as parameter in console tool). Each line is one tag name.
221+
For sequence labeling task, this file contains output tag set. For sequence-to-sequence task, it's output vocabulary file.
205222

206223
## Console Tool
207224

@@ -232,11 +249,16 @@ RNNSharpConsole.exe -mode train <parameters>
232249
-savestep <int>: save temporary model after every <int> sentence, default is 0
233250
-dir <int> : RNN directional: 0 - Forward RNN, 1 - Bi-directional RNN, default is 0
234251
-vq <int> : Model vector quantization, 0 is disable, 1 is enable. default is 0
252+
-seq2seq <boolean> : Train a sequence-to-sequence model if it's true, otherwise, train a sequence labeling model. Default is false
235253

236-
Example: RNNSharpConsole.exe -mode train -trainfile train.txt -validfile valid.txt -modelfile model.bin -ftrfile features.txt -tagfile tags.txt -hiddenlayertype BPTT -outputlayertype softmax -layersize 200,100 -alpha 0.1 -crf 1 -maxiter 20 -savestep 200K -dir 1 -vq 0 -grad 15.0
254+
Example for sequence labeling task: RNNSharpConsole.exe -mode train -trainfile train.txt -validfile valid.txt -modelfile model.bin -ftrfile features.txt -tagfile tags.txt -hiddenlayertype BPTT -outputlayertype softmax -layersize 200,100 -alpha 0.1 -crf 1 -maxiter 20 -savestep 200K -dir 1 -vq 0 -grad 15.0
237255

238256
This command trains a bi-directional recurrent neural network with CRF output. The network has two BPTT hidden layers and one softmax output layer. The first hidden layer size is 200 and the second hidden layer size is 100
239257

258+
Example for sequence-to-sequence task: RNNSharpConsole.exe -mode train -trainfile train.txt -modelfile model.bin -ftrfile features_seq2seq.txt -tagfile tags.txt -hiddenlayertype lstm -outputlayertype ncesoftmax -ncesamplesize 20 -layersize 300 -alpha 0.1 -crf 0 -maxiter 0 -savestep 200K -dir 0 -dropout 0 -seq2seq true
259+
260+
This command trains a forward-directional sequence-to-sequence LSTM model, and the output layer is negative sampling softmax. The encoder is defined in [AUTOENCODER_XXX] section in features_seq2seq.txt file.
261+
240262
### Decode Model
241263

242264
In this mode, the console tool is used to predict output tags of given corpus. The usage as follows:

RNNSharp.sln

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11

22
Microsoft Visual Studio Solution File, Format Version 12.00
33
# Visual Studio 14
4-
VisualStudioVersion = 14.0.24720.0
4+
VisualStudioVersion = 14.0.25420.1
55
MinimumVisualStudioVersion = 10.0.40219.1
66
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "RNNSharp", "RNNSharp\RNNSharp.csproj", "{1D37FF75-66F5-4814-AE48-EBF5CB9A56BF}"
77
EndProject
@@ -17,6 +17,7 @@ Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Solution Items", "Solution
1717
README.md = README.md
1818
RNNSharpLayer.jpg = RNNSharpLayer.jpg
1919
RNNSharpOverview.jpg = RNNSharpOverview.jpg
20+
RNNSharpSeq2Seq.jpg = RNNSharpSeq2Seq.jpg
2021
EndProjectSection
2122
EndProject
2223
Global

RNNSharp/BPTTLayer.cs

+9-10
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
using System.Runtime.CompilerServices;
55
using System.IO;
66
using AdvUtils;
7+
using System.Collections.Generic;
78
/// <summary>
89
/// RNNSharp written by Zhongkai Fu ([email protected])
910
/// </summary>
@@ -196,10 +197,9 @@ public override void computeLayer(SparseVector sparseFeature, double[] denseFeat
196197
if (SparseFeatureSize > 0)
197198
{
198199
double[] vector_b = SparseWeights[b];
199-
for (int i = 0; i < SparseFeature.Count; i++)
200+
foreach (KeyValuePair<int, float> pair in SparseFeature)
200201
{
201-
var entry = SparseFeature.GetEntry(i);
202-
score += entry.Value * vector_b[entry.Key];
202+
score += pair.Value * vector_b[pair.Key];
203203
}
204204
}
205205
cellOutput[b] += score;
@@ -324,10 +324,9 @@ private void learnBptt()
324324
{
325325
//sparse weight update hidden->input
326326
vector_a = SparseWeightsDelta[a];
327-
for (i = 0; i < sparse.Count; i++)
327+
foreach (KeyValuePair<int, float> pair in sparse)
328328
{
329-
var entry = sparse.GetEntry(i);
330-
vector_a[entry.Key] += er2 * entry.Value;
329+
vector_a[pair.Key] += er2 * pair.Value;
331330
}
332331
}
333332

@@ -396,7 +395,7 @@ private void learnBptt()
396395
vecDelta = RNNHelper.NormalizeGradient(vecDelta);
397396

398397
//Computing learning rate and update its weights
399-
Vector<double> vecLearningRate = RNNHelper.ComputeLearningRate(vecDelta, ref vecLearningRateWeights);
398+
Vector<double> vecLearningRate = RNNHelper.UpdateLearningRate(vecDelta, ref vecLearningRateWeights);
400399
vecLearningRateWeights.CopyTo(vector_lr, i);
401400

402401
//Update weights
@@ -439,7 +438,7 @@ private void learnBptt()
439438
vecDelta = RNNHelper.NormalizeGradient(vecDelta);
440439

441440
//Computing learning rate and update its weights
442-
Vector<double> vecLearningRate = RNNHelper.ComputeLearningRate(vecDelta, ref vecLearningRateWeights);
441+
Vector<double> vecLearningRate = RNNHelper.UpdateLearningRate(vecDelta, ref vecLearningRateWeights);
443442
vecLearningRateWeights.CopyTo(vector_lr, i);
444443

445444
//Update weights
@@ -477,9 +476,9 @@ private void learnBptt()
477476
if (sparse == null)
478477
break;
479478

480-
for (i = 0; i < sparse.Count; i++)
479+
foreach (KeyValuePair<int, float> pair in sparse)
481480
{
482-
int pos = sparse.GetEntry(i).Key;
481+
int pos = pair.Key;
483482

484483
double delta = RNNHelper.NormalizeGradient(vector_bf[pos]);
485484
double newLearningRate = RNNHelper.UpdateLearningRate(SparseWeightsLearningRate, b, pos, delta);

0 commit comments

Comments
 (0)