-
Notifications
You must be signed in to change notification settings - Fork 25
/
Copy path140. Word Break II.java
executable file
·312 lines (266 loc) · 11.3 KB
/
140. Word Break II.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
H
tags: DP, Backtracking, DFS, Memoization, Hash Table
time: O(n!)
space: O(n!)
找出所有 word break variations, given dictionary. (`Word Break I` only checks possibility)
利用 memoization: `Map<prefix, List<suffix variations>>`
#### DFS + Memoization, pick a prefix, and find a list of suffix candidates
- IMPORANT, Memoization: `Map<prefix, List<suffix variations>>` to build substring segments. Reduces repeated calculation if the substring has been tried.
- Realize the input s expands into a tree of possible prefixes.
- Find list of candidates from subproblem, and cross-match
- DFS returns List<String> segments of target s: every for loop takes a prefix substring, and append with all suffix (result of dfs)
- Time O(n!). Worst case, permutation of unique letters: `s= 'abcdef....'`, and `dict=[a,b,c,d,e,f...]`
#### Method2: DFS on suffix + memo of failed cases, like in WordBreakI
- DFS on string: find a valid prefix, dfs on the suffix, building individual candidate in list till substring exhaust.
- improvement:
- use memo to record failed case (solved the timeout issue explained below)
- use min/max to as boundary for dict check.
- core code is short; helper code is slightly longer
#### Method3: Regular DPs, kinda too slow
- 两个DP一起用, 解决了timeout的问题: when a invalid case 'aaaaaaaaa' occurs, isValid[] stops dfs from occuring
- 1. isWord[i][j], subString(i,j)是否存在dict中?
- 2. 用isWord加快 isValid[i]: [i ~ end]是否可以从dict中找到合理的解?
- 从末尾开始查看i:因为我们需要测试isWord[i][j]时候,j>i, 而我们观察的是[i,j]这区间;
- j>i的部分同样需要考虑,我们还需要知道isValid[0~j+1]。 所以isValid[x]这次是表示[x, end]是否valid的DP。
- i 从 末尾到0, 可能是因为考虑到isWord[i][j]都是在[0~n]之内,所以倒过来数,坐标比较容易搞清楚。
- (回头看Word Break I, 也有坐标反转的做法)
- 3. dfs 利用 isValid 和isWord做普通的DFS。
#### Timeout Note
- Regarding regular solution: 如果不做memoization或者dp, 'aaaaa....aaa' will repeatedly calculate same substring
- Regarding double DP solution: 在Word Break里面用了set.contains(...), 在isValid里面,i 从0开始. 但是, contains()本身是O(n); intead,用一个isWord[i][j],就O(1)判断了i~j是不是存在dictionary
```
/*
Given a non-empty string s and a dictionary wordDict containing a list of non-empty words,
add spaces in s to construct a sentence where each word is a valid dictionary word.
"Return all such possible sentences."
Note:
The same word in the dictionary may be reused multiple times in the segmentation.
You may assume the dictionary does not contain duplicate words.
Example 1:
Input:
s = "catsanddog"
wordDict = ["cat", "cats", "and", "sand", "dog"]
Output:
[
"cats and dog",
"cat sand dog"
]
Example 2:
Input:
s = "pineapplepenapple"
wordDict = ["apple", "pen", "applepen", "pine", "pineapple"]
Output:
[
"pine apple pen apple",
"pineapple pen apple",
"pine applepen apple"
]
Explanation: Note that you are allowed to reuse a dictionary word.
Example 3:
Input:
s = "catsandog"
wordDict = ["cats", "dog", "sand", "and", "cat"]
Output:
[]
*/
// Simplier solution, memoization
class Solution {
Map<String, List<String>> memo;
public List<String> wordBreak(String s, List<String> dict) {
List<String> rst = new ArrayList<>();
memo = new HashMap<>();
return dfs(new HashSet<>(dict), s);
}
private List<String> dfs(Set<String> dict, String s) {
if (memo.containsKey(s)) return memo.get(s); // calculated, just return
List<String> rst = new ArrayList<>();
if (s.length() == 0) return rst;
if (dict.contains(s)) rst.add(s); // total match word
// loop over form index -> n, find candidates, validate, dfs
StringBuffer sb = new StringBuffer();
for (int i = 1; i < s.length(); i++) {
sb.append(s.charAt(i - 1));
if (!dict.contains(sb.toString())) {
continue;
}
String suffix = s.substring(i);
List<String> segments = dfs(dict, suffix);
for (String segment : segments) {
rst.add(sb.toString() + " " + segment);
}
}
memo.put(s, rst);
return rst;
}
}
// Just use memo, with void dfs.
class Solution {
Map<String, List<String>> memo = new HashMap<>();
public List<String> wordBreak(String s, List<String> dict) {
List<String> rst = new ArrayList<>();
if (s == null || s.length() == 0 || dict == null || dict.size() == 0) return rst;
dfs(new HashSet<>(dict), s);
return memo.get(s);
}
private void dfs(Set<String> dict, String s) {
List<String> rst = new ArrayList<>();
if (dict.contains(s)) rst.add(s); // match word, populate suffix variation list
// loop over form index -> n: set prefix and corss-match with all possible suffix variations
for (int i = 1; i < s.length(); i++) {
String prefix = s.substring(0, i);
if (!dict.contains(prefix)) continue; // validation with dict
String suffix = s.substring(i);
if (suffix.length() > 0 && !memo.containsKey(suffix)) { // if calculated, skip dfs
dfs(dict, suffix);
}
List<String> segments = memo.get(suffix);
for (String segment : segments) {
rst.add(prefix + " " + segment);
}
}
memo.put(s, rst); // save result
}
}
// Method2: DFS like in WordBreakI, building individual candidate in list till substring exhaust
// Test front sub, and dfs on end substring. Use list to track candidates.
// End state: when i == end of string && test it with map.
// apply the same improvement as in wordBreakI: existing failure check, min/max boundary.
class Solution {
int min = 0, max = 0;
public List<String>wordBreak(String s, List<String> wordDict) {
List<String> rst = new ArrayList<>();
Set<String> dict = new HashSet<>(wordDict);
buildBoundary(dict);
dfs(rst, new ArrayList<>(), dict, new HashSet<>(), s);
return rst;
}
// core
private void dfs(List<String> rst, List<String> list, Set<String> dict, Set<String> failureSet, String s) {
int rstSize = 0;
if (validate(dict, s)) {
list.add(s);
rst.add(output(list));
list.remove(list.size() - 1);
}
StringBuffer sb = new StringBuffer();
for (int i = 0; i < s.length(); i++) {
sb.append(s.charAt(i));
String front = sb.toString();
if (validate(dict, front)) {
String end = s.substring(i + 1);
if (failureSet.contains(end)) continue;
list.add(front);
dfs(rst, list, dict, failureSet, end);
list.remove(list.size() - 1);
}
}
if (rst.size() == rstSize) failureSet.add(s);
}
// improvement fns: test boundary & dict contains
private void buildBoundary(Set<String> dict) {
for (String s : dict) {
int len = s.length();
min = Math.min(min, len);
max = Math.max(max, len);
}
}
private boolean validate(Set<String> dict, String s) {
int n = s.length();
if (n > max || n < min) return false;
return dict.contains(s);
}
// Helpers:
private String output(List<String> list) {
StringBuffer sb = new StringBuffer();
for (String s : list) sb.append(s).append(" ");
return sb.toString().trim();
}
}
/*
Works but supper slow.
Thoughts:
Dict is a look up table. We need to backtrack on s && result list.
When iterating over s && worst case of each remaining letter is a word, time: n * (n - 1) * (n - 2) * (n - 3) ... + 1 => O(n!)
return result when s string is traversed completely.
Questions to ask:
1. all chars in s need to be used?
2. can we jump over chars in s? NO
3. can we resume item in dict? YES
Problem:
dict.contains() is O(logN), so overall time went up to O(N! * LogN). Should improve.
DP1:
Is (i,j) a valid word form dictionary? DP[i][j] represents true/false valid word.
DP2:
for s[i,j] to be a valid entry in the result, s[0, i] has to be validated as well.
dp[i] represents: up to ith index, all substring ahead are valid
*/
class Solution {
public List<String> wordBreak(String s, List<String> wordDict) {
List<String> result = new ArrayList<>();
if(s == null || s.length() == 0 || wordDict == null || wordDict.size() == 0) {
return result;
}
boolean[][] isWord = isWord(wordDict, s);
boolean[] isValid = validatePossibility(wordDict, isWord, s);
helper(result, new ArrayList<String>(), isWord, isValid, s, 0);
return result;
}
public void helper(List<String> result, List<String> list, boolean[][] isWord, boolean[] isValid, String s, int start) {
if (!isValid[start]) {
return;
}
if (start >= s.length()) {
StringBuffer sb = new StringBuffer();
for (String word : list) {
sb.append(word + " ");
}
result.add(sb.toString().trim());
return;
}
for (int i = start; i < s.length(); i++) {
if (!isWord[start][i]) {//O(1)
continue;
}
list.add(s.substring(start, i + 1));
helper(result, list, isWord, isValid, s, i + 1);
list.remove(list.size() - 1);
}
}
// isWord[i][j]: is subString s[i, j] a word from dictionary?
public boolean[][] isWord(List<String> wordDict, String s) {
int n = s.length();
boolean[][] isWord = new boolean[n][n];
for (int i = 0; i < n; i++) {
for (int j = i; j < n; j++) {
isWord[i][j] = wordDict.contains(s.substring(i, j + 1));
}
}
return isWord;
}
/*
Verify: up to i letters, is it possible to satisfy the 'word break' rules?
Need to consider ith index: sequence DP, create dp[n + 1];
dp[i] = substring(i, j) valid && dp[after substring, j to end]
Calculating DP from right-side of the string: from the right side, we need to know the substring is valid, then move to left and check further
*/
public boolean[] validatePossibility(List<String> wordDict, boolean[][] isWord, String s) {
//optimize, find maxLength in wordDict to restrict string growth
int maxLen = 0;
for (String word : wordDict) {
maxLen = Math.max(maxLen, word.length());
}
int n = s.length();
boolean[] isValid = new boolean[n + 1];
isValid[n] = true;
for (int i = n - 1; i >= 0; i--) {
for (int j = i; j < n; j++) {
if (isWord[i][j] && isValid[j + 1]) {
isValid[i] = true;
break;
}
}
}
return isValid;
}
}
```