Skip to content

Commit e18d615

Browse files
authored
Workaround for CoNSeP Dataset Download Issue (#1691)
- can download the CoNSeP dataset from this mirror: https://opendatalab.com/OpenDataLab/CoNSeP - Each user is responsible for checking the content of datasets and the applicable licenses and determining if suitable for the intended use ### Checks <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [ ] Avoid including large-size files in the PR. - [ ] Clean up long text outputs from code cells in the notebook. - [ ] For security purposes, please check the contents and remove any sensitive info such as user names and private key. - [ ] Ensure (1) hyperlinks and markdown anchors are working (2) use relative paths for tutorial repo files (3) put figure and graphs in the `./figure` folder - [ ] Notebook runs automatically `./runner.sh -t <path to .ipynb file>` --------- Signed-off-by: YunLiu <[email protected]>
1 parent f534a1a commit e18d615

7 files changed

+22
-146
lines changed

pathology/hovernet/README.MD

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ It also has torch version notebooks to run training and evaluation.
88

99
### 1. Data
1010

11-
CoNSeP datasets which are used in the examples can be downloaded from <https://warwick.ac.uk/fac/cross_fac/tia/data/HoVerNet/>.
11+
CoNSeP datasets which are used in the examples are from <https://warwick.ac.uk/fac/cross_fac/tia/data/HoVerNet/>.
1212

1313
- First download CoNSeP dataset to `DATA_ROOT` (default is `"/workspace/Data/Pathology/CoNSeP"`).
1414
- Run `python prepare_patches.py` to prepare patches from images.

pathology/hovernet/hovernet_torch.ipynb

+2-9
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,6 @@
118118
"from matplotlib.patches import Rectangle\n",
119119
"from monai.config import print_config\n",
120120
"from monai.data import DataLoader, decollate_batch, CacheDataset, Dataset\n",
121-
"from monai.apps import download_and_extract\n",
122121
"from monai.networks.nets import HoVerNet\n",
123122
"from monai.metrics import DiceMetric\n",
124123
"from monai.transforms import (\n",
@@ -207,7 +206,7 @@
207206
"source": [
208207
"## Download dataset, prepare patch\n",
209208
"Each user is responsible for checking the content of datasets and the applicable licenses and determining if suitable for the intended use.\n",
210-
"1. download CoNSeP dataset\n",
209+
"1. Download CoNSeP dataset. The dataset is from [CoNSeP](https://warwick.ac.uk/fac/cross_fac/tia/data/HoVerNet/)\n",
211210
"2. run ./prepare_patches.py to prepare patches from images. \n",
212211
" Similar to https://github.com/vqdang/hover_net/blob/master/extract_patches.py"
213212
]
@@ -218,13 +217,7 @@
218217
"metadata": {},
219218
"outputs": [],
220219
"source": [
221-
"resource = \"https://warwick.ac.uk/fac/cross_fac/tia/data/hovernet/consep_dataset.zip\"\n",
222-
"md5 = \"a4fa18067849c536cba5fceee0427e81\"\n",
223-
"\n",
224-
"compressed_file = os.path.join(root_dir, \"consep_dataset.zip\")\n",
225-
"data_dir = os.path.join(root_dir, \"CoNSeP\")\n",
226-
"if not os.path.exists(data_dir):\n",
227-
" download_and_extract(resource, compressed_file, root_dir, md5)"
220+
"data_dir = os.path.join(root_dir, \"CoNSeP\")"
228221
]
229222
},
230223
{

pathology/nuclick/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ It also has notebooks to run inference over trained (monai-zoo) model.
44

55
### 1. Data
66

7-
Training these model requires data. Some public available datasets which are used in the examples can be downloaded from [ConSeP](https://warwick.ac.uk/fac/cross_fac/tia/data/hovernet).
7+
Training these model requires data. The datasets used in the examples are from [CoNSeP](https://warwick.ac.uk/fac/cross_fac/tia/data/HoVerNet/). Each user is responsible for checking the content of datasets and the applicable licenses and determining if suitable for the intended use.
88

99
### 2. Questions and bugs
1010

pathology/nuclick/nuclei_classification_infer.ipynb

+2-38
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,6 @@
6363
"import numpy as np\n",
6464
"import torch\n",
6565
"from monai.apps.nuclick.transforms import AddLabelAsGuidanced\n",
66-
"from monai.apps.utils import download_and_extract\n",
6766
"from monai.bundle import download\n",
6867
"from monai.config import print_config\n",
6968
"from monai.data import PILReader\n",
@@ -249,43 +248,6 @@
249248
" plt.show()"
250249
]
251250
},
252-
{
253-
"cell_type": "code",
254-
"execution_count": 7,
255-
"metadata": {},
256-
"outputs": [
257-
{
258-
"name": "stderr",
259-
"output_type": "stream",
260-
"text": [
261-
"consep_dataset.zip: 146MB [00:40, 3.76MB/s] "
262-
]
263-
},
264-
{
265-
"name": "stdout",
266-
"output_type": "stream",
267-
"text": [
268-
"2022-12-06 10:50:51,968 - INFO - Downloaded: /tmp/tmpct_gfbvg/consep_dataset.zip\n",
269-
"2022-12-06 10:50:51,970 - INFO - Expected md5 is None, skip md5 check for file /tmp/tmpct_gfbvg/consep_dataset.zip.\n",
270-
"2022-12-06 10:50:51,972 - INFO - Writing into directory: workspace.\n"
271-
]
272-
},
273-
{
274-
"name": "stderr",
275-
"output_type": "stream",
276-
"text": [
277-
"\n"
278-
]
279-
}
280-
],
281-
"source": [
282-
"consep_zip = \"https://warwick.ac.uk/fac/cross_fac/tia/data/hovernet/consep_dataset.zip\"\n",
283-
"consep_dir = os.path.join(workspace_path, \"CoNSeP\")\n",
284-
"\n",
285-
"if not os.path.exists(consep_dir):\n",
286-
" download_and_extract(consep_zip, output_dir=workspace_path)"
287-
]
288-
},
289251
{
290252
"cell_type": "code",
291253
"execution_count": 8,
@@ -310,6 +272,8 @@
310272
}
311273
],
312274
"source": [
275+
"consep_dir = os.path.join(workspace_path, \"CoNSeP\")\n",
276+
"\n",
313277
"image_file = os.path.join(consep_dir, \"Test\", \"Images\", \"test_12.png\")\n",
314278
"label_mat = os.path.join(consep_dir, \"Test\", \"Labels\", \"test_12.mat\")\n",
315279
"\n",

pathology/nuclick/nuclei_classification_training_notebook.ipynb

+8-42
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
"* Data pre-processing functions, as typically pathology data is given as a whole slide, which cannot be directly used for training as it's too large in size, in particular check `consep_nuclei_dataset`\n",
3838
"* Uses the dataset: \n",
3939
"\n",
40-
"This model is trained using [DenseNet121](https://docs.monai.io/en/latest/networks.html#densenet121) over [ConSeP](https://warwick.ac.uk/fac/cross_fac/tia/data/hovernet) dataset. \n",
40+
"This model is trained using [DenseNet121](https://docs.monai.io/en/latest/networks.html#densenet121) over [CoNSeP](https://warwick.ac.uk/fac/cross_fac/tia/data/hovernet) dataset. \n",
4141
"\n",
4242
"> Please note that this model uses existing label mask as additional signal input while training.\n",
4343
"\n",
@@ -92,7 +92,6 @@
9292
"import torch.distributed\n",
9393
"from IPython.display import Image as IImage\n",
9494
"from monai.apps.nuclick.transforms import AddLabelAsGuidanced, SetLabelClassd, SplitLabeld\n",
95-
"from monai.apps.utils import download_and_extract\n",
9695
"from monai.config import IgniteInfo, print_config\n",
9796
"from monai.data import CacheDataset, DataLoader\n",
9897
"from monai.engines import SupervisedEvaluator, SupervisedTrainer\n",
@@ -146,44 +145,8 @@
146145
"cell_type": "markdown",
147146
"metadata": {},
148147
"source": [
149-
"## Configure Workspace Path"
150-
]
151-
},
152-
{
153-
"cell_type": "code",
154-
"execution_count": 4,
155-
"metadata": {},
156-
"outputs": [
157-
{
158-
"name": "stderr",
159-
"output_type": "stream",
160-
"text": [
161-
"consep_dataset.zip: 146MB [00:23, 6.41MB/s] "
162-
]
163-
},
164-
{
165-
"name": "stdout",
166-
"output_type": "stream",
167-
"text": [
168-
"2022-12-06 10:46:41,905 - INFO - Downloaded: /tmp/tmp834vb_2l/consep_dataset.zip\n",
169-
"2022-12-06 10:46:41,907 - INFO - Expected md5 is None, skip md5 check for file /tmp/tmp834vb_2l/consep_dataset.zip.\n",
170-
"2022-12-06 10:46:41,910 - INFO - Writing into directory: workspace.\n"
171-
]
172-
},
173-
{
174-
"name": "stderr",
175-
"output_type": "stream",
176-
"text": [
177-
"\n"
178-
]
179-
}
180-
],
181-
"source": [
182-
"consep_zip = \"https://warwick.ac.uk/fac/cross_fac/tia/data/hovernet/consep_dataset.zip\"\n",
183-
"consep_dir = os.path.join(workspace_path, \"CoNSeP\")\n",
184-
"\n",
185-
"if not os.path.exists(consep_dir):\n",
186-
" download_and_extract(consep_zip, output_dir=workspace_path)"
148+
"## Configure Workspace Path\n",
149+
"The datasets used in the examples are from [CoNSeP](https://warwick.ac.uk/fac/cross_fac/tia/data/HoVerNet/). Each user is responsible for checking the content of datasets and the applicable licenses and determining if suitable for the intended use."
187150
]
188151
},
189152
{
@@ -204,6 +167,9 @@
204167
}
205168
],
206169
"source": [
170+
"# consep_dir points to the root directory of the consep dataset\n",
171+
"consep_dir = os.path.join(workspace_path, \"CoNSeP\")\n",
172+
"\n",
207173
"IImage(filename=os.path.join(consep_dir, \"Train\", \"Overlay\", \"train_8.png\"))"
208174
]
209175
},
@@ -213,7 +179,7 @@
213179
"source": [
214180
"## Pre-processing utility functions\n",
215181
"\n",
216-
"`consep_nuclei_dataset` reads the raw Image and Matlab files provided in ConSeP dataset. For each Nuclei it tries to create a patch of 128x128 with single nuclei labeled with correspnding class index and rest of the nuclei falling in this patch are labeled as others (mask_value: 255)\n",
182+
"The `consep_nuclei_dataset` function processes the raw image and Matlab files from the CoNSeP dataset. For each nucleus, it generates a 128x128 patch wherein the target nucleus is labeled with its corresponding class index, and all other nuclei within the patch are labeled as 'others' (using a mask value of 255).\n",
217183
"\n"
218184
]
219185
},
@@ -225,7 +191,7 @@
225191
"source": [
226192
"def consep_nuclei_dataset(datalist, output_dir, crop_size, min_area=80, min_distance=20, limit=0) -> List[Dict]:\n",
227193
" \"\"\"\n",
228-
" Utility to pre-process and create dataset list for Patches per Nuclei for training over ConSeP dataset.\n",
194+
" Utility to pre-process and create a dataset list for Patches per Nuclei for training over CoNSeP dataset.\n",
229195
"\n",
230196
" Args:\n",
231197
" datalist: A list of data dictionary. Each entry should at least contain 'image_key': <image filename>.\n",

pathology/nuclick/nuclick_infer.ipynb

+2-38
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,6 @@
5757
"import numpy as np\n",
5858
"import torch\n",
5959
"from monai.apps.nuclick.transforms import AddClickSignalsd, PostFilterLabeld\n",
60-
"from monai.apps.utils import download_and_extract\n",
6160
"from monai.bundle import download\n",
6261
"from monai.config import print_config\n",
6362
"from monai.data import PILReader\n",
@@ -247,43 +246,6 @@
247246
" plt.show()"
248247
]
249248
},
250-
{
251-
"cell_type": "code",
252-
"execution_count": 7,
253-
"metadata": {},
254-
"outputs": [
255-
{
256-
"name": "stderr",
257-
"output_type": "stream",
258-
"text": [
259-
"consep_dataset.zip: 146MB [00:26, 5.86MB/s] "
260-
]
261-
},
262-
{
263-
"name": "stdout",
264-
"output_type": "stream",
265-
"text": [
266-
"2022-12-06 10:38:08,238 - INFO - Downloaded: /tmp/tmp7_dupxgu/consep_dataset.zip\n",
267-
"2022-12-06 10:38:08,239 - INFO - Expected md5 is None, skip md5 check for file /tmp/tmp7_dupxgu/consep_dataset.zip.\n",
268-
"2022-12-06 10:38:08,241 - INFO - Writing into directory: workspace.\n"
269-
]
270-
},
271-
{
272-
"name": "stderr",
273-
"output_type": "stream",
274-
"text": [
275-
"\n"
276-
]
277-
}
278-
],
279-
"source": [
280-
"consep_zip = \"https://warwick.ac.uk/fac/cross_fac/tia/data/hovernet/consep_dataset.zip\"\n",
281-
"consep_dir = os.path.join(workspace_path, \"CoNSeP\")\n",
282-
"\n",
283-
"if not os.path.exists(consep_dir):\n",
284-
" download_and_extract(consep_zip, output_dir=workspace_path)"
285-
]
286-
},
287249
{
288250
"cell_type": "code",
289251
"execution_count": 8,
@@ -309,6 +271,8 @@
309271
}
310272
],
311273
"source": [
274+
"consep_dir = os.path.join(workspace_path, \"CoNSeP\")\n",
275+
"\n",
312276
"image_file = os.path.join(consep_dir, \"Test\", \"Images\", \"test_12.png\")\n",
313277
"foreground = [[190, 15], [218, 32], [296, 96]]\n",
314278
"\n",

pathology/nuclick/nuclick_training_notebook.ipynb

+6-17
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,6 @@
9494
"import torch.distributed\n",
9595
"from IPython.display import Image as IImage\n",
9696
"from monai.apps.nuclick.transforms import AddPointGuidanceSignald, SplitLabeld\n",
97-
"from monai.apps.utils import download_and_extract\n",
9897
"from monai.config import IgniteInfo, print_config\n",
9998
"from monai.data import CacheDataset, DataLoader\n",
10099
"from monai.engines import SupervisedEvaluator, SupervisedTrainer\n",
@@ -148,20 +147,7 @@
148147
"metadata": {},
149148
"source": [
150149
"## Configure Workspace Path\n",
151-
"Please note this dataset is under noncommercial license. You may not use it for commercial purpose."
152-
]
153-
},
154-
{
155-
"cell_type": "code",
156-
"execution_count": 4,
157-
"metadata": {},
158-
"outputs": [],
159-
"source": [
160-
"consep_zip = \"https://warwick.ac.uk/fac/cross_fac/tia/data/hovernet/consep_dataset.zip\"\n",
161-
"consep_dir = os.path.join(workspace_path, \"CoNSeP\")\n",
162-
"\n",
163-
"if not os.path.exists(consep_dir):\n",
164-
" download_and_extract(consep_zip, output_dir=workspace_path)"
150+
"The datasets used in the examples are from [CoNSeP](https://warwick.ac.uk/fac/cross_fac/tia/data/HoVerNet/). Each user is responsible for checking the content of datasets and the applicable licenses and determining if suitable for the intended use."
165151
]
166152
},
167153
{
@@ -182,6 +168,9 @@
182168
}
183169
],
184170
"source": [
171+
"# consep_dir points to the root directory of the consep dataset\n",
172+
"consep_dir = os.path.join(workspace_path, \"CoNSeP\")\n",
173+
"\n",
185174
"IImage(filename=os.path.join(consep_dir, \"Train\", \"Overlay\", \"train_1.png\"))"
186175
]
187176
},
@@ -191,7 +180,7 @@
191180
"source": [
192181
"## Pre-processing utility functions\n",
193182
"\n",
194-
"`consep_nuclei_dataset` reads the raw Image and Matlab files provided in ConSeP dataset. For each Nuclei it tries to create a patch of 128x128 with single nuclei labeled with correspnding class index and rest of the nuclei falling in this patch are labeled as others (mask_value: 255)\n",
183+
"The `consep_nuclei_dataset` function processes the raw image and Matlab files from the CoNSeP dataset. For each nucleus, it generates a 128x128 patch wherein the target nucleus is labeled with its corresponding class index, and all other nuclei within the patch are labeled as 'others' (using a mask value of 255).\n",
195184
"\n"
196185
]
197186
},
@@ -203,7 +192,7 @@
203192
"source": [
204193
"def consep_nuclei_dataset(datalist, output_dir, crop_size, min_area=80, min_distance=20, limit=0) -> List[Dict]:\n",
205194
" \"\"\"\n",
206-
" Utility to pre-process and create dataset list for Patches per Nuclei for training over ConSeP dataset.\n",
195+
" Utility to pre-process and create a dataset list for Patches per Nuclei for training over CoNSeP dataset.\n",
207196
"\n",
208197
" Args:\n",
209198
" datalist: A list of data dictionary. Each entry should at least contain 'image_key': <image filename>.\n",

0 commit comments

Comments
 (0)