Skip to content

Commit 5f4c6e8

Browse files
thomwolflhoestq
andauthored
Quick fix :) (#606)
* Changing the name * style + quality * update doc and logo * clean up * circle-CI on the branche for now * fix daily dialog dataset * fix urls Co-authored-by: Quentin Lhoest <[email protected]>
1 parent c53558f commit 5f4c6e8

File tree

428 files changed

+5147
-4898
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

428 files changed

+5147
-4898
lines changed

.circleci/config.yml

+7-7
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
version: 2
22
jobs:
33
run_dataset_script_tests_pyarrow_0p17:
4-
working_directory: ~/nlp
4+
working_directory: ~/datasets
55
docker:
66
- image: circleci/python:3.6
77
resource_class: medium
@@ -11,10 +11,10 @@ jobs:
1111
- run: source venv/bin/activate
1212
- run: pip install .[tests]
1313
- run: pip install pyarrow==0.17.1
14-
- run: HF_SCRIPTS_VERSION=master python -m pytest -sv ./tests/
14+
- run: HF_SCRIPTS_VERSION=datasets python -m pytest -sv ./tests/
1515

1616
run_dataset_script_tests_pyarrow_1:
17-
working_directory: ~/nlp
17+
working_directory: ~/datasets
1818
docker:
1919
- image: circleci/python:3.6
2020
resource_class: medium
@@ -24,10 +24,10 @@ jobs:
2424
- run: source venv/bin/activate
2525
- run: pip install .[tests]
2626
- run: pip install pyarrow==1.0.0
27-
- run: HF_SCRIPTS_VERSION=master python -m pytest -sv ./tests/
27+
- run: HF_SCRIPTS_VERSION=datasets python -m pytest -sv ./tests/
2828

2929
check_code_quality:
30-
working_directory: ~/nlp
30+
working_directory: ~/datasets
3131
docker:
3232
- image: circleci/python:3.6
3333
resource_class: medium
@@ -39,7 +39,7 @@ jobs:
3939
- run: isort --check-only tests src benchmarks datasets metrics
4040
- run: flake8 tests src benchmarks datasets metrics
4141
build_doc:
42-
working_directory: ~/nlp
42+
working_directory: ~/datasets
4343
docker:
4444
- image: circleci/python:3.6
4545
steps:
@@ -49,7 +49,7 @@ jobs:
4949
- store_artifacts:
5050
path: ./docs/_build
5151
deploy_doc:
52-
working_directory: ~/nlp
52+
working_directory: ~/datasets
5353
docker:
5454
- image: circleci/python:3.6
5555
steps:

.circleci/deploy.sh

+3-3
Original file line numberDiff line numberDiff line change
@@ -28,12 +28,12 @@ function deploy_doc(){
2828
fi
2929
}
3030

31-
# You can find the commit for each tag on https://github.com/huggingface/nlp/tags
32-
# Deploys the master documentation on huggingface.co/nlp/master
31+
# You can find the commit for each tag on https://github.com/huggingface/datasets/tags
32+
# Deploys the master documentation on huggingface.co/datasets/master
3333
deploy_doc "master" master
3434

3535
# Example of how to deploy a doc on a certain commit (the commit doesn't have to be on the master branch).
36-
# The following commit would live on huggingface.co/nlp/v1.0.0
36+
# The following commit would live on huggingface.co/datasets/v1.0.0
3737
#deploy_doc "b33a385" v1.0.0
3838
deploy_doc "99e0ee6" v0.3.0
3939
deploy_doc "21e8091" v0.4.0

AUTHORS

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# This is the list of HuggingFace NLP authors for copyright purposes.
1+
# This is the list of HuggingFace Datasets authors for copyright purposes.
22
#
33
# This does not necessarily list everyone who has contributed code, since in
44
# some cases, their employer may be the copyright holder. To see the full list

CONTRIBUTING.md

+15-15
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
1-
# How to contribute to nlp?
1+
# How to contribute to Datasets?
22

3-
1. Fork the [repository](https://github.com/huggingface/nlp) by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.
3+
1. Fork the [repository](https://github.com/huggingface/datasets) by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.
44

55
2. Clone your fork to your local disk, and add the base repository as a remote:
66

77
```bash
8-
git clone [email protected]:<your Github handle>/nlp.git
9-
cd nlp
10-
git remote add upstream https://github.com/huggingface/nlp.git
8+
git clone [email protected]:<your Github handle>/datasets.git
9+
cd datasets
10+
git remote add upstream https://github.com/huggingface/datasets.git
1111
```
1212

1313
3. Create a new branch to hold your development changes:
@@ -24,11 +24,11 @@
2424
pip install -e ".[dev]"
2525
```
2626

27-
(If nlp was already installed in the virtual environment, remove
28-
it with `pip uninstall nlp` before reinstalling it in editable
27+
(If datasets was already installed in the virtual environment, remove
28+
it with `pip uninstall datasets` before reinstalling it in editable
2929
mode with the `-e` flag.)
3030

31-
5. Develop the features on your branch. If you want to add a dataset see more in-detail intsructions in the section [*How to add a dataset*](#how-to-add-a-dataset). Alternatively, you can follow the steps to [add a dataset](https://huggingface.co/nlp/add_dataset.html) and [share a dataset](https://huggingface.co/nlp/share_dataset.html) in the documentation.
31+
5. Develop the features on your branch. If you want to add a dataset see more in-detail intsructions in the section [*How to add a dataset*](#how-to-add-a-dataset). Alternatively, you can follow the steps to [add a dataset](https://huggingface.co/datasets/add_dataset.html) and [share a dataset](https://huggingface.co/datasets/share_dataset.html) in the documentation.
3232

3333
6. Format your code. Run black and isort so that your newly added files look nice with the following command:
3434

@@ -60,20 +60,20 @@
6060
8. Once you are satisfied, go the webpage of your fork on GitHub. Click on "Pull request" to send your to the project maintainers for review.
6161

6262
## How-To-Add a dataset
63-
1. Make sure you followed steps 1-4 of the section [*How to contribute to nlp?*](#how-to-contribute-to-nlp).
63+
1. Make sure you followed steps 1-4 of the section [*How to contribute to datasets?*](#how-to-contribute-to-datasets).
6464

65-
2. Create your dataset folder under `datasets/<your_dataset_name>` and create your dataset script under `datasets/<your_dataset_name>/<your_dataset_name>.py`. You can check out other dataset scripts under `datasets` for some inspiration. Note on naming: the dataset class should be camel case, while the dataset name is its snake case equivalent (ex: `class BookCorpus(nlp.GeneratorBasedBuilder)` for the dataset `book_corpus`).
65+
2. Create your dataset folder under `datasets/<your_dataset_name>` and create your dataset script under `datasets/<your_dataset_name>/<your_dataset_name>.py`. You can check out other dataset scripts under `datasets` for some inspiration. Note on naming: the dataset class should be camel case, while the dataset name is its snake case equivalent (ex: `class BookCorpus(datasets.GeneratorBasedBuilder)` for the dataset `book_corpus`).
6666

67-
3. **Make sure you run all of the following commands from the root of your `nlp` git clone.** To check that your dataset works correctly and to create its `dataset_infos.json` file run the command:
67+
3. **Make sure you run all of the following commands from the root of your `datasets` git clone.** To check that your dataset works correctly and to create its `dataset_infos.json` file run the command:
6868

6969
```bash
70-
python nlp-cli test datasets/<your-dataset-folder> --save_infos --all_configs
70+
python datasets-cli test datasets/<your-dataset-folder> --save_infos --all_configs
7171
```
7272

7373
4. If the command was succesful, you should now create some dummy data. Use the following command to get in-detail instructions on how to create the dummy data:
7474

7575
```bash
76-
python nlp-cli dummy_data datasets/<your-dataset-folder>
76+
python datasets-cli dummy_data datasets/<your-dataset-folder>
7777
```
7878

7979
5. Now test that both the real data and the dummy data work correctly using the following commands:
@@ -89,7 +89,7 @@
8989
RUN_SLOW=1 pytest tests/test_dataset_common.py::LocalDatasetTest::test_load_dataset_all_configs_<your-dataset-name>
9090
```
9191

92-
6. If all tests pass, your dataset works correctly. Awesome! You can now follow steps 6, 7 and 8 of the section [*How to contribute to nlp?*](#how-to-contribute-to-nlp). If you experience problems with the dummy data tests, you might want to take a look at the section *Help for dummy data tests* below.
92+
6. If all tests pass, your dataset works correctly. Awesome! You can now follow steps 6, 7 and 8 of the section [*How to contribute to 🤗Datasets?*](#how-to-contribute-to-🤗Datasets). If you experience problems with the dummy data tests, you might want to take a look at the section *Help for dummy data tests* below.
9393

9494

9595
### Help for dummy data tests
@@ -98,7 +98,7 @@ Follow these steps in case the dummy data test keeps failing:
9898

9999
- Verify that all filenames are spelled correctly. Rerun the command
100100
```bash
101-
python nlp-cli dummy_data datasets/<your-dataset-folder>
101+
python datasets-cli dummy_data datasets/<your-dataset-folder>
102102
```
103103
and make sure you follow the exact instructions provided by the command of step 5).
104104

0 commit comments

Comments
 (0)