Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create NeptuneHook mechanism for automatic metadata logging #1

Merged
merged 33 commits into from
Dec 30, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
1c07e5b
add code for NeptuneHook
AleksanderWWW Dec 12, 2022
fb2581e
apply pre-commit suggestions
AleksanderWWW Dec 12, 2022
a028895
make logging final model conditional
AleksanderWWW Dec 14, 2022
35336ab
add e2e test
AleksanderWWW Dec 15, 2022
81bdc4d
add output dir to gitignore
AleksanderWWW Dec 15, 2022
821b560
add torch to pyproject and pip installation of detectron to workflow
AleksanderWWW Dec 15, 2022
d604e9e
make custom_run_id a local variable
AleksanderWWW Dec 15, 2022
2c75c71
remove windows from workflow
AleksanderWWW Dec 16, 2022
e7484b2
add run syncing before assertions
AleksanderWWW Dec 16, 2022
d9c2145
give time to upload files
AleksanderWWW Dec 16, 2022
b8ba20a
temporarily remove problematic assert to see how the rest goes
AleksanderWWW Dec 16, 2022
65326e1
explicitly pass run to NeptuneHook and call sync after training
AleksanderWWW Dec 16, 2022
567c8dc
sync active run, not the closed one
AleksanderWWW Dec 16, 2022
78f86e0
sync before stoping run
AleksanderWWW Dec 16, 2022
c502ad5
change connecting with custom id to run id
AleksanderWWW Dec 19, 2022
e6f21c4
force installing lower version of numpy
AleksanderWWW Dec 19, 2022
922ca97
add sync after uploading checkpoint
AleksanderWWW Dec 19, 2022
0ade49b
increase number of epochs
AleksanderWWW Dec 19, 2022
dfaec88
fix checkpointing error
AleksanderWWW Dec 19, 2022
38fc29c
add removing checkpoint files after train (+sync before)
AleksanderWWW Dec 20, 2022
a13e8ea
force lower version of fvcore
AleksanderWWW Dec 20, 2022
1efc286
force precise version of fvcore
AleksanderWWW Dec 20, 2022
de82c8c
fix typo
AleksanderWWW Dec 20, 2022
44304e4
bring back previous version specification of fvcore
AleksanderWWW Dec 20, 2022
5465adf
modularize the code - create private methods for individual activities
AleksanderWWW Dec 20, 2022
c2fdbe8
verify type of config in _log_config method
AleksanderWWW Dec 20, 2022
c786969
verityf type of run before creating base handler
AleksanderWWW Dec 22, 2022
0ff6907
fix checkpointing issue by uploading from stream
AleksanderWWW Dec 27, 2022
56e1a7a
apply review suggestions
AleksanderWWW Dec 28, 2022
27793e2
test accuracy, not loss
AleksanderWWW Dec 28, 2022
24091e5
remove TODOs
AleksanderWWW Dec 30, 2022
a876116
delete train images
AleksanderWWW Dec 30, 2022
b101f49
add train images to gitignore
AleksanderWWW Dec 30, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .github/actions/e2e/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,12 @@ runs:
run: pip install -e .[dev]
shell: bash

- name: Install detectron2
working-directory: ${{ inputs.working_directory }}
run: pip install 'git+https://github.com/facebookresearch/detectron2.git'
shell: bash

- name: Run tests
working-directory: ${{ inputs.working_directory }}
run: pytest -v --doctest-modules
shell: bash
shell: bash
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why not Windows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os: [ubuntu-latest, macos-latest]
python-version: [3.9]
steps:
- uses: actions/checkout@v2
Expand Down Expand Up @@ -60,4 +60,4 @@ jobs:
with:
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
packages_dir: dist
packages_dir: dist
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -124,3 +124,8 @@ venv.bak/
.vscode

stream.bin

#detectron2
output/

datasets/coco/train2014/*jpg
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@
- First ([#2](https://github.com/neptune-ai/neptune-detectron2/pull/1))

### Fixes
- First ([#2](https://github.com/neptune-ai/neptune-detectron2/pull/1))
- First ([#2](https://github.com/neptune-ai/neptune-detectron2/pull/1))
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# Neptune - detectron2 integration

TODO: Update docs link
See [the official docs](https://docs.neptune.ai/integrations-and-supported-tools/model-training/).
See [the official docs](https://docs.neptune.ai/integrations-and-supported-tools/model-training/).
18 changes: 18 additions & 0 deletions datasets/coco/annotations/instances_train2014.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"info":
{
"description": "COCO 2014 Dataset",
"url": "http://cocodataset.org",
"version": "1.0",
"year": 2014,
"contributor": "COCO Consortium",
"date_created": "2017/09/01"
},
"images":
[{
"license": 5, "file_name": "COCO_train2014_000000057870.jpg", "coco_url": "http://images.cocodataset.org/train2014/COCO_train2014_000000057870.jpg", "height": 480, "width": 640, "date_captured": "2013-11-14 16:28:13", "flickr_url": "http://farm4.staticflickr.com/3153/2970773875_164f0c0b83_z.jpg", "id": 57870}, {"license": 5, "file_name": "COCO_train2014_000000384029.jpg", "coco_url": "http://images.cocodataset.org/train2014/COCO_train2014_000000384029.jpg", "height": 429, "width": 640, "date_captured": "2013-11-14 16:29:45", "flickr_url": "http://farm3.staticflickr.com/2422/3577229611_3a3235458a_z.jpg", "id": 384029}],
"categories": [{"supercategory": "person","id": 1,"name": "person"},{"supercategory": "vehicle","id": 2,"name": "bicycle"},{"supercategory": "vehicle","id": 3,"name": "car"},{"supercategory": "vehicle","id": 4,"name": "motorcycle"},{"supercategory": "vehicle","id": 5,"name": "airplane"},{"supercategory": "vehicle","id": 6,"name": "bus"},{"supercategory": "vehicle","id": 7,"name": "train"},{"supercategory": "vehicle","id": 8,"name": "truck"},{"supercategory": "vehicle","id": 9,"name": "boat"},{"supercategory": "outdoor","id": 10,"name": "traffic light"},{"supercategory": "outdoor","id": 11,"name": "fire hydrant"},{"supercategory": "outdoor","id": 13,"name": "stop sign"},{"supercategory": "outdoor","id": 14,"name": "parking meter"},{"supercategory": "outdoor","id": 15,"name": "bench"},{"supercategory": "animal","id": 16,"name": "bird"},{"supercategory": "animal","id": 17,"name": "cat"},{"supercategory": "animal","id": 18,"name": "dog"},{"supercategory": "animal","id": 19,"name": "horse"},{"supercategory": "animal","id": 20,"name": "sheep"},{"supercategory": "animal","id": 21,"name": "cow"},{"supercategory": "animal","id": 22,"name": "elephant"},{"supercategory": "animal","id": 23,"name": "bear"},{"supercategory": "animal","id": 24,"name": "zebra"},{"supercategory": "animal","id": 25,"name": "giraffe"},{"supercategory": "accessory","id": 27,"name": "backpack"},{"supercategory": "accessory","id": 28,"name": "umbrella"},{"supercategory": "accessory","id": 31,"name": "handbag"},{"supercategory": "accessory","id": 32,"name": "tie"},{"supercategory": "accessory","id": 33,"name": "suitcase"},{"supercategory": "sports","id": 34,"name": "frisbee"},{"supercategory": "sports","id": 35,"name": "skis"},{"supercategory": "sports","id": 36,"name": "snowboard"},{"supercategory": "sports","id": 37,"name": "sports ball"},{"supercategory": "sports","id": 38,"name": "kite"},{"supercategory": "sports","id": 39,"name": "baseball bat"},{"supercategory": "sports","id": 40,"name": "baseball glove"},{"supercategory": "sports","id": 41,"name": "skateboard"},{"supercategory": "sports","id": 42,"name": "surfboard"},{"supercategory": "sports","id": 43,"name": "tennis racket"},{"supercategory": "kitchen","id": 44,"name": "bottle"},{"supercategory": "kitchen","id": 46,"name": "wine glass"},{"supercategory": "kitchen","id": 47,"name": "cup"},{"supercategory": "kitchen","id": 48,"name": "fork"},{"supercategory": "kitchen","id": 49,"name": "knife"},{"supercategory": "kitchen","id": 50,"name": "spoon"},{"supercategory": "kitchen","id": 51,"name": "bowl"},{"supercategory": "food","id": 52,"name": "banana"},{"supercategory": "food","id": 53,"name": "apple"},{"supercategory": "food","id": 54,"name": "sandwich"},{"supercategory": "food","id": 55,"name": "orange"},{"supercategory": "food","id": 56,"name": "broccoli"},{"supercategory": "food","id": 57,"name": "carrot"},{"supercategory": "food","id": 58,"name": "hot dog"},{"supercategory": "food","id": 59,"name": "pizza"},{"supercategory": "food","id": 60,"name": "donut"},{"supercategory": "food","id": 61,"name": "cake"},{"supercategory": "furniture","id": 62,"name": "chair"},{"supercategory": "furniture","id": 63,"name": "couch"},{"supercategory": "furniture","id": 64,"name": "potted plant"},{"supercategory": "furniture","id": 65,"name": "bed"},{"supercategory": "furniture","id": 67,"name": "dining table"},{"supercategory": "furniture","id": 70,"name": "toilet"},{"supercategory": "electronic","id": 72,"name": "tv"},{"supercategory": "electronic","id": 73,"name": "laptop"},{"supercategory": "electronic","id": 74,"name": "mouse"},{"supercategory": "electronic","id": 75,"name": "remote"},{"supercategory": "electronic","id": 76,"name": "keyboard"},{"supercategory": "electronic","id": 77,"name": "cell phone"},{"supercategory": "appliance","id": 78,"name": "microwave"},{"supercategory": "appliance","id": 79,"name": "oven"},{"supercategory": "appliance","id": 80,"name": "toaster"},{"supercategory": "appliance","id": 81,"name": "sink"},{"supercategory": "appliance","id": 82,"name": "refrigerator"},{"supercategory": "indoor","id": 84,"name": "book"},{"supercategory": "indoor","id": 85,"name": "clock"},{"supercategory": "indoor","id": 86,"name": "vase"},{"supercategory": "indoor","id": 87,"name": "scissors"},{"supercategory": "indoor","id": 88,"name": "teddy bear"},{"supercategory": "indoor","id": 89,"name": "hair drier"},{"supercategory": "indoor","id": 90,"name": "toothbrush"}],
"annotations": [{"segmentation": [[91.48, 476.07, 90.67, 425.33, 130.76, 394.24, 126.67, 374.6, 116.85, 348.41, 119.31, 326.32, 136.49, 297.68, 161.86, 281.31, 189.68, 271.49, 217.5, 272.31, 247.78, 287.86, 262.51, 310.77, 259.24, 327.95, 255.96, 348.41, 243.69, 372.14, 228.96, 404.87, 223.23, 432.7, 236.32, 480.0, 217.5, 477.7, 204.41, 437.61, 193.77, 431.06, 191.32, 474.43, 182.32, 478.52, 163.49, 478.52, 173.31, 453.97, 174.13, 435.15, 161.86, 453.97, 155.31, 480.0, 136.49, 480.0, 129.94, 473.61, 127.49, 464.61, 121.76, 456.43, 116.03, 449.88, 108.67, 446.61, 107.03, 476.88, 97.21, 476.88]], "area": 23717.007900000004, "iscrowd": 0, "image_id": 57870, "bbox": [90.67, 271.49, 171.84, 208.51], "category_id": 62, "id": 2190436},
{"segmentation": [[62.5, 252.1, 48.73, 259.52, 33.9, 274.35, 33.9, 293.41, 30.72, 311.42, 36.01, 328.37, 51.9, 341.08, 60.38, 347.44, 80.5, 352.73, 100.63, 352.73, 106.99, 347.44, 145.12, 333.67, 157.83, 328.37, 175.84, 317.78, 188.55, 305.07, 194.9, 292.36, 197.02, 276.47, 191.73, 269.05, 183.25, 249.99, 183.25, 248.93, 172.66, 241.51, 163.13, 238.33, 149.36, 230.92, 143.0, 231.98, 131.35, 230.92, 118.64, 231.98, 99.57, 234.1, 94.27, 239.39, 75.21, 244.69, 68.85, 249.99]], "area": 14980.245250000007, "iscrowd": 0, "image_id": 384029, "bbox": [30.72, 230.92, 166.3, 121.81], "category_id": 61, "id": 2222000}
]
}
6 changes: 4 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,15 @@ python = "^3.7"
# Python lack of functionalities from future versions
importlib-metadata = { version = "*", python = "<3.8" }

# TODO: Base requirements
neptune-client = ">=0.10.0"
numpy = "<1.24.0"
fvcore = "<0.1.5.post20221220"

# dev
pre-commit = { version = "*", optional = true }
pytest = { version = ">=5.0", optional = true }
pytest-cov = { version = "2.10.1", optional = true }
torch = "^1.13.0"

[tool.poetry.extras]
dev = [
Expand Down Expand Up @@ -92,4 +94,4 @@ force_grid_wrap = 2

[tool.flake8]
max-line-length = 120
extend-ignore = "E203"
extend-ignore = "E203"
7 changes: 6 additions & 1 deletion src/neptune_detectron2/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,12 @@
# limitations under the License.
#

__all__ = [
"NeptuneHook",
"__version__",
]

from neptune_detectron2.impl import (
NeptuneHook,
__version__,
# TODO: add objects that are to be imported from the impl package e.g. NeptuneCallback
)
111 changes: 107 additions & 4 deletions src/neptune_detectron2/impl/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,26 +16,129 @@

__all__ = [
"__version__",
# TODO: add importable public names here, `neptune-client` uses `import *`
# https://docs.python.org/3/tutorial/modules.html#importing-from-a-package
"NeptuneHook",
]

# TODO: use `warnings.warn` for user caused problems: https://stackoverflow.com/a/14762106/1565454
import os
import warnings
from typing import (
Any,
Optional,
)

try:
# neptune-client=0.9.0+ package structure
import neptune.new as neptune
from neptune.new.internal.utils import verify_type
from neptune.new.internal.utils.compatibility import expect_not_an_experiment

except ImportError:
# neptune-client>=1.0.0 package structure
import neptune
from neptune.internal.utils import verify_type
from neptune.internal.utils.compatibility import expect_not_an_experiment

import detectron2
from detectron2.checkpoint import Checkpointer
from detectron2.engine import hooks
from neptune.new.metadata_containers import Run
from neptune.new.types import File
from torch.nn import Module

from neptune_detectron2.impl.version import __version__

INTEGRATION_VERSION_KEY = "source_code/integrations/detectron2"

# TODO: Implementation of neptune-detectron2 here

class NeptuneHook(hooks.HookBase):
def __init__(
self,
*,
run: Optional[Run] = None,
base_namespace: str = "training",
smoothing_window_size: int = 20,
log_model: bool = False,
log_checkpoints: bool = False,
**kwargs: Any,
):
self._window_size = smoothing_window_size
self.log_model = log_model
self.log_checkpoints = log_checkpoints

self._verify_window_size()

self._run = neptune.init_run(**kwargs) if not isinstance(run, Run) else run

verify_type("run", self._run, Run)

if base_namespace.endswith("/"):
self._base_namespace = base_namespace[:-1]

self.base_handler = self._run[base_namespace]

expect_not_an_experiment(self._run)

def _verify_window_size(self) -> None:
if self._window_size <= 0:
raise ValueError(f"Update freq should be greater than 0. Got {self._window_size}.")
if not isinstance(self._window_size, int):
raise TypeError(f"Smoothing window size should be of type int. Got {type(self._window_size)} instead.")

def _log_integration_version(self) -> None:
self.base_handler[INTEGRATION_VERSION_KEY] = detectron2.__version__

def _log_config(self) -> None:
if hasattr(self.trainer, "cfg") and isinstance(self.trainer.cfg, dict):
self.base_handler["config"] = self.trainer.cfg

def _log_model(self) -> None:
if hasattr(self.trainer, "model") and isinstance(self.trainer.model, Module):
self.base_handler["model/summary"] = str(self.trainer.model)

def _log_checkpoint(self, final: bool = False) -> None:
if not self._can_save_checkpoint():
warnings.warn("Checkpointer not present for the current trainer.")
return

self.trainer.checkpointer.save(f"neptune_iter_{self.trainer.iter}")
neptune_model_path = "model/checkpoints/checkpoint_{}"

neptune_model_path = neptune_model_path.format("final" if final else f"iter_{self.trainer.iter}")

checkpoint_path = self.trainer.checkpointer.get_checkpoint_file()

with open(checkpoint_path, "rb") as fp:
self._run[neptune_model_path] = File.from_stream(fp)
os.remove(checkpoint_path)

def _log_metrics(self) -> None:
storage = detectron2.utils.events.get_event_storage()
for k, (v, _) in storage.latest_with_smoothing_hint(self._window_size).items():
self.base_handler[f"metrics/{k}"].log(v)

def _can_save_checkpoint(self) -> bool:
return hasattr(self.trainer, "checkpointer") and isinstance(self.trainer.checkpointer, Checkpointer)

def _should_perform_after_step(self) -> bool:
return self.trainer.iter % self._window_size == 0

def before_train(self) -> None:
self._log_integration_version()
self._log_config()
self._log_model()

def after_step(self) -> None:
if not self._should_perform_after_step():
return

self._log_metrics()

if self.log_checkpoints:
self._log_checkpoint()

def after_train(self) -> None:
if self.log_model:
self._log_checkpoint(final=True)

self._run.sync()
self._run.stop()
Empty file added tests/__init__.py
Empty file.
29 changes: 29 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import pytest
from detectron2 import model_zoo
from detectron2.config import get_cfg
from detectron2.engine import DefaultTrainer


@pytest.fixture(scope="session")
def cfg():
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("coco_2014_train",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(
"COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"
) # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 1 # This is the real "batch size" commonly known to deep learning people
cfg.SOLVER.BASE_LR = 0.00025
cfg.SOLVER.MAX_ITER = 2
cfg.SOLVER.STEPS = [] # do not decay learning rate
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 80
cfg.MODEL.DEVICE = "cpu"
yield cfg


@pytest.fixture(scope="session")
def trainer(cfg):
yield DefaultTrainer(cfg)
33 changes: 33 additions & 0 deletions tests/test_e2e.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
import os

import neptune.new as neptune

from src.neptune_detectron2 import NeptuneHook
from tests.utils import get_images


def test_e2e(cfg, trainer):

run = neptune.init_run()
run_id = run["sys/id"].fetch()

get_images()
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer.resume_or_load(resume=False)

hook = NeptuneHook(run=run, log_checkpoints=True, log_model=True)
trainer.register_hooks([hook])
trainer.train()

npt_run = neptune.init_run(with_id=run_id)

assert npt_run.exists("training/config")

assert npt_run.exists("model/checkpoints/checkpoint_iter_0")

assert npt_run.exists("model/checkpoints/checkpoint_final")

assert isinstance(npt_run["training/model/summary"].fetch(), str)

cls_accuracy_vals = npt_run["training/metrics/fast_rcnn/cls_accuracy"].fetch_values()
assert 0 < cls_accuracy_vals.iloc[-1]["value"] <= 1
12 changes: 12 additions & 0 deletions tests/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
import os
import shutil


def get_images():
img_dir = "./datasets/coco/train2014"
if not os.path.isdir(img_dir) or len(os.listdir(img_dir)) == 0:
os.makedirs(img_dir, exist_ok=True)
os.system("wget http://images.cocodataset.org/train2014/COCO_train2014_000000057870.jpg")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need images stored in the repo if they are downloaded dynamically?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't need them, some gitignore entries might be a good idea.

os.system("wget http://images.cocodataset.org/train2014/COCO_train2014_000000384029.jpg")
shutil.move("COCO_train2014_000000057870.jpg", img_dir)
shutil.move("COCO_train2014_000000384029.jpg", img_dir)