Labeling Data for Object Detection and Instance Segmentation with The Segment Anything Model (SAM) and GroundingDINO. The project also contains an object detector, a segmenter with SAM, YOLO and GroundingDINO support.
- Install pytorch from
https://pytorch.org/get-started/locally/
- Install GroundingDINO and SAM
- download repo
https://github.com/IDEA-Research/Grounded-Segment-Anything
- python -m pip install -e segment_anything
- python -m pip install -e GroundingDINO
- download repo
- Install requirements from
requirements.txt
aspip install -r requirements.txt
- Download SAM models from
https://github.com/facebookresearch/segment-anything#model-checkpoints
- Place model to the
/sam_models
- Download GroundingDINO model from
https://github.com/IDEA-Research/GroundingDINO
- Place model to the
/gd
- Replace string
text_encoder_type
in configs fromgd/GroundingDINO/groundingdino/config/
to "bert-base-uncased" for online download BERT model or for local path to BERT-BASE-UNCASED model
- Replace
weights
andconfig
in CNN_DICT for 'YOLOv8' inutils/ml_config.py
to your model weight and YAML config paths - Replace CLASSES_ENG and CLASSES_RU in
utils/ml_config.py
for your classes names
- Run
annotator_light.py
for Annotator version without SAM, GroundingDINO, YOLO etc. - Run
annotator.py
for Annotator version with SAM, GroundingDINO, YOLO etc. - Run
detector.py
for Detector version with SAM, GroundingDINO, YOLO etc. - Run
segmentator.py
for Segmentator version with SAM, GroundingDINO, YOLO etc.
S - draw a new label
D - delete current label
Space - finish drawing the current label
Ctrl + C - copy current label
Ctrl + V - paste current label
Ctrl + A - SAM by points
Ctrl + M - SAM by box
Ctrl + G - GroundingDINO + SAM
- Segmentation by points:
- Left mouse button set point inside segment
- Right mouse button set a point outside the segment (background)
- Space run the SAM neural network to draw the label
- Segmentation inside the box
- Draw a rectangular label with an area for segmentation
- Wait for the SAM label to appear
The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 billion masks, and has strong zero-shot performance on a variety of segmentation tasks.
@article{kirillov2023segany, title={Segment Anything}, author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{'a}r, Piotr and Girshick, Ross}, journal={arXiv:2304.02643}, year={2023} }
@article{liu2023grounding, title={Grounding dino: Marrying dino with grounded pre-training for open-set object detection}, author={Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others}, journal={arXiv preprint arXiv:2303.05499}, year={2023} }