Skip to content

Commit

Permalink
feat: Add support for multiple webcams and enhance webcam handling
Browse files Browse the repository at this point in the history
- Introduced `--webcam_indices` argument to specify multiple webcam indices.
- Default to using the first webcam (index 0) if no `--webcam_indices` flag is provided.
- Enhanced `YOFLO` class to handle multiple webcam threads concurrently.
- Implemented graceful shutdown of all webcam windows.
- Updated the main function to parse new arguments and initialize `YOFLO` accordingly.
- Improved logging and error handling for webcam operations.
-Updated readme.md
  • Loading branch information
CharlesCNorton committed Jul 10, 2024
1 parent 16affb4 commit 92b388f
Show file tree
Hide file tree
Showing 3 changed files with 53 additions and 35 deletions.
25 changes: 18 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Answer yes/no questions based on the visual input. This feature leverages Floren

### Inference Chain

Evaluate multiple inferences and determine overall results based on a sequence of phrases. This allows for a more comprehensive context analysis within individual frames by examining multiple aspects of the scene. For example, to determine if a person is working, you might check if their eyes are open, their are hands on the keyboard, and they are facing the computer. This feature addresses the limitation that newer and smaller vision-language models are capable of answering simple questions, but not compound ones.
Evaluate multiple inferences and determine overall results based on a sequence of phrases. This allows for a more comprehensive context analysis within individual frames by examining multiple aspects of the scene. For example, to determine if a person is working, you might check if their eyes are open, their hands are on the keyboard, and they are facing the computer. This feature addresses the limitation that newer and smaller vision-language models are capable of answering simple questions, but not compound ones.

### Inference Rate Calculation

Expand Down Expand Up @@ -44,6 +44,10 @@ Enable formatted output of detections for better readability. This makes it easi

Option to download the Florence-2 model directly from the Hugging Face Hub. This simplifies the setup process by automating the model download and initialization.

### Multi-Webcam Support

Support for multiple webcams, allowing concurrent processing and inference on multiple video feeds. This is useful for surveillance systems, multi-view analysis, and other applications requiring inputs from several cameras.

## Model Information

This tool uses Microsoft's Florence-2, a powerful vision-language model designed to understand and generate detailed descriptions of visual inputs. Florence-2 combines advanced image processing with natural language understanding, making it ideal for complex tasks that require both visual and textual analysis. Florence-2 uses a unified sequence-to-sequence architecture to handle tasks from image-level understanding to fine-grained visual-semantic alignment. The model is trained on a large-scale, high-quality multitask dataset FLD-5B, which includes 126M images and billions of text annotations.
Expand Down Expand Up @@ -87,6 +91,7 @@ Run the script with the desired arguments. Below are the available flags and the
- `-pp`, `--pretty_print`: Enable pretty print for detections. This flag formats the output of detections for better readability, making it easier to interpret the results.
- `-il`, `--inference_limit`: Limit the inference rate to a specified number of inferences per second. This can help manage performance and ensure the system is not overloaded, providing a smoother operation.
- `-ic`, `--inference_chain`: Enable inference chain with specified phrases. Provide phrases in quotes, separated by spaces (e.g., `"Is it sunny?" "Is it raining?"`).
- `-wi`, `--webcam_indices`: Specify the indices of the webcams to use (e.g., `0 1 2`). If not provided, the first webcam (index 0) will be used by default.

## Inference Chain Feature

Expand Down Expand Up @@ -143,7 +148,9 @@ To run the tool in headless mode without displaying the video feed:
python yoflo.py --model_path /path/to/Florence-2-base-ft --object_detection "person" --headless
```
### Enable Screenshot on Detection
### Enable Screenshot
on Detection
To enable screenshot capture whenever a target object is detected:
```sh
python yoflo.py --model_path /path/to/Florence-2-base-ft --object_detection "person" --screenshot
Expand All @@ -158,9 +165,7 @@ python yoflo.py --model_path /path/to/Florence-2-base-ft --object_detection "per
### Display Inference Speed
To log and display the inference speed (inferences per second):
```sh
python yoflo.py --model_path /path/to/Florence-2-base-ft --object_detection
"person" --display_inference_speed
python yoflo.py --model_path /path/to/Florence-2-base-ft --object_detection "person" --display_inference_speed
```
### Download Model from Hugging Face
Expand All @@ -181,6 +186,12 @@ To limit the inference rate to a specified number of inferences per second, for
python yoflo.py --model_path /path/to/Florence-2-base-ft --object_detection "person" --inference_limit 5
```
### Use Multiple Webcams
To use multiple webcams for object detection or inference:
```sh
python yoflo.py --model_path /path/to/Florence-2-base-ft --object_detection "person" --webcam_indices 0 1 --inference_limit 3
```
## Minimum Requirements for Running YOFLO
1. **Operating System**:
Expand All @@ -189,7 +200,7 @@ python yoflo.py --model_path /path/to/Florence-2-base-ft --object_detection "per
2. **Minimum Hardware**:
- **CPU**: Intel Core i7
- **GPU**: 24 GB VRAM
- **GPU**: 16 GB VRAM
- **RAM**: 32 GB RAM
- **Camera**: USB camera connected
Expand Down Expand Up @@ -220,7 +231,7 @@ python yoflo.py --model_path /path/to/Florence-2-base-ft --object_detection "per
## Development Status
YOFLO-CLI has been successfully converted into a full Python package and is available on PyPI. The package currently supports object detection and binary inference based on referring expression comprehension. Future updates will focus on optimizations and adding new features as the project evolves.
YOFLO-CLI has been successfully converted into a full Python package and is available on PyPI. The package currently supports object detection, binary inference based on referring expression comprehension, as well as inference trees consisting of multiple phrases. Future updates will focus on optimizations and adding new features as the project evolves.
## Contributing
Expand Down
5 changes: 3 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@

setup(
name='yoflo',
version='0.3.1',
version='0.4.2',
packages=find_packages(),
include_package_data=True,
install_requires=[
'packages'
'packages',
'packaging',
'torch',
'timm',
'transformers>=4.38.0',
Expand Down
58 changes: 32 additions & 26 deletions yoflo/yoflo.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ def setup_logging(log_to_file, log_file_path="alerts.log"):
logging.basicConfig(level=logging.INFO, format='%(message)s', handlers=handlers)

class YOFLO:
def __init__(self, model_path=None, display_inference_speed=False, pretty_print=False, inference_limit=None, class_names=None):
def __init__(self, model_path=None, display_inference_speed=False, pretty_print=False, inference_limit=None, class_names=None, webcam_indices=None):
"""Initialize the YO-FLO class with configuration options."""
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.model = None
Expand All @@ -34,13 +34,14 @@ def __init__(self, model_path=None, display_inference_speed=False, pretty_print=
self.display_inference_speed = display_inference_speed
self.stop_webcam_flag = threading.Event()
self.last_beep_time = 0
self.webcam_thread = None
self.webcam_threads = []
self.pretty_print = pretty_print
self.inference_limit = inference_limit
self.last_inference_time = 0
self.last_detection = None
self.last_detection_count = 0
self.inference_phrases = []
self.webcam_indices = webcam_indices if webcam_indices else [0] # Default to webcam 0 if not specified
if model_path:
self.init_model(model_path)

Expand All @@ -63,7 +64,6 @@ def init_model(self, model_path):
except Exception as e:
logging.error(f"Unexpected error initializing model: {e}")


def update_inference_rate(self):
"""Calculate and log the inference rate (inferences per second)."""
try:
Expand Down Expand Up @@ -165,28 +165,31 @@ def download_model(self):
logging.error(f"Error downloading model: {e}")

def start_webcam_detection(self):
"""Start a separate thread for webcam detection."""
"""Start separate threads for each specified webcam."""
try:
if self.webcam_thread and self.webcam_thread.is_alive():
if self.webcam_threads:
logging.warning("Webcam detection is already running.")
return
self.stop_webcam_flag.clear()
self.webcam_thread = threading.Thread(target=self._webcam_detection_thread)
self.webcam_thread.start()
for index in self.webcam_indices:
thread = threading.Thread(target=self._webcam_detection_thread, args=(index,))
thread.start()
self.webcam_threads.append(thread)
except Exception as e:
logging.error(f"Error starting webcam detection: {e}")

def _webcam_detection_thread(self):
"""Run the webcam detection loop in a separate thread."""
def _webcam_detection_thread(self, index):
"""Run the webcam detection loop in a separate thread for a specific webcam."""
try:
cap = cv2.VideoCapture(0)
cap = cv2.VideoCapture(index)
if not cap.isOpened():
logging.error("Error: Could not open webcam.")
logging.error(f"Error: Could not open webcam {index}.")
return
window_name = f'Object Detection Webcam {index}'
while not self.stop_webcam_flag.is_set():
ret, frame = cap.read()
if not ret:
logging.error("Error: Failed to capture image from webcam.")
logging.error(f"Error: Failed to capture image from webcam {index}.")
break
image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
image_pil = Image.fromarray(image)
Expand All @@ -204,7 +207,7 @@ def _webcam_detection_thread(self):
if self.pretty_print:
self.pretty_print_detections(filtered_detections)
else:
logging.info(f"Detections: {filtered_detections}")
logging.info(f"Detections from webcam {index}: {filtered_detections}")
if not self.headless:
frame = self.plot_bbox(frame, filtered_detections)
self.inference_count += 1
Expand All @@ -213,7 +216,7 @@ def _webcam_detection_thread(self):
if self.screenshot_active:
self.save_screenshot(frame)
if self.log_to_file_active:
self.log_alert(f"Detections: {filtered_detections}")
self.log_alert(f"Detections from webcam {index}: {filtered_detections}")
elif self.phrase:
results = self.run_expression_comprehension(image_pil, self.phrase)
if results:
Expand All @@ -223,35 +226,36 @@ def _webcam_detection_thread(self):
self.update_inference_rate()
if clean_result in ['yes', 'no'] and self.log_to_file_active:
if self.log_to_file_active:
self.log_alert(f"Expression Comprehension: {clean_result} at {datetime.now()}")
self.log_alert(f"Expression Comprehension from webcam {index}: {clean_result} at {datetime.now()}")
if self.inference_phrases:
inference_result, phrase_results = self.evaluate_inference_chain(image_pil)
logging.info(f"Inference Chain result: {inference_result}, Details: {phrase_results}")
logging.info(f"Inference Chain result from webcam {index}: {inference_result}, Details: {phrase_results}")
if self.pretty_print:
for idx, result in enumerate(phrase_results):
logging.info(f"Inference {idx + 1}: {'PASS' if result else 'FAIL'}")
logging.info(f"Inference {idx + 1} from webcam {index}: {'PASS' if result else 'FAIL'}")
self.inference_count += 1
self.update_inference_rate()
if not self.headless:
cv2.imshow('Object Detection', frame)
cv2.imshow(window_name, frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
self.last_inference_time = current_time
cap.release()
if not self.headless:
cv2.destroyAllWindows()
cv2.destroyWindow(window_name)
except cv2.error as e:
logging.error(f"OpenCV error in webcam detection thread: {e}")
logging.error(f"OpenCV error in webcam detection thread {index}: {e}")
except Exception as e:
logging.error(f"Error in webcam detection thread: {e}")
logging.error(f"Error in webcam detection thread {index}: {e}")

def stop_webcam_detection(self):
"""Stop the webcam detection thread."""
"""Stop all webcam detection threads."""
try:
self.object_detection_active = False
self.stop_webcam_flag.set()
if self.webcam_thread:
self.webcam_thread.join()
for thread in self.webcam_threads:
thread.join()
self.webcam_threads = []
logging.info("Webcam detection stopped")
except Exception as e:
logging.error(f"Error stopping webcam detection: {e}")
Expand Down Expand Up @@ -344,6 +348,7 @@ def main():
parser.add_argument("-pp", "--pretty_print", action='store_true', help="Enable pretty print for detections. Formats and prints detection results nicely in the console.")
parser.add_argument("-il", "--inference_limit", type=float, help="Limit the inference rate to X inferences per second. Useful for controlling the load on the system.", required=False)
parser.add_argument("-ic", "--inference_chain", nargs='+', help="Enable inference chain with specified phrases. Provide phrases in quotes, separated by spaces (e.g., 'Is it sunny?' 'Is it raining?').")
parser.add_argument("-wi", "--webcam_indices", nargs='+', type=int, help="Specify the indices of the webcams to use (e.g., 0 1 2).")

group = parser.add_mutually_exclusive_group(required=True)
group.add_argument("-mp", "--model_path", type=str, help="Path to the pre-trained model directory. Use this if you have a local copy of the model.")
Expand All @@ -355,8 +360,9 @@ def main():

try:
setup_logging(args.log_to_file)
webcam_indices = args.webcam_indices if args.webcam_indices else [0]
if args.download_model:
yo_flo = YOFLO(display_inference_speed=args.display_inference_speed, pretty_print=args.pretty_print, inference_limit=args.inference_limit, class_names=args.object_detection)
yo_flo = YOFLO(display_inference_speed=args.display_inference_speed, pretty_print=args.pretty_print, inference_limit=args.inference_limit, class_names=args.object_detection, webcam_indices=webcam_indices)
yo_flo.download_model()
else:
if not os.path.exists(args.model_path):
Expand All @@ -365,7 +371,7 @@ def main():
if not os.path.isdir(args.model_path):
logging.error(f"Model path {args.model_path} is not a directory.")
return
yo_flo = YOFLO(model_path=args.model_path, display_inference_speed=args.display_inference_speed, pretty_print=args.pretty_print, inference_limit=args.inference_limit, class_names=args.object_detection)
yo_flo = YOFLO(model_path=args.model_path, display_inference_speed=args.display_inference_speed, pretty_print=args.pretty_print, inference_limit=args.inference_limit, class_names=args.object_detection, webcam_indices=webcam_indices)

if args.phrase:
yo_flo.phrase = args.phrase
Expand Down

0 comments on commit 92b388f

Please sign in to comment.