Skip to content

Feat/66/press-defect-data-simulator-service: 프레스 공정 - 생산품 결함 시뮬레이터 구현#68

Merged
TaeHyunaivle merged 9 commits intomainfrom
feat/66/press-defect-simulator
Aug 27, 2025
Merged

Feat/66/press-defect-data-simulator-service: 프레스 공정 - 생산품 결함 시뮬레이터 구현#68
TaeHyunaivle merged 9 commits intomainfrom
feat/66/press-defect-simulator

Conversation

@haeyekim
Copy link
Member

@haeyekim haeyekim commented Aug 23, 2025

✨ Description

  • AI 모델 시뮬레이터의 MVP(Minimum Viable Product) 개발을 완료하고, 모든 핵심 컴포넌트들을 통합했습니다.
  • 외부 서비스(Azure Storage, AI 모델)와의 연결을 테스트하고, 시뮬레이션 워크플로우를 자동화하는 기능들을 포함했습니다.
  • Docker를 통한 컨테이너화된 배포 환경을 구축하고, 의존성 관리를 위한 requirements.txt 파일을 추가했습니다.

✅ Task

  • [] services/model_client.py: AI 모델과의 통신을 위한 클라이언트 클래스 구현
  • [] services/scheduler_service.py: 주기적인 시뮬레이션 실행 및 관리 로직 구현
  • [] routers/connection_test_router.py: 외부 서비스 연결 테스트를 위한 API 엔드포인트 구현
  • [] routers/simulator_router.py: 시뮬레이터 제어 및 상태 모니터링을 위한 API 엔드포인트 구현
  • [] main.py: FastAPI 애플리케이션 설정 및 모든 컴포넌트 통합
  • [] requirements.txt: 프로젝트에 필요한 모든 Python 패키지 의존성 정의
  • [] Dockerfile: 컨테이너화된 배포 환경 구축

✅ Test Result

  • /connection/test/all 엔드포인트를 통해 Azure Storage와 AI 모델 서비스의 연결이 성공적으로 이루어지는 것을 확인했습니다.
  • /simulator/start를 호출하여 시뮬레이션이 주기적으로 실행되고, 로그에 정상적으로 기록되는 것을 확인했습니다.

📸 Screenshot (선택)

  • 결과 이미지, 시각화 그래프 등 필요한 경우 첨부해 주세요.

📌 Etc

  • SchedulerService의 비동기 스레드 관리와 종료 로직을 중점적으로 검토해주시면 감사하겠습니다.
  • API 엔드포인트의 반환 구조와 오류 처리가 일관적인지 확인해주시면 좋습니다.

Summary by CodeRabbit

  • New Features

    • Introduces a Press Defect Data Simulator service with REST endpoints for root, health, info, and version.
    • Connection test endpoints for storage and model services, plus a combined check.
    • Simulator controls: start, stop, restart, manual run, and status/health reporting.
    • Inspection validation and sample data checks for specific inspection IDs.
    • Runtime statistics and recent log retrieval with summaries.
  • Chores

    • Added containerization with a health check, environment configuration, and port exposure.
    • Included required dependencies for service runtime and integrations.

시뮬레이터의 기본 설정(config) 및 로깅 시스템(logger) 초기화 파일 추가
Azure Blob Storage에서 데이터를 읽어와 AI 모델에 전송하는 기능 구현
AI 모델 서비스와 통신하기 위한 ModelServiceClient 클래스 추가. 헬스 체크, 예측 요청, 재시도 로직 등 모델과의 상호 작용을 위한 핵심 기능들을 포함합니다.
자동화된 시뮬레이션 워크플로우를 관리하는 SchedulerService 클래스를 추가했습니다. Azure Storage에서 데이터를 주기적으로 수집하고, AI 모델에 전송하며, 실행 상태를 로깅하는 핵심 기능을 구현합니다. 또한 서비스 초기화, 헬스 체크, 수동 실행 기능도 포함합니다.
Azure Storage 및 AI 모델 서비스의 연결 및 상태를 확인하는 API 라우터를 추가했습니다. 개별 서비스와 전체 서비스에 대한 헬스 체크 엔드포인트를 제공하며, 데이터 검증 기능을 포함합니다.
시뮬레이션 전체를 제어하고 모니터링하기 위한 API 라우터를 추가했습니다. 시작, 중지, 재시작, 수동 실행과 같은 제어 기능과 실시간 상태, 통계, 로그 및 헬스 체크를 조회하는 기능을 제공합니다.
메인 애플리케이션 파일에 FastAPI 앱을 설정하고, CORS 미들웨어 및 라우터를 통합했습니다. 애플리케이션의 시작/종료 라이프사이클 관리, 기본 엔드포인트 및 전역 예외 핸들러 정의, 그리고 설정 검증 및 외부 서비스 연결 확인 로직을 추가하여 전체 시스템의 안정성을 높였습니다.
프로젝트에 필요한 모든 Python 패키지 의존성을 requirements.txt 파일에 추가했습니다. 이 패키지들은 시뮬레이터, 서비스, 라우터가 정상적으로 동작하기 위해 필요합니다.
FastAPI 애플리케이션을 위한 Dockerfile을 생성했습니다. Python 3.10 기반으로 개발 환경을 구성하고, 시스템 의존성, Python 패키지, 애플리케이션 코드를 포함합니다. 또한 포트 노출, 환경 변수 설정, 헬스체크 및 애플리케이션 실행 명령어를 정의하여 컨테이너화된 배포를 가능하게 합니다.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 23, 2025

Walkthrough

Adds a new FastAPI-based press-defect data simulator service with configuration, logging, Azure Blob integration, model-service client, scheduler, health/info endpoints, simulator control and connection-test routers, Dockerfile, and pinned requirements. Includes startup validation, health checks, structured logging, and async prediction workflows with manual and scheduled execution.

Changes

Cohort / File(s) Summary
Containerization
services/press-defect-data-simulator-service/Dockerfile
New Python 3.10 image; installs deps; sets env/ports; healthcheck on /health; runs uvicorn.
Configuration
services/press-defect-data-simulator-service/app/config/settings.py
New Pydantic settings with env loading, validation, summaries, and in-memory simulation state helpers.
Application Bootstrap
services/press-defect-data-simulator-service/app/main.py
FastAPI app with lifespan startup checks, global error handler, health/info/version/root endpoints, CORS, signal handling, and router registration.
Routers: Connection Tests
services/press-defect-data-simulator-service/app/routers/connection_test_router.py
Endpoints to test Azure and model service connectivity, statuses, and inspection validation; parallel test aggregation.
Routers: Simulator Control
services/press-defect-data-simulator-service/app/routers/simulator_router.py
Endpoints to start/stop/restart/manual-run scheduler; fetch status, stats, logs, health, and current settings.
Service: Azure Storage
services/press-defect-data-simulator-service/app/services/azure_storage.py
Async Azure Blob client: init, connection test, list inspections, list/download images, completeness validation, status; singleton instance.
Service: Model Client
services/press-defect-data-simulator-service/app/services/model_client.py
Async aiohttp client: health/ready/info, batch/single predict with retries, aggregated checks, status; singleton instance.
Service: Scheduler
services/press-defect-data-simulator-service/app/services/scheduler_service.py
Threaded scheduler using schedule: async init, periodic execution, lifecycle control, manual run, status and health reporting; singleton instance.
Utilities: Logger
services/press-defect-data-simulator-service/app/utils/logger.py
Structured rotating logger; result/error/anomaly persistence; retrieval API; scheduler/connection logs; singleton instance.
Dependencies
services/press-defect-data-simulator-service/requirements.txt
Adds pinned deps for FastAPI, aiohttp/httpx, Azure Blob, Pydantic v2, dotenv, uvicorn, tests/dev tools.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant App as FastAPI App
  participant Settings as Settings
  participant Azure as AzureStorageService
  participant Model as ModelServiceClient
  participant Sched as SchedulerService

  rect rgb(230,240,255)
  note over App: Startup (lifespan)
  App->>Settings: validate_settings()
  alt invalid
    App-->>User: Startup error
  else valid
    App->>Azure: initialize() / test_connection()
    App->>Model: test_connection()
    App->>Sched: health_check()
    App-->>User: Service ready
  end
  end
Loading
sequenceDiagram
  autonumber
  participant Sched as SchedulerService
  participant Settings as Settings/state
  participant Azure as AzureStorageService
  participant Model as ModelServiceClient
  participant Logger as SimulatorLogger

  rect rgb(235,255,235)
  note over Sched: Periodic job
  Sched->>Azure: download_inspection_images(inspection_id)
  alt images retrieved
    Sched->>Model: predict_inspection(inspection_id, images)
    alt success
      Sched->>Logger: log_simulation_success(...)
      Sched->>Settings: increment_simulation_stats(True)
    else failure
      Sched->>Logger: log_simulation_failure(...)
      Sched->>Settings: increment_simulation_stats(False)
    end
  else retrieval failed
    Sched->>Logger: log_simulation_failure(...)
    Sched->>Settings: increment_simulation_stats(False)
  end
  Sched->>Settings: update_current_inspection_id(next)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Poem

Hop, hop, new gears spin in code’s bright light,
I nibble logs where schedulers take flight.
Azure clouds, models hum a tune,
Predictions bloom like carrots in June.
Health checks blink—green LEDs aglow—
Ship the image, watch results grow.
Thump-thump: to prod we go! 🥕🐇

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/66/press-defect-simulator

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link

🤖 Smart FAST 자동 코드 리뷰

📦 변경된 서비스 (9개):

  • painting-process-data-simulator-service 서비스
  • painting-process-equipment-defect-detection-model-service 서비스
  • painting-surface-defect-detection-model-service 서비스
  • press-defect-data-simulator-service 서비스
  • press-fault-data-simulator-service 서비스
  • press-fault-detection-model-service 서비스
  • vehicle-assembly-process-defect-detection-model-service 서비스
  • welding-machine-data-simulator-service 서비스
  • welding-machine-defect-detection-model-service 서비스

🔍 Git diff를 통해 서비스 감지됨

📊 전체 검사 결과:

⚠️ 일부 검사에서 문제 발견 - 확인 필요

🧪 테스트 & 코드 품질 결과:

⚠️ 테스트 & 린트 상태 불명

  • 총 테스트: 35개
  • 성공: 35개 ✅
  • 커버리지: 0%

🔒 보안 스캔 결과:

⚠️ 보안 스캔 상태 불명

  • 보안 이슈가 발견되지 않았습니다 ✅

✅ 검사 체크리스트:

  • ❌ 테스트 실패
  • ✅ 코드 품질 검사 통과
  • ❌ 보안 이슈 발견

🚀 배포 정보:

  • ⚠️ 검사 실패 - 문제 해결 후 배포 가능
    • 🧪 테스트 문제 해결 필요
    • 🔒 보안 이슈 해결 필요
  • 📋 상세 로그는 Actions 탭에서 확인 가능

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🧹 Nitpick comments (30)
services/press-defect-data-simulator-service/requirements.txt (3)

1-52: Trim prod image: separate runtime vs dev/test dependencies

pytest, Pygments, watchfiles, and possibly pillow look like dev/test or tooling-only. Installing them in the runtime image increases attack surface and size. Consider splitting into requirements.txt (runtime) and requirements-dev.txt (dev/test), and have the Dockerfile install only runtime deps.

Example split:

-# services/press-defect-data-simulator-service/requirements.txt
+# services/press-defect-data-simulator-service/requirements.txt (runtime only)
 fastapi==0.116.1
 uvicorn==0.35.0
 starlette==0.47.2
 httpx==0.28.1
 aiohttp==3.12.15
 azure-core==1.35.0
 azure-storage-blob==12.26.0
 pydantic==2.11.7
 pydantic-settings==2.10.1
 python-dotenv==1.1.1
 PyYAML==6.0.2
 anyio==4.10.0
 schedule==1.2.2
 # ... keep the rest that are truly needed at runtime
-# dev/test below
-Pygments==2.19.2
-pytest==8.4.1
-watchfiles==1.1.0
-# pillow only if images are actually processed here
-# pillow==11.3.0

1-52: Pinning strategy: consider constraints file for transitive stability without over-freezing direct deps

Exact-pinning every package (including transitive utilities like typing_extensions) locks you into urgent rebuilds for security patches. A constraints file (pip -c constraints.txt) or upper-bounds pinning for non-critical libs can balance stability and agility.


1-52: Remove unused dependencies from requirements.txt

After verifying the code in services/press-defect-data-simulator-service:

  • Pillow is used in app/services/azure_storage.py (from PIL import Image), and aiohttp is used extensively in app/services/model_client.py.
  • However, requests, httpx, and websockets aren’t imported or referenced anywhere in this service.

Please remove these three packages from services/press-defect-data-simulator-service/requirements.txt to slim the image and reduce CVE exposure.

Suggested diff:

--- services/press-defect-data-simulator-service/requirements.txt
@@
- httpx==0.28.1
- requests==2.32.5
- websockets==15.0.1
services/press-defect-data-simulator-service/app/config/settings.py (2)

2-3: Remove unused import or make it useful

typing.Optional is imported but not used. Either remove it or update types (e.g., make azure_connection_string Optional[str]) to reflect that it can be missing.

-from typing import Optional
-from pydantic_settings import BaseSettings
+from typing import Optional
+from pydantic import Field
+from pydantic_settings import BaseSettings, SettingsConfigDict

12-16: Service name mismatch with context

The service is a press-defect simulator; current name says “Painting Process Data Simulator”. Align to avoid confusion in logs and API responses.

-    service_name: str = "Painting Process Data Simulator"
+    service_name: str = "Press Defect Data Simulator"
services/press-defect-data-simulator-service/Dockerfile (1)

6-17: Optional: run as non-root and use a slimmer base to reduce risk/size

Drop privileges and consider python:3.10-slim with build essentials only when needed. This improves security posture.

-FROM python:3.10
+FROM python:3.10-slim

@@
-RUN apt-get update && apt-get install -y \
-    gcc \
-    g++ \
-    curl \
-    && rm -rf /var/lib/apt/lists/*
+RUN apt-get update && apt-get install -y --no-install-recommends gcc g++ curl \
+  && rm -rf /var/lib/apt/lists/*

@@
-# 애플리케이션 실행
-CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8013"]
+# 애플리케이션 실행 (non-root)
+RUN useradd -ms /bin/bash appuser && chown -R appuser:appuser /app
+USER appuser
+CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8013", "--proxy-headers"]
services/press-defect-data-simulator-service/app/routers/simulator_router.py (3)

2-2: Remove unused import List

List isn’t used; keep imports lean.

-from typing import Dict, Any, Optional, List
+from typing import Dict, Any, Optional

37-39: Preserve original exception context with raise ... from e

This improves debuggability by keeping the traceback chain.

-        raise HTTPException(status_code=500, detail=f"상태 조회 실패: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"상태 조회 실패: {str(e)}") from e
@@
-        raise HTTPException(status_code=500, detail=f"시뮬레이션 시작 실패: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"시뮬레이션 시작 실패: {str(e)}") from e
@@
-        raise HTTPException(status_code=500, detail=f"시뮬레이션 중지 실패: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"시뮬레이션 중지 실패: {str(e)}") from e
@@
-        raise HTTPException(status_code=500, detail=f"시뮬레이션 재시작 실패: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"시뮬레이션 재시작 실패: {str(e)}") from e
@@
-        raise HTTPException(status_code=500, detail=f"수동 실행 실패: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"수동 실행 실패: {str(e)}") from e
@@
-        raise HTTPException(status_code=500, detail=f"통계 조회 실패: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"통계 조회 실패: {str(e)}") from e
@@
-        raise HTTPException(status_code=500, detail=f"로그 조회 실패: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"로그 조회 실패: {str(e)}") from e
@@
-        raise HTTPException(status_code=500, detail=f"로그 요약 조회 실패: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"로그 요약 조회 실패: {str(e)}") from e
@@
-        raise HTTPException(status_code=500, detail=f"설정 조회 실패: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"설정 조회 실패: {str(e)}") from e

Also applies to: 64-66, 92-94, 119-121, 147-149, 176-178, 210-212, 244-246, 302-303


11-15: Unused SimulationSettings model and plural/singular log types: verify intended usage

SimulationSettings isn’t used by any endpoint; either wire it into a PATCH/PUT settings endpoint or remove it. Also, accepted log_type values are plural ("results", "errors", "anomalies"), while individual log records appear to use singular ('result', 'error', 'anomaly') in counters. Ensure simulator_logger.get_recent_logs supports the plural inputs consistently.

Suggested additions:

  • Add an endpoint to update scheduler interval and start_inspection_id using SimulationSettings.
  • Normalize accepted values to singular or plural across API and logger, and validate at router boundary.

Also applies to: 180-206, 214-240

services/press-defect-data-simulator-service/app/main.py (4)

1-1: Remove unused import

asyncio is imported but unused.

-import asyncio

141-171: Simplify timestamp generation in /health and avoid logging internals

Building a fake log record to get a timestamp is brittle. Use datetime.now().isoformat() for clarity.

-        app_status = {
-            "application": "healthy",
-            "timestamp": simulator_logger.logger.handlers[0].formatter.formatTime(
-                simulator_logger.logger.makeRecord(
-                    "health", 20, __file__, 0, "", (), None
-                )
-            )
-        }
+        from datetime import datetime, timezone
+        app_status = {
+            "application": "healthy",
+            "timestamp": datetime.now(timezone.utc).isoformat()
+        }

203-205: Preserve exception context in /info

Use exception chaining for better diagnostics.

-        raise HTTPException(status_code=500, detail=f"서비스 정보 조회 실패: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"서비스 정보 조회 실패: {str(e)}") from e

241-260: Dev vs prod port mismatch with Docker CMD (8013 vs 8000): confirm docs and client expectations

The Dockerfile launches uvicorn on 8013; main uses 8000. That’s fine (different runtimes), but document this to avoid confusion for clients or health checks outside Docker.

services/press-defect-data-simulator-service/app/services/azure_storage.py (5)

2-8: Remove unused imports and tighten typing.

  • io, BlobClient, and PIL.Image are imported but never used.
  • Also, prefer typing.Any over built-in any in annotations elsewhere in this file.

This keeps lint clean and avoids confusion.

Apply:

-import base64
-import io
-from typing import List, Dict, Optional, Tuple
-from azure.storage.blob import BlobServiceClient, BlobClient
+import base64
+from typing import List, Dict, Optional, Tuple, Any
+from azure.storage.blob import BlobServiceClient
 from azure.core.exceptions import AzureError
 import asyncio
-from PIL import Image

74-95: Avoid scanning all blobs; use hierarchical listing for folder discovery.

Listing all blobs under press_defect_path and then deducing “inspection_*” folders is O(N files). Azure SDK supports hierarchical listing with delimiter to enumerate “folders” more efficiently.

-# press-defect/ 경로의 모든 blob 조회
-blob_list = await asyncio.get_event_loop().run_in_executor(
-    None, 
-    lambda: list(container_client.list_blobs(name_starts_with=f"{self.press_defect_path}/"))
-)
-
-# inspection 폴더 추출
-inspection_folders = set()
-for blob in blob_list:
-    blob_path = blob.name
-    if blob_path.startswith(f"{self.press_defect_path}/inspection_"):
-        parts = blob_path.split('/')
-        if len(parts) >= 3:
-            inspection_folder = parts[1]  # inspection_001
-            inspection_folders.add(inspection_folder)
+loop = asyncio.get_running_loop()
+def _walk():
+    # Use walk_blobs with delimiter "/" to get virtual directories under press_defect_path
+    return [p.name.rstrip('/').split('/')[-1]
+            for p in container_client.walk_blobs(name_starts_with=f"{self.press_defect_path}/", delimiter="/")
+            if p.name.rstrip('/').split('/')[-1].startswith("inspection_")]
+inspection_folders = await loop.run_in_executor(None, _walk)

If walk_blobs is unavailable in your installed sdk version, consider list_blobs(..., results_per_page) with delimiters via by_page().


188-201: Download images concurrently with bounded concurrency.

Currently each image is fetched sequentially. For inspections with many images, this will be slow. Use asyncio.gather with a semaphore to bound concurrency.

- for image_info in image_list:
-     blob_name = image_info['blob_name']
-     base64_data = await self.download_image_as_base64(blob_name)
-     if base64_data:
-         images_data.append({...})
-         successful_downloads += 1
+ sem = asyncio.Semaphore(5)
+ async def _fetch(image_info):
+     async with sem:
+         data = await self.download_image_as_base64(image_info['blob_name'])
+         return image_info, data
+ tasks = [_fetch(info) for info in image_list]
+ for image_info, base64_data in await asyncio.gather(*tasks):
+     if base64_data:
+         images_data.append({'image': base64_data, 'name': image_info['camera_name']})
+         successful_downloads += 1

216-260: Fix type annotations and make expected_count configurable.

  • Use typing.Any, not built-in any, in return type hints.
  • Hard-coding expected_count=21 reduces flexibility. Source this from settings with a safe default.
-async def validate_inspection_completeness(self, inspection_id: int) -> Dict[str, any]:
+async def validate_inspection_completeness(self, inspection_id: int) -> Dict[str, Any]:
@@
-    expected_count = 21  # 예상 이미지 수
+    expected_count = getattr(settings, "inspection_expected_count", 21)  # 예상 이미지 수
@@
-    def get_service_status(self) -> Dict[str, any]:
+    def get_service_status(self) -> Dict[str, Any]:

51-55: Be defensive when deriving inspection_id from folder name.

Parsing the first folder name with split('_')[1] assumes perfect naming. Add guards to avoid ValueError/IndexError.

-first_inspection_id = int(inspection_list[0].split('_')[1])
+first_token = inspection_list[0].split('_')
+first_inspection_id = int(first_token[1]) if len(first_token) > 1 and first_token[1].isdigit() else 1

Also applies to: 91-96

services/press-defect-data-simulator-service/app/routers/connection_test_router.py (4)

79-81: Preserve original traceback when raising HTTPException.

Use exception chaining to distinguish handler failures from original errors.

-except Exception as e:
-    simulator_logger.logger.error(f"Azure 연결 테스트 중 오류: {str(e)}")
-    raise HTTPException(status_code=500, detail=f"Azure 연결 테스트 실패: {str(e)}")
+except Exception as e:
+    simulator_logger.logger.error(f"Azure 연결 테스트 중 오류: {str(e)}")
+    raise HTTPException(status_code=500, detail=f"Azure 연결 테스트 실패: {str(e)}") from e

113-115: Apply exception chaining consistently across endpoints.

For consistent error diagnostics, chain the caught exception when re-raising HTTPException.

-raise HTTPException(status_code=500, detail=f"모델 서비스 연결 테스트 실패: {str(e)}")
+raise HTTPException(status_code=500, detail=f"모델 서비스 연결 테스트 실패: {str(e)}") from e

Repeat similarly for:

  • 전체 연결 테스트 실패
  • Azure 상태 조회 실패
  • 모델 서비스 상태 조회 실패
  • 데이터 검증 실패

Also applies to: 160-162, 187-189, 216-218, 251-251


51-55: Harden folder-name parsing for inspection_id.

Make the first_inspection_id extraction robust to unexpected folder names.

-first_inspection_id = int(inspection_list[0].split('_')[1])
+try:
+    parts = inspection_list[0].split('_', 1)
+    first_inspection_id = int(parts[1]) if len(parts) == 2 else 1
+except ValueError:
+    first_inspection_id = 1

29-77: Unify response envelope across endpoints.

Response shapes differ per endpoint (sometimes returning plain service status, sometimes wrapping with status/message). Consider a consistent envelope like: {status, message, data, error}. This simplifies client handling and reduces branching.

I can draft a small ResponseFactory utility and apply it across these handlers if you’d like.

Also applies to: 83-112, 117-158, 164-189, 190-218, 219-251

services/press-defect-data-simulator-service/app/services/model_client.py (2)

5-5: Remove unused import.

json is imported but never used.

-import json

140-167: Reuse a single ClientSession across retries to reduce connection churn.

Each attempt creates a new session, which is inefficient. Consider hoisting ClientSession outside the retry loop and reusing it. Keep timeouts per-request.

I can provide a concrete refactor if you’d like this change now.

services/press-defect-data-simulator-service/app/utils/logger.py (2)

20-27: Ensure parent directories exist.

Path.mkdir without parents=True may fail if nested paths are provided. Use parents=True to be robust.

- log_dir.mkdir(exist_ok=True)
+ log_dir.mkdir(parents=True, exist_ok=True)
@@
- (log_dir / "simulation").mkdir(exist_ok=True)
- (log_dir / "errors").mkdir(exist_ok=True)
- (log_dir / "results").mkdir(exist_ok=True)
+ (log_dir / "simulation").mkdir(parents=True, exist_ok=True)
+ (log_dir / "errors").mkdir(parents=True, exist_ok=True)
+ (log_dir / "results").mkdir(parents=True, exist_ok=True)

120-145: Remove unnecessary f-strings.

These lines don’t interpolate variables; drop the f prefix to satisfy linters (F541).

-            message = f"☁️ Azure Storage 연결 성공"
+            message = "☁️ Azure Storage 연결 성공"
@@
-            message = f"❌ Azure Storage 연결 실패"
+            message = "❌ Azure Storage 연결 실패"
@@
-            message = f"🤖 모델 서비스 연결 성공"
+            message = "🤖 모델 서비스 연결 성공"
@@
-            message = f"❌ 모델 서비스 연결 실패"
+            message = "❌ 모델 서비스 연결 실패"
services/press-defect-data-simulator-service/app/services/scheduler_service.py (4)

175-205: Start path: set stop_event state and thread name for observability.

Minor: clear the stop event before starting; naming the thread helps debugging.

-# 스케줄러 시작
+self._stop_event.clear()
 # 스케줄러 시작
@@
- self.scheduler_thread = threading.Thread(target=self._run_scheduler)
+ self.scheduler_thread = threading.Thread(target=self._run_scheduler, name="sim-scheduler-thread")

147-155: Creating a new event loop per tick is heavy.

asyncio.run(...) per execution is acceptable for MVP but can be expensive. Consider dedicating an event loop in the scheduler thread and running coroutines on it; that would also enable graceful cancellation.

I can provide a targeted refactor to keep a long-lived loop in the scheduler thread if you want to pursue this now.


325-336: Health check should re-probe even when currently unhealthy.

Currently, if azure_ready/model_ready are False, you don’t retry. This makes recovery (e.g., transient outage) invisible until restart.

-            if self.azure_ready:
-                azure_test = await azure_storage_service.test_connection()
-                self.azure_ready = azure_test
+            azure_test = await azure_storage_service.test_connection()
+            self.azure_ready = azure_test
@@
-            if self.model_ready:
-                model_test = await model_service_client.test_connection()
-                self.model_ready = model_test
+            model_test = await model_service_client.test_connection()
+            self.model_ready = model_test

163-171: Tag the scheduled job for precise control.

Store the schedule job with a tag so you can clear just this job on restart rather than nuking all jobs.

- schedule.every(settings.scheduler_interval_seconds).seconds.do(self._schedule_job)
+ schedule.every(settings.scheduler_interval_seconds).seconds.do(self._schedule_job).tag(self._job_tag)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between abddb82 and 222fb3f.

📒 Files selected for processing (10)
  • services/press-defect-data-simulator-service/Dockerfile (1 hunks)
  • services/press-defect-data-simulator-service/app/config/settings.py (1 hunks)
  • services/press-defect-data-simulator-service/app/main.py (1 hunks)
  • services/press-defect-data-simulator-service/app/routers/connection_test_router.py (1 hunks)
  • services/press-defect-data-simulator-service/app/routers/simulator_router.py (1 hunks)
  • services/press-defect-data-simulator-service/app/services/azure_storage.py (1 hunks)
  • services/press-defect-data-simulator-service/app/services/model_client.py (1 hunks)
  • services/press-defect-data-simulator-service/app/services/scheduler_service.py (1 hunks)
  • services/press-defect-data-simulator-service/app/utils/logger.py (1 hunks)
  • services/press-defect-data-simulator-service/requirements.txt (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (8)
services/press-defect-data-simulator-service/app/utils/logger.py (2)
services/press-fault-data-simulator-service/tests/utils/test_logger.py (5)
  • _S (64-67)
  • test_prediction_file_handler_writes_only_warning_and_above (220-258)
  • import_logger_module (77-88)
  • test_setup_loggers_creates_log_dir_and_configures_handlers (164-217)
  • fake_settings (50-74)
services/press-fault-data-simulator-service/app/utils/logger.py (2)
  • CustomJsonFormatter (30-37)
  • setup_loggers (40-93)
services/press-defect-data-simulator-service/app/routers/connection_test_router.py (2)
services/press-defect-data-simulator-service/app/services/azure_storage.py (5)
  • initialize (21-40)
  • get_service_status (252-260)
  • get_inspection_list (66-98)
  • validate_inspection_completeness (216-250)
  • get_inspection_images (100-140)
services/press-defect-data-simulator-service/app/services/model_client.py (3)
  • get_client_status (292-303)
  • check_all_services (256-290)
  • check_service_health (56-88)
services/press-defect-data-simulator-service/app/main.py (4)
services/press-defect-data-simulator-service/app/config/settings.py (2)
  • validate_settings (94-110)
  • get_settings_summary (63-91)
services/press-defect-data-simulator-service/app/services/scheduler_service.py (2)
  • stop_scheduler (218-244)
  • get_scheduler_status (297-323)
services/press-defect-data-simulator-service/app/services/azure_storage.py (1)
  • get_service_status (252-260)
services/press-defect-data-simulator-service/app/services/model_client.py (1)
  • get_client_status (292-303)
services/press-defect-data-simulator-service/app/services/azure_storage.py (1)
services/press-defect-data-simulator-service/app/utils/logger.py (1)
  • log_azure_connection (120-131)
services/press-defect-data-simulator-service/app/config/settings.py (1)
services/press-fault-data-simulator-service/app/config/settings.py (2)
  • Settings (7-47)
  • PREDICTION_API_FULL_URL (30-35)
services/press-defect-data-simulator-service/app/services/scheduler_service.py (4)
services/press-defect-data-simulator-service/app/config/settings.py (4)
  • update_simulation_status (113-115)
  • increment_simulation_stats (117-123)
  • update_current_inspection_id (125-127)
  • get_simulation_stats (129-142)
services/press-defect-data-simulator-service/app/utils/logger.py (4)
  • log_simulation_start (73-76)
  • log_simulation_failure (90-99)
  • log_simulation_success (78-88)
  • log_anomaly_detection (101-111)
services/press-defect-data-simulator-service/app/services/azure_storage.py (2)
  • initialize (21-40)
  • download_inspection_images (175-214)
services/press-defect-data-simulator-service/app/services/model_client.py (2)
  • check_model_ready (90-116)
  • predict_inspection (135-227)
services/press-defect-data-simulator-service/app/routers/simulator_router.py (2)
services/press-defect-data-simulator-service/app/config/settings.py (2)
  • get_simulation_stats (129-142)
  • get_settings_summary (63-91)
services/press-defect-data-simulator-service/app/services/scheduler_service.py (5)
  • get_scheduler_status (297-323)
  • start_scheduler (175-216)
  • stop_scheduler (218-244)
  • restart_scheduler (246-259)
  • manual_execution (261-295)
services/press-defect-data-simulator-service/app/services/model_client.py (2)
services/press-defect-data-simulator-service/app/utils/logger.py (1)
  • log_model_service (133-144)
services/press-defect-detection-model-service/app/main.py (2)
  • model_info (154-174)
  • root (82-99)
🪛 Ruff (0.12.2)
services/press-defect-data-simulator-service/app/utils/logger.py

123-123: f-string without any placeholders

Remove extraneous f prefix

(F541)


128-128: f-string without any placeholders

Remove extraneous f prefix

(F541)


136-136: f-string without any placeholders

Remove extraneous f prefix

(F541)


141-141: f-string without any placeholders

Remove extraneous f prefix

(F541)

services/press-defect-data-simulator-service/app/routers/connection_test_router.py

81-81: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


115-115: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


162-162: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


182-182: Do not use bare except

(E722)


188-188: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


217-217: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


251-251: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

services/press-defect-data-simulator-service/app/main.py

1-1: asyncio imported but unused

Remove unused import: asyncio

(F401)


204-204: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

services/press-defect-data-simulator-service/app/services/azure_storage.py

2-2: io imported but unused

Remove unused import: io

(F401)


4-4: azure.storage.blob.BlobClient imported but unused

Remove unused import: azure.storage.blob.BlobClient

(F401)


7-7: PIL.Image imported but unused

Remove unused import: PIL.Image

(F401)

services/press-defect-data-simulator-service/app/config/settings.py

2-2: typing.Optional imported but unused

Remove unused import: typing.Optional

(F401)

services/press-defect-data-simulator-service/app/services/scheduler_service.py

5-5: typing.Optional imported but unused

Remove unused import: typing.Optional

(F401)

services/press-defect-data-simulator-service/app/routers/simulator_router.py

2-2: typing.List imported but unused

Remove unused import: typing.List

(F401)


39-39: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


66-66: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


94-94: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


121-121: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


149-149: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


178-178: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


212-212: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


246-246: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


303-303: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

services/press-defect-data-simulator-service/app/services/model_client.py

5-5: json imported but unused

Remove unused import: json

(F401)


30-31: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)


33-33: Local variable data is assigned to but never used

Remove assignment to unused variable data

(F841)


95-96: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)


123-124: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)


239-244: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

🔇 Additional comments (2)
services/press-defect-data-simulator-service/app/utils/logger.py (1)

46-70: LGTM on structured rotating handlers.

Console + rotating file handlers with module/function context looks good and aligns with settings-driven levels.

services/press-defect-data-simulator-service/app/services/scheduler_service.py (1)

188-210: LGTM on startup sequencing and status logging.

Initialization gating, status logs, and next_execution_time setup are clear and help operability. Once the thread-safety fix lands, this flow should behave reliably.

Comment on lines +18 to +26
azure_connection_string: str = os.getenv("AZURE_CONNECTION_STRING")
azure_container_name: str = "simulator-data"
azure_press_defect_path: str = "press-defect"

# 모델 서비스 설정
model_service_url: str = os.getenv("MODEL_SERVICE_URL", "http://127.0.0.1:8000")
model_service_predict_endpoint: str = "/predict/inspection"
model_service_health_endpoint: str = "/health"
model_service_timeout: int = 300 # 5분 (21장 이미지 처리)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid os.getenv in field defaults; let Pydantic Settings load env vars with type safety

Using os.getenv at import time bypasses pydantic-settings’ env loading and can yield type-mismatch defaults (None for str). Use Field(..., env="ENV_NAME") and proper Optional types.

-    azure_connection_string: str = os.getenv("AZURE_CONNECTION_STRING")
+    azure_connection_string: Optional[str] = Field(default=None, env="AZURE_CONNECTION_STRING")
@@
-    model_service_url: str = os.getenv("MODEL_SERVICE_URL", "http://127.0.0.1:8000")
+    model_service_url: str = Field(default="http://127.0.0.1:8000", env="MODEL_SERVICE_URL")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
azure_connection_string: str = os.getenv("AZURE_CONNECTION_STRING")
azure_container_name: str = "simulator-data"
azure_press_defect_path: str = "press-defect"
# 모델 서비스 설정
model_service_url: str = os.getenv("MODEL_SERVICE_URL", "http://127.0.0.1:8000")
model_service_predict_endpoint: str = "/predict/inspection"
model_service_health_endpoint: str = "/health"
model_service_timeout: int = 300 # 5분 (21장 이미지 처리)
azure_connection_string: Optional[str] = Field(default=None, env="AZURE_CONNECTION_STRING")
azure_container_name: str = "simulator-data"
azure_press_defect_path: str = "press-defect"
# 모델 서비스 설정
model_service_url: str = Field(default="http://127.0.0.1:8000", env="MODEL_SERVICE_URL")
model_service_predict_endpoint: str = "/predict/inspection"
model_service_health_endpoint: str = "/health"
model_service_timeout: int = 300 # 5분 (21장 이미지 처리)
🤖 Prompt for AI Agents
In services/press-defect-data-simulator-service/app/config/settings.py around
lines 18–26, the fields use os.getenv at import time which bypasses pydantic
Settings env handling and can produce wrong defaults; change these to Pydantic
Field declarations that bind to env names and proper types: make
azure_connection_string Optional[str] with Field(None,
env="AZURE_CONNECTION_STRING"), leave azure_container_name and
azure_press_defect_path as string constants or use Field("simulator-data",
env="AZURE_CONTAINER_NAME") / Field("press-defect",
env="AZURE_PRESS_DEFECT_PATH") if you want them configurable, replace
model_service_url with Field("http://127.0.0.1:8000", env="MODEL_SERVICE_URL"),
model_service_predict_endpoint and model_service_health_endpoint can be
constants or Fields if needed, and model_service_timeout should be an int
Field(300, env="MODEL_SERVICE_TIMEOUT") so pydantic will cast the env var to
int; remove all os.getenv usages and rely on pydantic Settings to load and
validate types.

Comment on lines +55 to +58
class Config:
case_sensitive = False
env_file = ".env"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Pydantic v2 config style: use SettingsConfigDict instead of v1-style Config

In pydantic-settings v2, prefer model_config = SettingsConfigDict(...). This also keeps consistency with the existing press-fault service.

-    class Config:
-        case_sensitive = False
-        env_file = ".env"
+    model_config = SettingsConfigDict(
+        case_sensitive=False,
+        env_file=".env",
+        env_file_encoding="utf-8",
+        extra="ignore",
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class Config:
case_sensitive = False
env_file = ".env"
# replace the v1-style Config class with a v2-style SettingsConfigDict
- class Config:
- case_sensitive = False
model_config = SettingsConfigDict(
case_sensitive=False,
env_file=".env",
env_file_encoding="utf-8",
extra="ignore",
)
🤖 Prompt for AI Agents
In services/press-defect-data-simulator-service/app/config/settings.py around
lines 55–58, replace the v1-style inner Config class with a v2-style
model_config: import SettingsConfigDict from pydantic_settings and set
model_config = SettingsConfigDict(case_sensitive=False, env_file=".env") on the
settings model, removing the old Config class; ensure you update imports
accordingly so the file no longer defines class Config and uses model_config
instead.

Comment on lines +112 to +124
# 설정 업데이트 함수들
def update_simulation_status(enabled: bool):
"""시뮬레이션 상태 업데이트"""
settings.simulation_enabled = enabled

def increment_simulation_stats(success: bool):
"""시뮬레이션 통계 업데이트"""
settings.total_simulations += 1
if success:
settings.successful_simulations += 1
else:
settings.failed_simulations += 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Thread-safety: protect shared mutable state updates

update_simulation_status/increment_simulation_stats mutate shared state accessed from an async app and a background thread. Add a lock to avoid races.

+from threading import Lock
+
+# Protect runtime mutable stats across threads/tasks
+_stats_lock = Lock()
@@
 def update_simulation_status(enabled: bool):
     """시뮬레이션 상태 업데이트"""
-    settings.simulation_enabled = enabled
+    with _stats_lock:
+        settings.simulation_enabled = enabled
@@
 def increment_simulation_stats(success: bool):
     """시뮬레이션 통계 업데이트"""
-    settings.total_simulations += 1
-    if success:
-        settings.successful_simulations += 1
-    else:
-        settings.failed_simulations += 1
+    with _stats_lock:
+        settings.total_simulations += 1
+        if success:
+            settings.successful_simulations += 1
+        else:
+            settings.failed_simulations += 1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# 설정 업데이트 함수들
def update_simulation_status(enabled: bool):
"""시뮬레이션 상태 업데이트"""
settings.simulation_enabled = enabled
def increment_simulation_stats(success: bool):
"""시뮬레이션 통계 업데이트"""
settings.total_simulations += 1
if success:
settings.successful_simulations += 1
else:
settings.failed_simulations += 1
from threading import Lock
# Protect runtime mutable stats across threads/tasks
_stats_lock = Lock()
# 설정 업데이트 함수들
def update_simulation_status(enabled: bool):
"""시뮬레이션 상태 업데이트"""
with _stats_lock:
settings.simulation_enabled = enabled
def increment_simulation_stats(success: bool):
"""시뮬레이션 통계 업데이트"""
with _stats_lock:
settings.total_simulations += 1
if success:
settings.successful_simulations += 1
else:
settings.failed_simulations += 1
🤖 Prompt for AI Agents
In services/press-defect-data-simulator-service/app/config/settings.py around
lines 112 to 124, the functions update_simulation_status and
increment_simulation_stats mutate shared module-level state without
synchronization, causing race conditions between async tasks and a background
thread; fix by creating a module-level threading.Lock (e.g., settings_lock) and
wrapping all reads/writes to settings.simulation_enabled,
settings.total_simulations, settings.successful_simulations, and
settings.failed_simulations in a with settings_lock: block so mutations are
atomic and exceptions still release the lock.

Comment on lines +125 to +128
def update_current_inspection_id(inspection_id: int):
"""현재 처리 중인 inspection ID 업데이트"""
settings.current_inspection_id = inspection_id

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Apply the same lock for current_inspection_id updates

current_inspection_id is also shared across routes and the scheduler.

 def update_current_inspection_id(inspection_id: int):
     """현재 처리 중인 inspection ID 업데이트"""
-    settings.current_inspection_id = inspection_id
+    with _stats_lock:
+        settings.current_inspection_id = inspection_id
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def update_current_inspection_id(inspection_id: int):
"""현재 처리 중인 inspection ID 업데이트"""
settings.current_inspection_id = inspection_id
def update_current_inspection_id(inspection_id: int):
"""현재 처리 중인 inspection ID 업데이트"""
with _stats_lock:
settings.current_inspection_id = inspection_id
🤖 Prompt for AI Agents
In services/press-defect-data-simulator-service/app/config/settings.py around
lines 125 to 128, the update_current_inspection_id function mutates a shared
variable without using the same synchronization primitive used elsewhere; wrap
the update in the shared lock (use the same lock object used for other shared
settings, e.g., settings.current_inspection_lock or whatever lock variable is
defined in this module) so the assignment to settings.current_inspection_id is
performed while holding that lock (use a context manager or acquire/release
around the assignment).

Comment on lines +230 to +239
# 우아한 종료를 위한 시그널 핸들러
def signal_handler(signum, frame):
"""시그널 핸들러 (Ctrl+C 등)"""
simulator_logger.logger.info(f"🛑 종료 신호 수신: {signum}")
sys.exit(0)

# 시그널 등록
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Let Uvicorn own signal handling; remove custom handler to guarantee graceful lifespan shutdown

Uvicorn already manages SIGINT/SIGTERM and triggers FastAPI lifespan shutdown (where you stop the scheduler). Overriding it with sys.exit(0) risks skipping cleanup. Recommend removing this section.

-# 우아한 종료를 위한 시그널 핸들러
-def signal_handler(signum, frame):
-    """시그널 핸들러 (Ctrl+C 등)"""
-    simulator_logger.logger.info(f"🛑 종료 신호 수신: {signum}")
-    sys.exit(0)
-
-# 시그널 등록
-signal.signal(signal.SIGINT, signal_handler)
-signal.signal(signal.SIGTERM, signal_handler)
+# Uvicorn handles SIGINT/SIGTERM and will invoke lifespan shutdown.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# 우아한 종료를 위한 시그널 핸들러
def signal_handler(signum, frame):
"""시그널 핸들러 (Ctrl+C 등)"""
simulator_logger.logger.info(f"🛑 종료 신호 수신: {signum}")
sys.exit(0)
# 시그널 등록
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
# Uvicorn handles SIGINT/SIGTERM and will invoke lifespan shutdown.
🤖 Prompt for AI Agents
In services/press-defect-data-simulator-service/app/main.py around lines
230-239, remove the custom signal_handler function and the two
signal.signal(...) registrations so Uvicorn can own SIGINT/SIGTERM handling;
instead ensure any shutdown cleanup (e.g., stopping the scheduler) is
implemented in the FastAPI lifespan or shutdown event handlers so cleanup runs
during Uvicorn-triggered graceful shutdown.

Comment on lines +25 to +55
async def test_connection(self) -> bool:
"""모델 서비스 연결 테스트"""
try:
health_url = f"{self.base_url}{self.health_endpoint}"

async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=10)) as session:
async with session.get(health_url) as response:
if response.status == 200:
data = await response.json()
self.service_available = True
self.last_health_check = time.time()

simulator_logger.log_model_service(
True,
f"응답시간: {response.headers.get('x-process-time', 'N/A')}"
)
return True
else:
simulator_logger.log_model_service(
False,
f"HTTP {response.status}: {await response.text()}"
)
return False

except aiohttp.ClientError as e:
simulator_logger.log_model_service(False, f"연결 오류: {str(e)}")
return False
except Exception as e:
simulator_logger.log_model_service(False, f"알 수 없는 오류: {str(e)}")
return False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Reset availability on failures and avoid unused JSON read.

  • test_connection should set service_available=False on non-200 or exceptions; also record last_health_check timestamp in all cases.
  • The response body assigned to data is unused; skip parsing to reduce overhead.
 async def test_connection(self) -> bool:
@@
-    async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=10)) as session:
-        async with session.get(health_url) as response:
-            if response.status == 200:
-                data = await response.json()
-                self.service_available = True
-                self.last_health_check = time.time()
-                
-                simulator_logger.log_model_service(
-                    True, 
-                    f"응답시간: {response.headers.get('x-process-time', 'N/A')}"
-                )
-                return True
-            else:
-                simulator_logger.log_model_service(
-                    False, 
-                    f"HTTP {response.status}: {await response.text()}"
-                )
-                return False
+    async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=10)) as session:
+        start = time.time()
+        async with session.get(health_url) as response:
+            self.last_health_check = time.time()
+            if response.status == 200:
+                self.service_available = True
+                simulator_logger.log_model_service(
+                    True,
+                    f"응답시간: {response.headers.get('x-process-time', f'{self.last_health_check - start:.3f}s')}"
+                )
+                return True
+            else:
+                self.service_available = False
+                simulator_logger.log_model_service(
+                    False,
+                    f"HTTP {response.status}: {await response.text()}"
+                )
+                return False
@@
-except aiohttp.ClientError as e:
-    simulator_logger.log_model_service(False, f"연결 오류: {str(e)}")
-    return False
+except aiohttp.ClientError as e:
+    self.service_available = False
+    self.last_health_check = time.time()
+    simulator_logger.log_model_service(False, f"연결 오류: {str(e)}")
+    return False
 except Exception as e:
-    simulator_logger.log_model_service(False, f"알 수 없는 오류: {str(e)}")
-    return False
+    self.service_available = False
+    self.last_health_check = time.time()
+    simulator_logger.log_model_service(False, f"알 수 없는 오류: {str(e)}")
+    return False
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def test_connection(self) -> bool:
"""모델 서비스 연결 테스트"""
try:
health_url = f"{self.base_url}{self.health_endpoint}"
async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=10)) as session:
async with session.get(health_url) as response:
if response.status == 200:
data = await response.json()
self.service_available = True
self.last_health_check = time.time()
simulator_logger.log_model_service(
True,
f"응답시간: {response.headers.get('x-process-time', 'N/A')}"
)
return True
else:
simulator_logger.log_model_service(
False,
f"HTTP {response.status}: {await response.text()}"
)
return False
except aiohttp.ClientError as e:
simulator_logger.log_model_service(False, f"연결 오류: {str(e)}")
return False
except Exception as e:
simulator_logger.log_model_service(False, f"알 수 없는 오류: {str(e)}")
return False
async def test_connection(self) -> bool:
"""모델 서비스 연결 테스트"""
try:
health_url = f"{self.base_url}{self.health_endpoint}"
async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=10)) as session:
start = time.time()
async with session.get(health_url) as response:
# Record health check timestamp immediately
self.last_health_check = time.time()
if response.status == 200:
self.service_available = True
simulator_logger.log_model_service(
True,
f"응답시간: {response.headers.get('x-process-time', f'{self.last_health_check - start:.3f}s')}"
)
return True
else:
self.service_available = False
simulator_logger.log_model_service(
False,
f"HTTP {response.status}: {await response.text()}"
)
return False
except aiohttp.ClientError as e:
self.service_available = False
self.last_health_check = time.time()
simulator_logger.log_model_service(False, f"연결 오류: {str(e)}")
return False
except Exception as e:
self.service_available = False
self.last_health_check = time.time()
simulator_logger.log_model_service(False, f"알 수 없는 오류: {str(e)}")
return False
🧰 Tools
🪛 Ruff (0.12.2)

30-31: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)


33-33: Local variable data is assigned to but never used

Remove assignment to unused variable data

(F841)

🤖 Prompt for AI Agents
In services/press-defect-data-simulator-service/app/services/model_client.py
around lines 25 to 55, the test_connection flow currently leaves
service_available true only on success and parses JSON into an unused variable;
modify it so that on any non-200 response or any exception service_available is
explicitly set to False, ensure last_health_check is updated for both success
and failure paths (set time.time() before each return), and remove the unused
await response.json() call (only read headers/text as needed) to avoid
unnecessary parsing and overhead.

Comment on lines +229 to +237
async def predict_single_image(self, image_base64: str, image_name: str = "test") -> Tuple[bool, Optional[Dict[str, Any]], Optional[str]]:
"""단일 이미지 예측 요청 (테스트용)"""
try:
predict_url = f"{self.base_url}/predict/"

request_data = {
"image_base64": image_base64,
"image_name": image_name
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use the configured predict endpoint consistently.

predict_single_image hardcodes “/predict/”, diverging from settings.model_service_predict_endpoint. Use the configured endpoint to avoid drift.

- predict_url = f"{self.base_url}/predict/"
+ predict_url = f"{self.base_url}{self.predict_endpoint}"

Also applies to: 239-251

🤖 Prompt for AI Agents
In services/press-defect-data-simulator-service/app/services/model_client.py
around lines 229-237 (and similarly lines 239-251), the method
predict_single_image hardcodes the path "/predict/" instead of using the
configured settings.model_service_predict_endpoint; update the code to construct
predict_url by joining self.base_url with the configured endpoint (e.g.,
self.settings.model_service_predict_endpoint or equivalent attribute on the
client), ensuring any leading/trailing slashes are handled so you don't produce
duplicate slashes, and replace all other hardcoded uses in that method (lines
239-251) to use the same configured endpoint.

Comment on lines +231 to +237
# 스케줄 클리어
schedule.clear()

# 스레드 종료 대기 (최대 5초)
if self.scheduler_thread and self.scheduler_thread.is_alive():
self.scheduler_thread.join(timeout=5)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Thread-safety: schedule.clear() called from a different thread can race with run_pending().

schedule is not thread-safe. Clearing jobs from the caller thread while the scheduler thread is iterating may cause sporadic failures.

Introduce a stop_event and a lock around all schedule operations; let the scheduler thread perform schedule.clear() during shutdown.

@@
-import schedule
+import schedule
+import threading
@@
-    def __init__(self):
+    def __init__(self):
         self.running = False
         self.scheduler_thread = None
+        self._stop_event = threading.Event()
+        self._sched_lock = threading.Lock()
+        self._job_tag = "sim_job"
@@
-    def _run_scheduler(self):
+    def _run_scheduler(self):
         """스케줄러 실행 (별도 스레드)"""
         try:
             simulator_logger.log_scheduler_event("스케줄러 스레드 시작")
             
             # 스케줄 등록
-            schedule.every(settings.scheduler_interval_seconds).seconds.do(self._schedule_job)
+            with self._sched_lock:
+                schedule.every(settings.scheduler_interval_seconds).seconds.do(self._schedule_job).tag(self._job_tag)
             
-            while self.running:
-                schedule.run_pending()
+            while self.running and not self._stop_event.is_set():
+                with self._sched_lock:
+                    schedule.run_pending()
                 time.sleep(1)  # 1초마다 스케줄 확인
             
+            # Ensure jobs are cleared on exit
+            with self._sched_lock:
+                schedule.clear(self._job_tag)
             simulator_logger.log_scheduler_event("스케줄러 스레드 종료")

And in stop_scheduler():

-    # 스케줄러 중지
-    self.running = False
-    
-    # 스케줄 클리어
-    schedule.clear()
+    # 스케줄러 중지
+    self.running = False
+    self._stop_event.set()

Comment on lines +281 to +285
return {
"success": success,
"inspection_id": self.current_inspection_id - 1, # 실행 후 증가했으므로 -1
"timestamp": datetime.now().isoformat()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Return the actually executed inspection_id.

Using self.current_inspection_id - 1 breaks when wrapping to start_inspection_id (can return 0). Capture executed_id before increment and return that.

-                return {
-                    "success": success,
-                    "inspection_id": self.current_inspection_id - 1,  # 실행 후 증가했으므로 -1
-                    "timestamp": datetime.now().isoformat()
-                }
+                executed_id = (self.current_inspection_id - 1) if self.current_inspection_id > settings.start_inspection_id else settings.max_inspection_count
+                return {
+                    "success": success,
+                    "inspection_id": executed_id,
+                    "timestamp": datetime.now().isoformat()
+                }

Or simply have execute_single_simulation return the executed_id alongside success to avoid inference here.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return {
"success": success,
"inspection_id": self.current_inspection_id - 1, # 실행 후 증가했으므로 -1
"timestamp": datetime.now().isoformat()
}
executed_id = (
self.current_inspection_id - 1
if self.current_inspection_id > settings.start_inspection_id
else settings.max_inspection_count
)
return {
"success": success,
"inspection_id": executed_id,
"timestamp": datetime.now().isoformat()
}
🤖 Prompt for AI Agents
In
services/press-defect-data-simulator-service/app/services/scheduler_service.py
around lines 281-285, the code returns self.current_inspection_id - 1 which can
be incorrect when the inspection id wraps (yielding 0); capture the actual
executed_id before you increment or advance self.current_inspection_id and
return that executed_id (or change execute_single_simulation to return
executed_id alongside success and use that), ensuring you replace the -1
arithmetic with the captured executed_id in the returned dict and update any
callers accordingly.

Comment on lines +32 to +35
# 헬스체크 설정
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
CMD curl -f http://localhost:8013/health || exit 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

HEALTHCHECK will fail: curl isn’t installed

The image never installs curl, yet the HEALTHCHECK uses curl. Either install curl or switch to wget/python.

 RUN apt-get update && apt-get install -y \
-    gcc \
-    g++ \
+    gcc \
+    g++ \
+    curl \
     && rm -rf /var/lib/apt/lists/*

Alternatively:

-HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
-    CMD curl -f http://localhost:8013/health || exit 1
+HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
+    CMD python - <<'PY' || exit 1
+import sys, urllib.request
+try:
+    with urllib.request.urlopen("http://localhost:8013/health", timeout=8) as r:
+        sys.exit(0 if r.status == 200 else 1)
+except Exception:
+    sys.exit(1)
+PY
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# 헬스체크 설정
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
CMD curl -f http://localhost:8013/health || exit 1
# Install necessary build tools (and curl for the healthcheck)
RUN apt-get update && apt-get install -y \
gcc \
g++ \
curl \
&& rm -rf /var/lib/apt/lists/*
# 헬스체크 설정
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
CMD curl -f http://localhost:8013/health || exit 1
🤖 Prompt for AI Agents
In services/press-defect-data-simulator-service/Dockerfile around lines 32-35
the HEALTHCHECK uses curl but the image never installs curl; either install curl
in the image or change the HEALTHCHECK to use an available tool. Fix by adding a
package install step for curl in the Dockerfile matching the base image (e.g.,
for Alpine add apk add --no-cache curl; for Debian/Ubuntu add apt-get update &&
apt-get install -y curl && rm -rf /var/lib/apt/lists/*), or replace the
HEALTHCHECK command with an equivalent using wget or a small python one-liner if
python is present.

@haeyekim haeyekim changed the title Feat/66/press defect simulator Feat/66/press-defect-data-simulator-service: 프레스 공정 - 생산품 결함 시뮬레이터 구현 Aug 26, 2025
@TaeHyunaivle TaeHyunaivle merged commit 78f36bc into main Aug 27, 2025
17 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments