Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporting Custom Trained Model #45

Closed
NeuralNoble opened this issue Jul 14, 2024 · 3 comments
Closed

Exporting Custom Trained Model #45

NeuralNoble opened this issue Jul 14, 2024 · 3 comments

Comments

@NeuralNoble
Copy link

Hello,

First, thank you for this amazing repo! I have a question regarding the usage of custom trained models with CoreML INT8 export.

In the documentation, you mentioned:

Export CoreML INT8 models using the ultralytics Python package (with pip install ultralytics), or download them from our GitHub release assets. You should have 5 YOLOv8 models in total. Place these in the YOLO/Models directory as seen in the Xcode screenshot below.

from ultralytics import YOLO

# Loop through all YOLOv8 model sizes
for size in ("n", "s", "m", "l", "x"):
    # Load a YOLOv8 PyTorch model
    model = YOLO(f"yolov8{size}.pt")

    # Export the PyTorch model to CoreML INT8 format with NMS layers
    model.export(format="coreml", int8=True, nms=True, imgsz=[640, 384])

I have custom trained a YOLO model and would like to use it within the iOS app. Should I export my custom model using the same process mentioned above? Specifically, should I follow the same format and parameters (format="coreml", int8=True, nms=True, imgsz=[640, 384]) for exporting my custom model?

Additionally, if I want to use my custom model, do I need to train all 5 sizes (n, s, m, l, x) and export all of them, or can I just use a single custom trained model? If I need to train and export all 5 sizes, should I avoid using the provided code and handle the export process separately for each size?

Any guidance or additional steps required for custom trained models would be greatly appreciated.

Thank you!

@pderrenger
Copy link
Member

@NeuralNoble hello,

Thank you for your kind words and for using our repository! 😊

To address your questions:

  1. Exporting Custom Trained Models:
    Yes, you can export your custom trained YOLO model to CoreML INT8 format using the same process. The parameters format="coreml", int8=True, nms=True, and imgsz=[640, 384] are applicable for custom models as well. Here’s a concise example for exporting your custom model:

    from ultralytics import YOLO
    
    # Load your custom trained YOLO model
    model = YOLO("path/to/your/custom_model.pt")
    
    # Export the custom model to CoreML INT8 format with NMS layers
    model.export(format="coreml", int8=True, nms=True, imgsz=[640, 384])
  2. Training and Exporting Multiple Sizes:
    You do not need to train and export all 5 sizes (n, s, m, l, x) unless your application specifically requires models of different sizes. If a single custom trained model meets your needs, you can proceed with just that one. The provided code snippet is a loop for convenience if you have multiple models, but it is not mandatory to use all sizes.

  3. Additional Guidance:
    Ensure you are using the latest version of the ultralytics package to avoid any potential issues. If you encounter any difficulties, please verify that the issue persists with the latest package versions.

If you have any further questions or run into any issues, feel free to ask. The YOLO community and the Ultralytics team are here to help!

@glenn-jocher
Copy link
Member

@NeuralNoble thanks for asking!

The FastSAM_sInput class you've shown is indeed focused only on the image input, which is typical for many Core ML vision models. However, FastSAM's prompts (like bounding boxes or points) are typically used in post-processing, after the main model inference.

Here's a suggested approach to handle this:

  1. Run the FastSAM Core ML model on the input image.
  2. Implement the post-processing and prompting logic in Swift.

For the post-processing step, you'll need to implement the following:

  1. Decode the model output (probably segmentation masks).
  2. Apply the prompts (bounding boxes or points) to filter or select the appropriate segments.

Here's a rough outline of how you might structure this in your Swift code:

class FastSAMProcessor {
    let model: FastSAM_s
    
    init() throws {
        self.model = try FastSAM_s(configuration: MLModelConfiguration())
    }
    
    func process(image: CVPixelBuffer, prompt: FastSAMPrompt) throws -> [Mask] {
        // 1. Run the model
        let input = FastSAM_sInput(image: image)
        let output = try model.prediction(input: input)
        
        // 2. Decode the output
        let masks = decodeMasks(from: output)
        
        // 3. Apply the prompt
        let filteredMasks = applyPrompt(prompt, to: masks)
        
        return filteredMasks
    }
    
    private func decodeMasks(from output: FastSAM_sOutput) -> [Mask] {
        // Implement mask decoding logic here
    }
    
    private func applyPrompt(_ prompt: FastSAMPrompt, to masks: [Mask]) -> [Mask] {
        // Implement prompt application logic here
    }
}

enum FastSAMPrompt {
    case boundingBox(CGRect)
    case point(CGPoint)
    // Add other prompt types as needed
}

struct Mask {
    // Define your mask structure here
}

This approach allows you to use the Core ML model as-is, without modifying its inputs, and then apply the FastSAM-specific logic in Swift.

For the prompt application logic, you'll need to implement the algorithms described in the FastSAM paper or repository. This might involve operations like:

  • For bounding box prompts: Selecting masks that have a significant overlap with the given box.
  • For point prompts: Selecting masks that contain the given point.

The exact implementation will depend on the specific output format of your Core ML model and the details of how FastSAM uses these prompts.

@john-rocky
Copy link
Contributor

@NeuralNoble
Thank you for your interest in using a custom model.

In order to use a custom model with the code in this repository, you must set nms=True when exporting. This option adds necessary post-processing to the model.

int8=True is a quantization option to make the model more compact and for faster inference on iOS. It works on iOS without it.

imgsz=[640, 640] can be changed to any size you like. However, larger sizes will increase memory consumption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants