Skip to content

Commit

Permalink
Initial commit: ComfyUI TogetherVision node with version 2.1.1
Browse files Browse the repository at this point in the history
  • Loading branch information
theshubzworld committed Jan 24, 2025
0 parents commit f6cb698
Show file tree
Hide file tree
Showing 17 changed files with 1,173 additions and 0 deletions.
24 changes: 24 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: Publish to Comfy registry
on:
workflow_dispatch:
push:
branches:
- main
- master
paths:
- "pyproject.toml"

jobs:
publish-node:
name: Publish Custom Node to registry
runs-on: ubuntu-latest
# if this is a forked repository. Skipping the workflow.
if: github.event.repository.fork == false
steps:
- name: Check out code
uses: actions/checkout@v4
- name: Publish Custom Node
uses: Comfy-Org/publish-node-action@main
with:
## Add your own personal access token to your Github Repository secrets and reference it here.
personal_access_token: ${{ secrets.REGISTRY_ACCESS_TOKEN }}
25 changes: 25 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Environment files
.env
*.env

# Python cache files
__pycache__/
*.py[cod]
*$py.class

# Virtual environments
venv/
env/
.venv/

# IDE-specific files
.vscode/
.idea/

# Logs and databases
*.log
*.sqlite3

# OS generated files
.DS_Store
Thumbs.db
16 changes: 16 additions & 0 deletions .tracking
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
.env
.github/workflows/publish.yml
.gitignore
DESCRIPTION.md
LICENSE
README.md
Workflows/TogetherVision+Image Generator.json
__init__.py
icon.svg
images/Latest.png
images/node-screenshot-old.png
images/node-screenshot.png
pyproject.toml
requirements.txt
together_image_node.py
together_vision_node.py
52 changes: 52 additions & 0 deletions DESCRIPTION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
## ComfyUI Together Vision Node

A custom ComfyUI node that leverages Together AI's Vision models to generate detailed descriptions of images. This node integrates both paid (Llama-3.2-11B-Vision) and free (Llama-Vision-Free) models, allowing users to get high-quality image descriptions directly within their ComfyUI workflows.

### Key Features
- 🎯 Easy-to-use image description node for ComfyUI
- 🔄 Support for both paid and free Together AI Vision models
- 🎚️ Advanced parameter controls (temperature, top_p, top_k, repetition_penalty)
- 🔑 Flexible API key management (via node input or .env file)
- 📝 Customizable system and user prompts
- 🛠️ Comprehensive error handling and logging

### Quick Start
1. Install the node in your ComfyUI custom_nodes directory
2. Add your Together AI API key
3. Connect any image output to the node
4. Get detailed, AI-generated descriptions of your images

Perfect for:
- Content creators needing image descriptions
- Accessibility enhancement
- AI art analysis
- Visual content documentation
- Creative writing inspiration

### Technical Stack
- Python
- Together AI Vision API
- ComfyUI Framework
- PyTorch for tensor handling

### Extended Capabilities

#### New Free Image Generation Node
- 🆓 Introducing a cost-effective Free Image Generation Node
- 🔄 Powered by Flux Schnell Model
- 💡 Enables free image processing and generation
- 🚀 Expand your ComfyUI workflows without additional costs

#### Additional Use Cases
- Cost-effective AI image generation
- Experimental image creation
- Prototype development
- Educational AI exploration

### Model Flexibility
Our node now offers enhanced flexibility with:
- Paid Vision Models: Llama-3.2-11B-Vision-Instruct-Turbo
- Free Vision Models: Llama-Vision-Free
- Image Generation: Flux Schnell Model

**Note**: Free model usage may have limitations compared to paid versions. Performance and output quality can vary.
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 theshubzworld

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
194 changes: 194 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
# ComfyUI-TogetherVision

A custom node for ComfyUI that enables image description using Together AI's Vision models. This node allows you to generate detailed descriptions of images using either the paid or free version of Together AI's Llama Vision models.

![Together Vision Node](images/node-screenshot.png)

## Features

🖼️ **Image Description & Text Generation**:
- Generate detailed descriptions of images using state-of-the-art vision models
- Toggle vision processing on/off for flexible usage
- Use as a text-only LLM when vision is disabled
- **New Free Image Generation Node**: Utilize free vision models for image analysis

🤖 **Multiple Models**:
- Paid Version: Llama-3.2-11B-Vision-Instruct-Turbo
- Free Version: Llama-Vision-Free

⚙️ **Customizable Parameters**:
- Temperature control
- Top P sampling
- Top K sampling
- Repetition penalty

🔑 **Flexible API Key Management**:
- Direct input in the node
- Environment variable through .env file

📝 **Custom Prompting**:
- System prompt customization
- User prompt customization

## Getting Started

### 1. Get Together AI API Key
1. Go to [Together AI API Settings](https://api.together.xyz/settings/api-keys)
2. Sign up or log in to your Together AI account
3. Click "Create API Key"
4. Copy your API key for later use

### 2. Installation

1. Clone this repository into your ComfyUI custom_nodes directory:
```bash
cd ComfyUI/custom_nodes
git clone https://github.com/thetheshubzworld/ComfyUI-TogetherVision.git
```

2. Restart ComfyUI - it will automatically install the required dependencies from requirements.txt

3. Set up your Together AI API key using one of these methods:
- Option 1: Create a `.env` file in the node directory:
```
TOGETHER_API_KEY=your_api_key_here
```
- Option 2: Input your API key directly in the node
## Usage
1. Add the "Together Vision 🔍" node to your workflow
2. Configure Vision Mode:
- Enable Vision (Default): Connect an image output to the node's image input
- Disable Vision: Skip image input for text-only generation
3. Select your preferred model (Paid or Free)
4. Configure the parameters:
- Temperature (0.0 - 2.0)
- Top P (0.0 - 1.0)
- Top K (1 - 100)
- Repetition Penalty (0.0 - 2.0)
5. Customize the prompts:
- System prompt: Sets the behavior of the AI
- User prompt: Specific instructions for image description or text generation
## Parameters
| Parameter | Description | Default | Range |
|-----------|-------------|---------|--------|
| Vision Enable | Toggles vision processing | True | True/False |
| Temperature | Controls randomness | 0.7 | 0.0 - 2.0 |
| Top P | Nucleus sampling | 0.7 | 0.0 - 1.0 |
| Top K | Top K sampling | 50 | 1 - 100 |
| Repetition Penalty | Prevents repetition | 1.0 | 0.0 - 2.0 |
## Image Resolution Limits
The node automatically handles high-resolution images:
- Images larger than 2048x2048 pixels will be automatically resized
- Aspect ratio is preserved during resizing
- High-quality LANCZOS resampling is used
For best results:
1. Keep image dimensions under 2048 pixels
2. Use ComfyUI's built-in resize nodes before this node
3. For very large images, consider splitting them into sections
## Rate Limits
### Free Model (Llama-Vision-Free)
- Limited to approximately 100 requests per day
- Rate limit resets every 24 hours
- Hourly limits may apply (typically 20-30 requests per hour)
### Paid Model (Llama-3.2-11B-Vision)
- Higher rate limits based on your Together AI subscription
- Better performance and reliability
- Priority API access
### Handling Rate Limits
When you hit a rate limit:
1. Wait for the specified time (usually 1 hour for hourly limits)
2. Switch to a different Together AI account
3. Upgrade to the paid model for higher limits
4. Consider batching your requests during off-peak hours
### Tips to Avoid Rate Limits
1. Cache results for repeated images
2. Use the paid model for production workloads
3. Monitor your API usage through Together AI dashboard
4. Space out your requests when possible
## Operating Modes
### Vision Mode (Default)
- Requires connected image input
- Generates detailed image descriptions
- Full vision + language capabilities
### Text-Only Mode
- No image input required
- Functions as a standard LLM
- Useful for text generation and chat
## Error Handling
The node includes comprehensive error handling and logging:
- API key validation
- Rate limit notifications
- Image processing errors
- API response errors
- Vision mode validation
## Examples
Here are some example prompts you can try:
1. Vision Mode - Detailed Description:
```
Describe this image in detail, including colors, objects, and composition.
```
2. Vision Mode - Technical Analysis:
```
Analyze this image from a technical perspective, including lighting, composition, and photographic techniques.
```
3. Text-Only Mode - Creative Writing:
```
Write a creative story about a magical forest.
```
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- Together AI for providing the Vision API
- ComfyUI community for the framework and support
## Support
If you encounter any issues or have questions:
1. Check the error logs in ComfyUI
2. Ensure your API key is valid
3. Check Together AI's service status
4. Open an issue on GitHub
---
**Note**: This node requires a Together AI account and API key. You can get one at [Together AI's website](https://together.ai).
**Updated README.md to reflect automatic mode switching based on image connection**
The node now automatically switches between Vision Mode and Text-Only Mode based on the presence of an image input connection. When an image is connected, the node will generate detailed image descriptions. When no image is connected, the node will function as a text generation model.
**Flexible Processing Modes**
- **Image + Text Mode**: When an image is connected, generates descriptions and responses about the image
- **Text-Only Mode**: When no image is connected, functions as a text generation model
- Seamlessly switches between modes based on input connections
Loading

0 comments on commit f6cb698

Please sign in to comment.