UtkarshTheDev
diff --git a/‎.github/workflows/publish.yml‎
Lines changed: 3 additions & 0 deletions b/‎.github/workflows/publish.yml‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 27 additions & 51 deletions b/‎CHANGELOG.md‎
Lines changed: 27 additions & 51 deletions
diff --git a/‎README.md‎
Lines changed: 14 additions & 1 deletion b/‎README.md‎
Lines changed: 14 additions & 1 deletion
diff --git a/‎docs/guides/cli.md‎
Lines changed: 6 additions & 3 deletions b/‎docs/guides/cli.md‎
Lines changed: 6 additions & 3 deletions
diff --git a/‎docs/guides/examples.md‎
Lines changed: 32 additions & 19 deletions b/‎docs/guides/examples.md‎
Lines changed: 32 additions & 19 deletions
diff --git a/‎locallab/cli/interactive.py‎
Lines changed: 13 additions & 1 deletion b/‎locallab/cli/interactive.py‎
Lines changed: 13 additions & 1 deletion
diff --git a/‎locallab/config.py‎
Lines changed: 6 additions & 0 deletions b/‎locallab/config.py‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎locallab/model_manager.py‎
Lines changed: 3 additions & 0 deletions b/‎locallab/model_manager.py‎
Lines changed: 3 additions & 0 deletions
@@ -12,6 +12,9 @@ on:
       - '.github/workflows/publish.yml'
       - 'CHANGELOG.md'
 
+permissions:
+  contents: write
+
 jobs:
   build-and-publish:
     runs-on: ubuntu-latest
 
@@ -13,18 +13,23 @@ graph LR
 ```
 
 ### The Server (Your AI Engine)
+
 Think of the server as your personal AI engine. It:
+
 - Downloads and runs AI models on your computer
 - Manages memory and resources automatically
 - Optimizes performance based on your hardware
 - Provides a simple API for accessing models
 
 You can run it:
+
 - On your computer (local mode)
 - On Google Colab (free GPU mode)
 
 ### The Client (Your AI Controller)
+
 The client is how your code talks to the AI. It:
+
 - Connects to your LocalLab server
 - Sends requests for text generation
 - Handles chat conversations
@@ -36,15 +41,18 @@ The client is how your code talks to the AI. It:
 When you use LocalLab:
 
 1. **Server Setup**
+
    ```python
    from locallab import start_server
    start_server()  # Server starts and loads AI model
    ```
 
 2. **Client Connection**
+
    ```python
    from locallab.client import LocalLabClient
-   client = LocalLabClient("http://localhost:8000")
+   server_url = "http://localhost:8000" # or "https://your-ngrok-url.ngrok.app"
+   client = LocalLabClient(server_url)
    ```
 
 3. **AI Interaction**
@@ -77,11 +85,13 @@ responses = await client.batch_generate([
 ## 💻 Requirements
 
 **Local Computer:**
+
 - Python 3.8+
 - 4GB RAM minimum
 - GPU optional (but recommended)
 
 **Google Colab:**
+
 - Just a Google account!
 - Free tier works fine
 
@@ -90,18 +100,21 @@ responses = await client.batch_generate([
 ### 1. Choose Your Path
 
 **New to AI/Programming?**
+
 1. Start with our [Getting Started Guide](./docs/guides/getting-started.md)
 2. Try the [Basic Examples](./docs/guides/examples.md)
 3. Join our [Community](https://github.com/UtkarshTheDev/LocalLab/discussions)
 
 **Developer?**
+
 1. Check [API Reference](./docs/guides/api.md)
 2. See [Client Libraries](./docs/clients/README.md)
 3. Read [Advanced Features](./docs/guides/advanced.md)
 
 ### 2. Read the Docs
 
 Our [Documentation Guide](./docs/README.md) will help you:
+
 - Understand LocalLab's features
 - Learn best practices
 - Find solutions to common issues
 
@@ -151,10 +151,9 @@ locallab info
 
 ## Environment Variables
 
-The CLI respects environment variables that you've already set. If an environment variable is set, the CLI won't prompt for that setting unless you explicitly run the configuration wizard.
-
 Key environment variables:
 
+- `HUGGINGFACE_TOKEN`: HuggingFace API token for accessing models (optional)
 - `HUGGINGFACE_MODEL`: Model to load
 - `NGROK_AUTH_TOKEN`: Ngrok authentication token
 - `LOCALLAB_ENABLE_QUANTIZATION`: Enable/disable quantization
@@ -166,7 +165,11 @@ Key environment variables:
 
 ## Configuration Storage
 
-The CLI stores your configuration in `~/.locallab/config.json` for future use. This means you don't have to re-enter your settings each time you run LocalLab.
+The CLI stores your configuration in `~/.locallab/config.json` for future use. This includes:
+- HuggingFace token (if provided)
+- Model settings
+- Server configuration
+- Optimization settings
 
 To view your stored configuration:
 
 
@@ -3,6 +3,7 @@
 This guide provides practical examples of using LocalLab in your projects. Each example includes code snippets and explanations.
 
 ## 📚 Table of Contents
+
 - [Basic Usage](#basic-usage)
 - [Text Generation](#text-generation)
 - [Chat Completion](#chat-completion)
@@ -21,15 +22,15 @@ from locallab.client import LocalLabClient
 
 async def main():
     # Initialize client
-    client = LocalLabClient("http://localhost:8000")
-    
+    client = LocalLabClient("http://localhost:8000") # or "https://your-ngrok-url.ngrok.app"
+
     try:
         # Check if server is healthy
         is_healthy = await client.health_check()
         print(f"Server status: {'Ready' if is_healthy else 'Not Ready'}")
-        
+
         # Your code here...
-        
+
     finally:
         # Always close the client when done
         await client.close()
@@ -41,11 +42,12 @@ asyncio.run(main())
 ## Text Generation
 
 ### Simple Generation
+
 Generate text with default settings:
 
 ```python
 async def generate_text():
-    client = LocalLabClient("http://localhost:8000")
+    client = LocalLabClient("http://localhost:8000") # or "https://your-ngrok-url.ngrok.app"
     try:
         response = await client.generate(
             "Write a short story about a robot"
@@ -56,6 +58,7 @@ async def generate_text():
 ```
 
 ### Custom Parameters
+
 Control the generation with parameters:
 
 ```python
@@ -70,11 +73,12 @@ response = await client.generate(
 ## Chat Completion
 
 ### Basic Chat
+
 Have a simple conversation:
 
 ```python
 async def chat_example():
-    client = LocalLabClient("http://localhost:8000")
+    client = LocalLabClient("http://localhost:8000") # or "https://your-ngrok-url.ngrok.app"
     try:
         response = await client.chat([
             {"role": "system", "content": "You are a helpful assistant."},
@@ -86,6 +90,7 @@ async def chat_example():
 ```
 
 ### Multi-turn Conversation
+
 Maintain a conversation thread:
 
 ```python
@@ -101,11 +106,12 @@ response = await client.chat(messages)
 ## Streaming Responses
 
 ### Stream Text Generation
+
 Get responses token by token:
 
 ```python
 async def stream_example():
-    client = LocalLabClient("http://localhost:8000")
+    client = LocalLabClient("http://localhost:8000") # or "https://your-ngrok-url.ngrok.app"
     try:
         print("Generating story: ", end="", flush=True)
         async for token in client.stream_generate("Once upon a time"):
@@ -116,6 +122,7 @@ async def stream_example():
 ```
 
 ### Stream Chat
+
 Stream chat responses:
 
 ```python
@@ -129,6 +136,7 @@ async def stream_chat():
 ## Batch Processing
 
 ### Process Multiple Prompts
+
 Generate responses for multiple prompts efficiently:
 
 ```python
@@ -138,9 +146,9 @@ async def batch_example():
         "Tell a joke",
         "Give a fun fact"
     ]
-    
+
     responses = await client.batch_generate(prompts)
-    
+
     for prompt, response in zip(prompts, responses["responses"]):
         print(f"\nPrompt: {prompt}")
         print(f"Response: {response}")
@@ -149,54 +157,56 @@ async def batch_example():
 ## Model Management
 
 ### Load Different Models
+
 Switch between different models:
 
 ```python
 async def model_management():
-    client = LocalLabClient("http://localhost:8000")
+    client = LocalLabClient("http://localhost:8000") # or "https://your-ngrok-url.ngrok.app"
     try:
         # List available models
         models = await client.list_models()
         print("Available models:", models)
-        
+
         # Load a specific model
         await client.load_model("microsoft/phi-2")
-        
+
         # Get current model info
         model_info = await client.get_current_model()
         print("Current model:", model_info)
-        
+
         # Generate with loaded model
         response = await client.generate("Hello!")
         print(response)
-        
+
     finally:
         await client.close()
 ```
 
 ## Error Handling
 
 ### Handle Common Errors
+
 Properly handle potential errors:
 
 ```python
 async def error_handling():
     try:
         # Try to connect
-        client = LocalLabClient("http://localhost:8000")
-        
+        client = LocalLabClient("http://localhost:8000") # or "https://your-ngrok-url.ngrok.app"
+
         # Check server health
         if not await client.health_check():
             print("Server is not responding")
             return
-            
+
         # Try generation
         try:
             response = await client.generate("Hello!")
             print(response)
         except Exception as e:
             print(f"Generation failed: {str(e)}")
-            
+
     except ConnectionError:
         print("Could not connect to server")
     except Exception as e:
@@ -208,6 +218,7 @@ async def error_handling():
 ## Best Practices
 
 1. **Always Close the Client**
+
    ```python
    try:
        # Your code here
@@ -216,13 +227,15 @@ async def error_handling():
    ```
 
 2. **Check Server Health**
+
    ```python
    if not await client.health_check():
        print("Server not ready")
        return
    ```
 
 3. **Use Proper Error Handling**
+
    ```python
    try:
        response = await client.generate(prompt)
@@ -245,4 +258,4 @@ async def error_handling():
 
 ---
 
-Need more examples? Check our [Community Examples](https://github.com/UtkarshTheDev/LocalLab/discussions/categories/show-and-tell) or ask in our [Discussion Forum](https://github.com/UtkarshTheDev/LocalLab/discussions).
+Need more examples? Check our [Community Examples](https://github.com/UtkarshTheDev/LocalLab/discussions/categories/show-and-tell) or ask in our [Discussion Forum](https://github.com/UtkarshTheDev/LocalLab/discussions).
@@ -256,5 +256,17 @@ def prompt_for_config(use_ngrok: bool = None, port: int = None, ngrok_auth_token
             os.environ["LOCALLAB_LOG_FILE"] = log_file
             config["log_file"] = log_file
 
+    # Ask about HuggingFace token
+    hf_token = config.get("huggingface_token") or os.environ.get("HUGGINGFACE_TOKEN")
+    if not hf_token or force_reconfigure:
+        hf_token = click.prompt(
+            "🔑 Enter your HuggingFace token (optional, press Enter to skip)",
+            default="",
+            hide_input=True
+        )
+        if hf_token:
+            os.environ["HUGGINGFACE_TOKEN"] = hf_token
+            config["huggingface_token"] = hf_token
+    
     click.echo("\n✅ Configuration complete!\n")
-    return config 
+    return config
@@ -6,6 +6,7 @@
 from huggingface_hub import model_info, HfApi
 import logging
 from pathlib import Path
+import click
 
 
 def get_env_var(key: str, *, default: Any = None, var_type: Type = str) -> Any:
@@ -512,3 +513,8 @@ def reset_instructions(self, model_id: Optional[str] = None):
 
 # Initialize system instructions
 system_instructions = SystemInstructions()
+
+
+
+
+
@@ -254,6 +254,9 @@ async def load_model(self, model_id: str) -> bool:
         try:
             start_time = time.time()
             logger.info(f"\n{Fore.CYAN}Loading model: {model_id}{Style.RESET_ALL}")
+        
+        from .config import get_hf_token
+        hf_token = get_hf_token(interactive=True)
 
             if self.model is not None:
                 prev_model = self.current_model