Hermes OpenCode Agent ile Modal.com'da Ücretsiz LLM Model Deploy

Hermes OpenCode Agent ile Modal.com’da Ücretsiz LLM Deploy

Bu rehberde, Hermes OpenCode Agent kullanarak Modal.com platformunda ücretsiz T4 GPU ile LLM model nasıl deploy edeceğinizi göstereceğim.

Modal.com Nedir?

Modal.com, GPU üzerinde Python kodlarınızı çalıştırabileceğiniz bir serverless platform. Özellikle ML/AI iş yükleri için optimize edilmiş.

Özellik	Değer
GPU	T4 (Ücretsiz Tier)
RAM	16GB
Storage	256GB
Fiyat	40 saat/ay ücretsiz

Kurulum

pip install modal
modal setup

2. API Key Alımı

modal.com adresinden hesap oluşturun ve API key alın.

Hermes OpenCode Agent Deployment Script

#!/usr/bin/env python3
"""
Hermes OpenCode Agent - Modal.com Deployment
"""

import modal

app = modal.App("hermes-opencode-agent")

# Custom container image
image = modal.Image.debian_slim(python_version="3.11").pip_install([
    "fastapi==0.110.3",
    "uvicorn==0.29.0",
    "pydantic==2.10.0",
    "requests==2.32.3",
    "transformers==4.45.0",
    "torch==2.3.0",
    "sentencepiece==0.2.0",
    "protobuf==3.20.3",
    "accelerate==1.1.0"
])

@app.function(
    image=image,
    gpu="T4",  # Free tier GPU
    timeout=3600
)
def serve_llm():
    """Serve Phi-3 mini LLM with FastAPI"""
    import fastapi
    from fastapi import FastAPI
    from pydantic import BaseModel
    import torch
    from transformers import AutoTokenizer, AutoModelForCausalLM

    # Phi-3 mini model
    MODEL_NAME = "microsoft/Phi-3-mini-4k-instruct"

    print("Loading Phi-3 mini model...")
    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

    # Add padding token if not exists
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token

    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        torch_dtype=torch.float16,
        device_map="auto",
        trust_remote_code=True
    )

    print(f"Model loaded on {model.device}!")

    # FastAPI app
    app = FastAPI(title="Hermes OpenCode LLM Agent")

    class CodeRequest(BaseModel):
        prompt: str
        max_tokens: int = 256

    class CodeResponse(BaseModel):
        code: str
        model: str
        tokens: int

    @app.get("/")
    async def root():
        return {
            "message": "Hermes OpenCode LLM Agent",
            "model": MODEL_NAME,
            "status": "ready",
            "gpu": "T4"
        }

    @app.post("/generate")
    async def generate_code(request: CodeRequest):
        """Generate code with Phi-3 model"""
        try:
            # Prepare prompt
            prompt = f"""<|system|>
You are a helpful AI assistant that generates high-quality code. Always respond with complete, runnable code blocks.
<|end|>
<|user|>
{request.prompt}
<|end|>
<|assistant|>
"""

            # Tokenize
            inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True).to(model.device)

            # Generate
            with torch.no_grad():
                outputs = model.generate(
                    **inputs,
                    max_new_tokens=request.max_tokens,
                    temperature=0.7,
                    do_sample=True,
                    pad_token_id=tokenizer.eos_token_id,
                    eos_token_id=tokenizer.eos_token_id
                )

            # Decode
            generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

            # Extract assistant response
            assistant_response = generated_text.split("<|assistant|>")[-1].strip()

            return CodeResponse(
                code=assistant_response,
                model=MODEL_NAME,
                tokens=len(outputs[0])
            )

        except Exception as e:
            return {"error": str(e)}

    @app.get("/health")
    async def health():
        return {
            "status": "healthy",
            "model": MODEL_NAME,
            "device": str(model.device),
            "dtype": str(model.dtype)
        }

    return app

@app.local_entrypoint()
def deploy():
    """Deploy the LLM agent"""
    print("🚀 Deploying Hermes OpenCode LLM Agent to Modal.com...")
    print(f"📦 Model: microsoft/Phi-3-mini-4k-instruct")
    print(f"🎯 GPU: T4 (Free tier)")
    print("⏳ This may take a few minutes...")

    # Test the function
    result = serve_llm.remote()
    print(f"✅ Deployment successful!")
    print(f"🔗 App URL: https://your-username--hermes-opencode-agent.modal.run")
    print(f"📊 Health endpoint: /health")
    print(f"⚡ Generate endpoint: /generate")

if __name__ == "__main__":
    deploy()

Deployment

Script’i çalıştırın:

modal run hermes_modal_deploy.py

API Endpoints

Endpoint	Method	Açıklama
`/`	GET	Status sayfası
`/health`	GET	Sistem sağlık durumu
`/generate`	POST	Kod generation

Örnek Kullanım

# Health check
curl https://your-username--hermes-opencode-agent.modal.run/health

# Code generation
curl -X POST https://your-username--hermes-opencode-agent.modal.run/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Python ile Fibonacci serisi oluştur",
    "max_tokens": 256
  }'

Maliyet Analizi

Kaynak	Ücret	Limit
GPU Saati	$0	40 saat/ay
Bandwidth	$0	10GB/ay
Storage	$0	256GB

Toplam Maliyet: $0 🎉

Avantajlar

Ücretsiz: T4 GPU 40 saat/ay bedava
Hızlı: NVIDIA GPU’lar ile yüksek performans
Kolay: Python script ile tek komut deploy
Scalable: Otomatik scaling
Open Source: Açık kaynak model desteği

Sonuç

Modal.com + Hermes OpenCode Agent kombinasyonu:

✅ Ücretsiz GPU erişimi
✅ Kolay deployment
✅ Production-ready API
✅ Açık kaynak model desteği
✅ Otomatik scaling

Tüm AI projeleriniz için mükemmel bir başlangıç noktası! 🚀

Hermes OpenCode Agent ile Modal.com'da Ücretsiz LLM Model Deploy

Hermes OpenCode Agent ile Modal.com’da Ücretsiz LLM Deploy

Modal.com Nedir?

Kurulum

2. API Key Alımı

Hermes OpenCode Agent Deployment Script

Deployment

API Endpoints

Örnek Kullanım

Maliyet Analizi

Avantajlar

Sonuç

Kaynaklar

Related Posts

Groq API: 2026’da Ücretsiz LLM Kullanımının En Hızlı Yolu

DeepSeek V4 API: Flash ve Pro Modelleri ile Ultra Hızlı AI Entegrasyonu

Hermes + Neo4j: AI Agent için Semantik Memory Sistemi

Hermes ve OpenClaw: NVIDIA NIM API ile DeepSeek V3.1 Entegrasyonu

Hermes OpenCode Agent ile Modal.com’da Ücretsiz LLM Deploy

Modal.com Nedir?

Kurulum

1. Modal CLI Kurulumu

2. API Key Alımı

Hermes OpenCode Agent Deployment Script

Deployment

API Endpoints

Örnek Kullanım

Maliyet Analizi

Avantajlar

Sonuç

Kaynaklar

Related Posts

Groq API: 2026’da Ücretsiz LLM Kullanımının En Hızlı Yolu

DeepSeek V4 API: Flash ve Pro Modelleri ile Ultra Hızlı AI Entegrasyonu

Hermes + Neo4j: AI Agent için Semantik Memory Sistemi

Hermes ve OpenClaw: NVIDIA NIM API ile DeepSeek V3.1 Entegrasyonu