Deployment Guide

Complete guide for deploying AI/ML Framework models and APIs to production environments.

πŸš€ Quick Deployment Options

Local Deployment

bash
# Generate API
python -c "
from ai_ml_framework.api import APIGenerator
api = APIGenerator('model.pkl')
api.generate_main_script('api_main.py')
api.generate_requirements('api_requirements.txt')
"

# Run locally
pip install -r api_requirements.txt
python api_main.py

Docker Deployment

bash
# Build Docker image
docker build -t ml-api .

# Run container
docker run -p 8000:8000 ml-api

# Test API
curl http://localhost:8000/health

Cloud Deployment

bash
# AWS ECS
python -m ai_ml_framework.api.deployment --platform aws

# Google Cloud Run
python -m ai_ml_framework.api.deployment --platform gcp

# Azure Container Instances
python -m ai_ml_framework.api.deployment --platform azure

🐳 Docker Deployment

Creating Dockerfile

The framework automatically generates a Dockerfile, but you can customize it:

dockerfile
FROM python:3.9-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements first for better caching
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Create non-root user
RUN useradd --create-home --shell /bin/bash app \
    && chown -R app:app /app
USER app

# Expose port
EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

# Run the application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Multi-stage Dockerfile

dockerfile
# Build stage
FROM python:3.9 as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# Production stage
FROM python:3.9-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .
ENV PATH=/root/.local/bin:$PATH
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Docker Compose

yaml
version: '3.8'

services:
  api:
    build: .
    ports:
      - "8000:8000"
    environment:
      - ENVIRONMENT=production
      - API_KEY=${API_KEY}
    deploy:
      replicas: 2
      resources:
        limits:
          cpus: '1.0'
          memory: 1G
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  redis:
    image: redis:alpine
    ports:
      - "6379:6379"
    restart: unless-stopped

  monitoring:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  dashboard:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana-storage:/var/lib/grafana

volumes:
  grafana-storage:

☸️ Kubernetes Deployment

Basic Deployment

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-api
  labels:
    app: ml-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ml-api
  template:
    metadata:
      labels:
        app: ml-api
    spec:
      containers:
      - name: ml-api
        image: ml-api:latest
        ports:
        - containerPort: 8000
        env:
        - name: ENVIRONMENT
          value: "production"
        - name: API_KEY
          valueFrom:
            secretKeyRef:
              name: api-secrets
              key: api-key
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: ml-api-service
spec:
  selector:
    app: ml-api
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8000
  type: LoadBalancer

Ingress Configuration

yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ml-api-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  tls:
  - hosts:
    - api.yourdomain.com
    secretName: ml-api-tls
  rules:
  - host: api.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: ml-api-service
            port:
              number: 80

Horizontal Pod Autoscaler

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ml-api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ml-api
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

☁️ Cloud Deployment

AWS Deployment

ECS Task Definition

json
{
  "family": "ml-api-task",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "512",
  "memory": "1024",
  "executionRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskExecutionRole",
  "containerDefinitions": [
    {
      "name": "ml-api",
      "image": "your-registry/ml-api:latest",
      "portMappings": [
        {
          "containerPort": 8000,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "ENVIRONMENT",
          "value": "production"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/ml-api",
          "awslogs-region": "us-west-2",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ]
}

CloudFormation Template

yaml
AWSTemplateFormatVersion: '2010-09-01'
Description: 'ECS deployment for ML API'

Resources:
  ECSCluster:
    Type: AWS::ECS::Cluster
    Properties:
      ClusterName: ml-api-cluster
      
  TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      Family: ml-api-task
      Cpu: 512
      Memory: 1024
      NetworkMode: awsvpc
      RequiresCompatibilities:
        - FARGATE
      ExecutionRoleArn: !Ref ExecutionRole
      
  Service:
    Type: AWS::ECS::Service
    Properties:
      Cluster: !Ref ECSCluster
      TaskDefinition: !Ref TaskDefinition
      DesiredCount: 2
      LaunchType: FARGATE
      NetworkConfiguration:
        AwsvpcConfiguration:
          AssignPublicIp: ENABLED
          SecurityGroupIds:
            - !Ref SecurityGroup
          Subnets:
            - !Ref Subnet1
            - !Ref Subnet2

Google Cloud Deployment

Cloud Build Configuration

yaml
steps:
- name: 'gcr.io/cloud-builders/docker'
  args: ['build', '-t', 'gcr.io/$PROJECT_ID/ml-api', '.']
  
- name: 'gcr.io/cloud-builders/docker'
  args: ['push', 'gcr.io/$PROJECT_ID/ml-api']
  
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
  entrypoint: 'gcloud'
  args:
  - 'run'
  - 'deploy'
  - 'ml-api'
  - '--image=gcr.io/$PROJECT_ID/ml-api'
  - '--region=us-central1'
  - '--platform=managed'
  - '--allow-unauthenticated'
  - '--memory=1Gi'
  - '--cpu=1'
  - '--max-instances=10'

Kubernetes YAML

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-api-gcp
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-api
  template:
    metadata:
      labels:
        app: ml-api
    spec:
      containers:
      - name: ml-api
        image: gcr.io/PROJECT_ID/ml-api
        ports:
        - containerPort: 8000
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

Azure Deployment

ARM Template

json
{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "containerName": {
      "type": "string",
      "defaultValue": "ml-api"
    }
  },
  "resources": [
    {
      "type": "Microsoft.ContainerInstance/containerGroups",
      "apiVersion": "2021-09-01",
      "name": "[parameters('containerName')]",
      "location": "[resourceGroup().location]",
      "properties": {
        "containers": [
          {
            "name": "[parameters('containerName')]",
            "properties": {
              "image": "ml-api:latest",
              "resources": {
                "requests": {
                  "cpu": 1.0,
                  "memoryInGB": 1.0
                }
              },
              "ports": [
                {
                  "port": 8000,
                  "protocol": "TCP"
                }
              ]
            }
          }
        ],
        "osType": "Linux",
        "ipAddress": {
          "type": "Public",
          "ports": [
            {
              "port": 8000,
              "protocol": "TCP"
            }
          ]
        }
      }
    }
  ]
}

πŸ”§ Configuration Management

Environment Variables

bash
# Production environment
export ENVIRONMENT=production
export API_HOST=0.0.0.0
export API_PORT=8000
export API_WORKERS=4
export LOG_LEVEL=INFO

# Database
export DATABASE_URL=postgresql://user:pass@host:5432/dbname

# Redis
export REDIS_URL=redis://localhost:6379/0

# Monitoring
export PROMETHEUS_ENABLED=true
export GRAFANA_ENABLED=true

# Security
export API_KEY=your-secure-api-key
export JWT_SECRET=your-jwt-secret

# MLflow
export MLFLOW_TRACKING_URI=http://mlflow:5000

Configuration Files

yaml
# config/production.yaml
framework:
  log_level: INFO
  environment: production
  
api:
  host: 0.0.0.0
  port: 8000
  workers: 4
  reload: false
  
database:
  url: ${DATABASE_URL}
  pool_size: 20
  max_overflow: 30
  
redis:
  url: ${REDIS_URL}
  max_connections: 100
  
monitoring:
  prometheus:
    enabled: true
    port: 9090
  grafana:
    enabled: true
    port: 3000
    
security:
  api_key: ${API_KEY}
  jwt_secret: ${JWT_SECRET}
  rate_limiting:
    requests_per_minute: 100

πŸ“Š Monitoring and Logging

Prometheus Configuration

yaml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'ml-api'
    static_configs:
      - targets: ['ml-api:8000']
    metrics_path: /metrics
    scrape_interval: 5s
    
  - job_name: 'redis'
    static_configs:
      - targets: ['redis:6379']
      
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

Grafana Dashboard

json
{
  "dashboard": {
    "title": "ML API Dashboard",
    "panels": [
      {
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_requests_total[5m])",
            "legendFormat": "{{method}} {{endpoint}}"
          }
        ]
      },
      {
        "title": "Response Time",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "95th percentile"
          }
        ]
      },
      {
        "title": "Error Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_requests_total{status=~\"5..\"}[5m]) / rate(http_requests_total[5m])",
            "legendFormat": "Error Rate"
          }
        ]
      }
    ]
  }
}

Structured Logging

python
import logging
import structlog

# Configure structured logging
structlog.configure(
    processors=[
        structlog.stdlib.filter_by_level,
        structlog.stdlib.add_logger_name,
        structlog.stdlib.add_log_level,
        structlog.stdlib.PositionalArgumentsFormatter(),
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.StackInfoRenderer(),
        structlog.processors.format_exc_info,
        structlog.processors.UnicodeDecoder(),
        structlog.processors.JSONRenderer()
    ],
    context_class=dict,
    logger_factory=structlog.stdlib.LoggerFactory(),
    wrapper_class=structlog.stdlib.BoundLogger,
    cache_logger_on_first_use=True,
)

# Use in API
logger = structlog.get_logger()
logger.info("Request received", request_id=req_id, method=method, path=path)

πŸ”’ Security Best Practices

API Security

python
from ai_ml_framework.api import APIGenerator

# Add security features
api_generator = APIGenerator('model.pkl')

# Authentication
api_generator.add_authentication(api_key='secure-key')

# Rate limiting
api_generator.enable_rate_limiting(requests_per_minute=100)

# CORS configuration
app = api_generator.generate_api()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://yourdomain.com"],
    allow_credentials=True,
    allow_methods=["GET", "POST"],
    allow_headers=["*"]
)

Secrets Management

yaml
# Kubernetes secrets
apiVersion: v1
kind: Secret
metadata:
  name: api-secrets
type: Opaque
data:
  api-key: 
  jwt-secret: 
  database-url: 

Network Policies

yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ml-api-netpol
spec:
  podSelector:
    matchLabels:
      app: ml-api
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: redis

πŸ”„ CI/CD Pipeline

GitHub Actions

yaml
name: Deploy ML API

on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: 3.9
    - name: Install dependencies
      run: |
        pip install -r requirements.txt
        pip install pytest
    - name: Run tests
      run: pytest

  build:
    needs: test
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Build Docker image
      run: |
        docker build -t ml-api:${{ github.sha }} .
        docker tag ml-api:${{ github.sha }} ml-api:latest
    - name: Push to registry
      run: |
        echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
        docker push ml-api:${{ github.sha }}
        docker push ml-api:latest

  deploy:
    needs: build
    runs-on: ubuntu-latest
    steps:
    - name: Deploy to Kubernetes
      run: |
        echo ${{ secrets.KUBECONFIG }} | base64 -d > kubeconfig
        export KUBECONFIG=kubeconfig
        kubectl set image deployment/ml-api ml-api=ml-api:${{ github.sha }}
        kubectl rollout status deployment/ml-api

GitLab CI/CD

yaml
stages:
  - test
  - build
  - deploy

test:
  stage: test
  script:
    - pip install -r requirements.txt
    - pytest

build:
  stage: build
  script:
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA

deploy:
  stage: deploy
  script:
    - kubectl set image deployment/ml-api ml-api=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - kubectl rollout status deployment/ml-api
  only:
    - main

⚑ Performance Optimization

Load Testing

python
# Use locust for load testing
from locust import HttpUser, task, between

class MLAPIUser(HttpUser):
    wait_time = between(1, 3)
    
    @task
    def health_check(self):
        self.client.get("/health")
    
    @task
    def predict(self):
        data = {
            "data": {
                "feature_1": 1.0,
                "feature_2": 2.0
            }
        }
        self.client.post("/predict", json=data)

# Run with: locust -f load_test.py --host=http://localhost:8000

Caching Strategy

python
from functools import lru_cache
import redis

# Redis caching
redis_client = redis.Redis(host='redis', port=6379, db=0)

@lru_cache(maxsize=1000)
def cached_prediction(input_hash):
    # Check cache first
    cached_result = redis_client.get(f"pred:{input_hash}")
    if cached_result:
        return json.loads(cached_result)
    
    # Make prediction
    result = model.predict(input_data)
    
    # Cache result
    redis_client.setex(f"pred:{input_hash}", 3600, json.dumps(result))
    return result

Performance Tuning

python
# Optimize API performance
import uvicorn

# Use multiple workers
uvicorn.run(app, host="0.0.0.0", port=8000, workers=4)

# Enable async
from fastapi import FastAPI
app = FastAPI()

@app.post("/predict")
async def predict(data: PredictionRequest):
    # Async prediction logic
    result = await async_predict(data)
    return result