Docs / Guides / Deploying to Production

Deploying to Production

Production checklist, environment configuration, webhook security, observability, and operational best practices for Ploton integrations.

Production checklist

Before going live:

  • Switch from pk_test_ to pk_live_ API keys
  • Configure webhook signature verification
  • Set up a stable, HTTPS webhook endpoint (not ngrok)
  • Implement idempotent webhook handling
  • Add logging for task creation, webhook receipt, and errors
  • Configure alerting for task.failed events
  • Set up health monitoring for your webhook endpoint
  • Test the full flow end-to-end with a live key

Environment configuration

Required environment variables

# Your production API key
PLOTON_API_KEY=pk_live_your_production_key

# Webhook secret for signature verification
PLOTON_WEBHOOK_SECRET=whsec_your_webhook_secret

Optional environment variables

# Override the base URL (default: https://api.ploton.ai/v1)
# Useful for pointing to a staging environment
PLOTON_BASE_URL=https://api.ploton.ai/v1

Store these in your platform’s secrets manager — AWS Secrets Manager, Vault, Vercel env vars, Railway secrets, whatever you use. Don’t commit them to source control.

Webhook security

Always verify the X-Ploton-Signature header in production. Without verification, anyone who discovers your webhook URL can send fake events.

Ploton sends an X-Ploton-Signature header containing HMAC-SHA256(webhook_secret, raw_request_body) as a hex string. Your handler should:

  1. Read the raw request body (before JSON parsing)
  2. Compute the expected HMAC-SHA256 signature
  3. Compare using a constant-time function
  4. Reject with 401 if the signature doesn’t match

See Webhooks: Signature Verification for code examples in JavaScript, Python, PHP, and Rust.

Verify against the raw body bytes, not re-serialized JSON. If your framework parses JSON before your handler runs, the re-serialized body may differ from the original and the signature check will fail.

Webhook reliability

Respond fast

Return 200 within 30 seconds. If processing takes longer, acknowledge receipt immediately and handle the event in a background job or queue.

Handle duplicates

Ploton may deliver the same event more than once. Use task_id + event as a deduplication key. Store processed keys in your database or cache (Redis works well here) with a 48-hour TTL, and skip events you’ve already handled.

Have a fallback

If your webhook endpoint goes down, you can poll for task results:

curl https://api.ploton.ai/v1/tasks/task_8xK2mP \
  -H "Authorization: Bearer $PLOTON_API_KEY"

Consider running a periodic reconciliation job that polls for completed tasks and catches anything your webhooks missed.

Logging

What to log

  • Task creation: task ID, prompt summary (truncated), user ID, timestamp
  • Webhook receipt: event type, task ID, timestamp
  • Webhook processing: success/failure, duration
  • Errors: full error object from failed tasks

What not to log

  • Full API keys (log a masked version: pk_live_...xyz)
  • User OAuth tokens or credentials
  • Full task results if they contain PII (log a summary instead)

Example log format

Use structured JSON so you can query later. Each entry should include the event type, task ID, and timestamp:

{"event": "ploton.task.created", "task_id": "task_8xK2mP", "tool": "crm", "timestamp": "2025-06-15T14:22:00Z"}
{"event": "ploton.webhook.received", "webhook_event": "task.complete", "task_id": "task_8xK2mP", "timestamp": "2025-06-15T14:22:07Z"}
{"event": "ploton.task.failed", "task_id": "task_3nL7pR", "error_code": "service_unavailable", "recoverable": true, "timestamp": "2025-06-15T14:23:00Z"}

Monitoring and alerting

Metrics to track

MetricWhat it tells youAlert threshold
Task creation rateDemand on Ploton integrationSudden drop (integration may be broken)
Task failure rateReliability of your integration> 5% sustained
Webhook delivery latencyPloton-to-your-endpoint health> 10 seconds (p95)
Webhook processing errorsYour handler healthAny 5xx response to Ploton
auth_token_expired errorsUser re-auth neededSpike (may indicate OAuth scope issue)

Health check endpoint

Set up a health check that verifies your Ploton integration. Have it make a lightweight GET /v1/tasks?limit=1 call and confirm a 200 response:

curl https://api.ploton.ai/v1/tasks?limit=1 \
  -H "Authorization: Bearer $PLOTON_API_KEY"

If it returns 200, the integration is up. Return 503 from your health check if it’s unreachable or errors.

Error handling strategy

flowchart TD
    A["Task failed"] --> B{"recoverable?"}
    B -->|Yes| C["Retry with backoff"]
    B -->|No| D["Log & alert"]
    C --> E{"Max retries?"}
    E -->|No| F["Wait & retry"]
    E -->|Yes| D
    F --> A

    style A fill:#1a1630,stroke:#FF5F56,color:#e8e0f0
    style B fill:#1a1630,stroke:#FACC15,color:#e8e0f0
    style C fill:#1a1630,stroke:#FACC15,color:#e8e0f0
    style D fill:#1a1630,stroke:#FF5F56,color:#e8e0f0
    style E fill:#1a1630,stroke:#FACC15,color:#e8e0f0
    style F fill:#1a1630,stroke:#50FA7B,color:#e8e0f0

Recoverable errors

When a task fails with recoverable: true, you can retry. The common ones:

  • auth_token_expired — Prompt the user to re-authorize, then create a new task
  • service_unavailable — The third-party service is down. Retry after a delay.
  • rate_limited — Back off and retry

Non-recoverable errors

When recoverable: false, retrying won’t fix it. Something needs to change:

  • permission_denied — You need broader OAuth scopes
  • invalid_credentials — Credentials need updating
  • workflow_depth_exceeded — The prompt is too complex; simplify it or split into multiple tasks

Retry pattern

If task creation fails with a 429 or 5xx, retry with exponential backoff:

AttemptDelay
12 seconds
24 seconds
38 seconds

After 3 failures, surface the error. Check the Retry-After header on 429 responses — it tells you how long the server wants you to wait.

Testing in production

Use test keys (pk_test_) in staging to run the full integration path without triggering real side effects. Staging should mirror production — same webhook endpoints, same error handling, same logging.

When staging is clean, swap in pk_live_ for production. The API behaves identically; only the connected service execution changes.

Next steps