Architecture & Key Concepts

System Overview

GoCloudera is a three-component system: a lightweight agent deployed in your infrastructure, a multi-tenant backend API, and a dashboard frontend.

Key Concepts

Multi-Tenancy

Every piece of data in GoCloudera is scoped to a tenant. The tenantIsolation middleware automatically filters all database queries by tenant_id. Tenants cannot see each other’s data. Each tenant gets their own API key, notification channels, enforcement policies, alert rules, and customization settings.

The Three-Layer Optimization Engine

GoCloudera uses three complementary engines, orchestrated every 5 minutes: Layer 1 — PolicyEngine (Proactive) Evaluates enforcement policies against current metrics. Supports composite AND/OR conditions with nesting, range operators, schedule-aware enforcement, maintenance window suppression, and cooldown periods. When conditions are met, it executes actions (stop, scale down, resize) or queues them for approval. Layer 2 — AnomalyDetectionService (Contextual) Runs three independent statistical methods on every metric time series: Z-score analysis (> 3 standard deviations), IQR method (outside 1.5x interquartile range), and rate-of-change detection (delta exceeds 3 sigma of historical deltas). An anomaly is confirmed when 2 of 3 methods agree (composite confidence >= 0.5). Uses day-of-week seasonal baselines to avoid false positives on predictable patterns. Layer 3 — RemediationEngine (Reactive) Responds to active alerts and incidents with automated remediation strategies: cost overrun mitigation, performance degradation response, security incident isolation, resource exhaustion cleanup, and network connectivity failover.

Agent Communication

The agent communicates with the backend in two modes: HTTP mode — Agent POSTs data to /api/sync every 5 minutes. Polls /api/actions every 15 seconds for pending commands. Simple, works through firewalls, no persistent connection needed. gRPC mode — Bidirectional streaming on port 50051. Agent pushes data continuously and receives commands instantly (no polling delay). Uses a bounded write queue (max 1000 messages) with exponential backoff on reconnect. Falls back to HTTP automatically if the stream drops. You can run both simultaneously (COMM_MODE=both) for redundancy.

Action Queue & Approval Workflow

When the PolicyEngine, RemediationEngine, or a user triggers an action (stop, start, resize, restart, terminate), it enters the ActionQueue: Actions can be executed two ways:

Platform execution — Backend assumes your cloud IAM role via STS (AWS) or service credentials (Azure/GCP) and executes directly.
Agent execution — Backend pushes the command to the agent via gRPC or HTTP polling; the agent executes in your VPC.

Cost Data Sources

GoCloudera collects real cost data from cloud billing APIs, not estimates:

AWS — Cost Explorer API (ce:GetCostAndUsage), Spot Price History, Reserved Instance utilization, Savings Plans coverage
Azure — Retail Prices API (https://prices.azure.com/api/retail/prices), Consumption API for actual usage, cost-by-tag grouping via Consumption API
GCP — Cloud Billing Catalog API for list prices, BigQuery billing export for actual costs, label-based cost allocation via BigQuery

When billing APIs aren’t configured, the agent falls back to hourly rate estimation based on instance type pricing.

LLM Spend Tracking

The AI Spend module tracks costs from LLM providers (OpenAI, Anthropic, AWS Bedrock, Azure OpenAI) at the token level. It records input tokens, output tokens, model name, provider, latency, and maps costs to AI workloads (training, inference, fine-tuning, embedding). Unit economics are calculated automatically: cost per 1K tokens, cost per inference request, cost per training run. Budget tracking per workload shows burn rate and projected overage.

Alert Rules vs Enforcement Policies

These are distinct systems that serve different purposes: Alert Rules are monitoring thresholds that notify humans. “Tell me when GPU utilization exceeds 95% for 10 minutes.” They support per-metric thresholds, duration conditions, scope filtering (all instances, by tag, or specific instances), and route to notification channels. Enforcement Policies are automated actions that respond to conditions. “When daily cost exceeds $1000, scale down the lowest-utilized 50% of instances.” They support composite AND/OR logic, budget-aware metrics, schedule constraints, maintenance window suppression, and three execution modes (auto, approval, notify-only). You can use alert rules to monitor and enforcement policies to act, or use enforcement policies in notify-only mode for both.

Anomaly Detection Audit Trail

Every anomaly detection run writes a row to the analysis_audit_log table with: baseline statistics used (mean, stddev, Q1, Q3, IQR), whether the baseline was day-specific, current metric value, each method’s raw score and trigger status, the composite confidence, and whether it was classified as an anomaly. This enables debugging false positives, tuning thresholds, and regulatory audit compliance.

Inference Feedback Loop

When the platform generates a recommendation (resize suggestion, anomaly alert, cost optimization), users can mark it as accepted, rejected, or modified. This feedback is stored in the inference_feedback table with the user’s reason and the outcome metrics (cost/utilization before and after). Over time, this creates a labeled dataset for training ML models to improve recommendation quality.

Data Model

Core entities and their relationships:

Infrastructure

Backend: Node.js (Express) on AWS App Runner
Database: PostgreSQL on AWS RDS
Cache/Events: Redis on AWS ElastiCache
Agent: Python, deployed via Docker/systemd/K8s in customer infrastructure
CI/CD: GitHub Actions → ECR → App Runner with PostgreSQL service containers for testing
Auth: JWT with Cognito-compatible token format, API key auth for agents
gRPC: Port 50051 with optional TLS

Getting Started

Setup & Operations

Reference

Architecture & Key Concepts

Architecture & Key Concepts

System Overview

Key Concepts

Multi-Tenancy

The Three-Layer Optimization Engine

Agent Communication

Action Queue & Approval Workflow

Cost Data Sources

LLM Spend Tracking

Alert Rules vs Enforcement Policies

Anomaly Detection Audit Trail

Inference Feedback Loop

Data Model

Infrastructure

Getting Started

Setup & Operations

Reference

​Architecture & Key Concepts

​System Overview

​Key Concepts

​Multi-Tenancy

​The Three-Layer Optimization Engine

​Agent Communication

​Action Queue & Approval Workflow

​Cost Data Sources

​LLM Spend Tracking

​Alert Rules vs Enforcement Policies

​Anomaly Detection Audit Trail

​Inference Feedback Loop

​Data Model

​Infrastructure

Architecture & Key Concepts

System Overview

Key Concepts

Multi-Tenancy

The Three-Layer Optimization Engine

Agent Communication

Action Queue & Approval Workflow

Cost Data Sources

LLM Spend Tracking

Alert Rules vs Enforcement Policies

Anomaly Detection Audit Trail

Inference Feedback Loop

Data Model

Infrastructure