Skip to main content

Configuration & Customization

Tenant Branding

Customize the dashboard appearance for your organization. Navigate to Customization in the admin sidebar.

Branding Tab

SettingDescription
Logo URLYour company logo displayed in the sidebar (recommended: 200x50px PNG/SVG)
Favicon URLBrowser tab icon (16x16 or 32x32px)
Primary ColorMain brand color used for headers, buttons, and accents
Secondary ColorSecondary accent color for highlights and charts
ThemeLight, Dark, or Auto (follows system preference)
Custom CSSAdvanced: inject custom CSS rules to override any dashboard styling
App NameDisplayed in browser tab title
Contact EmailSupport contact shown to your users
Support URLLink to your internal support/help page
Disclaimer TextLegal disclaimer shown at the bottom of pages
Footer TextCustom footer text
Branding is applied automatically across the entire dashboard when saved. Colors override the default MUI theme. Logo replaces the default sidebar logo.

Domain Tab

Configure a custom domain for your dashboard (e.g., gpu.yourcompany.com instead of app.gocloudera.com).
SettingDescription
Custom DomainYour desired domain (e.g., gpu.yourcompany.com)
SubdomainYour tenant slug (auto-generated)
SSL EnabledHTTPS for your custom domain (auto-provisioned)
Setup steps:
  1. Enter your custom domain
  2. Copy the CNAME target provided
  3. Add a CNAME record in your DNS provider pointing your domain to the target
  4. Wait for DNS propagation (up to 48 hours)
  5. SSL certificate is provisioned automatically

Public Dashboards

Enable public dashboard access to share read-only views with stakeholders who don’t have accounts. When enabled, you can generate shareable links for specific dashboards.

Notification Channels

Configure where alerts and policy notifications are delivered. Navigate to Customization → Notifications.

Channel Types

Slack
  • Provide an Incoming Webhook URL from your Slack workspace
  • Alerts are formatted as rich Slack blocks with severity colors, context, and action buttons
Microsoft Teams
  • Provide an Incoming Webhook URL from your Teams channel
  • Alerts use MessageCard format with facts and theme colors
PagerDuty
  • Provide a PagerDuty Integration Key (Events API v2)
  • Critical alerts create PagerDuty incidents
  • Severity is mapped: critical → critical, high → error, medium → warning, low → info
Email
  • Provide one or more recipient email addresses
  • Alerts are sent as formatted HTML emails with tables
  • Supports digest mode: batch multiple alerts into a single email
Custom Webhook
  • Provide a URL, HTTP method, and optional headers
  • Alerts are sent as JSON payloads

Per-Channel Filtering

Each channel can be configured with:
  • Alert types — which alert types to receive (idle_gpu, cost_threshold, high_gpu_utilization, etc.). Empty = all types.
  • Minimum priority — only receive alerts at or above this severity level (low, medium, high, critical)
  • Digest modeinstant (send immediately) or batched (group alerts into periodic digests)
  • Digest interval — when batched, how often to flush the digest (default: 5 minutes)

Testing

Every channel has a Test button that sends a sample notification to verify the configuration is working.

Alert Rules

Define custom monitoring thresholds. Navigate to Alert Rules in the sidebar.

Creating a Rule

FieldDescription
Rule NameDescriptive name (e.g., “Production GPU Overload”)
MetricWhat to monitor: gpu_utilization, cpu_utilization, memory_utilization, daily_cost, hourly_cost, temperature, error_rate
OperatorComparison: >, <, >=, <=, =, !=
ThresholdNumeric value to compare against
DurationHow long the condition must persist (0 = immediate)
SeverityAlert severity: low, medium, high, critical
ScopeWhich instances: All, By Tag, or Specific Instances
Notification ChannelsWhich channels to notify when triggered
CooldownMinutes before the rule can trigger again (prevents alert storms)

Scope Filtering

  • All Instances — rule applies to every GPU instance in your tenant
  • By Tag — rule applies to instances matching a specific tag key/value pair (e.g., environment=production)
  • Specific Instances — rule applies only to selected instance IDs

Enforcement Policies

Configure automated cost optimization rules. Navigate to Enforcement in the sidebar.

Execution Modes

ModeBehavior
Notify OnlySends alerts when conditions are met. No automatic action. Start here.
Approval RequiredQueues actions for admin approval before execution.
AutoExecutes actions immediately when conditions are met. Use with tested policies.

Policy Templates

Pre-built templates you can clone and customize:
TemplateDescription
Training Job Cost GuardStops training jobs that exceed a cost threshold
Dev/Test Auto-ShutdownStops dev/test instances at end of business hours
Inference Right-SizeSuggests downsizing for underutilized inference endpoints
Spot Instance FallbackAutomatically starts on-demand when spot instances are preempted
Weekend Cost SaverScales down non-essential instances on weekends
New Instance AlertNotifies when any new GPU instance starts

Composite Conditions

Policies support nested AND/OR logic:
AND
├── GPU utilization < 10% for 30 min
└── OR
    ├── Daily cost > $500
    └── Monthly budget utilization > 80%

Schedule Constraints

Policies can be restricted to specific time windows:
  • Active hours (e.g., 8am-10pm)
  • Active days (e.g., Mon-Fri)
  • Timezone-aware
  • Maintenance windows pause enforcement

Budget-Aware Metrics

Policies can reference budget metrics:
  • monthly_budget_utilization — percentage of monthly budget consumed
  • burn_rate — daily spend rate
  • projected_monthly_spend — extrapolated end-of-month total
  • days_remaining_in_month — days left in billing period

Escalation Policies

Configure multi-level escalation for critical alerts. Navigate to Customization or manage via API. Define escalation levels with increasing urgency:
LevelDelayAction
10 minNotify Slack #ops-alerts
215 minNotify PagerDuty on-call
330 minNotify Engineering Manager email
If an alert is not acknowledged within the delay window, it automatically escalates to the next level. Acknowledging an alert stops the escalation chain.

Maintenance Windows

Schedule maintenance periods that suppress alerts and enforcement. Navigate to the API or manage programmatically.
FieldDescription
NameWindow name (e.g., “Saturday Deploy”)
Start/End TimeUTC timestamps
Suppress AlertsDon’t send notifications during window
Suppress EnforcementDon’t execute policy actions during window
ScopeAll instances, specific instance IDs, or tag-based
RecurringOptional cron expression for recurring windows

Data Retention

Configure how long GoCloudera keeps your data. Navigate to Customization → Data & Retention.
SettingDefaultRange
Retention Period90 days30-365 days
Data older than the retention period is automatically archived and deleted nightly. You can also manually trigger cleanup or export data before deletion.

Exports

Export any data as CSV or PDF:
  • GPU instances and their current state
  • Cost data with breakdowns
  • AI spend with unit economics
  • Alert history
  • GPU metrics time series

User Settings

Personal preferences for individual users. Navigate to Settings (bottom of sidebar).

Profile

  • First name, last name
  • Timezone (affects how timestamps are displayed)
  • Language preference

Notifications

  • Personal notification preferences by severity and channel
  • Daily email digest toggle

Display

  • Items per page (10, 25, 50, 100)
  • Chart animation toggle

Cloud Account Configuration

Configure cloud provider access for cross-account actions. Navigate to Cloud Accounts in the admin sidebar.

AWS Cross-Account Access

GoCloudera uses AWS STS AssumeRole for secure cross-account access:
  1. Create an IAM role in your AWS account
  2. Set the trust policy to allow GoCloudera’s platform account
  3. Attach the required permissions policies
  4. Enter the Role ARN and External ID in Cloud Accounts
The platform requests temporary credentials (15-minute expiry) each time it needs to execute an action. No long-lived credentials are stored.

Azure

Provide your Azure subscription details and service principal credentials. The platform uses DefaultAzureCredential for authentication.

GCP

Provide your project ID and service account credentials. The platform uses Application Default Credentials.

API Keys

Each tenant has an API key for agent authentication. Manage API keys in Cloud Accounts:
  • View the current API key (masked by default)
  • Rotate the API key (generates a new key, invalidates the old one)
  • Copy the key for agent configuration
API keys authenticate agent-to-backend communication. They are scoped to a single tenant and grant data-write permissions only.