Configuration & Customization
Tenant Branding
Customize the dashboard appearance for your organization. Navigate to Customization in the admin sidebar.Branding Tab
| Setting | Description |
|---|---|
| Logo URL | Your company logo displayed in the sidebar (recommended: 200x50px PNG/SVG) |
| Favicon URL | Browser tab icon (16x16 or 32x32px) |
| Primary Color | Main brand color used for headers, buttons, and accents |
| Secondary Color | Secondary accent color for highlights and charts |
| Theme | Light, Dark, or Auto (follows system preference) |
| Custom CSS | Advanced: inject custom CSS rules to override any dashboard styling |
| App Name | Displayed in browser tab title |
| Contact Email | Support contact shown to your users |
| Support URL | Link to your internal support/help page |
| Disclaimer Text | Legal disclaimer shown at the bottom of pages |
| Footer Text | Custom footer text |
Domain Tab
Configure a custom domain for your dashboard (e.g.,gpu.yourcompany.com instead of app.gocloudera.com).
| Setting | Description |
|---|---|
| Custom Domain | Your desired domain (e.g., gpu.yourcompany.com) |
| Subdomain | Your tenant slug (auto-generated) |
| SSL Enabled | HTTPS for your custom domain (auto-provisioned) |
- Enter your custom domain
- Copy the CNAME target provided
- Add a CNAME record in your DNS provider pointing your domain to the target
- Wait for DNS propagation (up to 48 hours)
- SSL certificate is provisioned automatically
Public Dashboards
Enable public dashboard access to share read-only views with stakeholders who don’t have accounts. When enabled, you can generate shareable links for specific dashboards.Notification Channels
Configure where alerts and policy notifications are delivered. Navigate to Customization → Notifications.Channel Types
Slack- Provide an Incoming Webhook URL from your Slack workspace
- Alerts are formatted as rich Slack blocks with severity colors, context, and action buttons
- Provide an Incoming Webhook URL from your Teams channel
- Alerts use MessageCard format with facts and theme colors
- Provide a PagerDuty Integration Key (Events API v2)
- Critical alerts create PagerDuty incidents
- Severity is mapped: critical → critical, high → error, medium → warning, low → info
- Provide one or more recipient email addresses
- Alerts are sent as formatted HTML emails with tables
- Supports digest mode: batch multiple alerts into a single email
- Provide a URL, HTTP method, and optional headers
- Alerts are sent as JSON payloads
Per-Channel Filtering
Each channel can be configured with:- Alert types — which alert types to receive (idle_gpu, cost_threshold, high_gpu_utilization, etc.). Empty = all types.
- Minimum priority — only receive alerts at or above this severity level (low, medium, high, critical)
- Digest mode —
instant(send immediately) orbatched(group alerts into periodic digests) - Digest interval — when batched, how often to flush the digest (default: 5 minutes)
Testing
Every channel has a Test button that sends a sample notification to verify the configuration is working.Alert Rules
Define custom monitoring thresholds. Navigate to Alert Rules in the sidebar.Creating a Rule
| Field | Description |
|---|---|
| Rule Name | Descriptive name (e.g., “Production GPU Overload”) |
| Metric | What to monitor: gpu_utilization, cpu_utilization, memory_utilization, daily_cost, hourly_cost, temperature, error_rate |
| Operator | Comparison: >, <, >=, <=, =, != |
| Threshold | Numeric value to compare against |
| Duration | How long the condition must persist (0 = immediate) |
| Severity | Alert severity: low, medium, high, critical |
| Scope | Which instances: All, By Tag, or Specific Instances |
| Notification Channels | Which channels to notify when triggered |
| Cooldown | Minutes before the rule can trigger again (prevents alert storms) |
Scope Filtering
- All Instances — rule applies to every GPU instance in your tenant
- By Tag — rule applies to instances matching a specific tag key/value pair (e.g.,
environment=production) - Specific Instances — rule applies only to selected instance IDs
Enforcement Policies
Configure automated cost optimization rules. Navigate to Enforcement in the sidebar.Execution Modes
| Mode | Behavior |
|---|---|
| Notify Only | Sends alerts when conditions are met. No automatic action. Start here. |
| Approval Required | Queues actions for admin approval before execution. |
| Auto | Executes actions immediately when conditions are met. Use with tested policies. |
Policy Templates
Pre-built templates you can clone and customize:| Template | Description |
|---|---|
| Training Job Cost Guard | Stops training jobs that exceed a cost threshold |
| Dev/Test Auto-Shutdown | Stops dev/test instances at end of business hours |
| Inference Right-Size | Suggests downsizing for underutilized inference endpoints |
| Spot Instance Fallback | Automatically starts on-demand when spot instances are preempted |
| Weekend Cost Saver | Scales down non-essential instances on weekends |
| New Instance Alert | Notifies when any new GPU instance starts |
Composite Conditions
Policies support nested AND/OR logic:Schedule Constraints
Policies can be restricted to specific time windows:- Active hours (e.g., 8am-10pm)
- Active days (e.g., Mon-Fri)
- Timezone-aware
- Maintenance windows pause enforcement
Budget-Aware Metrics
Policies can reference budget metrics:monthly_budget_utilization— percentage of monthly budget consumedburn_rate— daily spend rateprojected_monthly_spend— extrapolated end-of-month totaldays_remaining_in_month— days left in billing period
Escalation Policies
Configure multi-level escalation for critical alerts. Navigate to Customization or manage via API. Define escalation levels with increasing urgency:| Level | Delay | Action |
|---|---|---|
| 1 | 0 min | Notify Slack #ops-alerts |
| 2 | 15 min | Notify PagerDuty on-call |
| 3 | 30 min | Notify Engineering Manager email |
Maintenance Windows
Schedule maintenance periods that suppress alerts and enforcement. Navigate to the API or manage programmatically.| Field | Description |
|---|---|
| Name | Window name (e.g., “Saturday Deploy”) |
| Start/End Time | UTC timestamps |
| Suppress Alerts | Don’t send notifications during window |
| Suppress Enforcement | Don’t execute policy actions during window |
| Scope | All instances, specific instance IDs, or tag-based |
| Recurring | Optional cron expression for recurring windows |
Data Retention
Configure how long GoCloudera keeps your data. Navigate to Customization → Data & Retention.| Setting | Default | Range |
|---|---|---|
| Retention Period | 90 days | 30-365 days |
Exports
Export any data as CSV or PDF:- GPU instances and their current state
- Cost data with breakdowns
- AI spend with unit economics
- Alert history
- GPU metrics time series
User Settings
Personal preferences for individual users. Navigate to Settings (bottom of sidebar).Profile
- First name, last name
- Timezone (affects how timestamps are displayed)
- Language preference
Notifications
- Personal notification preferences by severity and channel
- Daily email digest toggle
Display
- Items per page (10, 25, 50, 100)
- Chart animation toggle
Cloud Account Configuration
Configure cloud provider access for cross-account actions. Navigate to Cloud Accounts in the admin sidebar.AWS Cross-Account Access
GoCloudera uses AWS STS AssumeRole for secure cross-account access:- Create an IAM role in your AWS account
- Set the trust policy to allow GoCloudera’s platform account
- Attach the required permissions policies
- Enter the Role ARN and External ID in Cloud Accounts
Azure
Provide your Azure subscription details and service principal credentials. The platform usesDefaultAzureCredential for authentication.
GCP
Provide your project ID and service account credentials. The platform uses Application Default Credentials.API Keys
Each tenant has an API key for agent authentication. Manage API keys in Cloud Accounts:- View the current API key (masked by default)
- Rotate the API key (generates a new key, invalidates the old one)
- Copy the key for agent configuration