OpenSRE uses the Dagster GraphQL API to investigate data-pipeline incidents, fetching recent runs and their status, the full event log and root-cause exception for a failed run, asset materialization history, and sensor or schedule tick history. Works against both Dagster OSS (Documentation Index
Fetch the complete documentation index at: https://opensre.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
dagster dev and self-hosted dagster-webserver) and Dagster+ (the SaaS).
Prerequisites
- A reachable dagster-webserver instance:
- Dagster OSS: run
dagster dev -f jobs.pylocally or deploydagster-webserverto your infra. Default port3000. - Dagster+: an active deployment, e.g.
https://<org>.dagster.cloud/<deployment>orhttps://<org>.<region>.dagster.cloud/<deployment>.
- Dagster OSS: run
- Network access from the OpenSRE environment to the webserver
- For Dagster+: a User Token generated under Organization Settings → Tokens → User Tokens (not an Agent Token; Agent Tokens authenticate Hybrid agents and are rejected by the GraphQL endpoint)
Setup
Option 1: Onboarding wizard
- Dagster webserver URL —
http://localhost:3000for OSS local dev, orhttps://<deployment>.dagster.cloud/<env>for Dagster+ (the client appends/graphqlitself, so either form is fine) - Dagster API token — required for Dagster+; leave blank for unauthenticated OSS
version probe before saving, writes DAGSTER_ENDPOINT to your .env, and persists the API token (when provided) to your system keychain.
Option 2: Legacy CLI
Option 3: Manual configuration
Add to your.env:
| Variable | Default | Description |
|---|---|---|
DAGSTER_ENDPOINT | — | Required. Base URL of the dagster-webserver. The client appends /graphql itself, so paste any of https://host/deployment, https://host/deployment/, https://host/deployment/graphql — all collapse to the same canonical base. |
DAGSTER_API_TOKEN | (empty) | Required for Dagster+ deployments. Leave empty for unauthenticated local OSS Dagster. Sent as the Dagster-Cloud-Api-Token header. |
~/.opensre/integrations.json with 0o600 permissions:
Where to find your Dagster+ token and endpoint
Endpoint: look at the URL in your browser when logged into Dagster+. It is the part up through the deployment name, e.g.https://acme.dagster.cloud/prod if the address bar shows https://acme.dagster.cloud/prod/runs. EU accounts use a regional subdomain such as https://acme.eu.dagster.cloud/prod. Trailing /graphql is accepted and stripped automatically.
API token:
- Click the user menu (your icon) → Organization Settings
- Open the Tokens tab
- Click + Create user token, give it a name like
opensre-integration - Copy the token immediately (Dagster+ shows it once and never again)
Token type matters. Use a User Token, not an Agent Token. Agent Tokens authenticate Hybrid agents talking to the Agents API and are rejected (HTTP 401) by the GraphQL endpoint.
Investigation tools
When OpenSRE investigates a Dagster-related alert, five diagnostic tools are available:- List runs — recent pipeline/job runs with status, job name, timestamps, and pre-computed duration; filterable by status and job name
- Get run logs — event log for a specific run with
ExecutionStepFailureEventandRunFailureEvententries; surfaces user-code exceptions fromerror.cause(e.g. theValueErrorunderlying Dagster’sDagsterExecutionStepExecutionErrorwrapper) and pre-counts multi-step failures - List assets with materialization — Dagster assets with their latest materialization timestamp + run id; useful for spotting stale or never-materialized assets
- List sensor ticks — recent tick history for a sensor (identified by full
SensorSelectortriplet: repository location, repository, sensor name) - List schedule ticks — recent tick history for a schedule (identified by full
ScheduleSelectortriplet: repository location, repository, schedule name)
Verify
query { version } probe against the configured endpoint and reports the running Dagster version on success.
Troubleshooting
| Symptom | Fix |
|---|---|
| HTTP 401 with HTML body | The Dagster+ edge proxy rejected the request. Most likely causes: (1) the token is an Agent Token not a User Token; (2) the user owning the token lacks role on the target deployment; (3) the token was revoked or regenerated. Verify under Organization Settings → Tokens → User Tokens and confirm the user has access to the deployment in the URL. |
| Invalid JSON in response: Expecting value | The endpoint was reached but did not respond with JSON. Usually means the URL is wrong (e.g. you pasted a path that hits the Dagster+ UI instead of the GraphQL endpoint). The client appends /graphql automatically; paste only the base URL through the deployment name. |
| Request to Dagster failed: Connection refused | dagster-webserver is not running at the configured endpoint. Start it with dagster dev -f jobs.py for local OSS, or check the Dagster+ deployment status. |
runsOrError.__typename == InvalidPipelineRunsFilterError | The status filter passed an unrecognized RunStatus value. Valid values: QUEUED, NOT_STARTED, MANAGED, STARTING, STARTED, SUCCESS, FAILURE, CANCELING, CANCELED. |
logsForRun returns RunNotFoundError | The run id does not exist on this deployment. Confirm the run id and the deployment slug in the endpoint match. |
Sensor query returns SensorNotFoundError | The SensorSelector triplet (repository_location_name, repository_name, sensor_name) did not match a sensor in the deployment. List sensors in the Dagster UI to confirm the exact names. |
Schedule query returns ScheduleNotFoundError | The ScheduleSelector triplet (repository_location_name, repository_name, schedule_name) did not match a schedule in the deployment. List schedules in the Dagster UI to confirm the exact names. |
Security best practices
- Use a dedicated User Token scoped to a service-style user account when possible. Dagster+ does not have first-class service accounts; the community pattern is a separate user whose token you use.
- Keep tokens out of source control — use
.env(gitignored) or the persistent store at~/.opensre/integrations.json. - The GraphQL queries OpenSRE issues are read-only: list runs, fetch event logs, list assets, fetch sensor ticks. No mutations are sent.
- Rotate tokens periodically. Tokens can be revoked from the same Organization Settings → Tokens page.
- For local OSS Dagster without auth, restrict the webserver to localhost or your private network. Do not expose
dagster dev’s default port to the internet.
Tracer