> ## Documentation Index
> Fetch the complete documentation index at: https://opensre.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# AWS RDS

> Investigate AWS RDS instance health and recent events during incidents

OpenSRE uses AWS RDS to investigate database instance health and surface recent operational events — failovers, maintenance windows, parameter changes, and backup activity — when an alert fires against a managed RDS database.

All RDS API calls are read-only and routed through the shared `aws_sdk_client` allowlist, so the integration cannot mutate your RDS resources.

## Prerequisites

* AWS credentials configured per the [AWS integration](/aws) (role ARN recommended)
* An RDS DB instance you want OpenSRE to investigate
* IAM permissions for the two RDS describe actions listed below

## Setup

### Environment variables

```bash theme={null}
RDS_DB_INSTANCE_IDENTIFIER=prod-orders-db
AWS_REGION=us-east-1
```

| Variable                     | Default     | Description                                                                                          |
| ---------------------------- | ----------- | ---------------------------------------------------------------------------------------------------- |
| `RDS_DB_INSTANCE_IDENTIFIER` | —           | Required. The DB instance identifier OpenSRE should investigate.                                     |
| `AWS_REGION`                 | `us-east-1` | AWS region the instance lives in. Used by both the integration config and per-tool param extraction. |
| `RDS_REGION`                 | `us-east-1` | Fallback used only when `AWS_REGION` is not set.                                                     |

Region resolution order (highest priority first):

1. `region` field on the source dict (when configured via the integrations store)
2. `AWS_REGION` environment variable
3. `RDS_REGION` environment variable
4. `us-east-1` (default)

## IAM permissions

The integration only needs two read-only RDS actions:

```json theme={null}
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "rds:DescribeDBInstances",
        "rds:DescribeEvents"
      ],
      "Resource": "*"
    }
  ]
}
```

Attach this policy to the same IAM role or user already configured for the [AWS integration](/aws). If you are already using the AWS managed `ReadOnlyAccess` policy, both actions are already covered.

## Tools

| Tool                    | AWS API call              | What it returns                                                                                                                                                                               |
| ----------------------- | ------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `describe_rds_instance` | `rds:DescribeDBInstances` | Instance status, engine + version, instance class, Multi-AZ flag, endpoint address/port, storage type and size, availability zone, and backup window.                                         |
| `describe_rds_events`   | `rds:DescribeEvents`      | Recent events for the DB instance — failovers, maintenance, parameter group changes, and backup activity. Defaults to the last 60 minutes; bounded to 20160 minutes (14 days, the AWS limit). |

Both tools become available to the planner whenever `rds.db_instance_identifier` is present in the resolved sources.

### Use cases

* Verifying RDS instance status (`available`, `modifying`, `failed`) when an alert fires
* Detecting Multi-AZ failover events around an incident timestamp
* Tracing recent maintenance, parameter group changes, or backup activity that may correlate with the incident

## Troubleshooting

| Symptom                                       | Fix                                                                                                                                |
| --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| **AccessDenied on `rds:DescribeDBInstances`** | Add the IAM policy above to the role or user used by the AWS integration.                                                          |
| **DBInstanceNotFound**                        | Confirm `RDS_DB_INSTANCE_IDENTIFIER` matches an instance in `AWS_REGION`.                                                          |
| **Tool reports the wrong region**             | Either `AWS_REGION` is set to a different region, or the source dict has a stale `region` field. Check the resolution order above. |

## Upstream correlation validation

OpenSRE also includes a deterministic upstream-correlation smoke validation path for no-trace-ID RDS CPU spike investigations.

This allows validating correlation output locally without requiring live Datadog credentials or full LLM investigation setup.

### Local smoke validation

Run:

```bash theme={null}
opensre tests upstream-correlation-smoke
```

Expected output includes separate sections for:

* correlated signals
* most likely causal driver(s)

Example:

```
Upstream Correlation Smoke Validation

Correlated signals:
- upstream-correlation (source=runtime, score=0.9)

Most likely causal driver(s):
- system.cpu.user{service:orders-web} (confidence=0.9)
  rationale=time_window=1.0, topology=1.0, periodicity=1.0, operator_hint=0.0
```

For machine-readable output:

```bash theme={null}
opensre tests upstream-correlation-smoke --json
```

### Live investigation validation

For live validation, configure Datadog and trigger an investigation against an RDS CPU spike alert.

The upstream correlation runtime automatically scopes RDS metrics to the alerting DB instance using the `dbinstanceidentifier` tag to avoid cross-instance aggregation in multi-RDS environments.

Recommended alert fields:

```json theme={null}
{
  "service": "orders",
  "resource": "orders-rds-prod",
  "upstream_services": ["orders-web"]
}
```

Then run a live investigation:

```bash theme={null}
opensre investigate
```

When runtime evidence is available, the final report includes:

* correlated signals
* most likely causal driver(s)

This validation flow is intended as a lightweight smoke/integration path and does not require synthetic benchmark execution.
