Skip to content
Go back

Reducing Credential Risk Without Breaking DX

For a long time, services have relied on the traditional username/password authentication model to connect to databases. Most developer frameworks work very well with this pattern, providing built-in mechanisms to read credentials from environment variables, configuration files, or class-level settings. This creates a simple and smooth developer experience — but there is a trade-off as the real challenge is not using credentials, it’s distributing and protecting them.

In our case, we followed a common approach:

This setup worked “fine”… until we hit an error while running at low scale. It was an isolated incident:

SQLException. Message: Connect failed to authenticate: reached max connection retries

From the log alone, we didn’t have much context. To understand what happened, we correlated events from CloudTrail, application logs, and ECS task history to reconstruct the sequence of events:

sequenceDiagram
    participant ECS as ECS Cluster
    participant Pod as Service Pod
    participant Secrets as AWS Secrets Manager
    participant RDS as Amazon RDS

    ECS->>Pod: Service scales up due to heavy load
    
    activate Secrets
    Note over Secrets: Password rotation started

    loop Retry until max attempts reached
        Pod->>Secrets: Fetch credentials
        Secrets-->>Pod: Returns stale credentials

        Pod->>RDS: Attempt connection
        RDS-->>Pod: Authentication failed
    end
    
    Pod-->>ECS: Startup failed:<br/>Max retry attempts reached

    Note over Secrets: Password rotation ends:<br/>DB and Secrets in sync
    deactivate Secrets

    ECS->>Pod: Pod scheduled for recreation

After reviewing the timeline, several key observations stood out:

Given these insights, the team decided to revisit the authentication flow. Some proposed options included:

Both were reasonable workarounds, but they felt like patches. Extending rotation intervals weakens security posture, while more aggressive retries make it harder to distinguish rotation issues from real configuration or connectivity problems — especially as the number of databases and application users grows.

Credentials are simple, but never free

Traditional database authentication has a few well-known issues:

Even with managed services like Secrets Manager, credentials remain static artifacts we must fetch, cache, and protect.

Complexity in our systems is inevitable but depending on the context (the team, the architecture, the tools, etc.) we can move it around to harden some parts of our systems.

The question I wanted to answer was simple:

Can we remove static database passwords with minimum friction?

From credentials to roles

Role-Based Access Control (RBAC) isn’t new, it pushes credentials complexity down to a layer that tools handle transparently. It’s a common pattern in infrastructure, but not widely used on development (I think this is because in general there is still a barrier between infra and development teams but maybe that is another post).

For our case, IAM database authentication replaces static passwords with short-lived auth tokens, generated on demand. The complexity moves from password distribution and rotation to the IAM layer.

sequenceDiagram
    participant Pod as Service Pod
    participant RDS as Amazon RDS
    participant IAM

    Pod-->>Pod: Generate IAM auth token (SigV4)
    Pod->>RDS: Connect using username + token
    RDS->>IAM: Validate token and permissions
    IAM-->>RDS: Signature valid
    RDS-->>Pod: Connection successful

On paper, it offers:

But there’s always a catch.

The common concern I hear is:

That sounds good, but isn’t it slower or more complex?

The experiment setup

I built a small Java application that connects to RDS in two ways:

  1. Traditional approach

    • Fetch credentials from Secrets Manager
    • Use JDBC with username/password
  2. IAM authentication

    • Generate an auth token using AWS SDK
    • Use the token as the database password

The goal wasn’t to optimize performance to the extreme — just to observe realistic connection behavior.

Measuring the cost

Secrets-based authentication

Secret fetch time: 1260 ms
Connection time: 780 ms
SecretsManagerClient closed.

Total overhead is split between:

IAM-based authentication

Token generation time: 877 ms
Connection time (JDBC connect): 760 ms
Connection closed.

Here, the cost shifts slightly:

What changed

From a security and operational standpoint, this was a clear win for our use case.

What didn’t change much

This was important: adopting IAM auth did not require a mental model shift for application teams.

The code impact (smaller than expected)

Another concern is:

This will require a big refactor.

In reality, the change was localized:

getPropertyFromSecret(prop) {
    SecretsManagerClient smClient = SecretsManagerClient.builder().region("ap-northeast-1").credentialsProvider(DefaultCredentialsProvider.create()).build();
    GetSecretValueRequest req = GetSecretValueRequest.builder().secretId(DB_SECRET_NAME).build();
    GetSecretValueResponse resp = smClient.getSecretValue(req);
    JSONObject json = new JSONObject(resp.secretString());

    return json.getString(prop);
}

generateAuthToken() {
    RdsUtilities utilities = RdsUtilities.builder().region("ap-northeast-1").credentialsProvider(DefaultCredentialsProvider.create()).build();
    GenerateAuthenticationTokenRequest tokenRequest = GenerateAuthenticationTokenRequest.builder().hostname(DB_HOST).port(DB_PORT).username(DB_USER).build();
    
    return utilities.generateAuthenticationToken(tokenRequest);
}

// String password = getPropertyFromSecret("password");
String password = generateAuthToken();

Connection conn = DriverManager.getConnection(jdbcUrl, DB_USER, password);

The main difference is where the password comes from, not how the connection works.

Adoption matters more than elegance

From a purely technical perspective, IAM authentication is not revolutionary.

What makes it valuable is this combination:

Security improvements that are hard to adopt usually fail. This one doesn’t need heroics.

When IAM authentication makes sense

It’s a good fit if:

Tip: For high-connection or serverless workloads, combine it with RDS Proxy — it handles token generation and refresh transparently, with no extra code.

It might not be ideal if:

Context matters.

Final thoughts

This experiment didn’t magically speed up database connections. That wasn’t the goal.

What it did was remove an entire class of risk without making the system harder to operate.

For me, that’s the kind of trade-off worth making: small changes, measurable impact, and fewer things that can fail at 3 a.m.


Share this post on:

Previous Post
DevOps Started When Production Failed