Spring Boot 中的健康指标详解

1. 概述

Spring Boot 提供了多种方式来监控应用及其组件的运行状态，其中 HealthContributor 和 HealthIndicator 是两个核心的 API。

本文将深入剖析这两个接口的工作机制，并演示如何注册自定义健康指标，帮助你在生产环境中更好地掌握服务的健康状况。✅

⚠️ 注意：本文面向有一定 Spring Boot 使用经验的开发者，基础概念不再赘述。

2. 依赖配置

健康指标功能由 Spring Boot Actuator 模块提供，因此需要引入对应的 starter 依赖：

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

引入后，默认会暴露 /actuator/health 接口，但部分敏感接口需显式开启（后文会提到）。

3. 内置 HealthIndicator

Spring Boot 在启动时会自动注册多个 HealthIndicator，用于检测关键组件的健康状态。

常见内置指标

✅ 始终注册：
- DiskSpaceHealthIndicator：检查磁盘空间是否充足。
- PingHealthIndicator：提供 /actuator/health 的基础存活探测，返回 {"status": "UP"}。
✅ 条件注册（根据 classpath 自动装配）：
- 使用数据库 → 注册 DataSourceHealthIndicator
- 使用 Redis → 注册 RedisHealthIndicator
- 使用 Kafka → 注册 KafkaHealthIndicator
- 使用 Cassandra → 注册 CassandraHealthIndicator

查看健康状态

🔹 查看整体健康状态：
GET /actuator/health

返回示例：

{
  "status": "UP",
  "components": {
    "diskSpace": {
      "status": "UP",
      "details": {
        "total": 499963170816,
        "free": 134414831616,
        "threshold": 10485760
      }
    },
    "ping": {
      "status": "UP"
    }
  }
}

🔹 查看某个组件状态：
GET /actuator/health/diskSpace

返回示例：

{
  "status": "UP",
  "details": {
    "total": 499963170816,
    "free": 134414831616,
    "threshold": 10485760,
    "exists": true
  }
}

💡 小贴士：这些 /actuator/health/{name} 的 {name} 规则是——去掉类名中的 HealthIndicator 后缀，首字母小写。例如 DataSourceHealthIndicator → dataSource。

4. 自定义 HealthIndicator

除了内置指标，我们经常需要监控自定义组件（如第三方服务、内部系统连接等）。实现方式非常简单：✅

4.1 基础实现

只需实现 HealthIndicator 接口并注册为 Spring Bean：

@Component
public class RandomHealthIndicator implements HealthIndicator {

    @Override
    public Health health() {
        double chance = ThreadLocalRandom.current().nextDouble();
        Health.Builder status = Health.up();
        if (chance > 0.9) {
            status = Health.down();
        }
        return status.build();
    }
}

这个“随机健康”指标有 10% 概率返回 DOWN，适合用来测试告警逻辑。

4.2 Reactor 支持（响应式场景）

如果你的应用是响应式的（WebFlux），应实现 ReactiveHealthIndicator：

@Component
public class ReactiveRandomHealthIndicator implements ReactiveHealthIndicator {

    @Override
    public Mono<Health> health() {
        double chance = ThreadLocalRandom.current().nextDouble();
        if (chance > 0.9) {
            return Mono.just(Health.down().build());
        }
        return Mono.just(Health.up().build());
    }
}

方法返回 Mono<Health>，其余逻辑一致。

4.3 自定义指标名称

默认情况下，Bean 名称决定接口路径。例如：

@Component("rand")
public class RandomHealthIndicator implements HealthIndicator {
    // ...
}

此时访问路径变为 /actuator/health/rand，而不是 random。
✅ 规则：使用 Bean 名称，而非类名。

4.4 禁用某个 HealthIndicator

可以通过配置项关闭指定指标：

management.health.random.enabled=false

但要让这个配置生效，必须加上 @ConditionalOnEnabledHealthIndicator 注解：

@Component
@ConditionalOnEnabledHealthIndicator("random")
public class RandomHealthIndicator implements HealthIndicator {
    // ...
}

否则配置无效！⚠️ 这是个常见踩坑点。

测试验证：

@SpringBootTest
@AutoConfigureMockMvc
@TestPropertySource(properties = "management.health.random.enabled=false")
class DisabledRandomHealthIndicatorIntegrationTest {

    @Autowired
    private MockMvc mockMvc;

    @Test
    void givenADisabledIndicator_whenSendingRequest_thenReturns404() throws Exception {
        mockMvc.perform(get("/actuator/health/random"))
               .andExpect(status().isNotFound());
    }
}

✅ 此方法对内置指标也有效，例如禁用磁盘检查：
management.health.diskspace.enabled=false

4.5 添加详细信息

仅返回 UP/DOWN 不够直观，我们可以附加上下文信息：

@Override
public Health health() {
    double chance = ThreadLocalRandom.current().nextDouble();
    Health.Builder builder = Health.up();
    
    if (chance > 0.9) {
        builder = Health.down();
    }

    return builder
        .withDetail("chance", chance)
        .withDetail("strategy", "thread-local")
        .build();
}

调用 /actuator/health/random 可能返回：

{
  "status": "DOWN",
  "details": {
    "chance": 0.9883560157173152,
    "strategy": "thread-local"
  }
}

你也可以一次性传入 Map：

Map<String, Object> details = new HashMap<>();
details.put("chance", chance);
details.put("strategy", "thread-local");
return builder.withDetails(details).build();

记录异常信息

当组件调用失败时，建议附带异常堆栈：

if (chance > 0.9) {
    builder = Health.down(new RuntimeException("Bad Luck"));
}

结果中会包含 error 字段：

{
  "status": "DOWN",
  "details": {
    "error": "java.lang.RuntimeException: Bad Luck",
    "chance": 0.9603739107139401,
    "strategy": "thread-local"
  }
}

测试时可用 jsonPath 验证：

mockMvc.perform(get("/actuator/health/random"))
       .andExpect(jsonPath("$.status").exists())
       .andExpect(jsonPath("$.details.strategy").value("thread-local"))
       .andExpect(jsonPath("$.details.chance").exists());

4.6 控制详情暴露级别

生产环境不建议对所有人暴露详细信息，可通过配置控制：

# 可选值：never, when_authorized, always
management.endpoint.health.show-details=when_authorized

never：永远不返回 details
always：总是返回
when_authorized：仅授权用户可见

授权条件如下：

用户已认证（authenticated）
拥有 management.endpoint.health.roles 配置的角色

例如：

management.endpoint.health.roles=ADMIN,ACTUATOR

4.7 自定义健康状态

Spring Boot 默认支持四种状态：

状态	含义
`UP`	正常
`DOWN`	故障
`OUT_OF_SERVICE`	临时下线
`UNKNOWN`	状态未知

这些是 Status 类中的 public static final 常量，非枚举，因此支持扩展：

Health.Builder warning = Health.status("WARNING");

自定义 HTTP 状态码映射

默认情况下：

UP → 200 OK
DOWN / OUT_OF_SERVICE → 503 Service Unavailable

你可以通过配置修改映射关系：

management.endpoint.health.status.http-mapping.down=500
management.endpoint.health.status.http-mapping.out_of_service=503
management.endpoint.health.status.http-mapping.warning=500

或通过代码注册 HttpCodeStatusMapper Bean：

@Component
public class CustomStatusCodeMapper implements HttpCodeStatusMapper {

    @Override
    public int getStatusCode(Status status) {
        if (status == Status.DOWN || status.getCode().equals("WARNING")) {
            return 500;
        }
        if (status == Status.OUT_OF_SERVICE) {
            return 503;
        }
        return 200;
    }
}

✅ SimpleHttpCodeStatusMapper 是默认实现，也支持从配置文件读取映射。

5. 健康指标 vs 指标（Health vs Metrics）

这是个关键区分，别搞混了 ❌

维度	HealthIndicator	Metrics
用途	检查组件是否可达、是否正常工作	记录数值、计数、分布、趋势
典型场景	DB 连接是否通、Redis 是否响应	CPU 使用率、GC 次数、HTTP 响应时间分布
数据类型	状态（UP/DOWN）+ 少量上下文	数值流（Gauge, Counter, Timer 等）
工具建议	Spring Boot Actuator `health`	Micrometer + Prometheus

举个例子：

你的服务连不上 Kafka？用 KafkaHealthIndicator 返回 DOWN。
想知道 Kafka 消费延迟？用 @Timed 记录处理时间，推送到 Prometheus。

✅ 总结一句话：
Health 告诉你“有没有问题”，Metrics 告诉你“问题有多严重”。

6. 总结

本文系统讲解了 Spring Boot 中 HealthIndicator 的使用方式，涵盖：

✅ 内置指标与自动装配机制
✅ 自定义指标的实现与命名规则
✅ 异常信息、详细数据的添加
✅ 权限控制与状态码映射
✅ 与 Metrics 的本质区别

这些知识在微服务监控、K8s 探针、告警系统中非常实用，建议集合备用。

示例代码已上传至 GitHub：https://github.com/yourname/spring-boot-actuator-demo

Persistence

REST

Security