写在前面

我们的系统可能因为正在部署、服务异常终止或者其他问题导致系统处于非健康状态,这个时候我们需要知道系统的健康状况,而
健康检查可以帮助我们快速确定系统是否处于正常状态。一般情况下,我们会提供公开的HTTP接口,用于专门化健康检查。

NET
Core提供的健康检查库包括Microsoft.Extensions.Diagnostics.HealthChecks.Abstractions和Microsoft.Extensions.Diagnostics.HealthChecks。这两个库共同为我们提供了最基础的健康检查的解决方案,后面扩展的组件主要有下面几个,本文不作其他说明。

AspNetCore.HealthChecks.System
AspNetCore.HealthChecks.Network
AspNetCore.HealthChecks.SqlServer
AspNetCore.HealthChecks.MongoDb
AspNetCore.HealthChecks.Npgsql
AspNetCore.HealthChecks.Redis
AspNetCore.HealthChecks.AzureStorage
AspNetCore.HealthChecks.AzureServiceBus
AspNetCore.HealthChecks.MySql
AspNetCore.HealthChecks.DocumentDb
AspNetCore.HealthChecks.SqLite
AspNetCore.HealthChecks.Kafka
AspNetCore.HealthChecks.RabbitMQ
AspNetCore.HealthChecks.IdSvr
AspNetCore.HealthChecks.DynamoDB
AspNetCore.HealthChecks.Oracle
AspNetCore.HealthChecks.Uris

源码探究

Microsoft.Extensions.Diagnostics.HealthChecks.Abstractions是.NET
Core健康检查的抽象基础,从中我们可以看出这个库的设计意图。它提供了一个统一的接口IHealthCheck,用于检查应用程序中各个被监控组件的状态,包括后台服务、数据库等。这个接口只有一个方法CheckHealthAsync,


该方法有一个参数是HealthCheckContext,它表示当前健康检查执行时所关联的上下文对象,它的返回值HealthCheckResult表示当前健康检查结束后所产生的被监控组件的运行状态。

源码如下所示:
1: public interface IHealthCheck 2: { 3: Task<HealthCheckResult>
CheckHealthAsync(HealthCheckContext context, CancellationToken
cancellationToken =default); 4: }
HealthCheckRegistration

HealthCheckContext里面只有一个成员就是HealthCheckRegistration实例。

而HealthCheckRegistration是一个相当重要的对象,它体现了健康检查需要关注和注意的地方,其内部涉及到五个属性,分别用于:

* 标识健康检查名称
* 创建IHealthCheck实例
* 健康检查的超时时间(防止我们因为健康检查而过多占用资源)
* 失败状态标识
* 一个标签集合(可用于健康检查过滤)

这五个属性的相关源码如下:
1: public Func<IServiceProvider, IHealthCheck> Factory 2: { 3: get =>
_factory; 4: set 5: { 6: if (value == null) 7: { 8: throw new
ArgumentNullException(nameof(value)); 9: } 10: 11: _factory = value; 12:
} 13: } 14: 15: public HealthStatus FailureStatus { get; set; } 16: 17:
public TimeSpan Timeout 18: { 19: get => _timeout; 20: set 21: { 22: if
(value <= TimeSpan.Zero && value != System.Threading.Timeout.InfiniteTimeSpan)
23: { 24: throw new ArgumentOutOfRangeException(nameof(value)); 25: } 26:
27: _timeout = value; 28: } 29: } 30: 31: public string Name 32: { 33:
get => _name; 34: set 35: { 36: if (value == null) 37: { 38: throw new
ArgumentNullException(nameof(value)); 39: } 40: 41: _name = value; 42: }
43:} 44: 45: public ISet<string> Tags { get; }
 

HealthCheckResult

HealthCheckResult是一个结构体,可以看出这里更多的是基于承担数据存储和性能问题的考量。

HealthCheckResult用于表示健康检查的相关结果信息,同样的,通过该类,我们知道了健康检查需要关注的几个点:

* 组件的当前状态
* 异常信息
* 友好的描述信息(不管是异常还是正常)
* 额外可描述当前组件的键值对,这是一个开放式的属性,方面我们记录更多信息
该类含有四个公共属性,和三个方法,相关源码如下:
1: public struct HealthCheckResult 2: { 3: private static readonly
IReadOnlyDictionary<string, object> _emptyReadOnlyDictionary = new Dictionary<
string, object>(); 4:   5: public HealthCheckResult(HealthStatus status,
string description = null, Exception exception = null, IReadOnlyDictionary<
string, object> data = null) 6: { 7: Status = status; 8: Description =
description; 9: Exception = exception; 10: Data = data ??
_emptyReadOnlyDictionary; 11: } 12:   13: public IReadOnlyDictionary<string,
object> Data { get; } 14:   15: public string Description { get; } 16:   17:
public Exception Exception { get; } 18:   19: public HealthStatus Status {
get; } 20:   21: public static HealthCheckResult Healthy(string description =
null, IReadOnlyDictionary<string, object> data = null) 22: { 23: return new
HealthCheckResult(status: HealthStatus.Healthy, description, exception:null,
data); 24: } 25:   26: public static HealthCheckResult Degraded(string
description =null, Exception exception = null, IReadOnlyDictionary<string,
object> data = null) 27: { 28: return new HealthCheckResult(status:
HealthStatus.Degraded, description, exception: exception, data); 29: } 30:
31: public static HealthCheckResult Unhealthy(string description = null,
Exception exception =null, IReadOnlyDictionary<string, object> data = null) 32:
{ 33: return new HealthCheckResult(status: HealthStatus.Unhealthy,
description, exception, data); 34: } 35: }

可以看出这个三个方法都是基于HealthStatus这个枚举而创建不同状态的HealthCheckResult实例,这个枚举表达了健康检查需要关注的几种状态,健康、异常以及降级。

HealthStatus的源码如下:
1: public enum HealthStatus 2: { 3: Unhealthy = 0, 4:   5: Degraded = 1,
6:  7: Healthy = 2, 8: }
IHealthCheckPublisher

健康检查功能本质上是一种轮询功能,需要定期执行,.NET Core
抽象定期执行的接口,即IHealthCheckPublisher,我们可以通过实现这个接口,并与我们自定义的定时功能相结合。

同时,作为一次健康检查,我们还需要关注相关的健康检查报告,那么我们需要关注那些点呢?

* 额外可描述当前组件的键值对,这是一个开放式的属性,方面我们记录更多信息
* 友好的描述信息(不管是异常还是正常)
* 组件的当前状态
* 异常信息
* 当前这次检查所耗费的时间
* 相关的标签信息
HealthReportEntry表示单个健康检查报告,HealthReport表示一组健康检查报告。HealthReport内部维护了一个
HealthReportEntry的字典数据,HealthReport源码如下所示:
1: public sealed class HealthReport 2: { 3: public
HealthReport(IReadOnlyDictionary<string, HealthReportEntry> entries, TimeSpan
totalDuration) 4: { 5: Entries = entries; 6: Status =
CalculateAggregateStatus(entries.Values); 7: TotalDuration = totalDuration;
8: } 9:   10: public IReadOnlyDictionary<string, HealthReportEntry> Entries {
get; } 11:   12: public HealthStatus Status { get; } 13:   14: public
TimeSpan TotalDuration { get; } 15:   16: private HealthStatus
CalculateAggregateStatus(IEnumerable<HealthReportEntry> entries) 17: { 18:
var currentValue = HealthStatus.Healthy; 19: foreach (var entry in entries)
20: { 21: if (currentValue > entry.Status) 22: { 23: currentValue =
entry.Status; 24: } 25:   26: if (currentValue == HealthStatus.Unhealthy)
27: { 28: // Game over, man! Game over! 29: // (We hit the worst possible
status, so there's no need to keep iterating) 30: return currentValue; 31: }
32: } 33:   34: return currentValue; 35: } 36: }
总结


通过以上内容,我们知道了,一个完整的健康检查需要关注健康检查上下文、健康状态的维护、健康检查结果、健康检查报告,同时,为了更好的维护健康检查,我们可以将健康检查发布抽象出来,并与外部的定时器相结合,共同守护健康检查程序。