引领性指标和滞后性指标_引领性指标的两个主要特征[通俗易懂]

引领性指标与滞后性指标

With all of the high-availability (HA) and disaster recovery (DR) features, the database administrator must understand how much data loss and downtime is possible under the worst case scenarios. Data loss affects your ability to meet recovery point objectives (RPO) and downtime affects your recovery time objectives (RTO). When using Availability Groups (AGs), your RTO and RPO rely upon the replication of transaction log records between at least two replicas to be extremely fast. The worse the performance, the more potential data loss will occur and the longer it can take for a failed over database to come back online.

使用所有的高可用性（HA）和灾难恢复（DR）功能，数据库管理员必须了解在最坏的情况下可能造成多少数据丢失和停机。数据丢失会影响您达到恢复点目标（RPO）的能力，而停机时间会影响您的恢复时间目标（RTO）。当使用可用性组（AG）时，您的RTO和RPO至少要在两个副本之间依靠事务日志记录的复制才能非常快。性能越差，发生潜在数据丢失的可能性就越大，并且故障转移数据库恢复在线所需的时间也越长。

Availability Groups must retain all transaction log records until they have been distributed to all secondary replicas. Slow synchronization to even a single replica will prevent log truncation. If the log records cannot be truncated your log will likely begin to grow. This becomes a maintenance concern because you either need to continue to expand your disk or you might run out of capacity entirely.

可用性组必须保留所有事务日志记录，直到它们已分发到所有辅助副本为止。对单个副本的慢速同步将阻止日志截断。如果日志记录不能被截断，则您的日志可能会开始增长。这成为维护方面的问题，因为您要么需要继续扩展磁盘，要么可能会完全耗尽容量。

可用性模式 (Availability modes)

There are two availability modes, synchronous commit and asynchronous commit. Selecting a mode is equivalent to selecting whether you want to favor data protection or transaction performance. Both availability modes follow the same work flow, with one small yet critical difference.

有两种可用性模式，同步提交和异步提交。选择一种模式等同于选择是要支持数据保护还是要提高事务性能。两种可用性模式遵循相同的工作流程，但有一个很小但很重要的差异。

With synchronous commit mode, the application does not receive confirmation that the transaction committed until after the log records are hardened (step 5) on all synchronous secondary replicas. This is how AGs can guarantee zero data loss. Any transactions which were not hardened before the primary failed would be rolled back and an appropriate error would be bubbled up to the application for it to alert the user or perform its own error handling.

使用同步提交模式时，直到在所有同步辅助副本上加固日志记录（步骤5）之后，应用程序都不会收到确认已提交事务的确认。 AG可以通过这种方式保证零数据丢失。在主数据库发生故障之前未进行任何加固的任何事务都将被回滚，并且适当的错误将冒泡到应用程序中，以提醒用户或执行其自身的错误处理。

With asynchronous commit mode, the application receives confirmation that the transaction committed after the last log record is flushed (step 1) to the primary replica’s log file. This improves performance because the application does not have to wait for the log records to be transmitted but it opens up the AG to the potential of data loss. If the primary replica fails before the secondary replicas harden the log records, then the application will believe a transaction was committed but a failover would result in the loss of that data.

在异步提交模式下，应用程序收到确认，确认在最后一条日志记录之后提交的事务已刷新（步骤1）到主副本的日志文件中。这可以提高性能，因为应用程序不必等待日志记录的传输，但是可以使AG开放数据丢失的可能性。如果主副本在辅助副本强化日志记录之前发生故障，则应用程序将认为事务已提交，但故障转移将导致该数据丢失。

测量潜在的数据丢失 (Measuring potential data loss)

Thomas Grohser once told me, “do not confuse luck with high-availability.” A server may stay online without ever failing or turning off for many years but if that server has no redundancy features then it is not highly-available. That same server staying up for the entire year does not mean that you can meet five nines as a service level agreement (SLA).

托马斯·格罗瑟（Thomas Grohser）曾经对我说：“不要将运气与高可用性混为一谈。” 一台服务器可以保持在线状态而不会出现任何故障或关闭多年，但如果该服务器没有冗余功能，则它的可用性不高。同一台服务器在整个一年中保持正常运行并不意味着您可以达到五个九的服务水平协议（SLA）。

Policy based management is one method of verifying that you can achieve your RTOs and RPOs. I will be covering the dynamic management view (DMV) method because I find it is more versatile and very useful when creating custom alerts in various monitoring tools. If you would like to read more on the policy based management method, review this BOL post.

基于策略的管理是一种验证您可以实现RTO和RPO的方法。我将介绍动态管理视图（DMV）方法，因为我发现它在各种监视工具中创建自定义警报时更加通用并且非常有用。如果您想了解更多有关基于策略的管理方法的信息，请查看此BOL帖子。

计算方式 (Calculations)

There are two methods of calculating data loss. Each method has its own quirks which are important to understand and put into context.

有两种计算数据丢失的方法。每种方法都有其自己的怪癖，这些怪癖对于理解和理解上下文很重要。

日志发送队列 (Log send queue)

T_{data_loss} = log_send_queue / log_generation_rate

T _{data_loss} = log_send_queue / log_generation_rate

Your first thought might be to look at the send rate rather than the generation rate but it is important to remember that we are not looking for how long it will take to synchronize, we are looking for what window of time will we lose data in. Also, it is measuring data loss by time rather than quantity.

您的第一个想法可能是查看发送速率，而不是生成速率，但重要的是要记住，我们并不是在寻找同步需要多长时间，而是在寻找丢失数据的时间范围。而且，它按时间而不是数量来衡量数据丢失。

This calculation can be a bit misleading if your write load is inconsistent. I once administered a system which used filestream. The database would have a very low write load until a 4 MB file was dropped in it. The instant after the transaction was committed the log send queue would be very large while the log generation rate was still showing very low. This made my alerts trigger even though the 4 MB of data was synchronized extremely fast and the next poll would show that we were within our RPO SLAs.

如果您的写入负载不一致，此计算可能会产生误导。我曾经管理过一个使用文件流的系统。在将4 MB文件放入数据库之前，该数据库的写负载将非常低。提交事务后的瞬间，日志发送队列将非常大，而日志生成率仍然显示非常低。即使使4 MB的数据非常快速地同步，这也使我的警报得以触发，并且下一次民意调查将显示我们位于RPO SLA中。

If you chose this calculation you will need to trigger alerts after your RPO SLAs have been violated for a period of time, such as after 5 polls at 1 minute intervals. This will help cut down on false positives.

如果选择此计算，则在违反RPO SLA一段时间后（例如，以1分钟为间隔的5次轮询之后），您将需要触发警报。 这将有助于减少误报。

上次提交时间 (Last commit time)

T_{data_loss} = last_commit_time_primary – last_commit_time_secondary

_{Ťdata_loss} = last_commit_time _初级 – _次级 last_commit_time

The last commit time method is easier to understand. The last commit time on your secondary replica will always be equal to or less than the primary replica. Finding the difference between these values will tell you how far behind your replica lags.

最后的提交时间方法更容易理解。辅助副本上的最后提交时间将始终等于或小于主副本。找到这些值之间的差异将告诉您副本滞后多远。

Similar to the log send queue method, the last commit time can be misleading on systems with an inconsistent work load. If a transaction occurs at 02:00am and then the write load on the database goes idle for one hour, this calculation will be misleading until the next transaction is synchronized. The metric would declare a one-hour lag even though there was no data to be lost during that hour.

与日志发送队列方法类似，在工作负载不一致的系统上，最后的提交时间可能会引起误解。如果某个事务在凌晨02:00发生，然后数据库的写负载处于空闲状态一个小时，则该计算将产生误导作用，直到下一个事务同步为止。即使该小时内没有数据丢失，该指标也会声明一个小时的延迟。

While misleading, the hour lag is technically accurate. RPO measures the time period where data may be lost. It does not measure the quantity of data which would be lost during that time frame. The fact that there was zero data to be lost does not alter the fact that you would lose the last hours’ worth of data. It being accurate still skews the picture, though, because if there was data flowing you would not have had a one hour lag indicated.

尽管存在误导，但时差在技术上还是准确的。 RPO测量数据可能丢失的时间段。它不会测量在该时间段内丢失的数据量。数据丢失为零的事实并不会改变您将丢失最后几个小时的数据的事实。不过，它的准确度仍然使图像歪斜，因为如果有数据在流，您将不会有一个小时的延迟。

RPO指标查询 (RPO metric queries)

日志发送队列方法 (Log send queue method)

 
;WITH UpTime AS
			(
			SELECT DATEDIFF(SECOND,create_date,GETDATE()) [upTime_secs]
			FROM sys.databases
			WHERE name = 'tempdb'
			),
	AG_Stats AS 
			(
			SELECT AR.replica_server_name,
				   HARS.role_desc, 
				   Db_name(DRS.database_id) [DBName], 
				   CAST(DRS.log_send_queue_size AS DECIMAL(19,2)) log_send_queue_size_KB, 
				   (CAST(perf.cntr_value AS DECIMAL(19,2)) / CAST(UpTime.upTime_secs AS DECIMAL(19,2))) / CAST(1024 AS DECIMAL(19,2)) [log_KB_flushed_per_sec]
			FROM   sys.dm_hadr_database_replica_states DRS 
			INNER JOIN sys.availability_replicas AR ON DRS.replica_id = AR.replica_id 
			INNER JOIN sys.dm_hadr_availability_replica_states HARS ON AR.group_id = HARS.group_id 
				AND AR.replica_id = HARS.replica_id 
			--I am calculating this as an average over the entire time that the instance has been online.
			--To capture a smaller, more recent window, you will need to:
			--1. Store the counter value.
			--2. Wait N seconds.
			--3. Recheck counter value.
			--4. Divide the difference between the two checks by N.
			INNER JOIN sys.dm_os_performance_counters perf ON perf.instance_name = Db_name(DRS.database_id)
				AND perf.counter_name like 'Log Bytes Flushed/sec%'
			CROSS APPLY UpTime
			),
	Pri_CommitTime AS 
			(
			SELECT	replica_server_name
					, DBName
					, [log_KB_flushed_per_sec]
			FROM	AG_Stats
			WHERE	role_desc = 'PRIMARY'
			),
	Sec_CommitTime AS 
			(
			SELECT	replica_server_name
					, DBName
					--Send queue will be NULL if secondary is not online and synchronizing
					, log_send_queue_size_KB
			FROM	AG_Stats
			WHERE	role_desc = 'SECONDARY'
			)
SELECT p.replica_server_name [primary_replica]
	, p.[DBName] AS [DatabaseName]
	, s.replica_server_name [secondary_replica]
	, CAST(s.log_send_queue_size_KB / p.[log_KB_flushed_per_sec] AS BIGINT) [Sync_Lag_Secs]
FROM Pri_CommitTime p
LEFT JOIN Sec_CommitTime s ON [s].[DBName] = [p].[DBName]

最后提交时间方法 (Last commit time method)

NOTE: This query is a bit simpler and does not have to calculate cumulative performance monitor counters.

注意：此查询稍微简单一些，不必计算累积性能监视器计数器。

 
 
;WITH 
	AG_Stats AS 
			(
			SELECT AR.replica_server_name,
				   HARS.role_desc, 
				   Db_name(DRS.database_id) [DBName], 
				   DRS.last_commit_time
			FROM   sys.dm_hadr_database_replica_states DRS 
			INNER JOIN sys.availability_replicas AR ON DRS.replica_id = AR.replica_id 
			INNER JOIN sys.dm_hadr_availability_replica_states HARS ON AR.group_id = HARS.group_id 
				AND AR.replica_id = HARS.replica_id 
			),
	Pri_CommitTime AS 
			(
			SELECT	replica_server_name
					, DBName
					, last_commit_time
			FROM	AG_Stats
			WHERE	role_desc = 'PRIMARY'
			),
	Sec_CommitTime AS 
			(
			SELECT	replica_server_name
					, DBName
					, last_commit_time
			FROM	AG_Stats
			WHERE	role_desc = 'SECONDARY'
			)
SELECT p.replica_server_name [primary_replica]
	, p.[DBName] AS [DatabaseName]
	, s.replica_server_name [secondary_replica]
	, DATEDIFF(ss,s.last_commit_time,p.last_commit_time) AS [Sync_Lag_Secs]
FROM Pri_CommitTime p
LEFT JOIN Sec_CommitTime s ON [s].[DBName] = [p].[DBName]

恢复时间目标 (Recovery time objective)

Your recovery time objective involves more than just the performance of the AG synchronization.

恢复时间目标不仅仅涉及AG同步的性能。

计算方式 (Calculation)

T_failover= T_detection + T_overhead + T_redo

T _故障转移 = T _检测 + T _开销 + T _重做

侦测 (Detection)

From the instant that an internal error or timeout occurs to the moment that the AG begins to failover is the detection window. The cluster will check the health of the AG by calling the sp_server_diagnostics stored procedure. If there is an internal error, the cluster will initiate a failover after receiving the results. This stored procedure is called at an interval that is 1/3^rd the total health-check timeout threshold. By default, it polls every 10 seconds with a timeout of 30 seconds.

从发生内部错误或超时到AG开始故障转移的那一刻，就是检测窗口。群集将通过调用sp_server_diagnostics存储过程来检查AG的运行状况。如果出现内部错误，则群集将在收到结果后启动故障转移。此存储的过程被称为在该1/3 ^次的总健康检查超时阈值的间隔。默认情况下，它每10秒轮询一次，超时时间为30秒。

If no error is detected, then a failover may occur if the health-check timeout is reached or the lease between the resource DLL and SQL Server instance has expired (20 seconds by default). For more details on these conditions review this book online post.

如果未检测到错误，则如果达到运行状况检查超时或资源DLL与SQL Server实例之间的租约到期（默认为20秒），则可能会发生故障转移。有关这些条件的更多详细信息，请查阅本书的在线文章。

高架 (Overhead)

Overhead is the time it takes for the cluster to failover plus bring the databases online. The failover time is typically constant and can be tested easily. Bringing the databases online is dependent upon crash recovery. This is typically very fast but a failover in the middle of a very large transaction can cause delays as crash recovery works to roll back. I recommend testing failovers in a non-production environment during operations such as large index rebuilds.

开销是集群进行故障转移以及使数据库联机所需的时间。故障转移时间通常是恒定的，可以轻松进行测试。使数据库联机取决于崩溃恢复。这通常非常快，但是由于崩溃恢复工作会回滚，因此在非常大的事务中间进行故障转移会导致延迟。我建议在诸如大型索引重建之类的操作期间在非生产环境中测试故障转移。

重做 (Redo)

When data pages are hardened on the secondary replica SQL Server must redo the transactions to roll everything forward. This is an area that we need to monitor, particularly if the secondary replica is underpowered when compared to the primary replica. Dividing the redo_queue by the redo_rate will indicate your lag.

在辅助副本上加固数据页后，SQL Server必须重做事务以使所有内容前滚。这是我们需要监视的领域，尤其是与主副本相比，如果辅助副本的电源不足。将redo_queue除以redo_rate将表示您的滞后。

T_redo = redo_queue / redo_rate

T _重做 = _重做 队列/ _重做 速率

RTO指标查询 (RTO metric query)

 
;WITH 
	AG_Stats AS 
			(
			SELECT AR.replica_server_name,
				   HARS.role_desc, 
				   Db_name(DRS.database_id) [DBName], 
				   DRS.redo_queue_size redo_queue_size_KB,
				   DRS.redo_rate redo_rate_KB_Sec
			FROM   sys.dm_hadr_database_replica_states DRS 
			INNER JOIN sys.availability_replicas AR ON DRS.replica_id = AR.replica_id 
			INNER JOIN sys.dm_hadr_availability_replica_states HARS ON AR.group_id = HARS.group_id 
				AND AR.replica_id = HARS.replica_id 
			),
	Pri_CommitTime AS 
			(
			SELECT	replica_server_name
					, DBName
					, redo_queue_size_KB
					, redo_rate_KB_Sec
			FROM	AG_Stats
			WHERE	role_desc = 'PRIMARY'
			),
	Sec_CommitTime AS 
			(
			SELECT	replica_server_name
					, DBName
					--Send queue and rate will be NULL if secondary is not online and synchronizing
					, redo_queue_size_KB
					, redo_rate_KB_Sec
			FROM	AG_Stats
			WHERE	role_desc = 'SECONDARY'
			)
SELECT p.replica_server_name [primary_replica]
	, p.[DBName] AS [DatabaseName]
	, s.replica_server_name [secondary_replica]
	, CAST(s.redo_queue_size_KB / s.redo_rate_KB_Sec AS BIGINT) [Redo_Lag_Secs]
FROM Pri_CommitTime p
LEFT JOIN Sec_CommitTime s ON [s].[DBName] = [p].[DBName]

同步性能 (Synchronous performance)

Everything discussed thus far has revolved around recovery in asynchronous commit mode. The final aspect of synchronization lag that will be covered is the performance impact of using synchronous commit mode. As mentioned above, synchronous commit mode guarantees zero data loss but you pay a performance price for that.

到目前为止，所有讨论都围绕异步提交模式下的恢复进行。同步滞后的最后一个方面是使用同步提交模式对性能的影响。如上所述，同步提交模式保证了零数据丢失，但是您为此付出了性能代价。

The impact to your transactions due to synchronization can be measured with performance monitor counters or wait types.

可以通过性能监视器计数器或等待类型来衡量由于同步而对事务产生的影响。

计算方式 (Calculations)

性能监视器计数器 (Performance monitor counters)

T_cost = T_{transaction delay} /T_{mirrored_write_transactions}

T _成本 = T _交易延迟 / T _{mirrored_write_transactions}

Simple division of the sec and transaction delay counters / mirrored write transactions will provide you with your cost of enabling synchronous commit in units of time. I prefer this method over the wait types method that I will demonstrate next because it can be measured at the database level and calculate implicit transactions.I prefer this method over the wait types method that I will demonstrate next because it can be measured at the database level and calculate implicit transactions. What I mean by that is, if I run a single INSERT statement with one million rows, it will calculate the delay induced on each of the rows. The wait types method would see the single insert as one action and provide you with the delay caused to all million rows. This difference is moot for the majority of OLTP systems because they typically have larger quantities of smaller transactions.

sec和事务延迟计数器/镜像写入事务的简单划分将使您付出以时间为单位启用同步提交的费用。我比较喜欢此方法，因为它可以在数据库级别进行度量并计算隐式事务，所以我比下面将要演示的等待类型方法要好。并计算隐式交易。我的意思是，如果我运行一个带有一百万行的INSERT语句，它将计算在每一行上引起的延迟。等待类型方法会将单个插入视为一项操作，并为您造成了所有百万行的延迟。对于大多数OLTP系统而言，这种差异是没有意义的，因为它们通常具有大量的较小事务。

等待类型– HADR_SYNC_COMMIT (Wait type – HADR_SYNC_COMMIT)

T_cost = T_{wait_time} / T_{waiting_tasks_count}

Ť _成本 = T _{WAIT_TIME} / T _{waiting_tasks_count}

The wait type counter is cumulative which means that you will need to extract snapshots in time and find their differences or perform the calculation based on all activity since the SQL Server instance was last restarted.

等待类型计数器是累积性的，这意味着自SQL Server实例上次重新启动以来，您将需要及时提取快照并查找它们之间的差异或基于所有活动执行计算。

同步指标查询 (Synchronization metric queries)

性能监视器计数器方法 (Performance monitor counters method)

NOTE: This script is much longer than the previous ones. That was because I chose to demonstrate how you would sample the performance counters and calculate off of a recent period of time. This metric could be accomplished with the up-time calculation demonstrated above as well.

注意：此脚本比以前的脚本长得多。 那是因为我选择演示如何对性能计数器进行采样并根据最近一段时间进行计算。 该指标也可以通过上面演示的正常运行时间计算来完成。

 
--Check metrics first
 
IF OBJECT_ID('tempdb..#perf') IS NOT NULL
	DROP TABLE #perf
 
SELECT IDENTITY (int, 1,1) id
	,instance_name
	,CAST(cntr_value * 1000 AS DECIMAL(19,2)) [mirrorWriteTrnsMS]
	,CAST(NULL AS DECIMAL(19,2)) [trnDelayMS]
INTO #perf
FROM sys.dm_os_performance_counters perf
WHERE perf.counter_name LIKE 'Mirrored Write Transactions/sec%'
	AND object_name LIKE 'SQLServer:Database Replica%'
	
UPDATE p
SET p.[trnDelayMS] = perf.cntr_value
FROM #perf p
INNER JOIN sys.dm_os_performance_counters perf ON p.instance_name = perf.instance_name
WHERE perf.counter_name LIKE 'Transaction Delay%'
	AND object_name LIKE 'SQLServer:Database Replica%'
	AND trnDelayMS IS NULL
 
-- Wait for recheck
-- I found that these performance counters do not update frequently,
-- thus the long delay between checks.
WAITFOR DELAY '00:05:00'
GO
--Check metrics again
 
INSERT INTO #perf
(
	instance_name
	,mirrorWriteTrnsMS
	,trnDelayMS
)
SELECT instance_name
	,CAST(cntr_value * 1000 AS DECIMAL(19,2)) [mirrorWriteTrnsMS]
	,NULL
FROM sys.dm_os_performance_counters perf
WHERE perf.counter_name LIKE 'Mirrored Write Transactions/sec%'
	AND object_name LIKE 'SQLServer:Database Replica%'
	
UPDATE p
SET p.[trnDelayMS] = perf.cntr_value
FROM #perf p
INNER JOIN sys.dm_os_performance_counters perf ON p.instance_name = perf.instance_name
WHERE perf.counter_name LIKE 'Transaction Delay%'
	AND object_name LIKE 'SQLServer:Database Replica%'
	AND trnDelayMS IS NULL
	
--Aggregate and present
 
;WITH AG_Stats AS 
			(
			SELECT AR.replica_server_name,
				   HARS.role_desc, 
				   Db_name(DRS.database_id) [DBName]
			FROM   sys.dm_hadr_database_replica_states DRS 
			INNER JOIN sys.availability_replicas AR ON DRS.replica_id = AR.replica_id 
			INNER JOIN sys.dm_hadr_availability_replica_states HARS ON AR.group_id = HARS.group_id 
				AND AR.replica_id = HARS.replica_id 
			),
	Check1 AS
			(
			SELECT DISTINCT p1.instance_name
				,p1.mirrorWriteTrnsMS
				,p1.trnDelayMS
			FROM #perf p1
			INNER JOIN 
				(
					SELECT instance_name, MIN(id) minId
					FROM #perf p2
					GROUP BY instance_name
				) p2 ON p1.instance_name = p2.instance_name
			),
	Check2 AS
			(
			SELECT DISTINCT p1.instance_name
				,p1.mirrorWriteTrnsMS
				,p1.trnDelayMS
			FROM #perf p1
			INNER JOIN 
				(
					SELECT instance_name, MAX(id) minId
					FROM #perf p2
					GROUP BY instance_name
				) p2 ON p1.instance_name = p2.instance_name
			),
	AggregatedChecks AS
			(
				SELECT DISTINCT c1.instance_name
					, c2.mirrorWriteTrnsMS - c1.mirrorWriteTrnsMS mirrorWriteTrnsMS
					, c2.trnDelayMS - c1.trnDelayMS trnDelayMS
				FROM Check1 c1
				INNER JOIN Check2 c2 ON c1.instance_name = c2.instance_name
			),
	Pri_CommitTime AS 
			(
			SELECT	replica_server_name
					, DBName
			FROM	AG_Stats
			WHERE	role_desc = 'PRIMARY'
			),
	Sec_CommitTime AS 
			(
			SELECT	replica_server_name
					, DBName
			FROM	AG_Stats
			WHERE	role_desc = 'SECONDARY'
			)
SELECT p.replica_server_name [primary_replica]
	, p.[DBName] AS [DatabaseName]
	, s.replica_server_name [secondary_replica]
	, CAST(CASE WHEN ac.trnDelayMS = 0 THEN 1 ELSE ac.trnDelayMS END AS DECIMAL(19,2) / ac.mirrorWriteTrnsMS) sync_lag_MS
FROM Pri_CommitTime p
LEFT JOIN Sec_CommitTime s ON [s].[DBName] = [p].[DBName]
LEFT JOIN AggregatedChecks ac ON ac.instance_name = p.DBName

等待类型方法 (Wait types method)

NOTE: For brevity I did not use the above two-check method to find the recent wait types but the method can be implemeneted, if you chose to use this method.

注意：为简便起见，我没有使用上述的两次检查方法来查找最近的等待类型，但是如果您选择使用此方法，则可以实现该方法。

 
  ;WITH AG_Stats AS 
			(
			SELECT AR.replica_server_name,
				   HARS.role_desc, 
				   Db_name(DRS.database_id) [DBName]
			FROM   sys.dm_hadr_database_replica_states DRS 
			INNER JOIN sys.availability_replicas AR ON DRS.replica_id = AR.replica_id 
			INNER JOIN sys.dm_hadr_availability_replica_states HARS ON AR.group_id = HARS.group_id 
				AND AR.replica_id = HARS.replica_id 
			),
	Waits AS
			(
			select wait_type
				, waiting_tasks_count
				, wait_time_ms
				, wait_time_ms/waiting_tasks_count sync_lag_MS
			from sys.dm_os_wait_stats where waiting_tasks_count >0
			and wait_type = 'HADR_SYNC_COMMIT'
			),
	Pri_CommitTime AS 
			(
			SELECT	replica_server_name
					, DBName
			FROM	AG_Stats
			WHERE	role_desc = 'PRIMARY'
			),
	Sec_CommitTime AS 
			(
			SELECT	replica_server_name
					, DBName
			FROM	AG_Stats
			WHERE	role_desc = 'SECONDARY'
			)
SELECT p.replica_server_name [primary_replica]
	, w.sync_lag_MS
FROM Pri_CommitTime p
CROSS APPLY Waits w

带走 (Take-away)

At this point, you should be ready to select a measurement method for your asynchronous or synchronous commit AGs and implement baselining and monitoring. I prefer the log send queue method for checking on potential data loss and the performance monitor counter method of measuring the performance impact of your synchronous commit replicas.

此时，您应该已经准备好为异步或同步提交AG选择一种测量方法，并实现基准和监视。我更喜欢使用日志发送队列方法来检查潜在的数据丢失，以及使用性能监视器计数器方法来测量同步提交副本的性能影响。

翻译自: https://www.sqlshack.com/measuring-availability-group-synchronization-lag/

引领性指标与滞后性指标

今天的文章引领性指标和滞后性指标_引领性指标的两个主要特征[通俗易懂]分享到此就结束了，感谢您的阅读。

版权声明：本文内容由互联网用户自发贡献，该文观点仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至举报，一经查实，本站将立刻删除。
如需转载请保留出处：https://bianchenghao.cn/69087.html

引领性指标和滞后性指标_引领性指标的两个主要特征[通俗易懂]

可用性模式 (Availability modes)

测量潜在的数据丢失 (Measuring potential data loss)

计算方式 (Calculations)

日志发送队列 (Log send queue)

上次提交时间 (Last commit time)

RPO指标查询 (RPO metric queries)

日志发送队列方法 (Log send queue method)

最后提交时间方法 (Last commit time method)

恢复时间目标 (Recovery time objective)

计算方式 (Calculation)

侦测 (Detection)

高架 (Overhead)

重做 (Redo)

RTO指标查询 (RTO metric query)

同步性能 (Synchronous performance)

计算方式 (Calculations)

性能监视器计数器 (Performance monitor counters)

等待类型– HADR_SYNC_COMMIT (Wait type – HADR_SYNC_COMMIT)

同步指标查询 (Synchronization metric queries)

性能监视器计数器方法 (Performance monitor counters method)

等待类型方法 (Wait types method)

带走 (Take-away)

相关推荐

发表回复