Oyente IOCP sin rendimiento

11

¿Alguien sabe lo que indica "Oyente IOCP sin rendimiento"?

Uno de nuestros servidores SQL acaba de tener un volcado de comprobación de errores:

=====================================================================                                            
       BugCheck Dump                                                                                             
=====================================================================                                            

This file is generated by Microsoft SQL Server                                                                   
version 9.00.5292.00                                                                                             
upon detection of fatal unexpected error. Please return this file,                                               
the query or program that produced the bugcheck, the database and                                                
the error log, and any other pertinent information with a Service Request.                                       


Computer type is AT/AT COMPATIBLE.                                                                               
Bios Version is DELL   - 1                                                                                       
Phoenix ROM BIOS PLUS Version 1.10 1.5.2                                                                         
Current time is 23:01:04 09/07/12.                                                                               
48 Unknown CPU 9., 2 Mhz processor (s).                                                                          
Windows NT 6.1 Build 7601 CSD Service Pack 1.                                                                    

Memory                               
MemoryLoad = 81%                     
Total Physical = 524278 MB           
Available Physical = 97549 MB        
Total Page File = 524276 MB          
Available Page File = 94472 MB       
Total Virtual = 8388607 MB           
Available Virtual = 7846765 MB       
**Dump thread - spid = 0, PSS = 0x0000000000000000, EC = 0x0000000000000000                                      
***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\SQLDump0008.txt              
* *******************************************************************************                                
*                                                                                                                
* BEGIN STACK DUMP:                                                                                              
*   09/07/12 23:01:04 spid 0                                                                                     
*                                                                                                                
* Non-yielding IOCP Listener                                                                                     
*                                                                                                                
* *******************************************************************************             </pre>                   


SQLDump0008.log contains:

<pre>
 No user action is required.
2012-09-07 18:30:11.28 spid782     Recovery of any in-doubt distributed transactions involving Microsoft Distributed Transaction Coordinator (MS DTC) has completed. This is an informational message only. No user action is required.
2012-09-07 20:58:54.53 spid196     The alert for 'average delay' has been raised. The current value of '509' surpasses the threshold '100'.
2012-09-07 20:59:24.74 spid477     The alert for 'average delay' has been raised. The current value of '299' surpasses the threshold '100'.
2012-09-07 21:44:06.53 spid23s     Database mirroring is inactive for database 'ToDoLists'. This is an informational message only. No user action is required.
2012-09-07 21:44:06.59 spid456     The alert for 'average delay' has been raised. The current value of '518' surpasses the threshold '100'.
2012-09-07 21:44:57.98 spid425     Error: 18056, Severity: 20, State: 27.
2012-09-07 21:44:57.98 spid425     The client was unable to reuse a session with SPID 425, which had been reset for connection pooling. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.
2012-09-07 21:44:57.98 spid808     Error: 18056, Severity: 20, State: 27.
2012-09-07 21:44:57.98 spid808     The client was unable to reuse a session with SPID 808, which had been reset for connection pooling. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.
2012-09-07 21:44:58.01 spid155     Error: 18056, Severity: 20, State: 27.
2012-09-07 21:44:58.01 spid155     The client was unable to reuse a session with SPID 155, which had been reset for connection pooling. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.
2012-09-07 21:44:58.03 spid486     Task (Worker 0x00000001B93B21C0) was forced to yield 2 times: 
2012-09-07 21:44:58.04 spid65s     Database mirroring is inactive for database 'Tracking'. This is an informational message only. No user action is required.
2012-09-07 21:44:58.06 spid486     Task (Worker 0x0000000CB9B341C0) was forced to yield 8 times: 
2012-09-07 21:44:58.09 spid486     Task (Worker 0x0000000655A9E1C0) was forced to yield 3 times: 
2012-09-07 21:44:58.10 spid486     Task (Worker 0x00000006C03BE1C0) was forced to yield 8 times: 
2012-09-07 21:44:58.11 spid65s     Error: 1404, Severity: 16, State: 6.
2012-09-07 21:44:58.11 spid65s     The command failed because the database mirror is busy. Reissue the command later.
2012-09-07 21:44:58.11 spid486     Task (Worker 0x0000000C819D01C0) was forced to yield 2 times: 
2012-09-07 21:44:58.49 spid140     The alert for 'average delay' has been raised. The current value of '191' surpasses the threshold '100'.
2012-09-07 21:45:00.66 spid46s     SQL Server has encountered 6 occurrence(s) of cachestore flush for the 'Object Plans' cachestore (part of plan cache) due to some database maintenance or reconfigure operations.
2012-09-07 21:45:17.25 spid83s     SQL Server has encountered 6 occurrence(s) of cachestore flush for the 'SQL Plans' cachestore (part of plan cache) due to some database maintenance or reconfigure operations.
2012-09-07 21:45:17.25 spid54s     SQL Server has encountered 6 occurrence(s) of cachestore flush for the 'Bound Trees' cachestore (part of plan cache) due to some database maintenance or reconfigure operations.
2012-09-07 21:45:17.28 spid45s     The mirrored database "Tracking" is changing roles from "PRINCIPAL" to "MIRROR" due to Role Syncronization.
2012-09-07 21:45:17.61 spid46s     Bypassing recovery for database 'Tracking' because it is marked as a mirror database, which cannot be recovered. This is an informational message only. No user action is required.
2012-09-07 21:45:29.21 spid45s     Database mirroring is active with database 'Tracking' as the mirror copy. This is an informational message only. No user action is required.
2012-09-07 21:50:56.94 spid196s    SQL Server has encountered 5 occurrence(s) of cachestore flush for the 'Object Plans' cachestore (part of plan cache) due to some database maintenance or reconfigure operations.
2012-09-07 21:50:57.14 spid196s    SQL Server has encountered 5 occurrence(s) of cachestore flush for the 'SQL Plans' cachestore (part of plan cache) due to some database maintenance or reconfigure operations.
2012-09-07 21:50:57.14 spid196s    SQL Server has encountered 5 occurrence(s) of cachestore flush for the 'Bound Trees' cachestore (part of plan cache) due to some database maintenance or reconfigure operations.
2012-09-07 23:00:09.42 spid438     Error: 18056, Severity: 20, State: 27.
2012-09-07 23:00:09.42 spid438     The client was unable to reuse a session with SPID 438, which had been reset for connection pooling. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.
2012-09-07 23:01:04.26 Server      Using 'dbghelp.dll' version '4.0.5'
2012-09-07 23:01:04.29 Server      **Dump thread - spid = 0, PSS = 0x0000000000000000, EC = 0x0000000000000000
2012-09-07 23:01:04.29 Server      ***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\SQLDump0008.txt
2012-09-07 23:01:04.29 Server      * *******************************************************************************
2012-09-07 23:01:04.29 Server      *
2012-09-07 23:01:04.29 Server      * BEGIN STACK DUMP:
2012-09-07 23:01:04.29 Server      *   09/07/12 23:01:04 spid 0
2012-09-07 23:01:04.29 Server      *
2012-09-07 23:01:04.29 Server      * Non-yielding IOCP Listener
2012-09-07 23:01:04.29 Server      *
2012-09-07 23:01:04.29 Server      * *******************************************************************************
2012-09-07 23:01:04.29 Server      * -------------------------------------------------------------------------------
2012-09-07 23:01:04.29 Server      * Short Stack Dump
2012-09-07 23:01:04.33 spid73      The alert for 'average delay' has been raised. The current value of '304' surpasses the threshold '100'.
2012-09-07 23:01:04.34 Server      Stack Signature for the dump is 0x00000000000002E8

La alerta para el mensaje de "retraso promedio" se relaciona con la duplicación de la base de datos y es una alerta que se genera cuando el tiempo necesario para confirmar las transacciones supera el tiempo especificado. La is_event_loggedcolumna es 0 para la alerta de "retraso promedio".

Puede ver el resultado sys.configurationsen ¿Qué puede hacer que una sesión de duplicación agote el tiempo de espera y luego la conmutación por error? .

Max Vernon
fuente

Respuestas:

9

IOCP es un puerto de finalización de E / S. Un escucha IOCP sin rendimiento significa que el subproceso que maneja las rutinas de finalización de IO tardó (relativamente) mucho tiempo en hacer algo, y SQLOS pensó que podría estar bloqueado / bloqueado / lo que sea.

El servidor SQL hace mucho ASYNC IO. La forma en que funciona es cuando envía la solicitud de IO al sistema operativo, dice "Haga esto IO de forma asincrónica. Aquí hay un puntero de función para llamar cuando se hace".

La función que se llama es el oyente de finalización de E / S.

Considere una página leída del disco. Un hilo que ejecuta una selección necesita leer una página que no está en la memoria. Se necesita un PAGEIOLATCH, emite el IO asíncrono a las ventanas para leer la página y se va a dormir.

Cuando el sistema operativo finaliza el IO, llama a la función IOCP que marca el IO como "hecho". Poco después, un subproceso sql termina su cuántica de 4 ms y comprueba si las IO se manejan. Lo marca como hecho y le indica al hilo emisor que se active. El hilo SELECT se programa, libera el PAGEIOLATCH y la vida es buena.

Ahora, la cantidad de trabajo que realiza el IOCP varía según el tipo de IO involucrado. Creo que con DB Mirroring, funciona más de lo que lo haría si solo estuviera leyendo una página en el grupo de búferes.

Si es un programador que trabaja en un servidor sql y desea optimizar el código de duplicación de DB, es posible que tenga la tentación de poner más trabajo en la ruta de código de duplicación de IOCP en comparación con la ruta de código de hebra del sistema SQLOS.

O tal vez el IOCP necesita copiar los datos en un búfer de duplicación que es de tamaño fijo, y se queda en un bucle hasta que esté listo.

O tal vez <> sucede, y la función IOCP parece "atascada".

No me preocuparía por esto si sucediera durante una conmutación por error y sucediera mucha actividad de LOG. Si sucede de manera consistente, puede requerir más investigación.

StrayCatDBA
fuente