SQL Server: múltiples totales acumulados

Tengo una tabla base con transacciones y necesito crear una tabla con totales acumulados. Necesito que sean por cuenta y que también tengan algunos totales acumulados para cada cuenta (según el tipo de transacción), y dentro de eso, algunos totales acumulados por subcuenta.

Mi tabla base tiene estos campos (más o menos):

AccountID  |  SubAccountID   |  TransactionType  |  TransactionAmount

Teniendo en cuenta que tengo alrededor de 4 tipos de totales acumulados por Cuenta / TransactionType y 2 totales acumulados más por Cuenta / SubCuenta / TransactionType, y tengo alrededor de 2 millones de cuentas con aproximadamente 10 subcuentas cada una, y obtengo aproximadamente 10K transacciones cada minuto (a carga máxima), ¿cómo lo harías?

También es imprescindible que esto se ejecute de forma asincrónica a través de un trabajo de SQL, creando las agregaciones sin ser parte de las transacciones en sí.

Estoy bastante atrapado usando un cursor aquí, lo que lleva demasiado tiempo. Realmente agradecería cualquier consejo / artículo que haga más o menos lo mismo.

sql-server performance query scalability AvnerSo
fuente

El enfoque estándar de contabilidad es mantener los totales acumulados ya en una tabla. Almacene con cada transacción no solo el valor anterior sino también el nuevo valor de la cuenta. No está atascado con un cursor aquí, ya que esto se puede hacer en una instrucción SELECT sql.

TomTom

¿Está utilizando SQL Server 2000 o existen otras restricciones que le impiden utilizar las funciones de la ventana (ROW_NUMBER, RANK, etc.)?

Bryan

Nuestro sistema de contabilidad ha tenido problemas cuando los totales acumulados se almacenaron en una tabla física separada. El software de nuestro proveedor podría actualizar las transacciones reales sin actualizar la tabla de saldo real, lo que da como resultado un saldo operativo fuera de control. Un sistema bien diseñado puede evitar esto, pero tenga cuidado y considere cuán importante es la precisión si adopta el enfoque de tabla separada.

Ben Brocka

¿Por qué es un requisito y qué se intenta lograr? Dependiendo de la necesidad, probablemente podría consultar la tabla de transacciones a pedido para los datos especificados ('actuales') y mover / agregar filas al final del día (almacenamiento de datos, para lo que estoy seguro que SQL Server proporciona utilidades).

Clockwork-Muse

Estoy limitado a SQL Server 2005. No tengo que tener el último total siempre exacto, pero sí tengo que mantener todos los totales acumulados para cada acción realizada: una tabla de "Historial". TomTom: no mantendré esto con la tabla original. Necesito algunos totales acumulados de diferentes tipos de transacciones y no pertenecen a la tabla original. No creo que esto se pueda hacer solo con un SELECT: es un cursor o un bucle while. Me encantaría aprender lo contrario. X-Zero: este es un tipo de procedimiento de almacenamiento de datos. Solo necesito hacerlo cada minuto y no una vez al día.

AvnerSo

Respuestas:

Asíncrono implica que los totales acumulados no necesitan ser completamente exactos en todo momento, o sus patrones de cambio de datos son tales que una compilación total consecutiva será válida y precisa hasta la próxima carga. De todos modos, estoy seguro de que has pensado en eso, así que no voy a insistir.

Sus opciones principales para un método compatible de alto rendimiento son una función / procedimiento SQLCLR o un UPDATEmétodo de iteración basado en conjuntos basado en Hugo Kornelis. El método SQLCLR (implementado en un procedimiento, pero razonablemente fácil de traducir) se puede encontrar aquí .

No he podido encontrar el método de Hugo en línea, pero se detalla en el excelente MVP Deep Dives (Volumen 1). A continuación se muestra un código de muestra para ilustrar el método de Hugo (copiado de una de mis publicaciones en otro sitio para el que no tiene un inicio de sesión):

-- A work table to hold the reformatted data, and
-- ultimately, the results
CREATE  TABLE #Work
    (
    Acct_No         VARCHAR(20) NOT NULL,
    MonthDate       DATETIME NOT NULL,
    MonthRate       DECIMAL(19,12) NOT NULL,
    Amount          DECIMAL(19,12) NOT NULL,
    InterestAmount  DECIMAL(19,12) NOT NULL,
    RunningTotal    DECIMAL(19,12) NOT NULL,
    RowRank         BIGINT NOT NULL
    );

-- Prepare the set-based iteration method
WITH    Accounts
AS      (
        -- Get a list of the account numbers
        SELECT  DISTINCT Acct_No 
        FROM    #Refunds
        ),
        Rates
AS      (
        -- Apply all the accounts to all the rates
        SELECT  A.Acct_No,
                R.[Year],
                R.[Month],
                MonthRate = R.InterestRate / 12
        FROM    #InterestRates R
        CROSS 
        JOIN    Accounts A
        ),
        BaseData
AS      (
        -- The basic data we need to work with
        SELECT  Acct_No = ISNULL(R.Acct_No,''),
                MonthDate = ISNULL(DATEADD(MONTH, R.[Month], DATEADD(YEAR, R.[year] - 1900, 0)), 0),
                R.MonthRate,
                Amount = ISNULL(RF.Amount,0),
                InterestAmount = ISNULL(RF.Amount,0) * R.MonthRate,
                RunningTotal = ISNULL(RF.Amount,0)
        FROM    Rates R
        LEFT
        JOIN    #Refunds RF
                ON  RF.Acct_No = R.Acct_No
                AND RF.[Year] = R.[Year]
                AND RF.[Month] = R.[Month]
        )
-- Basic data plus a rank id, numbering the rows by MonthDate, and resetting to 1 for each new Account
INSERT  #Work
        (Acct_No, MonthDate, MonthRate, Amount, InterestAmount, RunningTotal, RowRank)
SELECT  BD.Acct_No, BD.MonthDate, BD.MonthRate, BD.Amount, BD.InterestAmount, BD.RunningTotal,
        RowRank = RANK() OVER (PARTITION BY BD.Acct_No ORDER BY MonthDate)
FROM    BaseData BD;

-- An index to speed the next stage (different from that used with the Quirky Update method)
CREATE UNIQUE CLUSTERED INDEX nc1 ON #Work (RowRank, Acct_No);

-- Iteration variables
DECLARE @Rank       BIGINT,
        @RowCount   INTEGER;

-- Initialize
SELECT  @Rank = 1,
        @RowCount = 1;

-- This is the iteration bit, processes a rank id per iteration
-- The number of rows processed with each iteration is equal to the number of groups in the data
-- More groups --> greater efficiency
WHILE   (1 = 1)
BEGIN
        SET @Rank = @Rank + 1;

        -- Set-based update with running totals for the current rank id
        UPDATE  This
        SET     InterestAmount = (Previous.RunningTotal + This.Amount) * This.MonthRate,
                RunningTotal = Previous.RunningTotal + This.Amount + (Previous.RunningTotal + This.Amount) * This.MonthRate
        FROM    #Work This
        JOIN    #Work Previous
                ON  Previous.Acct_No = This.Acct_No
                AND Previous.RowRank = @Rank - 1
        WHERE   This.RowRank = @Rank;

        IF  (@@ROWCOUNT = 0) BREAK;
END;

-- Show the results in natural order
SELECT  *
FROM    #Work
ORDER   BY
        Acct_No, RowRank;

En SQL Server 2012, puede usar las extensiones de función de ventanas, por ejemplo SUM OVER (ORDER BY).

Paul White 9
fuente

No estoy seguro de por qué quieres asíncrono, pero un par de vistas indexadas suena como el ticket aquí. Si desea un SUM simple para algún grupo que sea: defina el total acumulado.

Si realmente desea asincrónico, con 160 filas nuevas por segundo, sus totales acumulados siempre estarán desactualizados. Asíncrono significaría que no hay disparadores o vistas indexadas

gbn
fuente

Calcular los totales acumulados es notoriamente lento, ya sea que lo haga con un cursor o con una unión triangular. Es muy tentador desnormalizar, almacenar totales acumulados en una columna, especialmente si lo selecciona con frecuencia. Sin embargo, como es habitual cuando se desnormaliza, debe garantizar la integridad de sus datos desnormalizados. Afortunadamente, puede garantizar la integridad de los totales acumulados con restricciones: siempre y cuando todas sus restricciones sean confiables, todos sus totales correctos son correctos.

También de esta manera puede asegurarse fácilmente de que el saldo actual (totales acumulados) nunca sea negativo: la aplicación por otros métodos también puede ser muy lenta. El siguiente script demuestra la técnica.

    CREATE TABLE Data.Inventory(InventoryID INT NOT NULL IDENTITY,
      ItemID INT NOT NULL,
      ChangeDate DATETIME NOT NULL,
      ChangeQty INT NOT NULL,
      TotalQty INT NOT NULL,
      PreviousChangeDate DATETIME NULL,
      PreviousTotalQty INT NULL,
      CONSTRAINT PK_Inventory PRIMARY KEY(ItemID, ChangeDate),
      CONSTRAINT UNQ_Inventory UNIQUE(ItemID, ChangeDate, TotalQty),
      CONSTRAINT UNQ_Inventory_Previous_Columns UNIQUE(ItemID, PreviousChangeDate, PreviousTotalQty),
      CONSTRAINT FK_Inventory_Self FOREIGN KEY(ItemID, PreviousChangeDate, PreviousTotalQty)
        REFERENCES Data.Inventory(ItemID, ChangeDate, TotalQty),
      CONSTRAINT CHK_Inventory_Valid_TotalQty CHECK(TotalQty >= 0 AND (TotalQty = COALESCE(PreviousTotalQty, 0) + ChangeQty)),
      CONSTRAINT CHK_Inventory_Valid_Dates_Sequence CHECK(PreviousChangeDate < ChangeDate),
      CONSTRAINT CHK_Inventory_Valid_Previous_Columns CHECK((PreviousChangeDate IS NULL AND PreviousTotalQty IS NULL)
                OR (PreviousChangeDate IS NOT NULL AND PreviousTotalQty IS NOT NULL))
    );
    GO
    -- beginning of inventory for item 1
    INSERT INTO Data.Inventory(ItemID,
      ChangeDate,
      ChangeQty,
      TotalQty,
      PreviousChangeDate,
      PreviousTotalQty)
    VALUES(1, '20090101', 10, 10, NULL, NULL);
    -- cannot begin the inventory for the second time for the same item 1
    INSERT INTO Data.Inventory(ItemID,
      ChangeDate,
      ChangeQty,
      TotalQty,
      PreviousChangeDate,
      PreviousTotalQty)
    VALUES(1, '20090102', 10, 10, NULL, NULL);


Msg 2627, Level 14, State 1, Line 10
Violation of UNIQUE KEY constraint 'UNQ_Inventory_Previous_Columns'. Cannot insert duplicate key in object 'Data.Inventory'.
The statement has been terminated.

-- add more
DECLARE @ChangeQty INT;
SET @ChangeQty = 5;
INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)
SELECT TOP 1 ItemID, '20090103', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
  FROM Data.Inventory
  WHERE ItemID = 1
  ORDER BY ChangeDate DESC;

SET @ChangeQty = 3;
INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)
SELECT TOP 1 ItemID, '20090104', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
  FROM Data.Inventory
  WHERE ItemID = 1
  ORDER BY ChangeDate DESC;

SET @ChangeQty = -4;
INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)
SELECT TOP 1 ItemID, '20090105', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
  FROM Data.Inventory
  WHERE ItemID = 1
  ORDER BY ChangeDate DESC;

-- try to violate chronological order

SET @ChangeQty = 5;
INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)
SELECT TOP 1 ItemID, '20081231', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
  FROM Data.Inventory
  WHERE ItemID = 1
  ORDER BY ChangeDate DESC;

Msg 547, Level 16, State 0, Line 4
The INSERT statement conflicted with the CHECK constraint "CHK_Inventory_Valid_Dates_Sequence". The conflict occurred in database "Test", table "Data.Inventory".
The statement has been terminated.


SELECT ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty
FROM Data.Inventory ORDER BY ChangeDate;

ChangeDate              ChangeQty   TotalQty    PreviousChangeDate      PreviousTotalQty
----------------------- ----------- ----------- ----------------------- -----
2009-01-01 00:00:00.000 10          10          NULL                    NULL
2009-01-03 00:00:00.000 5           15          2009-01-01 00:00:00.000 10
2009-01-04 00:00:00.000 3           18          2009-01-03 00:00:00.000 15
2009-01-05 00:00:00.000 -4          14          2009-01-04 00:00:00.000 18


-- try to change a single row, all updates must fail
UPDATE Data.Inventory SET ChangeQty = ChangeQty + 2 WHERE InventoryID = 3;
UPDATE Data.Inventory SET TotalQty = TotalQty + 2 WHERE InventoryID = 3;
-- try to delete not the last row, all deletes must fail
DELETE FROM Data.Inventory WHERE InventoryID = 1;
DELETE FROM Data.Inventory WHERE InventoryID = 3;

-- the right way to update

DECLARE @IncreaseQty INT;
SET @IncreaseQty = 2;
UPDATE Data.Inventory SET ChangeQty = ChangeQty + CASE WHEN ItemID = 1 AND ChangeDate = '20090103' THEN @IncreaseQty ELSE 0 END,
  TotalQty = TotalQty + @IncreaseQty,
  PreviousTotalQty = PreviousTotalQty + CASE WHEN ItemID = 1 AND ChangeDate = '20090103' THEN 0 ELSE @IncreaseQty END
WHERE ItemID = 1 AND ChangeDate >= '20090103';

SELECT ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty
FROM Data.Inventory ORDER BY ChangeDate;

ChangeDate              ChangeQty   TotalQty    PreviousChangeDate      PreviousTotalQty
----------------------- ----------- ----------- ----------------------- ----------------
2009-01-01 00:00:00.000 10          10          NULL                    NULL
2009-01-03 00:00:00.000 7           17          2009-01-01 00:00:00.000 10
2009-01-04 00:00:00.000 3           20          2009-01-03 00:00:00.000 17
2009-01-05 00:00:00.000 -4          16          2009-01-04 00:00:00.000 20

Copiado de mi blog

Alaska
fuente