Optimizar una jerarquía CTE

Actualización a continuación

Tengo una tabla de cuentas con una arquitectura típica de cuenta principal / cuenta para representar una jerarquía de cuentas (SQL Server 2012). Creé una VIEW usando un CTE para descifrar la jerarquía, y en general funciona de maravilla y como se esperaba. Puedo consultar la jerarquía en cualquier nivel y ver las ramas fácilmente.

Hay un campo de lógica de negocios que debe devolverse en función de la jerarquía. Un campo en cada registro de cuenta describe el tamaño de la empresa (lo llamaremos CustomerCount). La lógica sobre la que necesito informar debe acumular CustomerCount desde toda la sucursal. En otras palabras, dada una cuenta, necesito resumir los valores de cuenta de clientes para esa cuenta junto con cada hijo en cada rama debajo de la cuenta a lo largo de la jerarquía.

Calculé con éxito el campo utilizando un campo de jerarquía integrado dentro del CTE, que se parece a acct4.acct3.acct2.acct1. El problema con el que me encuentro es simplemente hacerlo correr rápido. Sin este campo calculado, la consulta se ejecuta en ~ 3 segundos. Cuando agrego el campo calculado, se convierte en una consulta de 4 minutos.

Aquí está la mejor versión que he podido encontrar que devuelve los resultados correctos. Estoy buscando ideas sobre cómo puedo reestructurar esto COMO UNA VISTA sin tan grandes sacrificios por el rendimiento.

Entiendo la razón por la cual esto va lento (requiere calcular un predicado en la cláusula where), pero no puedo pensar en otra forma de estructurarlo y aún obtener los mismos resultados.

Aquí hay un código de muestra para construir una tabla y hacer el CTE más o menos exactamente como funciona en mi entorno.

Use Tempdb
go
CREATE TABLE dbo.Account
(
   Acctid varchar(1) NOT NULL
    , Name varchar(30) NULL
    , ParentId varchar(1) NULL
    , CustomerCount int NULL
);

INSERT Account
SELECT 'A','Best Bet',NULL,21  UNION ALL
SELECT 'B','eStore','A',30 UNION ALL
SELECT 'C','Big Bens','B',75 UNION ALL
SELECT 'D','Mr. Jimbo','B',50 UNION ALL
SELECT 'E','Dr. John','C',100 UNION ALL
SELECT 'F','Brick','A',222 UNION ALL
SELECT 'G','Mortar','C',153 ;


With AccountHierarchy AS

(                                                                           --Root values have no parent
    SELECT
        Root.AcctId                                         AccountId
        , Root.Name                                         AccountName
        , Root.ParentId                                     ParentId
        , 1                                                 HierarchyLevel  
        , cast(Root.Acctid as varchar(4000))                IdHierarchy     --highest parent reads right to left as in id3.Acctid2.Acctid1
        , cast(replace(Root.Name,'.','') as varchar(4000))  NameHierarchy   --highest parent reads right to left as in name3.name2.name1 (replace '.' so name parse is easy in last step)
        , cast(Root.Acctid as varchar(4000))                HierarchySort   --reverse of above, read left to right name1.name2.name3 for sorting on reporting only
        , cast(Root.Name as varchar(4000))                  HierarchyLabel  --use for labels on reporting only, indents names under sorted hierarchy
        , Root.CustomerCount                                CustomerCount   

    FROM 
        tempdb.dbo.account Root

    WHERE
        Root.ParentID is null

    UNION ALL

    SELECT
        Recurse.Acctid                                      AccountId
        , Recurse.Name                                      AccountName
        , Recurse.ParentId                                  ParentId
        , Root.HierarchyLevel + 1                           HierarchyLevel  --next level in hierarchy
        , cast(cast(recurse.Acctid as varchar(40)) + '.' + Root.IdHierarchy as varchar(4000))   IdHierarchy --cast because in real system this is a uniqueidentifier type needs converting
        , cast(replace(recurse.Name,'.','') + '.' + Root.NameHierarchy as varchar(4000)) NameHierarchy  --replace '.' for parsing in last step, cast to make room for lots of sub levels down the hierarchy
        , cast(Root.AccountName + '.' + Recurse.Name as varchar(4000)) HierarchySort    
        , cast(space(root.HierarchyLevel * 4) + Recurse.Name as varchar(4000)) HierarchyLabel
        , Recurse.CustomerCount                             CustomerCount

    FROM
        tempdb.dbo.account Recurse INNER JOIN
        AccountHierarchy Root on Root.AccountId = Recurse.ParentId
)


SELECT
    hier.AccountId
    , Hier.AccountName
    , hier.ParentId
    , hier.HierarchyLevel
    , hier.IdHierarchy
    , hier.NameHierarchy
    , hier.HierarchyLabel
    , parsename(hier.IdHierarchy,1) Acct1Id
    , parsename(hier.NameHierarchy,1) Acct1Name     --This is why we stripped out '.' during recursion
    , parsename(hier.IdHierarchy,2) Acct2Id
    , parsename(hier.NameHierarchy,2) Acct2Name
    , parsename(hier.IdHierarchy,3) Acct3Id
    , parsename(hier.NameHierarchy,3) Acct3Name
    , parsename(hier.IdHierarchy,4) Acct4Id
    , parsename(hier.NameHierarchy,4) Acct4Name
    , hier.CustomerCount

    /* fantastic up to this point. Next block of code is what causes problem. 
        Logic of code is "sum of CustomerCount for this location and all branches below in this branch of hierarchy"
        In live environment, goes from taking 3 seconds to 4 minutes by adding this one calc */

    , (
        SELECT  
            sum(children.CustomerCount)
        FROM
            AccountHierarchy Children
        WHERE
            hier.IdHierarchy = right(children.IdHierarchy, (1 /*length of id field*/ * hier.HierarchyLevel) + hier.HierarchyLevel - 1 /*for periods inbetween ids*/)
            --"where this location's idhierarchy is within child idhierarchy"
            --previously tried a charindex(hier.IdHierarchy,children.IdHierarchy)>0, but that performed even worse
        ) TotalCustomerCount
FROM
    AccountHierarchy hier

ORDER BY
    hier.HierarchySort


drop table tempdb.dbo.Account

20/11/2013 ACTUALIZACIÓN

Algunas de las soluciones sugeridas hicieron fluir mis jugos, y probé un nuevo enfoque que se acerca, pero presenta un obstáculo nuevo / diferente. Honestamente, no sé si esto garantiza una publicación separada o no, pero está relacionado con la solución de este problema.

Lo que decidí fue que lo que hacía difícil la suma (cuenta de clientes) es la identificación de los niños en el contexto de una jerarquía que comienza en la parte superior y se reduce. Así que comencé creando una jerarquía que se construye de abajo hacia arriba, usando la raíz definida por "cuentas que no son padre de ninguna otra cuenta" y haciendo la unión recursiva al revés (root.parentacctid = recurse.acctid)

De esta manera, podría agregar el recuento de clientes secundarios al principal a medida que ocurre la recursividad. Debido a la forma en que necesito informes y niveles, estoy haciendo esto de arriba hacia abajo además de arriba hacia abajo, luego solo me uní a ellos a través de la identificación de la cuenta. Este enfoque resulta ser mucho más rápido que la cuenta de clientes de la consulta externa original, pero me encontré con algunos obstáculos.

Primero, estaba capturando inadvertidamente el recuento de clientes duplicados para cuentas que son padres de varios hijos. Estaba contando el doble o el triple de clientes para algunos acctidos, por la cantidad de niños que había. Mi solución fue crear otro cte que cuente cuántos nodos tiene un acct y dividir el recuento.customercount durante la recursión, por lo que cuando sumo toda la rama, el acct no se cuenta dos veces.

Entonces, en este punto, los resultados de esta nueva versión no son correctos, pero sé por qué. El cte ascendente está creando duplicados. Cuando pasa la recursividad, busca cualquier cosa en la raíz (hijos de nivel inferior) que sea hijo de una cuenta en la tabla de cuentas. En la tercera recursión, recoge las mismas cuentas que hizo en la segunda y las vuelve a colocar.

¿Ideas sobre cómo hacer un cte de abajo hacia arriba, o esto hace que fluyan otras ideas?

Use Tempdb
go


CREATE TABLE dbo.Account
(
    Acctid varchar(1) NOT NULL
    , Name varchar(30) NULL
    , ParentId varchar(1) NULL
    , CustomerCount int NULL
);

INSERT Account
SELECT 'A','Best Bet',NULL,1  UNION ALL
SELECT 'B','eStore','A',2 UNION ALL
SELECT 'C','Big Bens','B',3 UNION ALL
SELECT 'D','Mr. Jimbo','B',4 UNION ALL
SELECT 'E','Dr. John','C',5 UNION ALL
SELECT 'F','Brick','A',6 UNION ALL
SELECT 'G','Mortar','C',7 ;



With AccountHierarchy AS

(                                                                           --Root values have no parent
    SELECT
        Root.AcctId                                         AccountId
        , Root.Name                                         AccountName
        , Root.ParentId                                     ParentId
        , 1                                                 HierarchyLevel  
        , cast(Root.Acctid as varchar(4000))                IdHierarchy     --highest parent reads right to left as in id3.Acctid2.Acctid1
        , cast(replace(Root.Name,'.','') as varchar(4000))  NameHierarchy   --highest parent reads right to left as in name3.name2.name1 (replace '.' so name parse is easy in last step)
        , cast(Root.Acctid as varchar(4000))                HierarchySort   --reverse of above, read left to right name1.name2.name3 for sorting on reporting only
        , cast(Root.Acctid as varchar(4000))                HierarchyMatch 
        , cast(Root.Name as varchar(4000))                  HierarchyLabel  --use for labels on reporting only, indents names under sorted hierarchy
        , Root.CustomerCount                                CustomerCount   

    FROM 
        tempdb.dbo.account Root

    WHERE
        Root.ParentID is null

    UNION ALL

    SELECT
        Recurse.Acctid                                      AccountId
        , Recurse.Name                                      AccountName
        , Recurse.ParentId                                  ParentId
        , Root.HierarchyLevel + 1                           HierarchyLevel  --next level in hierarchy
        , cast(cast(recurse.Acctid as varchar(40)) + '.' + Root.IdHierarchy as varchar(4000))   IdHierarchy --cast because in real system this is a uniqueidentifier type needs converting
        , cast(replace(recurse.Name,'.','') + '.' + Root.NameHierarchy as varchar(4000)) NameHierarchy  --replace '.' for parsing in last step, cast to make room for lots of sub levels down the hierarchy
        , cast(Root.AccountName + '.' + Recurse.Name as varchar(4000)) HierarchySort    
        , CAST(CAST(Root.HierarchyMatch as varchar(40)) + '.' 
            + cast(recurse.Acctid as varchar(40))   as varchar(4000))   HierarchyMatch
        , cast(space(root.HierarchyLevel * 4) + Recurse.Name as varchar(4000)) HierarchyLabel
        , Recurse.CustomerCount                             CustomerCount

    FROM
        tempdb.dbo.account Recurse INNER JOIN
        AccountHierarchy Root on Root.AccountId = Recurse.ParentId
)

, Nodes as
(   --counts how many branches are below for any account that is parent to another
    select
        node.ParentId Acctid
        , cast(count(1) as float) Nodes
    from AccountHierarchy  node
    group by ParentId
)

, BottomUp as
(   --creates the hierarchy starting at accounts that are not parent to any other
    select
        Root.Acctid
        , root.ParentId
        , cast(isnull(root.customercount,0) as float) CustomerCount
    from
        tempdb.dbo.Account Root
    where
        not exists ( select 1 from tempdb.dbo.Account OtherAccts where root.Acctid = OtherAccts.ParentId)

    union all

    select
        Recurse.Acctid
        , Recurse.ParentId
        , root.CustomerCount + cast ((isnull(recurse.customercount,0) / nodes.nodes) as float) CustomerCount
        -- divide the recurse customercount by number of nodes to prevent duplicate customer count on accts that are parent to multiple children, see customercount cte next
    from
        tempdb.dbo.Account Recurse inner join 
        BottomUp Root on root.ParentId = recurse.acctid inner join
        Nodes on nodes.Acctid = recurse.Acctid
)

, CustomerCount as
(
    select
        sum(CustomerCount) TotalCustomerCount
        , hier.acctid
    from
        BottomUp hier
    group by 
        hier.Acctid
)


SELECT
    hier.AccountId
    , Hier.AccountName
    , hier.ParentId
    , hier.HierarchyLevel
    , hier.IdHierarchy
    , hier.NameHierarchy
    , hier.HierarchyLabel
    , hier.hierarchymatch
    , parsename(hier.IdHierarchy,1) Acct1Id
    , parsename(hier.NameHierarchy,1) Acct1Name     --This is why we stripped out '.' during recursion
    , parsename(hier.IdHierarchy,2) Acct2Id
    , parsename(hier.NameHierarchy,2) Acct2Name
    , parsename(hier.IdHierarchy,3) Acct3Id
    , parsename(hier.NameHierarchy,3) Acct3Name
    , parsename(hier.IdHierarchy,4) Acct4Id
    , parsename(hier.NameHierarchy,4) Acct4Name
    , hier.CustomerCount

    , customercount.TotalCustomerCount

FROM
    AccountHierarchy hier inner join
    CustomerCount on customercount.acctid = hier.accountid

ORDER BY
    hier.HierarchySort 



drop table tempdb.dbo.Account

sql-server sql-server-2012 optimization cte hígado.larson
fuente

¿Has intentado poner los resultados del AccountHierarchy CTE en una tabla temporal (indexada en IdHierarchy) ENTONCES haciendo el cálculo consultando desde la tabla temporal? Es posible que se encuentre con la forma en que se implementan los CTE; es posible que esté ejecutando todo el CTE una vez por CADA fila en el CTE.

Jon Boulineau

¿Cuáles son los índices en la tabla subyacente?

Mike Walsh

¿Y cuántas filas en la tabla real?

Mike Walsh

@MaxVernon Gracias. No he publicado mucho, pero definitivamente veo la diferencia en la calidad de las respuestas para preguntas vagas.

liver.larson el

@JonBoulineau He considerado probar algo con tablas temporales, pero estoy tratando específicamente de ejecutar esto como una vista, lo que excluye las tablas temporales. ¿Alguna idea sobre cómo moverse o probar su última afirmación?

liver.larson el

Respuestas:

Editar: este es el segundo intento

Basado en la respuesta de @Max Vernon, aquí hay una manera de evitar el uso de CTE dentro de una subconsulta en línea, que es como unirse al CTE y supongo que es la razón de la baja eficiencia. Utiliza funciones analíticas disponibles solo en la versión 2012 de SQL-Server. Probado en SQL-Fiddle

Esta parte se puede omitir de la lectura, es una copia y pega de la respuesta de Max:

;With AccountHierarchy AS
(                                                                           
    SELECT
        Root.AcctId                                         AccountId
        , Root.Name                                         AccountName
        , Root.ParentId                                     ParentId
        , 1                                                 HierarchyLevel  
        , cast(Root.Acctid as varchar(4000))                IdHierarchyMatch        
        , cast(Root.Acctid as varchar(4000))                IdHierarchy
        , cast(replace(Root.Name,'.','') as varchar(4000))  NameHierarchy   
        , cast(Root.Acctid as varchar(4000))                HierarchySort
        , cast(Root.Name as varchar(4000))                  HierarchyLabel          ,
        Root.CustomerCount                                  CustomerCount   

    FROM 
        account Root

    WHERE
        Root.ParentID is null

    UNION ALL

    SELECT
        Recurse.Acctid                                      AccountId
        , Recurse.Name                                      AccountName
        , Recurse.ParentId                                  ParentId
        , Root.HierarchyLevel + 1                           HierarchyLevel
        , CAST(CAST(Root.IdHierarchyMatch as varchar(40)) + '.' 
            + cast(recurse.Acctid as varchar(40))   as varchar(4000))   IdHierarchyMatch
        , cast(cast(recurse.Acctid as varchar(40)) + '.' 
            + Root.IdHierarchy  as varchar(4000))           IdHierarchy
        , cast(replace(recurse.Name,'.','') + '.' 
            + Root.NameHierarchy as varchar(4000))          NameHierarchy
        , cast(Root.AccountName + '.' 
            + Recurse.Name as varchar(4000))                HierarchySort   
        , cast(space(root.HierarchyLevel * 4) 
            + Recurse.Name as varchar(4000))                HierarchyLabel
        , Recurse.CustomerCount                             CustomerCount
    FROM
        account Recurse INNER JOIN
        AccountHierarchy Root on Root.AccountId = Recurse.ParentId
)

Aquí ordenamos las filas del CTE usando IdHierarchyMatchy calculamos los números de fila y un total acumulado (desde la siguiente fila hasta el final).

, cte1 AS 
(
SELECT
    h.AccountId
    , h.AccountName
    , h.ParentId
    , h.HierarchyLevel
    , h.IdHierarchy
    , h.NameHierarchy
    , h.HierarchyLabel
    , parsename(h.IdHierarchy,1) Acct1Id
    , parsename(h.NameHierarchy,1) Acct1Name
    , parsename(h.IdHierarchy,2) Acct2Id
    , parsename(h.NameHierarchy,2) Acct2Name
    , parsename(h.IdHierarchy,3) Acct3Id
    , parsename(h.NameHierarchy,3) Acct3Name
    , parsename(h.IdHierarchy,4) Acct4Id
    , parsename(h.NameHierarchy,4) Acct4Name
    , h.CustomerCount
    , h.HierarchySort
    , h.IdHierarchyMatch
        , Rn = ROW_NUMBER() OVER 
                  (ORDER BY h.IdHierarchyMatch)
        , RunningCustomerCount = COALESCE(
            SUM(h.CustomerCount)
            OVER
              (ORDER BY h.IdHierarchyMatch
               ROWS BETWEEN 1 FOLLOWING
                        AND UNBOUNDED FOLLOWING)
          , 0) 
FROM
    AccountHierarchy AS h  
)

Luego tenemos un CTE intermedio más donde usamos los totales acumulados anteriores y los números de fila, básicamente para encontrar dónde apuntan los extremos de las ramas de la estructura de árbol:

, cte2 AS
(
SELECT
    cte1.*
    , rn3  = LAST_VALUE(Rn) OVER 
               (PARTITION BY Acct1Id, Acct2Id, Acct3Id 
                ORDER BY Acct4Id
                ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)       
    , rn2  = LAST_VALUE(Rn) OVER 
               (PARTITION BY Acct1Id, Acct2Id 
                ORDER BY Acct3Id, Acct4Id
                ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) 
    , rn1  = LAST_VALUE(Rn) OVER 
               (PARTITION BY Acct1Id 
                ORDER BY Acct2Id, Acct3Id, Acct4Id
                ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) 
    , rcc3 = LAST_VALUE(RunningCustomerCount) OVER 
               (PARTITION BY Acct1Id, Acct2Id, Acct3Id 
                ORDER BY Acct4Id
                ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)       
    , rcc2 = LAST_VALUE(RunningCustomerCount) OVER 
               (PARTITION BY Acct1Id, Acct2Id 
                ORDER BY Acct3Id, Acct4Id
                ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) 
    , rcc1 = LAST_VALUE(RunningCustomerCount) OVER 
               (PARTITION BY Acct1Id 
                ORDER BY Acct2Id, Acct3Id, Acct4Id
                ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) 
FROM
    cte1 
)

y finalmente construimos la última parte:

SELECT
    hier.AccountId
    , hier.AccountName
    ---                        -- columns skipped 
    , hier.CustomerCount

    , TotalCustomerCount = hier.CustomerCount
        + hier.RunningCustomerCount 
        - ca.LastRunningCustomerCount

    , hier.HierarchySort
    , hier.IdHierarchyMatch
FROM
    cte2 hier
  OUTER APPLY
    ( SELECT  LastRunningCustomerCount, Rn
      FROM
      ( SELECT LastRunningCustomerCount
              = RunningCustomerCount, Rn
        FROM (SELECT NULL a) x  WHERE 4 <= HierarchyLevel 
      UNION ALL
        SELECT rcc3, Rn3
        FROM (SELECT NULL a) x  WHERE 3 <= HierarchyLevel 
      UNION ALL
        SELECT rcc2, Rn2 
        FROM (SELECT NULL a) x  WHERE 2 <= HierarchyLevel 
      UNION ALL
        SELECT rcc1, Rn1
        FROM (SELECT NULL a) x  WHERE 1 <= HierarchyLevel 
      ) x
      ORDER BY Rn 
      OFFSET 0 ROWS
      FETCH NEXT 1 ROWS ONLY
      ) ca
ORDER BY
    hier.HierarchySort ;

Y una simplificación, usando lo mismo cte1que el código anterior. Prueba en SQL-Fiddle-2 . Tenga en cuenta que ambas soluciones funcionan bajo el supuesto de que tiene un máximo de cuatro niveles en su árbol:

SELECT
    hier.AccountId
    ---                      -- skipping rows
    , hier.CustomerCount

    , TotalCustomerCount = CustomerCount
        + RunningCustomerCount 
        - CASE HierarchyLevel
            WHEN 4 THEN RunningCustomerCount
            WHEN 3 THEN LAST_VALUE(RunningCustomerCount) OVER 
                   (PARTITION BY Acct1Id, Acct2Id, Acct3Id 
                    ORDER BY Acct4Id
                    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)       
            WHEN 2 THEN LAST_VALUE(RunningCustomerCount) OVER 
                   (PARTITION BY Acct1Id, Acct2Id 
                    ORDER BY Acct3Id, Acct4Id
                    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) 
            WHEN 1 THEN LAST_VALUE(RunningCustomerCount) OVER 
                   (PARTITION BY Acct1Id 
                    ORDER BY Acct2Id, Acct3Id, Acct4Id
                    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) 
          END

    , hier.HierarchySort
    , hier.IdHierarchyMatch
FROM cte1 AS hier
ORDER BY
    hier.HierarchySort ;

Un tercer enfoque, con un solo CTE, para la parte recursiva y luego solo funciones de agregado de ventana ( SUM() OVER (...)), por lo que debería funcionar en cualquier versión desde 2005 en adelante. Prueba en SQL-Fiddle-3 Esta solución asume, como las anteriores, que hay 4 niveles máximos en el árbol de jerarquía:

;WITH AccountHierarchy AS
(                                                                           
    SELECT
          AccountId      = Root.AcctId                                         
        , AccountName    = Root.Name                                         
        , ParentId       = Root.ParentId                                     
        , HierarchyLevel = 1                                                   
        , HierarchySort  = CAST(Root.Acctid AS VARCHAR(4000))                
        , HierarchyLabel = CAST(Root.Name AS VARCHAR(4000))                   
        , Acct1Id        = CAST(Root.Acctid AS VARCHAR(4000))                
        , Acct2Id        = CAST(NULL AS VARCHAR(4000))                       
        , Acct3Id        = CAST(NULL AS VARCHAR(4000))                       
        , Acct4Id        = CAST(NULL AS VARCHAR(4000))                       
        , Acct1Name      = CAST(Root.Name AS VARCHAR(4000))                  
        , Acct2Name      = CAST(NULL AS VARCHAR(4000))                       
        , Acct3Name      = CAST(NULL AS VARCHAR(4000))                       
        , Acct4Name      = CAST(NULL AS VARCHAR(4000))                       
        , CustomerCount  = Root.CustomerCount                                   

    FROM 
        account AS Root

    WHERE
        Root.ParentID IS NULL

    UNION ALL

    SELECT
          Recurse.Acctid 
        , Recurse.Name 
        , Recurse.ParentId 
        , Root.HierarchyLevel + 1 
        , CAST(Root.AccountName + '.' 
            + Recurse.Name AS VARCHAR(4000)) 
        , CAST(SPACE(Root.HierarchyLevel * 4) 
            + Recurse.Name AS VARCHAR(4000)) 
        , Root.Acct1Id 
        , CASE WHEN Root.HierarchyLevel = 1 
              THEN cast(Recurse.Acctid AS VARCHAR(4000)) 
              ELSE Root.Acct2Id 
          END 
        , CASE WHEN Root.HierarchyLevel = 2 
              THEN CAST(Recurse.Acctid AS VARCHAR(4000)) 
              ELSE Root.Acct3Id 
          END 
        , CASE WHEN Root.HierarchyLevel = 3 
              THEN CAST(Recurse.Acctid AS VARCHAR(4000)) 
              ELSE Root.Acct4Id 
          END 

        , cast(Root.AccountName as varchar(4000))          
        , CASE WHEN Root.HierarchyLevel = 1 
              THEN CAST(Recurse.Name AS VARCHAR(4000)) 
              ELSE Root.Acct2Name 
          END 
        , CASE WHEN Root.HierarchyLevel = 2 
              THEN CAST(Recurse.Name AS VARCHAR(4000)) 
              ELSE Root.Acct3Name 
          END 
        , CASE WHEN Root.HierarchyLevel = 3 
              THEN CAST(Recurse.Name AS VARCHAR(4000)) 
              ELSE Root.Acct4Name 
          END 
        , Recurse.CustomerCount 
    FROM 
        account AS Recurse INNER JOIN 
        AccountHierarchy AS Root ON Root.AccountId = Recurse.ParentId
)

SELECT
      h.AccountId
    , h.AccountName
    , h.ParentId
    , h.HierarchyLevel
    , IdHierarchy = 
          CAST(COALESCE(h.Acct4Id+'.','') 
               + COALESCE(h.Acct3Id+'.','') 
               + COALESCE(h.Acct2Id+'.','') 
               + h.Acct1Id AS VARCHAR(4000))
    , NameHierarchy = 
          CAST(COALESCE(h.Acct4Name+'.','') 
               + COALESCE(h.Acct3Name+'.','') 
               + COALESCE(h.Acct2Name+'.','') 
               + h.Acct1Name AS VARCHAR(4000))   
    , h.HierarchyLabel
    , h.Acct1Id
    , h.Acct1Name
    , h.Acct2Id
    , h.Acct2Name
    , h.Acct3Id
    , h.Acct3Name
    , h.Acct4Id
    , h.Acct4Name
    , h.CustomerCount
    , TotalCustomerCount =  
          CASE h.HierarchyLevel
            WHEN 4 THEN h.CustomerCount
            WHEN 3 THEN SUM(h.CustomerCount) OVER 
                   (PARTITION BY h.Acct1Id, h.Acct2Id, h.Acct3Id)       
            WHEN 2 THEN SUM(h.CustomerCount) OVER 
                   (PARTITION BY Acct1Id, h.Acct2Id) 
            WHEN 1 THEN SUM(h.CustomerCount) OVER 
                   (PARTITION BY h.Acct1Id) 
          END
    , h.HierarchySort
    , IdHierarchyMatch = 
          CAST(h.Acct1Id 
               + COALESCE('.'+h.Acct2Id,'') 
               + COALESCE('.'+h.Acct3Id,'') 
               + COALESCE('.'+h.Acct4Id,'') AS VARCHAR(4000))   
FROM
    AccountHierarchy AS h  
ORDER BY
    h.HierarchySort ;

Un cuarto enfoque, que calcula como un CTE intermedio, la tabla de cierre de la jerarquía. Prueba en SQL-Fiddle-4 . El beneficio es que para los cálculos de sumas, no hay restricción en el número de niveles.

;WITH AccountHierarchy AS
( 
    -- skipping several line, identical to the 3rd approach above
)

, ClosureTable AS
( 
    SELECT
          AccountId      = Root.AcctId  
        , AncestorId     = Root.AcctId  
        , CustomerCount  = Root.CustomerCount 
    FROM 
        account AS Root

    UNION ALL

    SELECT
          Recurse.Acctid 
        , Root.AncestorId 
        , Recurse.CustomerCount
    FROM 
        account AS Recurse INNER JOIN 
        ClosureTable AS Root ON Root.AccountId = Recurse.ParentId
)

, ClosureGroup AS
(                                                                           
    SELECT
          AccountId           = AncestorId  
        , TotalCustomerCount  = SUM(CustomerCount)                             
    FROM 
        ClosureTable AS a
    GROUP BY
        AncestorId
)

SELECT
      h.AccountId
    , h.AccountName
    , h.ParentId
    , h.HierarchyLevel 
    , h.HierarchyLabel
    , h.CustomerCount
    , cg.TotalCustomerCount 

    , h.HierarchySort
FROM
    AccountHierarchy AS h  
  JOIN
    ClosureGroup AS cg
      ON cg.AccountId = h.AccountId
ORDER BY
    h.HierarchySort ;

ypercubeᵀᴹ
fuente

Se corrigió el código (y el violín vinculado). Había una opción CUANDO faltaba en la respuesta.

ypercubeᵀᴹ

+1 - Me gusta el uso de las funciones de 2012. ¡Tengo mucho aprendizaje ahora!

Max Vernon

ok, solo pasé algún tiempo sumergiéndome y me di cuenta de que el rendimiento fue excelente, pero los números no coinciden. Comprueba los resultados con mi original. Puedo ver a dónde ibas con los totales acumulados, pero va a tener que cambiar algunos para que funcione según lo previsto, y no he llegado a la solución correcta. Su enfoque me da un poco de forraje para trabajar, pero aún no es una solución viable.

liver.larson

Oh, creo que está muy mal. Por favor, no aceptes.

ypercubeᵀᴹ

Traté de corregirlo. Funciona bien con mi pequeña muestra, pero pruebe la corrección con sus datos. Sobre la eficiencia, qué puedo decir, solo podemos saberlo mediante pruebas (a menos que su nombre sea @Paul White).

ypercubeᵀᴹ

Creo que esto debería hacerlo más rápido:

;With AccountHierarchy AS
(                                                                           
    SELECT
        Root.AcctId                                         AccountId
        , Root.Name                                         AccountName
        , Root.ParentId                                     ParentId
        , 1                                                 HierarchyLevel  
        , cast(Root.Acctid as varchar(4000))                IdHierarchyMatch        
        , cast(Root.Acctid as varchar(4000))                IdHierarchy
        , cast(replace(Root.Name,'.','') as varchar(4000))  NameHierarchy   
        , cast(Root.Acctid as varchar(4000))                HierarchySort
        , cast(Root.Name as varchar(4000))                  HierarchyLabel          ,
        Root.CustomerCount                                  CustomerCount   

    FROM 
        tempdb.dbo.account Root

    WHERE
        Root.ParentID is null

    UNION ALL

    SELECT
        Recurse.Acctid                                      AccountId
        , Recurse.Name                                      AccountName
        , Recurse.ParentId                                  ParentId
        , Root.HierarchyLevel + 1                           HierarchyLevel
        , CAST(CAST(Root.IdHierarchyMatch as varchar(40)) + '.' 
            + cast(recurse.Acctid as varchar(40))   as varchar(4000))   IdHierarchyMatch
        , cast(cast(recurse.Acctid as varchar(40)) + '.' 
            + Root.IdHierarchy  as varchar(4000))           IdHierarchy
        , cast(replace(recurse.Name,'.','') + '.' 
            + Root.NameHierarchy as varchar(4000))          NameHierarchy
        , cast(Root.AccountName + '.' 
            + Recurse.Name as varchar(4000))                HierarchySort   
        , cast(space(root.HierarchyLevel * 4) 
            + Recurse.Name as varchar(4000))                HierarchyLabel
        , Recurse.CustomerCount                             CustomerCount
    FROM
        tempdb.dbo.account Recurse INNER JOIN
        AccountHierarchy Root on Root.AccountId = Recurse.ParentId
)


SELECT
    hier.AccountId
    , Hier.AccountName
    , hier.ParentId
    , hier.HierarchyLevel
    , hier.IdHierarchy
    , hier.NameHierarchy
    , hier.HierarchyLabel
    , parsename(hier.IdHierarchy,1) Acct1Id
    , parsename(hier.NameHierarchy,1) Acct1Name
    , parsename(hier.IdHierarchy,2) Acct2Id
    , parsename(hier.NameHierarchy,2) Acct2Name
    , parsename(hier.IdHierarchy,3) Acct3Id
    , parsename(hier.NameHierarchy,3) Acct3Name
    , parsename(hier.IdHierarchy,4) Acct4Id
    , parsename(hier.NameHierarchy,4) Acct4Name
    , hier.CustomerCount
    , (
        SELECT  
            sum(children.CustomerCount)
        FROM
            AccountHierarchy Children
        WHERE
            Children.IdHierarchyMatch LIKE hier.IdHierarchyMatch + '%'
        ) TotalCustomerCount
        , HierarchySort
        , IdHierarchyMatch
FROM
    AccountHierarchy hier
ORDER BY
    hier.HierarchySort

Agregué una columna en el CTE llamada IdHierarchyMatchque es la versión hacia adelante IdHierarchypara habilitar la TotalCustomerCountsubconsultaWHERE cláusula de sea modificable.

Al comparar los costos estimados de subárbol para los planes de ejecución, esta forma debería ser aproximadamente 5 veces más rápida.

Max Vernon
fuente

Gracias por tomarse el tiempo de mirar esto. Es curioso, ese fue mi primer instinto, y pensé que agregar un comodín a un campo solo era posible usando SQL dinámico, por lo que ni siquiera lo intenté. Debería haberlo comprobado. Por lo tanto, el resultado es una mejora notable a las 2:49 (por debajo de 3:53), pero no tanto como esperaba. Me iré sin respuesta para ver qué otras ideas surgen. Gracias de nuevo por tomarse el tiempo para evaluar esto, de verdad. Lo aprecio.

liver.larson el

FYI, acabo de notar un error de sintaxis en mi implementación. Estamos a las 2:04 horas de ejecución. Aún mejor donde empecé. Todavía con el objetivo de más rápido.

liver.larson el

Me alegro de haber ayudado de alguna manera. Pasé alrededor de 2 horas anoche tratando de entender el problema. Tengo un profundo presentimiento que esto podría resolverse usando algún tipo de ROW_NUMER() OVER (ORDER BY...)cosa. Simplemente no pude obtener los números correctos. Es una pregunta realmente genial e interesante. Buen ejercicio cerebral!

Max Vernon

Intenté convertir esto en una vista vinculada al esquema (materializada), con el propósito de agregar un índice en el IdHierarchyMatchcampo, sin embargo, no puede agregar un índice agrupado en una vista vinculada al esquema que incluye un CTE. Me pregunto si esta limitación se resuelve en SQL Server 2014.

Max Vernon

@MaxVernon Para la versión 2012: SQL-Fiddle

ypercubeᵀᴹ

Le di una oportunidad también. No es muy bonito, pero parece funcionar mejor.

USE Tempdb
go

SET STATISTICS IO ON;
SET STATISTICS TIME OFF;
SET NOCOUNT ON;

--------
-- assuming the original table looks something like this 
-- and you cannot control it's indexes 
-- (only widened the data types a bit for the extra sample rows)
--------
CREATE TABLE dbo.Account
    (
      Acctid VARCHAR(10) NOT NULL ,
      Name VARCHAR(100) NULL ,
      ParentId VARCHAR(10) NULL ,
      CustomerCount INT NULL
    );

--------
-- inserting the same records as in your sample
--------
INSERT  Account
        SELECT  'A' ,
                'Best Bet' ,
                NULL ,
                21
        UNION ALL
        SELECT  'B' ,
                'eStore' ,
                'A' ,
                30
        UNION ALL
        SELECT  'C' ,
                'Big Bens' ,
                'B' ,
                75
        UNION ALL
        SELECT  'D' ,
                'Mr. Jimbo' ,
                'B' ,
                50
        UNION ALL
        SELECT  'E' ,
                'Dr. John' ,
                'C' ,
                100
        UNION ALL
        SELECT  'F' ,
                'Brick' ,
                'A' ,
                222
        UNION ALL
        SELECT  'G' ,
                'Mortar' ,
                'C' ,
                153;

--------
-- now lets up the ante a bit and add some extra rows with random parents 
-- to these 7 items, it is hard to measure differences with so few rows
--------
DECLARE @numberOfRows INT = 25000
DECLARE @from INT = 1
DECLARE @to INT = 7
DECLARE @T1 TABLE ( n INT ); 

WITH    cte ( n )
          AS ( SELECT   ROW_NUMBER() OVER ( ORDER BY CURRENT_TIMESTAMP )
               FROM     sys.messages
             )
    INSERT  INTO @T1
            SELECT  n
            FROM    cte
            WHERE   n <= @numberOfRows;

INSERT  INTO dbo.Account
        ( acctId ,
          name ,
          parentId ,
          Customercount
        )
        SELECT  CHAR(64 + RandomNumber) + CAST(n AS VARCHAR(10)) AS Id ,
                CAST('item ' + CHAR(64 + RandomNumber) + CAST(n AS VARCHAR(10)) AS VARCHAR(100)) ,
                CHAR(64 + RandomNumber) AS parentId ,
                ABS(CHECKSUM(NEWID()) % 100) + 1 AS RandomCustCount
        FROM    ( SELECT    n ,
                            ABS(CHECKSUM(NEWID()) % @to) + @from AS RandomNumber
                  FROM      @T1
                ) A;

--------
-- Assuming you cannot control it's indexes, in my tests we're better off taking the IO hit of copying the data
-- to some structure that is better optimized for this query. Not quite what I initially expected,  but we seem 
-- to be better off that way.
--------
CREATE TABLE tempdb.dbo.T1
    (
      AccountId VARCHAR(10) NOT NULL
                            PRIMARY KEY NONCLUSTERED ,
      AccountName VARCHAR(100) NOT NULL ,
      ParentId VARCHAR(10) NULL ,
      HierarchyLevel INT NULL ,
      HPath VARCHAR(1000) NULL ,
      IdHierarchy VARCHAR(1000) NULL ,
      NameHierarchy VARCHAR(1000) NULL ,
      HierarchyLabel VARCHAR(1000) NULL ,
      HierarchySort VARCHAR(1000) NULL ,
      CustomerCount INT NOT NULL
    );

CREATE CLUSTERED INDEX IX_Q1
ON tempdb.dbo.T1  ([ParentId]);

-- for summing customer counts over parents
CREATE NONCLUSTERED INDEX IX_Q2 
ON tempdb.dbo.T1  (HPath) INCLUDE(CustomerCount);

INSERT  INTO tempdb.dbo.T1
        ( AccountId ,
          AccountName ,
          ParentId ,
          HierarchyLevel ,
          HPath ,
          IdHierarchy ,
          NameHierarchy ,
          HierarchyLabel ,
          HierarchySort ,
          CustomerCount 
        )
        SELECT  Acctid AS AccountId ,
                Name AS AccountName ,
                ParentId AS ParentId ,
                NULL AS HierarchyLevel ,
                NULL AS HPath ,
                NULL AS IdHierarchy ,
                NULL AS NameHierarchy ,
                NULL AS HierarchyLabel ,
                NULL AS HierarchySort ,
                CustomerCount AS CustomerCount
        FROM    tempdb.dbo.account;



--------
-- I cannot seem to force an efficient way to do the sum while selecting over the recursive cte, 
-- so I took it aside. I am sure there is a more elegant way but I can't seem to make it happen. 
-- At least it performs better this way. But it remains a very expensive query.
--------
;
WITH    AccountHierarchy
          AS ( SELECT   Root.AccountId AS AcId ,
                        Root.ParentId ,
                        1 AS HLvl ,
                        CAST(Root.AccountId AS VARCHAR(1000)) AS [HPa] ,
                        CAST(Root.accountId AS VARCHAR(1000)) AS hid ,
                        CAST(REPLACE(Root.AccountName, '.', '') AS VARCHAR(1000)) AS hn ,
                        CAST(Root.accountid AS VARCHAR(1000)) AS hs ,
                        CAST(Root.accountname AS VARCHAR(1000)) AS hl
               FROM     tempdb.dbo.T1 Root
               WHERE    Root.ParentID IS NULL
               UNION ALL
               SELECT   Recurse.AccountId AS acid ,
                        Recurse.ParentId ParentId ,
                        Root.Hlvl + 1 AS hlvl ,
                        CAST(Root.HPa + '.' + Recurse.AccountId AS VARCHAR(1000)) AS hpa ,
                        CAST(recurse.AccountId + '.' + Root.hid AS VARCHAR(1000)) AS hid ,
                        CAST(REPLACE(recurse.AccountName, '.', '') + '.' + Root.hn AS VARCHAR(1000)) AS hn ,
                        CAST(Root.hs + '.' + Recurse.AccountName AS VARCHAR(1000)) AS hs ,
                        CAST(SPACE(root.hlvl * 4) + Recurse.AccountName AS VARCHAR(1000)) AS hl
               FROM     tempdb.dbo.T1 Recurse
                        INNER JOIN AccountHierarchy Root ON Root.AcId = Recurse.ParentId
             )
    UPDATE  tempdb.dbo.T1
    SET     HierarchyLevel = HLvl ,
            HPath = Hpa ,
            IdHierarchy = hid ,
            NameHierarchy = hn ,
            HierarchyLabel = hl ,
            HierarchySort = hs
    FROM    AccountHierarchy
    WHERE   AccountId = AcId;

SELECT  --HPath ,
        AccountId ,
        AccountName ,
        ParentId ,
        HierarchyLevel ,
        IdHierarchy ,
        NameHierarchy ,
        HierarchyLabel ,
        PARSENAME(IdHierarchy, 1) Acct1Id ,
        PARSENAME(NameHierarchy, 1) Acct1Name ,
        PARSENAME(IdHierarchy, 2) Acct2Id ,
        PARSENAME(NameHierarchy, 2) Acct2Name ,
        PARSENAME(IdHierarchy, 3) Acct3Id ,
        PARSENAME(NameHierarchy, 3) Acct3Name ,
        PARSENAME(IdHierarchy, 4) Acct4Id ,
        PARSENAME(NameHierarchy, 4) Acct4Name ,
        CustomerCount ,
        Cnt.TotalCustomerCount
FROM    tempdb.dbo.t1 Hier
        CROSS APPLY ( SELECT    SUM(CustomerCount) AS TotalCustomerCount
                      FROM      tempdb.dbo.t1
                      WHERE     HPath LIKE hier.HPath + '%'
                    ) Cnt
ORDER BY HierarchySort;

DROP TABLE tempdb.dbo.t1;
DROP TABLE tempdb.dbo.Account;

souplex
fuente

valiente intento. Y esa es una generación de datos de muestra bastante dulce. Necesito mejorar haciendo algunos de esos trucos. Todavía estoy buscando esa solución elegante, estoy seguro de que está esperando.

liver.larson el