Orden de clasificación natural en C #

129

¿Alguien tiene un buen recurso o proporciona una muestra de un orden natural en C # para una FileInfomatriz? Estoy implementando la IComparerinterfaz en mi especie.

c# sorting file natural-sort Michael Kniskern
fuente

148

Lo más fácil de hacer es simplemente P / Invocar la función incorporada en Windows y usarla como la función de comparación en su IComparer:

[DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
private static extern int StrCmpLogicalW(string psz1, string psz2);

Michael Kaplan tiene algunos ejemplos de cómo funciona esta función aquí , y los cambios que se hicieron para que Vista funcione de manera más intuitiva. El lado positivo de esta función es que tendrá el mismo comportamiento que la versión de Windows en la que se ejecuta, sin embargo, esto significa que difiere entre las versiones de Windows, por lo que debe considerar si esto es un problema para usted.

Entonces, una implementación completa sería algo como:

[SuppressUnmanagedCodeSecurity]
internal static class SafeNativeMethods
{
    [DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
    public static extern int StrCmpLogicalW(string psz1, string psz2);
}

public sealed class NaturalStringComparer : IComparer<string>
{
    public int Compare(string a, string b)
    {
        return SafeNativeMethods.StrCmpLogicalW(a, b);
    }
}

public sealed class NaturalFileInfoNameComparer : IComparer<FileInfo>
{
    public int Compare(FileInfo a, FileInfo b)
    {
        return SafeNativeMethods.StrCmpLogicalW(a.Name, b.Name);
    }
}

Greg Beech
fuente

8

Gran respuesta. Advertencia: Esto no funcionará con Win2000, para esas pocas personas que aún ejecutan cosas en ese sistema operativo. Por otro lado, hay suficientes pistas entre el blog de Kaplan y la documentación de MSDN para crear una función similar.

Chris Charabaruk

9

Esto no es portátil, solo funciona en Win32, pero no funciona en Linux / MacOS / Silverlight / Windows Phone / Metro

linquize

20

@linquize - Dijo que .NET no Mono, por lo que Linux / OSX no es realmente una preocupación. Windows Phone / Metro no existía en 2008 cuando se publicó esta respuesta. ¿Y con qué frecuencia realiza operaciones de archivo en Silverlight? Entonces, para el OP, y probablemente para la mayoría de las otras personas, fue una respuesta adecuada. En cualquier caso, eres libre de proporcionar una mejor respuesta; así es como funciona este sitio.

Greg Beech

66

Esto no significa que la respuesta original fuera incorrecta. Solo agrego información adicional con información actualizada

linquize el

2

Para su información, si hereda de en Comparer<T>lugar de implementar IComparer<T>, obtiene una implementación integrada de la IComparerinterfaz (no genérica) que llama a su método genérico, para usar en las API que usan eso en su lugar. Básicamente, también es gratis: simplemente borre el "I" y cambie public int Compare(...)a public override int Compare(...). Lo mismo para IEqualityComparer<T>y EqualityComparer<T>.

Joe Amenta

75

Solo pensé en agregar a esto (con la solución más concisa que pude encontrar):

public static IOrderedEnumerable<T> OrderByAlphaNumeric<T>(this IEnumerable<T> source, Func<T, string> selector)
{
    int max = source
        .SelectMany(i => Regex.Matches(selector(i), @"\d+").Cast<Match>().Select(m => (int?)m.Value.Length))
        .Max() ?? 0;

    return source.OrderBy(i => Regex.Replace(selector(i), @"\d+", m => m.Value.PadLeft(max, '0')));
}

Lo anterior rellena los números en la cadena hasta la longitud máxima de todos los números en todas las cadenas y utiliza la cadena resultante para ordenar.

La conversión a ( int?) es para permitir colecciones de cadenas sin ningún número ( .Max()en un vacío enumerable arroja un InvalidOperationException).

Matthew Horsley
fuente

1

+1 No solo es lo más conciso, es lo más rápido que he visto. a excepción de la respuesta aceptada pero no puedo usarla debido a las dependencias de la máquina. Clasificó más de 4 millones de valores en unos 35 segundos.

Gene S

44

Esto es hermoso e imposible de leer. Supongo que los beneficios de Linq significarán (al menos) el mejor rendimiento promedio y el mejor de los casos, así que creo que voy a ir con eso. A pesar de la falta de claridad. Muchas gracias @Matthew Horsley

Ian Grainger

1

Esto es muy bueno, pero hay un error para ciertos números decimales, mi ejemplo fue la clasificación de k8.11 vs k8.2. Para solucionar esto, implementé la siguiente expresión regular: \ d + ([\.,] \ D)?

devzero

2

También debe tener en cuenta la longitud del segundo grupo (punto decimal + decimales) cuando ingresa este código m.Value.PadLeft (max, '0')

devzero

3

Creo que puedes usarlo en .DefaultIfEmpty().Max()lugar de lanzarlo int?. También vale la pena hacer una source.ToList()para evitar volver a enumerar lo enumerable.

Teejay

30

Ninguna de las implementaciones existentes se veía genial, así que escribí la mía. Los resultados son casi idénticos a la clasificación utilizada por las versiones modernas de Windows Explorer (Windows 7/8). Las únicas diferencias que he visto son 1) aunque Windows solía (por ejemplo, XP) manejar números de cualquier longitud, ahora está limitado a 19 dígitos: el mío es ilimitado, 2) Windows da resultados inconsistentes con ciertos conjuntos de dígitos Unicode: el mío funciona bien (aunque no compara numéricamente dígitos de pares sustitutos; tampoco Windows), y 3) el mío no puede distinguir diferentes tipos de pesos de clasificación no primarios si ocurren en diferentes secciones (por ejemplo, "e-1é" vs " é1e- ": las secciones anteriores y posteriores al número tienen diferencias de peso diacríticas y de puntuación).

public static int CompareNatural(string strA, string strB) {
    return CompareNatural(strA, strB, CultureInfo.CurrentCulture, CompareOptions.IgnoreCase);
}

public static int CompareNatural(string strA, string strB, CultureInfo culture, CompareOptions options) {
    CompareInfo cmp = culture.CompareInfo;
    int iA = 0;
    int iB = 0;
    int softResult = 0;
    int softResultWeight = 0;
    while (iA < strA.Length && iB < strB.Length) {
        bool isDigitA = Char.IsDigit(strA[iA]);
        bool isDigitB = Char.IsDigit(strB[iB]);
        if (isDigitA != isDigitB) {
            return cmp.Compare(strA, iA, strB, iB, options);
        }
        else if (!isDigitA && !isDigitB) {
            int jA = iA + 1;
            int jB = iB + 1;
            while (jA < strA.Length && !Char.IsDigit(strA[jA])) jA++;
            while (jB < strB.Length && !Char.IsDigit(strB[jB])) jB++;
            int cmpResult = cmp.Compare(strA, iA, jA - iA, strB, iB, jB - iB, options);
            if (cmpResult != 0) {
                // Certain strings may be considered different due to "soft" differences that are
                // ignored if more significant differences follow, e.g. a hyphen only affects the
                // comparison if no other differences follow
                string sectionA = strA.Substring(iA, jA - iA);
                string sectionB = strB.Substring(iB, jB - iB);
                if (cmp.Compare(sectionA + "1", sectionB + "2", options) ==
                    cmp.Compare(sectionA + "2", sectionB + "1", options))
                {
                    return cmp.Compare(strA, iA, strB, iB, options);
                }
                else if (softResultWeight < 1) {
                    softResult = cmpResult;
                    softResultWeight = 1;
                }
            }
            iA = jA;
            iB = jB;
        }
        else {
            char zeroA = (char)(strA[iA] - (int)Char.GetNumericValue(strA[iA]));
            char zeroB = (char)(strB[iB] - (int)Char.GetNumericValue(strB[iB]));
            int jA = iA;
            int jB = iB;
            while (jA < strA.Length && strA[jA] == zeroA) jA++;
            while (jB < strB.Length && strB[jB] == zeroB) jB++;
            int resultIfSameLength = 0;
            do {
                isDigitA = jA < strA.Length && Char.IsDigit(strA[jA]);
                isDigitB = jB < strB.Length && Char.IsDigit(strB[jB]);
                int numA = isDigitA ? (int)Char.GetNumericValue(strA[jA]) : 0;
                int numB = isDigitB ? (int)Char.GetNumericValue(strB[jB]) : 0;
                if (isDigitA && (char)(strA[jA] - numA) != zeroA) isDigitA = false;
                if (isDigitB && (char)(strB[jB] - numB) != zeroB) isDigitB = false;
                if (isDigitA && isDigitB) {
                    if (numA != numB && resultIfSameLength == 0) {
                        resultIfSameLength = numA < numB ? -1 : 1;
                    }
                    jA++;
                    jB++;
                }
            }
            while (isDigitA && isDigitB);
            if (isDigitA != isDigitB) {
                // One number has more digits than the other (ignoring leading zeros) - the longer
                // number must be larger
                return isDigitA ? 1 : -1;
            }
            else if (resultIfSameLength != 0) {
                // Both numbers are the same length (ignoring leading zeros) and at least one of
                // the digits differed - the first difference determines the result
                return resultIfSameLength;
            }
            int lA = jA - iA;
            int lB = jB - iB;
            if (lA != lB) {
                // Both numbers are equivalent but one has more leading zeros
                return lA > lB ? -1 : 1;
            }
            else if (zeroA != zeroB && softResultWeight < 2) {
                softResult = cmp.Compare(strA, iA, 1, strB, iB, 1, options);
                softResultWeight = 2;
            }
            iA = jA;
            iB = jB;
        }
    }
    if (iA < strA.Length || iB < strB.Length) {
        return iA < strA.Length ? 1 : -1;
    }
    else if (softResult != 0) {
        return softResult;
    }
    return 0;
}

La firma coincide con el Comparison<string>delegado:

string[] files = Directory.GetFiles(@"C:\");
Array.Sort(files, CompareNatural);

Aquí hay una clase de contenedor para usar como IComparer<string>:

public class CustomComparer<T> : IComparer<T> {
    private Comparison<T> _comparison;

    public CustomComparer(Comparison<T> comparison) {
        _comparison = comparison;
    }

    public int Compare(T x, T y) {
        return _comparison(x, y);
    }
}

Ejemplo:

string[] files = Directory.EnumerateFiles(@"C:\")
    .OrderBy(f => f, new CustomComparer<string>(CompareNatural))
    .ToArray();

Aquí hay un buen conjunto de nombres de archivo que uso para las pruebas:

Func<string, string> expand = (s) => { int o; while ((o = s.IndexOf('\\')) != -1) { int p = o + 1;
    int z = 1; while (s[p] == '0') { z++; p++; } int c = Int32.Parse(s.Substring(p, z));
    s = s.Substring(0, o) + new string(s[o - 1], c) + s.Substring(p + z); } return s; };
string encodedFileNames =
    "KDEqLW4xMiotbjEzKjAwMDFcMDY2KjAwMlwwMTcqMDA5XDAxNyowMlwwMTcqMDlcMDE3KjEhKjEtISox" +
    "LWEqMS4yNT8xLjI1KjEuNT8xLjUqMSoxXDAxNyoxXDAxOCoxXDAxOSoxXDA2NioxXDA2NyoxYSoyXDAx" +
    "NyoyXDAxOCo5XDAxNyo5XDAxOCo5XDA2Nio9MSphMDAxdGVzdDAxKmEwMDF0ZXN0aW5nYTBcMzEqYTAw" +
    "Mj9hMDAyIGE/YTAwMiBhKmEwMDIqYTAwMmE/YTAwMmEqYTAxdGVzdGluZ2EwMDEqYTAxdnNmcyphMSph" +
    "MWEqYTF6KmEyKmIwMDAzcTYqYjAwM3E0KmIwM3E1KmMtZSpjZCpjZipmIDEqZipnP2cgMT9oLW4qaG8t" +
    "bipJKmljZS1jcmVhbT9pY2VjcmVhbT9pY2VjcmVhbS0/ajBcNDE/ajAwMWE/ajAxP2shKmsnKmstKmsx" +
    "KmthKmxpc3QqbTAwMDNhMDA1YSptMDAzYTAwMDVhKm0wMDNhMDA1Km0wMDNhMDA1YSpuMTIqbjEzKm8t" +
    "bjAxMypvLW4xMipvLW40P28tbjQhP28tbjR6P28tbjlhLWI1Km8tbjlhYjUqb24wMTMqb24xMipvbjQ/" +
    "b240IT9vbjR6P29uOWEtYjUqb245YWI1Km/CrW4wMTMqb8KtbjEyKnAwMCpwMDEqcDAxwr0hKnAwMcK9" +
    "KnAwMcK9YSpwMDHCvcK+KnAwMipwMMK9KnEtbjAxMypxLW4xMipxbjAxMypxbjEyKnItMDAhKnItMDAh" +
    "NSpyLTAwIe+8lSpyLTAwYSpyLe+8kFwxIS01KnIt77yQXDEhLe+8lSpyLe+8kFwxISpyLe+8kFwxITUq" +
    "ci3vvJBcMSHvvJUqci3vvJBcMWEqci3vvJBcMyE1KnIwMCEqcjAwLTUqcjAwLjUqcjAwNSpyMDBhKnIw" +
    "NSpyMDYqcjQqcjUqctmg2aYqctmkKnLZpSpy27Dbtipy27Qqctu1KnLfgN+GKnLfhCpy34UqcuClpuCl" +
    "rCpy4KWqKnLgpasqcuCnpuCnrCpy4KeqKnLgp6sqcuCppuCprCpy4KmqKnLgqasqcuCrpuCrrCpy4Kuq" +
    "KnLgq6sqcuCtpuCtrCpy4K2qKnLgrasqcuCvpuCvrCpy4K+qKnLgr6sqcuCxpuCxrCpy4LGqKnLgsasq" +
    "cuCzpuCzrCpy4LOqKnLgs6sqcuC1puC1rCpy4LWqKnLgtasqcuC5kOC5lipy4LmUKnLguZUqcuC7kOC7" +
    "lipy4LuUKnLgu5UqcuC8oOC8pipy4LykKnLgvKUqcuGBgOGBhipy4YGEKnLhgYUqcuGCkOGClipy4YKU" +
    "KnLhgpUqcuGfoOGfpipy4Z+kKnLhn6UqcuGgkOGglipy4aCUKnLhoJUqcuGlhuGljCpy4aWKKnLhpYsq" +
    "cuGnkOGnlipy4aeUKnLhp5UqcuGtkOGtlipy4a2UKnLhrZUqcuGusOGutipy4a60KnLhrrUqcuGxgOGx" +
    "hipy4bGEKnLhsYUqcuGxkOGxlipy4bGUKnLhsZUqcuqYoFwx6pilKnLqmKDqmKUqcuqYoOqYpipy6pik" +
    "KnLqmKUqcuqjkOqjlipy6qOUKnLqo5UqcuqkgOqkhipy6qSEKnLqpIUqcuqpkOqplipy6qmUKnLqqZUq" +
    "cvCQkqAqcvCQkqUqcvCdn5gqcvCdn50qcu+8kFwxISpy77yQXDEt77yVKnLvvJBcMS7vvJUqcu+8kFwx" +
    "YSpy77yQXDHqmKUqcu+8kFwx77yO77yVKnLvvJBcMe+8lSpy77yQ77yVKnLvvJDvvJYqcu+8lCpy77yV" +
    "KnNpKnPEsSp0ZXN02aIqdGVzdNmi2aAqdGVzdNmjKnVBZS0qdWFlKnViZS0qdUJlKnVjZS0xw6kqdWNl" +
    "McOpLSp1Y2Uxw6kqdWPDqS0xZSp1Y8OpMWUtKnVjw6kxZSp3ZWlhMSp3ZWlhMip3ZWlzczEqd2Vpc3My" +
    "KndlaXoxKndlaXoyKndlacOfMSp3ZWnDnzIqeSBhMyp5IGE0KnknYTMqeSdhNCp5K2EzKnkrYTQqeS1h" +
    "Myp5LWE0KnlhMyp5YTQqej96IDA1MD96IDIxP3ohMjE/ejIwP3oyMj96YTIxP3rCqTIxP1sxKl8xKsKt" +
    "bjEyKsKtbjEzKsSwKg==";
string[] fileNames = Encoding.UTF8.GetString(Convert.FromBase64String(encodedFileNames))
    .Replace("*", ".txt?").Split(new[] { "?" }, StringSplitOptions.RemoveEmptyEntries)
    .Select(n => expand(n)).ToArray();

JD
fuente

Las secciones de dígitos deben compararse en sección, es decir, 'abc12b' debe ser menor que 'abc123'.

SOUser

Puede probar los siguientes datos: cadena pública [] nombres de archivo = {"-abc12.txt", " abc12.txt", "1abc_2.txt", "a0000012.txt", "a0000012c.txt", "a000012.txt" , "a000012b.txt", "a012.txt", "a0000102.txt", "abc1_2.txt", "abc12 .txt", "abc12b.txt", "abc123.txt", "abccde.txt", " b0000.txt "," b00001.txt "," b0001.txt "," b001.txt "," c0000.txt "," c0000c.txt "," c00001.txt "," c000b.txt "," d0. 20.2b.txt "," d0.1000c.txt "," d0.2000y.txt "," d0.20000.2b.txt ","

SOUser

@XichenLi Gracias por el buen caso de prueba. Si deja que el Explorador de Windows clasifique esos archivos, obtendrá resultados diferentes según la versión de Windows que esté utilizando. Mi código ordena esos nombres de manera idéntica a Server 2003 (y presumiblemente XP), pero diferente a Windows 8. Si tengo la oportunidad, intentaré averiguar cómo lo está haciendo Windows 8 y actualizar mi código.

JD

3

Tiene error Índice fuera de rango

linquize

3

Gran solución! Cuando lo comparé en un escenario normal con alrededor de 10,000 archivos, fue más rápido que el ejemplo regex de Matthew y casi el mismo rendimiento que StrCmpLogicalW (). Hay un error menor en el código anterior: el "while (strA [jA] == zeroA) jA ++;" y "while (strB [jB] == zeroB) jB ++;" debe ser "while (jA <strA.Length && strA [jA] == zeroA) jA ++;" y "while (jB <strB.Length && strB [jB] == zeroB) jB ++;". De lo contrario, las cadenas que contienen solo ceros generarán una excepción.

kuroki

22

Solución C # pura para linq orderby:

http://zootfroot.blogspot.com/2009/09/natural-sort-compare-with-linq-orderby.html

public class NaturalSortComparer<T> : IComparer<string>, IDisposable
{
    private bool isAscending;

    public NaturalSortComparer(bool inAscendingOrder = true)
    {
        this.isAscending = inAscendingOrder;
    }

    #region IComparer<string> Members

    public int Compare(string x, string y)
    {
        throw new NotImplementedException();
    }

    #endregion

    #region IComparer<string> Members

    int IComparer<string>.Compare(string x, string y)
    {
        if (x == y)
            return 0;

        string[] x1, y1;

        if (!table.TryGetValue(x, out x1))
        {
            x1 = Regex.Split(x.Replace(" ", ""), "([0-9]+)");
            table.Add(x, x1);
        }

        if (!table.TryGetValue(y, out y1))
        {
            y1 = Regex.Split(y.Replace(" ", ""), "([0-9]+)");
            table.Add(y, y1);
        }

        int returnVal;

        for (int i = 0; i < x1.Length && i < y1.Length; i++)
        {
            if (x1[i] != y1[i])
            {
                returnVal = PartCompare(x1[i], y1[i]);
                return isAscending ? returnVal : -returnVal;
            }
        }

        if (y1.Length > x1.Length)
        {
            returnVal = 1;
        }
        else if (x1.Length > y1.Length)
        { 
            returnVal = -1; 
        }
        else
        {
            returnVal = 0;
        }

        return isAscending ? returnVal : -returnVal;
    }

    private static int PartCompare(string left, string right)
    {
        int x, y;
        if (!int.TryParse(left, out x))
            return left.CompareTo(right);

        if (!int.TryParse(right, out y))
            return left.CompareTo(right);

        return x.CompareTo(y);
    }

    #endregion

    private Dictionary<string, string[]> table = new Dictionary<string, string[]>();

    public void Dispose()
    {
        table.Clear();
        table = null;
    }
}

James McCormack
fuente

2

Ese código es en última instancia de codeproject.com/KB/recipes/NaturalComparer.aspx (que no está orientado a LINQ).

mhenry1384

2

La publicación del blog acredita a Justin Jones ( codeproject.com/KB/string/NaturalSortComparer.aspx ) para el IComparer, no para Pascal Ganaye.

James McCormack

1

Nota menor, esta solución ignora espacios que no son lo mismo que Windows, y no es tan bueno como el código de Matthew Horsley a continuación. Por lo tanto, puede obtener 'string01' 'string 01' 'string 02' 'string02' por ejemplo (que se ve feo). Si elimina la separación de espacios, ordena las cadenas hacia atrás, es decir, 'string01' viene antes que 'string 01', lo que puede o no ser aceptable.

Michael Parker

Esto funcionó para direcciones, es decir, "1 Smith Rd", "10 Smith Rd", "2 Smith Rd", etc. - Ordenadas naturalmente. ¡Si! ¡Buena esa!

Piotr Kula

Por cierto, noté (y los comentarios en esa página vinculada también parecen indicar) que el argumento Tipo <T> es completamente innecesario.

jv-dev

18

La respuesta de Matthews Horsleys es el método más rápido que no cambia el comportamiento según la versión de Windows en la que se esté ejecutando su programa. Sin embargo, puede ser aún más rápido creando la expresión regular una vez y usando RegexOptions.Compiled. También agregué la opción de insertar un comparador de cadenas para que pueda ignorar mayúsculas y minúsculas si es necesario, y mejorar un poco la legibilidad.

    public static IEnumerable<T> OrderByNatural<T>(this IEnumerable<T> items, Func<T, string> selector, StringComparer stringComparer = null)
    {
        var regex = new Regex(@"\d+", RegexOptions.Compiled);

        int maxDigits = items
                      .SelectMany(i => regex.Matches(selector(i)).Cast<Match>().Select(digitChunk => (int?)digitChunk.Value.Length))
                      .Max() ?? 0;

        return items.OrderBy(i => regex.Replace(selector(i), match => match.Value.PadLeft(maxDigits, '0')), stringComparer ?? StringComparer.CurrentCulture);
    }

Usar por

var sortedEmployees = employees.OrderByNatural(emp => emp.Name);

Esto toma 450ms para ordenar 100,000 cadenas en comparación con 300ms para la comparación predeterminada de cadenas .net, ¡bastante rápido!

Michael Parker
fuente

2

Vale la pena leer esto antes - Compilación y reutilización en expresiones regulares

mungflesh

16

Mi solución:

void Main()
{
    new[] {"a4","a3","a2","a10","b5","b4","b400","1","C1d","c1d2"}.OrderBy(x => x, new NaturalStringComparer()).Dump();
}

public class NaturalStringComparer : IComparer<string>
{
    private static readonly Regex _re = new Regex(@"(?<=\D)(?=\d)|(?<=\d)(?=\D)", RegexOptions.Compiled);

    public int Compare(string x, string y)
    {
        x = x.ToLower();
        y = y.ToLower();
        if(string.Compare(x, 0, y, 0, Math.Min(x.Length, y.Length)) == 0)
        {
            if(x.Length == y.Length) return 0;
            return x.Length < y.Length ? -1 : 1;
        }
        var a = _re.Split(x);
        var b = _re.Split(y);
        int i = 0;
        while(true)
        {
            int r = PartCompare(a[i], b[i]);
            if(r != 0) return r;
            ++i;
        }
    }

    private static int PartCompare(string x, string y)
    {
        int a, b;
        if(int.TryParse(x, out a) && int.TryParse(y, out b))
            return a.CompareTo(b);
        return x.CompareTo(y);
    }
}

Resultados:

1
a2
a3
a4
a10
b4
b5
b400
C1d
c1d2

mpen
fuente

Me gusta. Es fácil de entender y no requiere Linq.

11

Debe tener cuidado: recuerdo vagamente haber leído que StrCmpLogicalW, o algo así, no era estrictamente transitivo, y he observado que los métodos de ordenación de .NET a veces se atascan en bucles infinitos si la función de comparación rompe esa regla.

Una comparación transitiva siempre informará que a <c si a <by b <c. Existe una función que hace una comparación de orden de clasificación natural que no siempre cumple con ese criterio, pero no puedo recordar si es StrCmpLogicalW u otra cosa.

Jonathan Gilbert
fuente

¿Tiene alguna prueba de esta declaración? Después de buscar en Google, no puedo encontrar ninguna indicación de que sea cierto.

mhenry1384

1

He experimentado esos bucles infinitos con StrCmpLogicalW.

thd

3

@mhenry: consulte connect.microsoft.com/VisualStudio/feedback/details/236900/… y codeproject.com/KB/cs/StringCompareBug.aspx

Brian el

El elemento de comentarios de Visual Studio 236900 ya no existe, pero aquí hay uno más actualizado que confirma el problema: connect.microsoft.com/VisualStudio/feedback/details/774540/... También ofrece una solución alternativa : CultureInfotiene una propiedad CompareInfo, y el objeto que devuelve puede proporcionarle SortKeyobjetos. Estos, a su vez, pueden compararse y garantizar la transitividad.

Jonathan Gilbert

9

Este es mi código para ordenar una cadena que tiene caracteres alfabéticos y numéricos.

Primero, este método de extensión:

public static IEnumerable<string> AlphanumericSort(this IEnumerable<string> me)
{
    return me.OrderBy(x => Regex.Replace(x, @"\d+", m => m.Value.PadLeft(50, '0')));
}

Luego, simplemente utilícelo en cualquier parte de su código como este:

List<string> test = new List<string>() { "The 1st", "The 12th", "The 2nd" };
test = test.AlphanumericSort();

Cómo funciona ? Al reemplazar con ceros:

  Original  | Regex Replace |      The      |   Returned
    List    | Apply PadLeft |    Sorting    |     List
            |               |               |
 "The 1st"  |  "The 001st"  |  "The 001st"  |  "The 1st"
 "The 12th" |  "The 012th"  |  "The 002nd"  |  "The 2nd"
 "The 2nd"  |  "The 002nd"  |  "The 012th"  |  "The 12th"

Funciona con números múltiples:

 Alphabetical Sorting | Alphanumeric Sorting
                      |
 "Page 21, Line 42"   | "Page 3, Line 7"
 "Page 21, Line 5"    | "Page 3, Line 32"
 "Page 3, Line 32"    | "Page 21, Line 5"
 "Page 3, Line 7"     | "Page 21, Line 42"

Espero que eso ayude.

Picsonald
fuente

6

Agregando a la respuesta de Greg Beech (porque he estado buscando eso), si quieres usar esto de Linq, puedes usar el OrderByque toma un IComparer. P.ej:

var items = new List<MyItem>();

// fill items

var sorted = items.OrderBy(item => item.Name, new NaturalStringComparer());

Wilka
fuente

2

Aquí hay un ejemplo relativamente simple que no usa P / Invoke y evita cualquier asignación durante la ejecución.

internal sealed class NumericStringComparer : IComparer<string>
{
    public static NumericStringComparer Instance { get; } = new NumericStringComparer();

    public int Compare(string x, string y)
    {
        // sort nulls to the start
        if (x == null)
            return y == null ? 0 : -1;
        if (y == null)
            return 1;

        var ix = 0;
        var iy = 0;

        while (true)
        {
            // sort shorter strings to the start
            if (ix >= x.Length)
                return iy >= y.Length ? 0 : -1;
            if (iy >= y.Length)
                return 1;

            var cx = x[ix];
            var cy = y[iy];

            int result;
            if (char.IsDigit(cx) && char.IsDigit(cy))
                result = CompareInteger(x, y, ref ix, ref iy);
            else
                result = cx.CompareTo(y[iy]);

            if (result != 0)
                return result;

            ix++;
            iy++;
        }
    }

    private static int CompareInteger(string x, string y, ref int ix, ref int iy)
    {
        var lx = GetNumLength(x, ix);
        var ly = GetNumLength(y, iy);

        // shorter number first (note, doesn't handle leading zeroes)
        if (lx != ly)
            return lx.CompareTo(ly);

        for (var i = 0; i < lx; i++)
        {
            var result = x[ix++].CompareTo(y[iy++]);
            if (result != 0)
                return result;
        }

        return 0;
    }

    private static int GetNumLength(string s, int i)
    {
        var length = 0;
        while (i < s.Length && char.IsDigit(s[i++]))
            length++;
        return length;
    }
}

No ignora los ceros a la izquierda, así que 01viene después 2.

Prueba de unidad correspondiente:

public class NumericStringComparerTests
{
    [Fact]
    public void OrdersCorrectly()
    {
        AssertEqual("", "");
        AssertEqual(null, null);
        AssertEqual("Hello", "Hello");
        AssertEqual("Hello123", "Hello123");
        AssertEqual("123", "123");
        AssertEqual("123Hello", "123Hello");

        AssertOrdered("", "Hello");
        AssertOrdered(null, "Hello");
        AssertOrdered("Hello", "Hello1");
        AssertOrdered("Hello123", "Hello124");
        AssertOrdered("Hello123", "Hello133");
        AssertOrdered("Hello123", "Hello223");
        AssertOrdered("123", "124");
        AssertOrdered("123", "133");
        AssertOrdered("123", "223");
        AssertOrdered("123", "1234");
        AssertOrdered("123", "2345");
        AssertOrdered("0", "1");
        AssertOrdered("123Hello", "124Hello");
        AssertOrdered("123Hello", "133Hello");
        AssertOrdered("123Hello", "223Hello");
        AssertOrdered("123Hello", "1234Hello");
    }

    private static void AssertEqual(string x, string y)
    {
        Assert.Equal(0, NumericStringComparer.Instance.Compare(x, y));
        Assert.Equal(0, NumericStringComparer.Instance.Compare(y, x));
    }

    private static void AssertOrdered(string x, string y)
    {
        Assert.Equal(-1, NumericStringComparer.Instance.Compare(x, y));
        Assert.Equal( 1, NumericStringComparer.Instance.Compare(y, x));
    }
}

Drew Noakes
fuente

2

De hecho, lo he implementado como un método de extensión StringComparerpara que puedas hacer, por ejemplo:

StringComparer.CurrentCulture.WithNaturalSort() o
StringComparer.OrdinalIgnoreCase.WithNaturalSort().

El resultante IComparer<string>se puede utilizar en todos los lugares como OrderBy, OrderByDescending, ThenBy, ThenByDescending, SortedSet<string>, etc, y que pueda mayúsculas y minúsculas todavía fácilmente pellizco, la cultura, etc.

La implementación es bastante trivial y debería funcionar bastante bien incluso en secuencias grandes.

También lo publiqué como un pequeño paquete NuGet , así que puedes hacer lo siguiente:

Install-Package NaturalSort.Extension

El código que incluye comentarios de documentación XML y un conjunto de pruebas está disponible en el repositorio NaturalSort.Extension GitHub .

El código completo es este (si aún no puede usar C # 7, simplemente instale el paquete NuGet):

public static class StringComparerNaturalSortExtension
{
    public static IComparer<string> WithNaturalSort(this StringComparer stringComparer) => new NaturalSortComparer(stringComparer);

    private class NaturalSortComparer : IComparer<string>
    {
        public NaturalSortComparer(StringComparer stringComparer)
        {
            _stringComparer = stringComparer;
        }

        private readonly StringComparer _stringComparer;
        private static readonly Regex NumberSequenceRegex = new Regex(@"(\d+)", RegexOptions.Compiled | RegexOptions.CultureInvariant);
        private static string[] Tokenize(string s) => s == null ? new string[] { } : NumberSequenceRegex.Split(s);
        private static ulong ParseNumberOrZero(string s) => ulong.TryParse(s, NumberStyles.None, CultureInfo.InvariantCulture, out var result) ? result : 0;

        public int Compare(string s1, string s2)
        {
            var tokens1 = Tokenize(s1);
            var tokens2 = Tokenize(s2);

            var zipCompare = tokens1.Zip(tokens2, TokenCompare).FirstOrDefault(x => x != 0);
            if (zipCompare != 0)
                return zipCompare;

            var lengthCompare = tokens1.Length.CompareTo(tokens2.Length);
            return lengthCompare;
        }
        
        private int TokenCompare(string token1, string token2)
        {
            var number1 = ParseNumberOrZero(token1);
            var number2 = ParseNumberOrZero(token2);

            var numberCompare = number1.CompareTo(number2);
            if (numberCompare != 0)
                return numberCompare;

            var stringCompare = _stringComparer.Compare(token1, token2);
            return stringCompare;
        }
    }
}

Tom Pažourek
fuente

2

Aquí hay una ingenua forma de LINQ sin expresiones regulares de una línea (prestada de python):

var alphaStrings = new List<string>() { "10","2","3","4","50","11","100","a12","b12" };
var orderedString = alphaStrings.OrderBy(g => new Tuple<int, string>(g.ToCharArray().All(char.IsDigit)? int.Parse(g) : int.MaxValue, g));
// Order Now: ["2","3","4","10","11","50","100","a12","b12"]

mshsayem
fuente

Se eliminó Dump () y se asignó a var, ¡y esto funciona de maravilla!

Arne S

@ArneS: fue escrito en LinQPad; y olvidé eliminar el Dump(). Gracias por señalarlo.

mshsayem

1

Ampliando un par de respuestas anteriores y haciendo uso de métodos de extensión, se me ocurrió lo siguiente que no tiene las advertencias de una posible enumeración enumerable múltiple o problemas de rendimiento relacionados con el uso de múltiples objetos regex, o llamar a regex innecesariamente, que Dicho esto, utiliza ToList (), que puede negar los beneficios en colecciones más grandes.

El selector admite la tipificación genérica para permitir que se asigne cualquier delegado, el selector muta los elementos de la colección de origen y luego los convierte en cadenas con ToString ().

    private static readonly Regex _NaturalOrderExpr = new Regex(@"\d+", RegexOptions.Compiled);

    public static IEnumerable<TSource> OrderByNatural<TSource, TKey>(
        this IEnumerable<TSource> source, Func<TSource, TKey> selector)
    {
        int max = 0;

        var selection = source.Select(
            o =>
            {
                var v = selector(o);
                var s = v != null ? v.ToString() : String.Empty;

                if (!String.IsNullOrWhiteSpace(s))
                {
                    var mc = _NaturalOrderExpr.Matches(s);

                    if (mc.Count > 0)
                    {
                        max = Math.Max(max, mc.Cast<Match>().Max(m => m.Value.Length));
                    }
                }

                return new
                {
                    Key = o,
                    Value = s
                };
            }).ToList();

        return
            selection.OrderBy(
                o =>
                String.IsNullOrWhiteSpace(o.Value) ? o.Value : _NaturalOrderExpr.Replace(o.Value, m => m.Value.PadLeft(max, '0')))
                     .Select(o => o.Key);
    }

    public static IEnumerable<TSource> OrderByDescendingNatural<TSource, TKey>(
        this IEnumerable<TSource> source, Func<TSource, TKey> selector)
    {
        int max = 0;

        var selection = source.Select(
            o =>
            {
                var v = selector(o);
                var s = v != null ? v.ToString() : String.Empty;

                if (!String.IsNullOrWhiteSpace(s))
                {
                    var mc = _NaturalOrderExpr.Matches(s);

                    if (mc.Count > 0)
                    {
                        max = Math.Max(max, mc.Cast<Match>().Max(m => m.Value.Length));
                    }
                }

                return new
                {
                    Key = o,
                    Value = s
                };
            }).ToList();

        return
            selection.OrderByDescending(
                o =>
                String.IsNullOrWhiteSpace(o.Value) ? o.Value : _NaturalOrderExpr.Replace(o.Value, m => m.Value.PadLeft(max, '0')))
                     .Select(o => o.Key);
    }

Vorspire
fuente

1

Inspirada en la solución de Michael Parker, aquí hay una IComparerimplementación que puede incluir en cualquiera de los métodos de pedido de linq:

private class NaturalStringComparer : IComparer<string>
{
    public int Compare(string left, string right)
    {
        int max = new[] { left, right }
            .SelectMany(x => Regex.Matches(x, @"\d+").Cast<Match>().Select(y => (int?)y.Value.Length))
            .Max() ?? 0;

        var leftPadded = Regex.Replace(left, @"\d+", m => m.Value.PadLeft(max, '0'));
        var rightPadded = Regex.Replace(right, @"\d+", m => m.Value.PadLeft(max, '0'));

        return string.Compare(leftPadded, rightPadded);
    }
}

Oliver
fuente

0

Necesitábamos un tipo natural para tratar el texto con el siguiente patrón:

"Test 1-1-1 something"
"Test 1-2-3 something"
...

Por alguna razón, cuando vi por primera vez SO, no encontré esta publicación e implementé la nuestra. En comparación con algunas de las soluciones presentadas aquí, aunque es similar en concepto, podría tener el beneficio de ser más simple y fácil de entender. Sin embargo, aunque intenté ver los cuellos de botella de rendimiento, sigue siendo una implementación mucho más lenta que la predeterminada OrderBy().

Aquí está el método de extensión que implemento:

public static class EnumerableExtensions
{
    // set up the regex parser once and for all
    private static readonly Regex Regex = new Regex(@"\d+|\D+", RegexOptions.Compiled | RegexOptions.Singleline);

    // stateless comparer can be built once
    private static readonly AggregateComparer Comparer = new AggregateComparer();

    public static IEnumerable<T> OrderByNatural<T>(this IEnumerable<T> source, Func<T, string> selector)
    {
        // first extract string from object using selector
        // then extract digit and non-digit groups
        Func<T, IEnumerable<IComparable>> splitter =
            s => Regex.Matches(selector(s))
                      .Cast<Match>()
                      .Select(m => Char.IsDigit(m.Value[0]) ? (IComparable) int.Parse(m.Value) : m.Value);
        return source.OrderBy(splitter, Comparer);
    }

    /// <summary>
    /// This comparer will compare two lists of objects against each other
    /// </summary>
    /// <remarks>Objects in each list are compare to their corresponding elements in the other
    /// list until a difference is found.</remarks>
    private class AggregateComparer : IComparer<IEnumerable<IComparable>>
    {
        public int Compare(IEnumerable<IComparable> x, IEnumerable<IComparable> y)
        {
            return
                x.Zip(y, (a, b) => new {a, b})              // walk both lists
                 .Select(pair => pair.a.CompareTo(pair.b))  // compare each object
                 .FirstOrDefault(result => result != 0);    // until a difference is found
        }
    }
}

La idea es dividir las cadenas originales en bloques de dígitos y no dígitos ( "\d+|\D+"). Dado que esta es una tarea potencialmente costosa, se realiza solo una vez por entrada. Luego usamos un comparador de objetos comparables (lo siento, no puedo encontrar una manera más adecuada de decirlo). Compara cada bloque con su bloque correspondiente en la otra cadena.

Me gustaría recibir comentarios sobre cómo esto podría mejorarse y cuáles son los principales defectos. Tenga en cuenta que la mantenibilidad es importante para nosotros en este momento y actualmente no la estamos utilizando en conjuntos de datos extremadamente grandes.

Eric Liprandi
fuente

1

Esto falla cuando intenta comparar cadenas que son estructuralmente diferentes, por ejemplo, comparar "a-1" con "a-2" funciona bien, pero comparar "a" con "1" no lo es, porque "a" .CompareTo (1) lanza una excepción.

jimrandomh

@ jimrandomh, tienes razón. Este enfoque fue específico para nuestros patrones.

Eric Liprandi

0

Una versión que es más fácil de leer / mantener.

public class NaturalStringComparer : IComparer<string>
{
    public static NaturalStringComparer Instance { get; } = new NaturalStringComparer();

    public int Compare(string x, string y) {
        const int LeftIsSmaller = -1;
        const int RightIsSmaller = 1;
        const int Equal = 0;

        var leftString = x;
        var rightString = y;

        var stringComparer = CultureInfo.CurrentCulture.CompareInfo;

        int rightIndex;
        int leftIndex;

        for (leftIndex = 0, rightIndex = 0;
             leftIndex < leftString.Length && rightIndex < rightString.Length;
             leftIndex++, rightIndex++) {
            var leftChar = leftString[leftIndex];
            var rightChar = rightString[leftIndex];

            var leftIsNumber = char.IsNumber(leftChar);
            var rightIsNumber = char.IsNumber(rightChar);

            if (!leftIsNumber && !rightIsNumber) {
                var result = stringComparer.Compare(leftString, leftIndex, 1, rightString, leftIndex, 1);
                if (result != 0) return result;
            } else if (leftIsNumber && !rightIsNumber) {
                return LeftIsSmaller;
            } else if (!leftIsNumber && rightIsNumber) {
                return RightIsSmaller;
            } else {
                var leftNumberLength = NumberLength(leftString, leftIndex, out var leftNumber);
                var rightNumberLength = NumberLength(rightString, rightIndex, out var rightNumber);

                if (leftNumberLength < rightNumberLength) {
                    return LeftIsSmaller;
                } else if (leftNumberLength > rightNumberLength) {
                    return RightIsSmaller;
                } else {
                    if(leftNumber < rightNumber) {
                        return LeftIsSmaller;
                    } else if(leftNumber > rightNumber) {
                        return RightIsSmaller;
                    }
                }
            }
        }

        if (leftString.Length < rightString.Length) {
            return LeftIsSmaller;
        } else if(leftString.Length > rightString.Length) {
            return RightIsSmaller;
        }

        return Equal;
    }

    public int NumberLength(string str, int offset, out int number) {
        if (string.IsNullOrWhiteSpace(str)) throw new ArgumentNullException(nameof(str));
        if (offset >= str.Length) throw new ArgumentOutOfRangeException(nameof(offset), offset, "Offset must be less than the length of the string.");

        var currentOffset = offset;

        var curChar = str[currentOffset];

        if (!char.IsNumber(curChar))
            throw new ArgumentException($"'{curChar}' is not a number.", nameof(offset));

        int length = 1;

        var numberString = string.Empty;

        for (currentOffset = offset + 1;
            currentOffset < str.Length;
            currentOffset++, length++) {

            curChar = str[currentOffset];
            numberString += curChar;

            if (!char.IsNumber(curChar)) {
                number = int.Parse(numberString);

                return length;
            }
        }

        number = int.Parse(numberString);

        return length;
    }
}

Kelly Elton
fuente

-2

Permítanme explicar mi problema y cómo pude resolverlo.

Problema: - Ordene los archivos basados en FileName de los objetos FileInfo que se recuperan de un Directorio.

Solución: seleccioné los nombres de archivo de FileInfo y recorté la parte ".png" del nombre del archivo. Ahora, simplemente haga List.Sort (), que clasifica los nombres de archivo en orden de clasificación natural. Según mis pruebas, descubrí que tener .png desordena el orden de clasificación. Echa un vistazo al siguiente código

var imageNameList = new DirectoryInfo(@"C:\Temp\Images").GetFiles("*.png").Select(x =>x.Name.Substring(0, x.Name.Length - 4)).ToList();
imageNameList.Sort();

girishkatta9
fuente

¿Puedo saber el motivo de -1 en esta respuesta?

girishkatta9

Orden de clasificación natural en C #

Respuestas: