Me gustaría extraer caracteres entre el primer y segundo guión bajo de los nombres de archivo en una carpeta y contar ese tipo de archivos presentes en ella. La carpeta contiene archivos en un formato particular como:
2305195303310_ABC_A08_1378408840043.hl7
2305195303310_ABC_A08_1378408840043.hl7
Q37984932T467566261_DEF_R03_1378825633215.hl7
37982442T467537201_DEF_R03_1378823455384.hl7
37982442T467537201_MNO_R03_1378823455384.hl7
2305195303310_ABC_A08_1378408840053.hl7
Q37984932T467566261_DEF_R03_1378825633215.hl7
37982442T467537201_MNO_R03_1378823455384.hl7
y así
El resultado del script debería darme el resultado como:
ABC 3
DEF 3
MNO 2
fuente
echo
for everything; just dols *_*_* | ...
; instead of grep and sort and uniq, count lines with something in an associative array in awk directly:ls *_*_* | cut -d_ -f2 | awk '/./ { count[$1]++; } END {for (f in count) { print f, count[f]; } }'
(note: you can do the cut part in awk as well, of course, but for a comment field that's too awkward (pun unavoidable)).ls
should always be avoided and breaks on many things, first and foremost names containing spaces. Apart from that, using coreutils is i) faster and ii) more portable than implementing gawk and (ls
is notoriously non-portable between systems and locales) iii) more faithful to the *nix way. Of course you can do it with a script but why when you have compiled executables that can do it for you?/bin/ls -1 *_*_*
will be no worse than the for loop (which will fork/exec a pipeline for every single file, uselessly, instead of once). Your portability comment is a red herring: that awk code is portable across POSIX.ls -1
being equivalent, I was thinking ofls
alone.time bash -c 'for i in a_b_c d_e_f aa_b_c dd_e_f ; do echo "$i" | cut -d_ -f2 ; done | grep . | sort | uniq -c'
vstime bash -c 'printf "%s\\n" a_b_c d_e_f aa_b_c dd_e_f | cut -d_ -f2 | awk '\''/./ { count[$1]++; } END {for (f in count) { print f, count[f]; } }'\'
- more files will just make the iteration slower. On the workstation, the loop takes 15ms vs 8 for a single pipe; macbook takes 24 vs 12.fuente