¿Usando una tabla o script en sed para reemplazar muchos caracteres especiales con caracteres de escape?

Tenga en cuenta que ^^ para ^y ^| para |y ^& para &... no son un requisito de sed. los ^ Personaje de escape Es requerido por el CMD-shell. Si su texto no está expuesto a la línea de comandos ni a un parámetro de comando en un script de comando .cmd / .bat, solo debe considerar sed personaje de escape que es una barra invertida \ ... Son dos ámbitos bastante separados (que pueden superponerse, por lo que a menudo es mejor mantenerlo todo dentro del alcance del sed, como lo hace el siguiente.

Aquí hay un sed guión que reemplazará cualquier número de encontrar cadenas Se especifica, con su complementario. cadena de reemplazo . El formato general de las cuerdas es un cruce entre una sed comando de sustitución ( s / abc / xyz / p ) y un formato tabular. Puedes "estirar" el delimitador medio para que puedas alinear las cosas.
Puede utilizar un patrón de cadena FIJA ( F/... ), o un normal estilo sed patrón de expresión regular ( s / ... ) ... y puedes ajustar sed -n y cada /p (en table.txt) según sea necesario.

Necesita 3 archivos para una ejecución mínima (y un 4to, derivado dinámicamente de table.txt):

el guión principal table-to-regex.sed
el archivo de la tabla table.txt
el archivo objetivo file-to-change.text
guión derivado tabla-derivado.sed

Para ejecutar una tabla contra un archivo de destino.

sed -nf table-to-regex.sed  table.txt > table-derrived.sed
# Here, check `table-derrived.sed` for errors as described in the example *table.txt*.  

sed -nf table-derrived.sed  file-to-change.txt
# Redirect *sed's* output via `>` or `>>` as need be, or use `sed -i -nf`

Si quieres correr table.txt contra muchos archivos, simplemente coloque el fragmento de código anterior en un bucle simple para satisfacer sus necesidades. Puedo hacerlo trivialmente en golpetazo , pero alguien más consciente de Windows CMD-shell sería más adecuado que yo para configurarlo.

Aquí está el guión: table-to-regex.sed

s/[[:space:]]*$//  # remove trailing whitespace

/^$\|^[[:space:]]*#/{p; b}  # empty and sed-style comment lines: print and branch
                            # printing keeps line numbers; for referencing errors

/^\([Fs]\)\(.\)\(.*\2\)\{4\}/{  # too many delims ERROR
      s/^/# error + # /p        # print a flagged/commented error
      b }                       # branch

/^\([Fs]\)\(.\)\(.*\2\)\{3\}/{                  # this may be a long-form 2nd delimiter
   /^\([Fs]\)\(.\)\(.*\2[[:space:]]*\2.*\2\)/{  # is long-form 2nd delimiter OK?
      s/^\([Fs]\)\(.\)\(.*\)\2[[:space:]]*\2\(.*\)\2\(.*\)/\1\2\n\3\n\4\n\5/
      t OK                                      # branch on true to :OK
   }; s/^/# error L # /p                        # print a flagged/commented error
      b }                                       # branch: long-form 2nd delimiter ERROR

/^\([Fs]\)\(.\)\(.*\2\)\{2\}/{     # this may be short-form delimiters
   /^\([Fs]\)\(.\)\(.*\2.*\2\)/{   # is short-form delimiters OK?
      s/^\([Fs]\)\(.\)\(.*\)\2\(.*\)\2\(.*\)/\1\2\n\3\n\4\n\5/
      t OK                         # branch on true to :OK  
   }; s/^/# error S # /p           # print a flagged/commented error
      b }                          # branch: short-form delimiters ERROR

{ s/^/# error - # /p        # print a flagged/commented error
  b }                       # branch: too few delimiters ERROR

:OK     # delimiters are okay
#============================
h   # copy the pattern-space to the hold space

# NOTE: /^s/ lines are considered to contain regex patterns, not FIXED strings.
/^s/{    s/^s\(.\)\n/s\1/   # shrink long-form delimiter to short-form
     :s; s/^s\(.\)\([^\n]*\)\n/s\1\2\1/; t s  # branch on true to :s 
      p; b }                                  # print and branch

# The following code handles FIXED-string /^F/ lines

s/^F.\n\([^\n]*\)\n.*/\1/  # isolate the literal find-string in the pattern-space
s/[]\/$*.^|[]/\\&/g        # convert the literal find-string into a regex of itself
H                          # append \n + find-regex to the hold-space

g   # Copy the modified hold-space back into the pattern-space

s/^F.\n[^\n]*\n\([^\n]*\)\n.*/\1/  # isolate the literal repl-string in the pattern-space
s/[\/&]/\\&/g                      # convert the literal repl-string into a regex of itself
H                                  # append \n + repl-regex to the hold-space

g   # Copy the modified hold-space back into the pattern-space

# Rearrange pattern-space into a / delimited command: s/find/repl/...      
s/^\(F.\)\n\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\)$/s\/\5\/\6\/\4/

p   # Print the modified find-and-replace regular expression line

Aquí hay un archivo de tabla de ejemplo, con una descripción de cómo funciona: table.txt

# The script expects an input table file, which can contain 
#   comment, blank, and substitution lines. The text you are
#   now reading is part of an input table file.

# Comment lines begin with optional whitespace followed by #

# Each substitution line must start with: 's' or 'F'
#  's' lines are treated as a normal `sed` substitution regular expressions
#  'F' lines are considered to contain `FIXED` (literal) string expressions 
# The 's' or 'F' must be followed by the 1st of 3 delimiters   
#   which must not appear elsewhere on the same line.
# A pre-test is performed to ensure conformity. Lines with 
#   too many or too few delimiters, or no 's' or 'F', are flagged   
#   with the text '# error ? #', which effectively comments them out.
#   '?' can be: '-' too few, '+' too many, 'L' long-form, 'S' short-form
#   Here is an example of a long-form error, as it appears in the output. 

# error L # s/example/(7+3)/2=5/

# 1st delimiter, eg '/' must be a single character.
# 2nd (middle) delimiter has two possible forms:
#   Either it is exactly the same as the 1st delimiter: '/' (short-form)
#   or it has a double-form for column alignment: '/      /' (long-form)
#   The long-form can have any anount of whitespace between the 2 '/'s   
# 3rd delimiter must be the same as the 1st delimiter,

# After the 3rd delimiter, you can put any of sed's 
#    substitution commands, eg. 'g'

# With one condition, a trailing '#' comment to 's' and 'F' lines is
#    valid. The condition is that no delimiter character can be in the 
#    comment (delimiters must not appear elsewhere on the same line)

# For 's' type lines, it is implied that *you* have included all the 
#    necessary sed-escape characters!  The script does not add any 
#    sed-escape characters for 's' type lines. It will, however, 
#    convert a long-form middle-delimiter into a short-form delimiter.   

# For 'F' type lines, it is implied that both strings (find and replace) 
#    are FIXED/literal-strings. The script does add the  necessary 
#    sed-escape characters for 'F' type lines. It will also 
#    convert a long-form middle-delimiter into a short-form delimiter.   

# The result is a sed-script which contains one sed-substitution 
#    statement per line; it is just a modified version of your 
#    's' and 'F' strings "table" file.

# Note that the 1st delimiter is *always* in column 2.

# Here are some sample 's' and 'F' lines, with comments:
#

F/abc/ABC/gp               #-> These 3 are the same for 's' and 'F', 
s/abc/ABC/gp               #-> as no characters need to be escaped,  
s/abc/         /ABC/gp     #-> and the 2nd delimiter shrinks to one  

F/^F=Fixed/    /\1okay/p   # \1 is okay here, It is a FIXED literal
s|^s=sed regex||\1FAIL|p   # \1 will FAIL: back-reference not defined!

F|\\\\|////|               # this line == next line 
F|\\\\|        |////|p     # this line == previous line  
s|\\\\|        |////|p     # this line is different; 's' vs 'F'

F_Hello! ^.&`//\\*$/['{'$";"`_    _Ciao!_   # literal find / replace

Aquí hay un archivo de entrada de muestra cuyo texto desea cambiar: file-to-change.text

abc abc
^F=Fixed
   s=sed regex
\\\\ \\\\ \\\\ \\\\
Hello! ^.&`//\\*$/['{'$";"`
some non-matching text

Peter.O
fuente

Estoy tratando de entender ... copio tus textos y los pego en 3 archivos: table-to-regex.sed , table.txt , tabla-derivado.sed . Pero cuando intento ejecutar el segundo comando sed -nf table-derivado.sed file-to-change.text El CMD me da este error: img152.imageshack.us/img152/30/76715330.png Espero haber hecho lo correcto al copiar el texto literalmente en 3 archivos

user143822

Sí, eso es exactamente lo que se supone que sucede ... Esa línea en particular está ahí para crear intencionalmente un error; para mostrarte que no hacer .. Leer el comentario de la línea. Todas las lineas en table.txt tienen comentarios que describen lo que hacen ... Aquí está la línea 58 en table.txt : s|^s=sed regex||\1FAIL| # \1 will FAIL: back-reference not defined! . Esa linea tiene la || cambiado a | en tabla-derivado.sed convertirse: s|^s=sed regex|\1FAIL| # \1 will FAIL: back-reference not defined!

Peter.O

continuado ... Esa linea particular es una s tipo, no un F Tipo, como se indica en el guión principal. table-to-regex.sed : # NOTE: /^s/ lines are considered to contain regex patterns, not FIXED strings. ... Si proporciona una mala expresión regular, fallará. El error es causado porque ella ve. \1 en un referencia inversa , pero no se definió dicha referencia en el patrón de búsqueda. No hay forma de atrapar ese error, salvo escribir un analizador de expresiones regulares completo. Ese tipo de error no debería ocurrir con el tipo F líneas, porque se tratan como FIJAS (no expresiones regulares)

Peter.O

Los sitios de StackExchange tienen servicios de chat especiales para discusiones extensas como esta. Desafortunadamente, no puedo hacer que el mío funcione, pero si desea discutir esto más a fondo y tiene un cliente de IRC ( mIRC es el más común para Windows), entonces podemos chatear en irc.freenode.com (solo puedes escribir /connect irc.freenode.com en la línea de comandos de mIRC, y luego, cuando se establezca la conexión, escriba este comando: /join #su447178 .. Estoy ahí ahora ... ci vediamo.

Peter.O

muchas gracias..remove pag de table.txt y la norte desde el segundo comando

user143822

¿Usando una tabla o script en sed para reemplazar muchos caracteres especiales con caracteres de escape?

Respuestas: