grep para ignorar patrones

Estoy extrayendo URL de un sitio web usando cURL como se muestra a continuación.

curl www.somesite.com | grep "<a href=.*title=" > new.txt

Mi archivo new.txt es el siguiente.

<a href="http://website1.com" title="something">
<a href="http://website1.com" information="something" title="something">
<a href="http://website2.com" title="some_other_thing">
<a href="http://website2.com" information="something" title="something">
<a href="http://websitenotneeded.com" title="something NOTNEEDED">

Sin embargo, necesito extraer solo la información a continuación.

<a href="http://website1.com" title="something">
<a href="http://website2.com" information="something" title="something">

Estoy tratando de ignorar los <a hrefque tienen información en ellos y cuyo título termina con NOTNEEDED .

¿Cómo puedo modificar mi declaración grep?

grep Ramesh
fuente

¿La salida que está mostrando aquí es correcta? El texto que lo describe no tiene sentido junto con este ejemplo.

slm

¿No estás buscando curl www.somesite.com | grep "<a href=.*title=" | grep -v NOTNEEDED > new.txt?

terdon

@terdon, exactamente eso era lo que estaba buscando. Puedo aceptarlo como respuesta si lo publicas.

Ramesh

Ramesh, es básicamente la respuesta de @ slm. Lo acabo de editar para que pueda aceptarlo.

terdon

oh sí, no me di cuenta de que la tubería era tan poderosa. Lo he aceptado como respuesta. ¡Gracias!

Ramesh

Respuestas:

No estoy siguiendo completamente tu ejemplo + la descripción, pero parece que lo que quieres es esto:

$ grep -v "<a href=.*title=.*NOTNEEDED" sample.txt 
<a href="http://website1.com" title="something">
<a href="http://website1.com" information="something" title="something">
<a href="http://website2.com" title="some_other_thing">
<a href="http://website2.com" information="something" title="something">

Entonces, para su ejemplo:

$ curl www.example.com | grep -v "<a href=.*title=" | grep -v NOTNEEDED > new.txt

slm
fuente

Tengo una clase en la sección <a href. Básicamente, no quiero eso en mi salida.

Ramesh

La página del manual de grep dice:

-v, --invert-match
    Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX .)

Puede usar expresiones regulares para múltiples inversiones:

grep -v 'red\|green\|blue'

grep -v red | grep -v green | grep -v blue

YesThatIsMyName
fuente