Tengo un archivo de gran tamaño (una base de datos química), y tengo que mostrar sólo los registros de cabecera, que se definen como líneas que no empiezan con: ATOM
, CONNECT
, HETATM
, TER
, o END
. Se supone que debo usar grep
para hacer esto. Aquí hay una muestra del archivo (el archivo completo está aquí ):
HEADER TRANSFERASE 15-OCT-12 4HKD
TITLE CRYSTAL STRUCTURE OF HUMAN MST2 SARAH DOMAIN
COMPND MOL_ID: 1;
COMPND 2 MOLECULE: SERINE/THREONINE-PROTEIN KINASE 3;
COMPND 3 CHAIN: A, B, C, D;
COMPND 4 FRAGMENT: SARAH DOMAIN, UNP RESIDUES 436-484;
COMPND 5 SYNONYM: MAMMALIAN STE20-LIKE PROTEIN KINASE 2, MST-2, STE20-LIKE
COMPND 6 KINASE MST2, SERINE/THREONINE-PROTEIN KINASE KRS-1, SERINE/THREONINE-
COMPND 7 PROTEIN KINASE 3 36KDA SUBUNIT, MST2/N, SERINE/THREONINE-PROTEIN
COMPND 8 KINASE 3 20KDA SUBUNIT, MST2/C;
COMPND 9 EC: 2.7.11.1;
COMPND 10 ENGINEERED: YES
SOURCE MOL_ID: 1;
SOURCE 2 ORGANISM_SCIENTIFIC: HOMO SAPIENS;
SOURCE 3 ORGANISM_COMMON: HUMAN;
SOURCE 4 ORGANISM_TAXID: 9606;
SOURCE 5 GENE: STK3, KRS1, MST2;
SOURCE 6 EXPRESSION_SYSTEM: ESCHERICHIA COLI;
SOURCE 7 EXPRESSION_SYSTEM_TAXID: 562;
SOURCE 8 EXPRESSION_SYSTEM_STRAIN: BL21 (DE3) CODON PLUS;
SOURCE 9 EXPRESSION_SYSTEM_VECTOR_TYPE: PLASMID;
SOURCE 10 EXPRESSION_SYSTEM_PLASMID: HT-PET28A
KEYWDS HOMODIMERIZATION, HETERODOMERIZATION, SAV1, NEK2, RASSF, TRANSFERASE
EXPDTA X-RAY DIFFRACTION
AUTHOR G.G.LIU,Z.B.SHI,Z.C.ZHOU
REVDAT 1 04-SEP-13 4HKD 0
JRNL AUTH G.G.LIU,Z.B.SHI,Z.C.ZHOU
JRNL TITL CRYSTAL STRUCTURE OF HUMAN MST2 SARAH DOMAIN
JRNL REF TO BE PUBLISHED
JRNL REFN
REMARK 2
REMARK 2 RESOLUTION. 1.50 ANGSTROMS.
REMARK 3
REMARK 3 REFINEMENT.
REMARK 3 PROGRAM : PHENIX (PHENIX.REFINE: 1.8_1069)
REMARK 3 AUTHORS : PAUL ADAMS,PAVEL AFONINE,VICENT CHEN,IAN
REMARK 3 : DAVIS,KRESHNA GOPAL,RALF GROSSE-
REMARK 3 : KUNSTLEVE,LI-WEI HUNG,ROBERT IMMORMINO,
REMARK 3 : TOM IOERGER,AIRLIE MCCOY,ERIK MCKEE,NIGEL
REMARK 3 : MORIARTY,REETAL PAI,RANDY READ,JANE
REMARK 3 : RICHARDSON,DAVID RICHARDSON,TOD ROMO,JIM
REMARK 3 : SACCHETTINI,NICHOLAS SAUTER,JACOB SMITH,
REMARK 3 : LAURENT STORONI,TOM TERWILLIGER,PETER
REMARK 3 : ZWART
REMARK 3
REMARK 3 REFINEMENT TARGET : ML
REMARK 3
REMARK 3 DATA USED IN REFINEMENT.
REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 1.50
REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 34.86
REMARK 3 MIN(FOBS/SIGMA_FOBS) : 1.380
REMARK 3 COMPLETENESS FOR RANGE (%) : 91.9
REMARK 3 NUMBER OF REFLECTIONS : 29481
REMARK 3
REMARK 3 FIT TO DATA USED IN REFINEMENT.
REMARK 3 R VALUE (WORKING + TEST SET) : 0.197
REMARK 3 R VALUE (WORKING SET) : 0.195
REMARK 3 FREE R VALUE : 0.231
REMARK 3 FREE R VALUE TEST SET SIZE (%) : 5.080
REMARK 3 FREE R VALUE TEST SET COUNT : 1497
REMARK 3
REMARK 3 FIT TO DATA USED IN REFINEMENT (IN BINS).
REMARK 3 BIN RESOLUTION RANGE COMPL. NWORK NFREE RWORK RFREE
REMARK 3 1 34.8685 - 3.3427 0.97 2878 149 0.1998 0.2322
REMARK 3 2 3.3427 - 2.6535 0.98 2711 175 0.2033 0.2452
REMARK 3 3 2.6535 - 2.3182 0.96 2660 155 0.1968 0.2148
REMARK 3 4 2.3182 - 2.1063 0.94 2620 114 0.1875 0.2318
REMARK 3 5 2.1063 - 1.9553 0.91 2533 113 0.1909 0.2295
REMARK 3 6 1.9553 - 1.8400 0.91 2476 143 0.1883 0.2137
REMARK 3 7 1.8400 - 1.7479 0.90 2465 128 0.1840 0.2029
REMARK 3 8 1.7479 - 1.6718 0.90 2446 130 0.1783 0.2144
REMARK 3 9 1.6718 - 1.6074 0.90 2419 129 0.1864 0.2400
REMARK 3 10 1.6074 - 1.5520 0.90 2487 120 0.1938 0.2588
REMARK 3 11 1.5520 - 1.5030 0.85 2289 141 0.1993 0.2471
REMARK 3
REMARK 3 BULK SOLVENT MODELLING.
REMARK 3 METHOD USED : FLAT BULK SOLVENT MODEL
REMARK 3 SOLVENT RADIUS : 1.11
REMARK 3 SHRINKAGE RADIUS : 0.90
REMARK 3 K_SOL : NULL
REMARK 3 B_SOL : NULL
REMARK 3
REMARK 3 ERROR ESTIMATES.
REMARK 3 COORDINATE ERROR (MAXIMUM-LIKELIHOOD BASED) : 0.130
REMARK 3 PHASE ERROR (DEGREES, MAXIMUM-LIKELIHOOD BASED) : 21.520
REMARK 3
REMARK 3 B VALUES.
REMARK 3 FROM WILSON PLOT (A**2) : NULL
REMARK 3 MEAN B VALUE (OVERALL, A**2) : NULL
REMARK 3 OVERALL ANISOTROPIC B VALUE.
REMARK 3 B11 (A**2) : NULL
REMARK 3 B22 (A**2) : NULL
REMARK 3 B33 (A**2) : NULL
REMARK 3 B12 (A**2) : NULL
REMARK 3 B13 (A**2) : NULL
REMARK 3 B23 (A**2) : NULL
REMARK 3
REMARK 3 TWINNING INFORMATION.
REMARK 3 FRACTION: NULL
REMARK 3 OPERATOR: NULL
REMARK 3
REMARK 3 DEVIATIONS FROM IDEAL VALUES.
REMARK 3 RMSD COUNT
REMARK 3 BOND : 0.007 1771
REMARK 3 ANGLE : 1.179 2367
REMARK 3 CHIRALITY : 0.083 255
REMARK 3 PLANARITY : 0.006 317
REMARK 3 DIHEDRAL : 14.379 737
REMARK 3
REMARK 3 TLS DETAILS
REMARK 3 NUMBER OF TLS GROUPS : NULL
REMARK 3
REMARK 3 NCS DETAILS
REMARK 3 NUMBER OF NCS GROUPS : NULL
REMARK 3
REMARK 3 OTHER REFINEMENT REMARKS: NULL
REMARK 4
REMARK 4 4HKD COMPLIES WITH FORMAT V. 3.30, 13-JUL-11
REMARK 100
REMARK 100 THIS ENTRY HAS BEEN PROCESSED BY PDBJ ON 22-OCT-12.
REMARK 100 THE RCSB ID CODE IS RCSB075574.
REMARK 200
REMARK 200 EXPERIMENTAL DETAILS
REMARK 200 EXPERIMENT TYPE : X-RAY DIFFRACTION
REMARK 200 DATE OF DATA COLLECTION : 16-APR-12
REMARK 200 TEMPERATURE (KELVIN) : 100
REMARK 200 PH : 4.6
REMARK 200 NUMBER OF CRYSTALS USED : 1
REMARK 200
REMARK 200 SYNCHROTRON (Y/N) : Y
REMARK 200 RADIATION SOURCE : SSRF
REMARK 200 BEAMLINE : BL17U
REMARK 200 X-RAY GENERATOR MODEL : NULL
REMARK 200 MONOCHROMATIC OR LAUE (M/L) : M
REMARK 200 WAVELENGTH OR RANGE (A) : 0.97915
REMARK 200 MONOCHROMATOR : SI 111 CHANNEL
REMARK 200 OPTICS : NULL
REMARK 200
REMARK 200 DETECTOR TYPE : CCD
REMARK 200 DETECTOR MANUFACTURER : ADSC QUANTUM 315
REMARK 200 INTENSITY-INTEGRATION SOFTWARE : HKL-2000
REMARK 200 DATA SCALING SOFTWARE : HKL-2000
REMARK 200
REMARK 200 NUMBER OF UNIQUE REFLECTIONS : 29548
REMARK 200 RESOLUTION RANGE HIGH (A) : 1.500
REMARK 200 RESOLUTION RANGE LOW (A) : 50.000
REMARK 200 REJECTION CRITERIA (SIGMA(I)) : 2.000
REMARK 200
REMARK 200 OVERALL.
REMARK 200 COMPLETENESS FOR RANGE (%) : 92.3
REMARK 200 DATA REDUNDANCY : 5.300
REMARK 200 R MERGE (I) : NULL
REMARK 200 R SYM (I) : NULL
REMARK 200 <I/SIGMA(I)> FOR THE DATA SET : 17.1000
Respuestas:
Su comentario es el enfoque correcto; si tiene que usarlo
grep
, probablemente debería usarlo-v
. Entonces solo necesita hacer coincidir todas las líneas que comienzan con las palabras clave que mencionó.-E
es usar expresiones regulares extendidas.^
coincide con el inicio de la línea, y(a|b|c)
medios "a
ob
oc
". Sospecho que "CONNECT
" en su pregunta era un error tipográfico ya que no aparece en el archivo, así que lo cambié aCONECT
aquífuente