11

En el desafío de 2014 , Michael Stern sugiere usar OCR para analizar el una imagen del número 2014 2014. Me gustaría tomar este desafío en una dirección diferente. Usando el OCR incorporado de la biblioteca de idiomas / estándar de su elección, diseñe la imagen más pequeña (en bytes) que se analiza en la cadena ASCII "2014".

La imagen original de Stern tiene 7357 bytes, pero con un poco de esfuerzo se puede comprimir sin pérdidas a 980 bytes. Sin duda, la versión en blanco y negro (181 bytes) también funciona con el mismo código.

Reglas: cada respuesta debe proporcionar la imagen, su tamaño en bytes y el código necesario para procesarla. ¡No se permite OCR personalizado, por razones obvias ...! Se permiten todos los idiomas y formatos de imagen razonables.

Editar: en respuesta a los comentarios, ampliaré esto para incluir cualquier biblioteca preexistente, o incluso http://www.free-ocr.com/ para aquellos idiomas donde no hay OCR disponible.

code-golf image-processing new-years Charles
fuente

99

¿Cuántos idiomas o bibliotecas estándar tienen OCR incorporado? ¿O tiene la intención de "biblioteca estándar" aquí para significar "cualquier biblioteca que no se haya creado específicamente para este desafío"?

Peter Taylor

3

¿Alguna plataforma de desarrollo que no sea Mathematica tiene OCR incorporado?

Michael Stern el

Debe estandarizar, decir algo como "use free-ocr.com " o algún otro ocr de fácil acceso.

Justin

10

Shell (ImageMagick, Tesseract), 18 bytes

file=golf_2014
echo -n UDQKMTMgNQruqCqo6riKiO6I | base64 -d > $file.pbm
convert -border 2x2 -bordercolor white -resize 300% -sharpen 0 -monochrome $file.pbm $file.png
tesseract $file.png $file digits
cat $file.txt
rm $file.pbm $file.png $file.txt

La imagen tiene 18 bytes y se puede reproducir así:

echo -n UDQKMTMgNQruqCqo6riKiO6I | base64 -d > 2014.pbm

Se ve así (esta es una copia PNG, no el original):

2014

Después de procesar con ImageMagick, se ve así:

2014 grande

Usando ImageMagick versión 6.6.9-7, Tesseract versión 3.02. La imagen PBM se creó en Gimp y se editó con un editor hexadecimal.

Esta versión requiere jp2a.

file=golf_2014
echo -n UDQKMTMgNQruqCqo6riKiO6I | base64 -d > $file.pbm
convert -border 2x2 -bordercolor white -resize 300% -sharpen 0 -monochrome $file.pbm $file.png
tesseract $file.png $file digits
cat $file.txt
convert -background black -fill white -border 2x2 -bordercolor black -pointsize 100 label:$(cat $file.txt) $file.jpg
jp2a --chars=" $(cat $file.txt) " $file.jpg
rm $file.pbm $file.png $file.txt $file.jpg

Produce algo como esto:

    2014444444102         01144444102              214441                 214441     
   1             1      24           1            04    4                0     4     
  1    410201     0    0    410004    1       2014      4              21      4     
 24   42     0    4    4    0     1    0    24          4             04       4     
  22222      1    1   0    42     0    4    2   4100    4            1   41    4     
            1    42   0    4      2     2   2412   0    4          24   420    4     
          04    42    0    1      2     2          0    4         0   40  0    4     
       204    42      0    1      2     2          0    4       24   42   0    4     
     21     12        0    4      0    42          0    4      2     411114     1112 
    04   412          24    0     1    0           0    4      0                   0 
  24     1111111110    1    42  21    4            0    4      200011111001    40002 
  4               4     04    44     42            0    4                 0    4     
 0                4      214       10              0    4                 0    4     
  22222222222222222         222222                  22222                  22222

usuario13957
fuente

Muy, muy impresionante. 3 bytes para el encabezado, 5 bytes para las dimensiones de la imagen, 10 bytes para el mapa de bits. El formato se describe aquí: netpbm.sourceforge.net/doc/pbm.html

Charles

5

Java + Tesseract, 53 bytes

Como no tengo Mathematica, decidí ~~doblar un poco las reglas y~~ usar Tesseract para hacer el OCR. Escribí un programa que representa "2014" en una imagen, usando varias fuentes, tamaños y estilos, y encuentra la imagen más pequeña que se reconoce como "2014". Los resultados dependen de las fuentes disponibles.

Aquí está el ganador en mi computadora: 53 bytes, usando la fuente "URW Gothic L": 2014

Código:

import java.awt.Color;
import java.awt.Font;
import java.awt.FontMetrics;
import java.awt.Graphics2D;
import java.awt.GraphicsEnvironment;
import java.awt.image.BufferedImage;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;

import javax.imageio.ImageIO;

public class Ocr {
    public static boolean blankLine(final BufferedImage img, final int x1, final int y1, final int x2, final int y2) {
        final int d = x2 - x1 + y2 - y1 + 1;
        final int dx = (x2 - x1 + 1) / d;
        final int dy = (y2 - y1 + 1) / d;
        for (int i = 0, x = x1, y = y1; i < d; ++i, x += dx, y += dy) {
            if (img.getRGB(x, y) != -1) {
                return false;
            }
        }
        return true;
    }

    public static BufferedImage trim(final BufferedImage img) {
        int x1 = 0;
        int y1 = 0;
        int x2 = img.getWidth() - 1;
        int y2 = img.getHeight() - 1;
        while (x1 < x2 && blankLine(img, x1, y1, x1, y2)) x1++;
        while (x1 < x2 && blankLine(img, x2, y1, x2, y2)) x2--;
        while (y1 < y2 && blankLine(img, x1, y1, x2, y1)) y1++;
        while (y1 < y2 && blankLine(img, x1, y2, x2, y2)) y2--;
        return img.getSubimage(x1, y1, x2 - x1 + 1, y2 - y1 + 1);
    }

    public static int render(final Font font, final int w, final String name) throws IOException {
        BufferedImage img = new BufferedImage(w, w, BufferedImage.TYPE_BYTE_BINARY);
        Graphics2D g = img.createGraphics();
        float size = font.getSize2D();
        Font f = font;
        while (true) {
            final FontMetrics fm = g.getFontMetrics(f);
            if (fm.stringWidth("2014") <= w) {
                break;
            }
            size -= 0.5f;
            f = f.deriveFont(size);
        }
        g = img.createGraphics();
        g.setFont(f);
        g.fillRect(0, 0, w, w);
        g.setColor(Color.BLACK);
        g.drawString("2014", 0, w - 1);
        g.dispose();
        img = trim(img);
        final File file = new File(name);
        ImageIO.write(img, "gif", file);
        return (int) file.length();
    }

    public static boolean ocr() throws Exception {
        Runtime.getRuntime().exec("/usr/bin/tesseract 2014.gif out -psm 8").waitFor();
        String t = "";
        final BufferedReader br = new BufferedReader(new FileReader("out.txt"));
        while (true) {
            final String s = br.readLine();
            if (s == null) break;
            t += s;
        }
        br.close();
        return t.trim().equals("2014");
    }

    public static void main(final String... args) throws Exception {
        int min = 10000;
        for (String s : GraphicsEnvironment.getLocalGraphicsEnvironment().getAvailableFontFamilyNames()) {
            for (int t = 0; t < 4; ++t) {
                final Font font = new Font(s, t, 50);
                for (int w = 10; w < 25; ++w) {
                    final int size = render(font, w, "2014.gif");
                    if (size < min && ocr()) {
                        render(font, w, "2014win.gif");
                        min = size;
                        System.out.println(s + ", " + size);
                    }
                }
            }
        }
    }
}

aditsu renunció porque SE es MALO
fuente

Cambié las reglas para permitir esto y entradas similares. Impresionante tamaño de archivo.

Charles

1

Mathematica 753 100

f[n_,format_]:=
Module[{filename},
Print["raster: ",n," by ", n];
filename="2014At"<>ToString[n]<>"."<>format;
Print["filename:  ",filename];
Print["format: ",format];
Print["raster image: ",rasterImg=Rasterize[Style[2014,"OCR A Std"],
RasterSize->n,ImageSize->1n,ImageResolution->6n]];
Export[filename,rasterImg];
Print["Actual imported image: ",img=Switch[format,"PDF"|"HDF",Import[filename][[1]],
_,Import[filename]]];
Print["Identified text: ",TextRecognize[ImageResize[img,Scaled[3]]]];
Print["filesize (bytes): ",FileByteCount[filename]]]

Mi mejor caso hasta ahora:

f[24, "PBM"]

eficiencia

DavidC
fuente

1

Mathematica, 78 bytes

El truco para ganar esto en Mathematica probablemente será el uso de la función ImageResize [] como se muestra a continuación.

Primero, creé el texto "2014" y lo guardé en un archivo GIF, para una comparación justa con la solución de David Carraher. El texto se ve así 2014 . Esto no está optimizado de ninguna manera; es solo Ginebra en un tamaño de letra pequeño; Pueden ser posibles otras fuentes y tamaños más pequeños. Straight TextRecognize [] fallaría, pero TextRecognize [ImageResize []]] no tiene ningún problema

filename = "~/Desktop/2014.gif";
Print["Actual imported image: ", img = Import[filename]]
Print["Identified text: ", 
 TextRecognize[ImageResize[img, Scaled[2]]]]
Print["filesize (bytes): ", FileByteCount[filename]]

resultados

Preocuparse por el tipo de letra, el tamaño de fuente, el grado de escala, etc., probablemente dará como resultado archivos aún más pequeños que funcionen.

Michael Stern
fuente

Tamaño de archivo muy impresionante.

DavidC

Puede recortar la imagen de los bordes blancos para hacerla más pequeña y acortar los espacios entre dígitos, quizás volver a dibujar para hacerlos más compactos.

swish

@swish de hecho, recortar el borde blanco lo lleva a 78 byes.

Michael Stern el

Produce el número 2014 a partir de una imagen

Respuestas:

Shell (ImageMagick, Tesseract), 18 bytes

Java + Tesseract, 53 bytes

Mathematica 753 100

Mathematica, 78 bytes