152

Sé la URL de una imagen en Internet.

por ejemplo , http://www.digimouth.com/news/media/2011/09/google-logo.jpg , que contiene el logotipo de Google.

Ahora, ¿cómo puedo descargar esta imagen usando Python sin abrir la URL en un navegador y guardar el archivo manualmente?

python web-scraping Pankaj Vatsa
fuente

1

Posible duplicado de ¿Cómo descargo un archivo a través de HTTP usando Python?

Jaydev

316

Python 2

Aquí hay una forma más directa si todo lo que quiere hacer es guardarlo como un archivo:

import urllib

urllib.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")

El segundo argumento es la ruta local donde se debe guardar el archivo.

Python 3

Como sugirió SergO, el siguiente código debería funcionar con Python 3.

import urllib.request

urllib.request.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")

Fuego líquido
fuente

55

Una buena manera de obtener el nombre de archivo del enlace esfilename = link.split('/')[-1]

heltonbiker

2

con urlretrieve acabo de obtener un archivo de 1 KB con un dict y un texto de error 404 dentro. ¿Por qué? Si ingreso la URL en mi navegador, puedo obtener la imagen

Yebach

2

@Yebach: el sitio desde el que está descargando puede estar utilizando cookies, el Agente de usuario u otros encabezados para determinar qué contenido le servirá. Estos serán diferentes entre su navegador y Python.

Liquid_Fire

27

Python 3 : import urllib.request yurllib.request.urlretrieve(), en consecuencia.

SergO

1

@SergO: ¿puede agregar la parte de Python 3 a la respuesta original?

Sreejith Menon

27

import urllib
resource = urllib.urlopen("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")
output = open("file01.jpg","wb")
output.write(resource.read())
output.close()

file01.jpg contendrá su imagen.

Noufal Ibrahim
fuente

2

Debe abrir el archivo en modo binario: de lo open("file01.jpg", "wb")contrario, puede dañar la imagen.

Liquid_Fire

2

urllib.urlretrievePuede guardar la imagen directamente.

heltonbiker

17

Escribí un script que hace exactamente esto , y está disponible en mi github para su uso.

Utilicé BeautifulSoup para permitirme analizar cualquier sitio web en busca de imágenes. Si va a hacer mucho raspado web (o tiene la intención de usar mi herramienta), le sugiero sudo pip install BeautifulSoup. La información sobre BeautifulSoup está disponible aquí .

Por conveniencia aquí está mi código:

from bs4 import BeautifulSoup
from urllib2 import urlopen
import urllib

# use this image scraper from the location that 
#you want to save scraped images to

def make_soup(url):
    html = urlopen(url).read()
    return BeautifulSoup(html)

def get_images(url):
    soup = make_soup(url)
    #this makes a list of bs4 element tags
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + "images found.")
    print 'Downloading images to current working directory.'
    #compile our unicode list of image links
    image_links = [each.get('src') for each in images]
    for each in image_links:
        filename=each.split('/')[-1]
        urllib.urlretrieve(each, filename)
    return image_links

#a standard call looks like this
#get_images('http://www.wookmark.com')

Sip.
fuente

11

Esto se puede hacer con solicitudes. Cargue la página y descargue el contenido binario en un archivo.

import os
import requests

url = 'https://apod.nasa.gov/apod/image/1701/potw1636aN159_HST_2048.jpg'
page = requests.get(url)

f_ext = os.path.splitext(url)[-1]
f_name = 'img{}'.format(f_ext)
with open(f_name, 'wb') as f:
    f.write(page.content)

AlexG
fuente

1

encabezados por el usuario en las solicitudes si conseguir solicitud incorrecta :)

1UC1F3R616

8

Python 3

urllib.request: biblioteca extensible para abrir URL

from urllib.error import HTTPError
from urllib.request import urlretrieve

try:
    urlretrieve(image_url, image_local_path)
except FileNotFoundError as err:
    print(err)   # something wrong with local path
except HTTPError as err:
    print(err)  # something wrong with url

SergO
fuente

6

Una solución que funciona con Python 2 y Python 3:

try:
    from urllib.request import urlretrieve  # Python 3
except ImportError:
    from urllib import urlretrieve  # Python 2

url = "http://www.digimouth.com/news/media/2011/09/google-logo.jpg"
urlretrieve(url, "local-filename.jpg")

o, si el requisito adicional de requestses aceptable y si es una URL de http:

def load_requests(source_url, sink_path):
    """
    Load a file from an URL (e.g. http).

    Parameters
    ----------
    source_url : str
        Where to load the file from.
    sink_path : str
        Where the loaded file is stored.
    """
    import requests
    r = requests.get(source_url, stream=True)
    if r.status_code == 200:
        with open(sink_path, 'wb') as f:
            for chunk in r:
                f.write(chunk)

Martin Thoma
fuente

5

Hice un guión ampliando el guión de Yup. Arreglé algunas cosas. Ahora evitará 403: problemas prohibidos. No se bloqueará cuando una imagen no se pueda recuperar. Intenta evitar vistas previas corruptas. Obtiene las URL absolutas correctas. Da más información. Se puede ejecutar con un argumento desde la línea de comandos.

# getem.py
# python2 script to download all images in a given url
# use: python getem.py http://url.where.images.are

from bs4 import BeautifulSoup
import urllib2
import shutil
import requests
from urlparse import urljoin
import sys
import time

def make_soup(url):
    req = urllib2.Request(url, headers={'User-Agent' : "Magic Browser"}) 
    html = urllib2.urlopen(req)
    return BeautifulSoup(html, 'html.parser')

def get_images(url):
    soup = make_soup(url)
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + " images found.")
    print 'Downloading images to current working directory.'
    image_links = [each.get('src') for each in images]
    for each in image_links:
        try:
            filename = each.strip().split('/')[-1].strip()
            src = urljoin(url, each)
            print 'Getting: ' + filename
            response = requests.get(src, stream=True)
            # delay to avoid corrupted previews
            time.sleep(1)
            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(response.raw, out_file)
        except:
            print '  An error occured. Continuing.'
    print 'Done.'

if __name__ == '__main__':
    url = sys.argv[1]
    get_images(url)

madprops
fuente

3

Usar la biblioteca de solicitudes

import requests
import shutil,os

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}
currentDir = os.getcwd()
path = os.path.join(currentDir,'Images')#saving images to Images folder

def ImageDl(url):
    attempts = 0
    while attempts < 5:#retry 5 times
        try:
            filename = url.split('/')[-1]
            r = requests.get(url,headers=headers,stream=True,timeout=5)
            if r.status_code == 200:
                with open(os.path.join(path,filename),'wb') as f:
                    r.raw.decode_content = True
                    shutil.copyfileobj(r.raw,f)
            print(filename)
            break
        except Exception as e:
            attempts+=1
            print(e)


ImageDl(url)

Sohan Das
fuente

Parece que el encabezado es realmente importante en mi caso, estaba recibiendo 403 errores. Funcionó.

Ishtiyaq Husain

2

Esta es una respuesta muy corta.

import urllib
urllib.urlretrieve("http://photogallery.sandesh.com/Picture.aspx?AlubumId=422040", "Abc.jpg")

OO7
fuente

2

Versión para Python 3

Ajusté el código de @madprops para Python 3

# getem.py
# python2 script to download all images in a given url
# use: python getem.py http://url.where.images.are

from bs4 import BeautifulSoup
import urllib.request
import shutil
import requests
from urllib.parse import urljoin
import sys
import time

def make_soup(url):
    req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"}) 
    html = urllib.request.urlopen(req)
    return BeautifulSoup(html, 'html.parser')

def get_images(url):
    soup = make_soup(url)
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + " images found.")
    print('Downloading images to current working directory.')
    image_links = [each.get('src') for each in images]
    for each in image_links:
        try:
            filename = each.strip().split('/')[-1].strip()
            src = urljoin(url, each)
            print('Getting: ' + filename)
            response = requests.get(src, stream=True)
            # delay to avoid corrupted previews
            time.sleep(1)
            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(response.raw, out_file)
        except:
            print('  An error occured. Continuing.')
    print('Done.')

if __name__ == '__main__':
    get_images('http://www.wookmark.com')

Giovanni G. PY
fuente

1

Algo nuevo para Python 3 usando Solicitudes:

Comentarios en el código. Listo para usar la función.


import requests
from os import path

def get_image(image_url):
    """
    Get image based on url.
    :return: Image name if everything OK, False otherwise
    """
    image_name = path.split(image_url)[1]
    try:
        image = requests.get(image_url)
    except OSError:  # Little too wide, but work OK, no additional imports needed. Catch all conection problems
        return False
    if image.status_code == 200:  # we could have retrieved error page
        base_dir = path.join(path.dirname(path.realpath(__file__)), "images") # Use your own path or "" to use current working directory. Folder must exist.
        with open(path.join(base_dir, image_name), "wb") as f:
            f.write(image.content)
        return image_name

get_image("https://apod.nasddfda.gov/apod/image/2003/S106_Mishra_1947.jpg")

Pavel Pančocha
fuente

0

Respuesta tardía, pero python>=3.6puede usar dload , es decir:

import dload
dload.save("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")

si necesita la imagen como bytes, use:

img_bytes = dload.bytes("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")

instalar usando pip3 install dload

CONvid19
fuente

-2

img_data=requests.get('https://apod.nasa.gov/apod/image/1701/potw1636aN159_HST_2048.jpg')

with open(str('file_name.jpg', 'wb') as handler:
    handler.write(img_data)

Lewis Mann
fuente

44

¡Bienvenido a Stack Overflow! Si bien es posible que haya resuelto el problema de este usuario, las respuestas de solo código no son muy útiles para los usuarios que vengan a esta pregunta en el futuro. Edite su respuesta para explicar por qué su código resuelve el problema original.

Joe C

1

TypeError: a bytes-like object is required, not 'Response'. Debe serhandler.write(img_data.content)

TitanFighter

Debería ser handler.write(img_data.read()).

jdhao

¿Cómo guardar una imagen localmente usando Python cuya dirección URL ya conozco?

Respuestas:

Python 2

Python 3

Versión para Python 3