@Atalanttore: Die BeautifulSoup-Methoden mit den „unpythonischen“ Namen sollten man nicht mehr verwenden. Du hast ja auch `find_all()` statt `findAll()` genommen. Bei `findChild()` sollte man `find()` nehmen.
`os.path.join()` ist für Pfade, nicht für URLs! Für URLs ist `urllib.parse.urljoin()` die passende Funktion.
Download-Skript von Python 2 auf Python 3 portieren
- __blackjack__
- User
- Beiträge: 13004
- Registriert: Samstag 2. Juni 2018, 10:21
- Wohnort: 127.0.0.1
- Kontaktdaten:
“Most people find the concept of programming obvious, but the doing impossible.” — Alan J. Perlis
-
- User
- Beiträge: 407
- Registriert: Freitag 6. August 2010, 17:03
@__blackjack__: Danke für die Hinweise.
Ich habe den Code zum Extrahieren der Bild-URL aus der HTML-Datei nun ins Download.Skript eingebaut.
Momentan kommt es (beim Logging) zu einem `ValueError`, weil an die Methode `get_image_info()` kein HTML-Code zum Parsen übergeben wird.
Fehlermeldung:
Aktueller Code:
Gruß
Atalanttore
Ich habe den Code zum Extrahieren der Bild-URL aus der HTML-Datei nun ins Download.Skript eingebaut.
Momentan kommt es (beim Logging) zu einem `ValueError`, weil an die Methode `get_image_info()` kein HTML-Code zum Parsen übergeben wird.
Fehlermeldung:
Code: Alles auswählen
/usr/bin/python3.6 /home/ata/PycharmProjects/nasa-apod-desktop/nasa_apod_desktop.py
2019-07-07 21:19:36,857 __main__: Starting
2019-07-07 21:19:36,857 __main__: Attempting to determine the current resolution.
2019-07-07 21:19:36,957 __main__: Using detected resolution of 3840x1080
--- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.6/logging/__init__.py", line 994, in emit
msg = self.format(record)
File "/usr/lib/python3.6/logging/__init__.py", line 840, in format
return fmt.format(record)
File "/usr/lib/python3.6/logging/__init__.py", line 577, in format
record.message = record.getMessage()
File "/usr/lib/python3.6/logging/__init__.py", line 338, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/home/ata/PycharmProjects/nasa-apod-desktop/nasa_apod_desktop.py", line 387, in <module>
TEMPORARY_DOWNLOAD_PATH = get_user_download_directory()
File "/home/ata/PycharmProjects/nasa-apod-desktop/nasa_apod_desktop.py", line 134, in get_user_download_directory
logger.info("Using automatically detected path:", new_path)
Message: 'Using automatically detected path:'
Arguments: ('/home/ata/Downloads/nasa-apod-backgrounds',)
2019-07-07 21:19:36,962 __main__: Downloading contents of the site to find the image name
--- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.6/logging/__init__.py", line 994, in emit
msg = self.format(record)
File "/usr/lib/python3.6/logging/__init__.py", line 840, in format
return fmt.format(record)
File "/usr/lib/python3.6/logging/__init__.py", line 577, in format
record.message = record.getMessage()
File "/usr/lib/python3.6/logging/__init__.py", line 338, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/home/ata/PycharmProjects/nasa-apod-desktop/nasa_apod_desktop.py", line 394, in <module>
site_contents = download_site(NASA_APOD_SITE)
File "/home/ata/PycharmProjects/nasa-apod-desktop/nasa_apod_desktop.py", line 148, in download_site
logger.info("Response", response.read())
Message: 'Response'
Arguments: (b'<!doctype html>\n<html>\n<head>\n<title>Astronomy Picture of the Day\n</title> \n<!-- gsfc meta tags -->\n<meta name="orgcode" content="661">\n<meta name="rno" content="phillip.a.newman">\n<meta name="content-owner" content="Jerry.T.Bonnell.1">\n<meta name="webmaster" content="Stephen.F.Fantasia.1">\n<meta name="description" content="A different astronomy and space science\nrelated image is featured each day, along with a brief explanation.">\n<!-- -->\n<meta name="keywords" content="Saturn, rings, shadow">\n<!-- -->\n<script language="javascript" id="_fed_an_ua_tag"\nsrc="//dap.digitalgov.gov/Universal-Federated-Analytics-Min.js?agency=NASA">\n</script>\n\n</head>\n\n<body BGCOLOR="#F4F4FF" text="#000000" link="#0000FF" vlink="#7F0F9F"\nalink="#FF0000">\n\n<center>\n<h1> Astronomy Picture of the Day </h1>\n<p>\n\n<a href="archivepix.html">Discover the cosmos!</a>\nEach day a different image or photograph of our fascinating universe is\nfeatured, along with a brief explanation written by a professional astronomer.\n<p>\n\n2019 July 7 \n<br> \n<a href="image/1907/CrescentSaturn_cassini_4824.jpg">\n<IMG SRC="image/1907/CrescentSaturn_cassini_1080.jpg"\nalt="See Explanation. Clicking on the picture will download\n the highest resolution version available." style="max-width:100%"></a>\n</center>\n\n<center>\n<b> Crescent Saturn </b> <br> \n<b> Image Credit: </b> \n<a href="https://www.nasa.gov/">NASA</a>, \n<a href="https://www.esa.int/">ESA</a>, \n<a href="https://www.spacescience.org/">SSI</a>,\n<a href="http://ciclops.org/ir_index_main/Cassini">Cassini Imaging Team</a>\n</center> <p> \n\n<b> Explanation: </b> \nSaturn never shows a crescent phase -- from Earth. \n\nBut when viewed from beyond, the \n<a href="https://solarsystem.nasa.gov/planets/saturn/overview/">majestic \ngiant planet</a> can show an unfamiliar diminutive sliver.\n\nThis <a href="https://photojournal.jpl.nasa.gov/catalog/PIA08388"\n>image of crescent Saturn</a> in natural color was taken by the robotic \n<a href="https://solarsystem.nasa.gov/missions/cassini/overview/"\n>Cassini spacecraft</a> in 2007.\n\nThe featured image captures \n<a href="https://en.wikipedia.org/wiki/Rings_of_Saturn">Saturn\'s\nmajestic rings</a> from the side of the ring plane opposite\nthe Sun -- the <a href="ap121222.html">unilluminated side</a> -- another\nvista not visible from Earth.\n\nPictured are many of \n<a href="https://en.wikipedia.org/wiki/Saturn">Saturn</a>\'s photogenic wonders, including the \n<a href="ap060503.html">subtle colors</a> of \n<a href="ap041102.html">cloud bands</a>, the complex \nshadows of the rings on the planet, and \nthe <a href="ap040721.html">shadow of the planet</a>\non the rings.\n\nA careful eye will find the moons \n<a href="ap170111.html">Mimas</a> (2 o\'clock) and \n<a href="ap061107.html">Janus</a> (4 o\'clock), \nbut the real challenge is to find \n<a href="ap051123.html">Pandora</a> (8 o\'clock). \n\nSaturn is now nearly \n<a href="https://in-the-sky.org/news.php?id=20190709_12_100"\n>opposite from the Sun</a> in the Earth\'s sky and so \n<a href="ap180614.html">can be seen</a> \nin the evening starting just after sunset for the rest of the night.\n\n\n<p> <center> \n<b> Tomorrow\'s picture: </b>galactic center in radio\n\n<p> <hr>\n<a href="ap190706.html"><</a>\n| <a href="archivepix.html">Archive</a>\n| <a href="lib/apsubmit2015.html">Submissions</a> \n| <a href="lib/aptree.html">Index</a>\n| <a href="https://antwrp.gsfc.nasa.gov/cgi-bin/apod/apod_search">Search</a>\n| <a href="calendar/allyears.html">Calendar</a>\n| <a href="/apod.rss">RSS</a>\n| <a href="lib/edlinks.html">Education</a>\n| <a href="lib/about_apod.html">About APOD</a>\n| <a href=\n"http://asterisk.apod.com/discuss_apod.php?date=190707">Discuss</a>\n| <a href="ap190708.html">></a>\n\n<hr><p>\n<b> Authors & editors: </b>\n<a href="http://www.phy.mtu.edu/faculty/Nemiroff.html">Robert Nemiroff</a>\n(<a href="http://www.phy.mtu.edu/">MTU</a>) &\n<a href="https://antwrp.gsfc.nasa.gov/htmltest/jbonnell/www/bonnell.html"\n>Jerry Bonnell</a> (<a href="http://www.astro.umd.edu/">UMCP</a>)<br>\n<b>NASA Official: </b> Phillip Newman\n<a href="lib/about_apod.html#srapply">Specific rights apply</a>.<br>\n<a href="https://www.nasa.gov/about/highlights/HP_Privacy.html">NASA Web\nPrivacy Policy and Important Notices</a><br>\n<b>A service of:</b>\n<a href="https://astrophysics.gsfc.nasa.gov/">ASD</a> at\n<a href="https://www.nasa.gov/">NASA</a> /\n<a href="https://www.nasa.gov/centers/goddard/">GSFC</a>\n<br><b>&</b> <a href="http://www.mtu.edu/">Michigan Tech. U.</a><br>\n</center>\n</body>\n</html>\n\n',)
2019-07-07 21:19:37,606 __main__: Grabbing the image URL
2019-07-07 21:19:37,608 __main__: Opening remote URL
Traceback (most recent call last):
File "/home/ata/PycharmProjects/nasa-apod-desktop/nasa_apod_desktop.py", line 400, in <module>
filename = get_image(site_contents)
File "/home/ata/PycharmProjects/nasa-apod-desktop/nasa_apod_desktop.py", line 159, in get_image
file_url, filename, file_size = get_image_info('a href', text)
File "/home/ata/PycharmProjects/nasa-apod-desktop/nasa_apod_desktop.py", line 237, in get_image_info
remote_file = urllib.request.urlopen(file_url)
File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.6/urllib/request.py", line 511, in open
req = Request(fullurl, data)
File "/usr/lib/python3.6/urllib/request.py", line 329, in __init__
self.full_url = url
File "/usr/lib/python3.6/urllib/request.py", line 355, in full_url
self._parse()
File "/usr/lib/python3.6/urllib/request.py", line 384, in _parse
raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: ''
Process finished with exit code 1
Aktueller Code:
Code: Alles auswählen
from gi.repository import GLib
from bs4 import BeautifulSoup
import logging
import subprocess
import urllib.request, urllib.parse, urllib.error
import re
import os
import random
import glob
from PIL import Image
from sys import stdout
from sys import exit
from lxml import etree
from datetime import datetime, timedelta
NASA_APOD_SITE = 'http://apod.nasa.gov/apod/'
TEMPORARY_DOWNLOAD_PATH = '/tmp/backgrounds/'
CUSTOM_FOLDER = 'nasa-apod-backgrounds'
RESOLUTION_TYPE = 'stretch'
DEFAULT_RESOLUTION_X = 1024
DEFAULT_RESOLUTION_Y = 768
IMAGE_SCROLL = True
IMAGE_DURATION = 1200
SEED_IMAGES = 10
SHOW_DEBUG = False
LOG_LEVEL = logging.DEBUG
LOG_FORMAT = '%(asctime)s %(name)s: %(message)s'
logger = logging.getLogger(__name__)
logger.setLevel(LOG_LEVEL)
formatter = logging.Formatter(LOG_FORMAT)
stream_handler = logging.StreamHandler()
stream_handler.setFormatter(formatter)
logger.addHandler(stream_handler)
# Use XRandR to grab the desktop resolution. If the scaling method is set to 'largest',
# we will attempt to grab it from the largest connected device. If the scaling method
# is set to 'stretch' we will grab it from the current value. Default will simply use
# what was set for the default resolutions.
def find_display_resolution():
if RESOLUTION_TYPE == 'default':
logger.info(f"Using default resolution of {DEFAULT_RESOLUTION_X}x{DEFAULT_RESOLUTION_Y}")
return DEFAULT_RESOLUTION_X, DEFAULT_RESOLUTION_Y
resolution_x = 0
resolution_y = 0
logger.info("Attempting to determine the current resolution.")
if RESOLUTION_TYPE == 'largest':
regex_search = 'connected'
else:
regex_search = 'current'
p1 = subprocess.Popen(["xrandr"], stdout=subprocess.PIPE)
p2 = subprocess.Popen(["grep", regex_search], stdin=p1.stdout, stdout=subprocess.PIPE) # TODO: Pythons re-Modul verwenden
p3 = re.findall(regex_search, str(p1.communicate()[0]))
p1.stdout.close()
output = str(p2.communicate()[0])
if RESOLUTION_TYPE == 'largest':
# We are going to go through the connected devices and get the X/Y from the largest
matches = re.finditer(" connected ([0-9]+)x([0-9]+)+", output) # TODO: liefert einen Iterator, der immer „wahr" ist.
if matches:
largest = 0
for match in matches:
if int(match.group(1)) * int(match.group(2)) > largest:
resolution_x = match.group(1)
resolution_y = match.group(2)
else:
logger.warning("Could not determine largest screen resolution.")
else:
reg = re.search(".* current (.*?) x (.*?),.*", output)
if reg:
resolution_x = reg.group(1)
resolution_y = reg.group(2)
else:
logger.warning("Could not determine current screen resolution.")
# If we couldn't find anything automatically use what was set for the defaults
if resolution_x == 0 or resolution_y == 0:
resolution_x = DEFAULT_RESOLUTION_X
resolution_y = DEFAULT_RESOLUTION_Y
logger.warning("Could not determine resolution automatically. Using defaults.")
logger.info(f"Using detected resolution of {resolution_x}x{resolution_y}")
return int(resolution_x), int(resolution_y)
# Uses GLib to find the localized "Downloads" folder
# See: http://askubuntu.com/questions/137896/how-to-get-the-user-downloads-folder-location-with-python
def get_user_download_directory():
downloads_dir = GLib.get_user_special_dir(GLib.USER_DIRECTORY_DOWNLOAD)
if downloads_dir:
# Add any custom folder
new_path = os.path.join(downloads_dir, CUSTOM_FOLDER)
logger.info("Using automatically detected path:", new_path)
else:
new_path = TEMPORARY_DOWNLOAD_PATH
logger.warning("Could not determine download folder with GLib. Using default.")
return new_path
# Download HTML of the site
def download_site(url):
logger.info("Downloading contents of the site to find the image name")
opener = urllib.request.build_opener()
req = urllib.request.Request(url)
try:
response = opener.open(req)
logger.info("Response", response.read())
reply = response.read().decode()
except urllib.error.HTTPError as error:
logger.error(f"Error downloading {url} - {error.code}")
reply = f"Error: {error.code})"
return reply
# Finds the image URL and saves it
def get_image(text):
logger.info("Grabbing the image URL")
file_url, filename, file_size = get_image_info('a href', text)
# If file_url is None, the today's picture might be a video
if file_url is None:
return None
logger.info(f"Found name of image: {filename}")
save_to = os.path.join(TEMPORARY_DOWNLOAD_PATH, os.path.splitext(filename)[0] + '.png')
if not os.path.isfile(save_to):
# If the response body is less than 500 bytes, something went wrong
if file_size < 500:
logger.warning("Response less than 500 bytes, probably an error\nAttempting to just grab image source")
file_url, filename, file_size = get_image_info('img src', text)
# If file_url is None, the today's picture might be a video
if file_url is None:
return None
logger.info(f"Found name of image: {filename}")
if file_size < 500:
# Give up
logger.error("Could not find image to download")
exit()
logger.info("Retrieving image")
urllib.request.urlretrieve(file_url, save_to, print_download_status)
# Adding additional padding to ensure entire line
logger.info(f"\rDone downloading {human_readable_size(file_size)} ")
else:
urllib.request.urlretrieve(file_url, save_to)
else:
logger.info("File exists, moving on")
return save_to
def get_image_info(element, source):
# Grabs information about the image
soup = BeautifulSoup(str(source), 'lxml')
tags = soup.find_all('a')
file_url = str()
for tag in tags:
if tag.find("img"):
file_url = urllib.parse.urljoin(NASA_APOD_SITE, tag.get('href'))
else:
logger.warning("Could not find an image. May be a video today.")
return None, None, None
# Create our handle for our remote file
logger.info("Opening remote URL")
remote_file = urllib.request.urlopen(file_url)
filename = os.path.basename(file_url)
file_size = float(remote_file.headers.get("content-length"))
return file_url, filename, file_size
# Resizes the image to the provided dimensions
def resize_image(filename):
logger.info("Opening local image")
image = Image.open(filename)
current_x, current_y = image.size
if (current_x, current_y) == (DEFAULT_RESOLUTION_X, DEFAULT_RESOLUTION_Y):
logger.info("Images are currently equal in size. No need to scale.")
else:
logger.info("Resizing the image from", image.size[0], "x", image.size[1], "to", DEFAULT_RESOLUTION_X, "x", DEFAULT_RESOLUTION_Y)
image = image.resize((DEFAULT_RESOLUTION_X, DEFAULT_RESOLUTION_Y), Image.ANTIALIAS)
logger.info(f"Saving the image as {filename}")
with open(filename, 'wb'):
image.save(filename, 'PNG')
#file_handler.close()
# Sets the new image as the wallpaper
def set_gnome_wallpaper(file_path):
logger.info("Setting the wallpaper")
command = "gsettings set org.gnome.desktop.background picture-uri file://" + file_path
status, output = subprocess.getstatusoutput(command) # TODO: Statt subprocess.getstatusoutput etwas wie subprocess.run benutzen
return status
def print_download_status(block_count, block_size, total_size):
written_size = human_readable_size(block_count * block_size)
total_size = human_readable_size(total_size)
# Adding space padding at the end to ensure we overwrite the whole line
stdout.write(f"\r{written_size} bytes of {total_size} ")
stdout.flush()
def human_readable_size(number_bytes): # TODO: gibt bei Größen größer 1073741824 None zurück.
for x in ['bytes', 'KB', 'MB']:
if number_bytes < 1024.0:
return "%3.2f%s" % (number_bytes, x)
number_bytes /= 1024.0
# Creates the necessary XML so background images will scroll through
def create_desktop_background_scroll(filename):
if not IMAGE_SCROLL:
return filename
logger.info("Creating XML file for desktop background switching.")
filename = os.path.join(TEMPORARY_DOWNLOAD_PATH, '/nasa_apod_desktop_backgrounds.xml')
# Create our base, background element
background = etree.Element("background")
# Grab our PNGs we have downloaded
images = glob.glob(TEMPORARY_DOWNLOAD_PATH + "/*.png")
num_images = len(images)
if num_images < SEED_IMAGES:
# Let's seed some images
# Start with yesterday and continue going back until we have enough
logger.info("Downloading some seed images as well")
days_back = 0
seed_images_left = SEED_IMAGES
while seed_images_left > 0:
days_back += 1
logger.info(f"Downloading seed image ({seed_images_left} left):")
day_to_try = datetime.now() - timedelta(days=days_back)
# Filenames look like /apYYMMDD.html
seed_filename = os.path.join(NASA_APOD_SITE, "ap" + day_to_try.strftime("%y%m%d") + ".html")
seed_site_contents = download_site(seed_filename)
# Make sure we didn't encounter an error for some reason
if seed_site_contents == "error":
logger.error("Seed site contains an error")
continue
seed_filename = get_image(seed_site_contents)
# If the content was an video or some other error occurred, skip the
# rest.
if seed_filename is None:
continue
resize_image(seed_filename)
# Add this to our list of images
images.append(seed_filename)
seed_images_left -= 1
logger.info("Done downloading seed images")
# Get our images in a random order so we get a new order every time we get a new file
random.shuffle(images)
# Recalculate the number of pictures
num_images = len(images)
for i, image in enumerate(images):
# Create a static entry for keeping this image here for IMAGE_DURATION
static = etree.SubElement(background, "static")
# Length of time the background stays
duration = etree.SubElement(static, "duration")
duration.text = str(IMAGE_DURATION)
# Assign the name of the file for our static entry
static_file = etree.SubElement(static, "file")
static_file.text = images[i]
# Create a transition for the animation with a from and to
transition = etree.SubElement(background, "transition")
# Length of time for the switch animation
transition_duration = etree.SubElement(transition, "duration")
transition_duration.text = "5"
# We are always transitioning from the current file
transition_from = etree.SubElement(transition, "from")
transition_from.text = images[i]
# Create our tranition to element
transition_to = etree.SubElement(transition, "to")
# Check to see if we're at the end, if we are use the first image as the image to
if i + 1 == num_images:
transition_to.text = images[0]
else:
transition_to.text = images[i + 1]
xml_tree = etree.ElementTree(background)
xml_tree.write(filename, pretty_print=True)
return filename
if __name__ == '__main__':
logger.info("Starting")
# Find desktop resolution
DEFAULT_RESOLUTION_X, DEFAULT_RESOLUTION_Y = find_display_resolution()
# Set a localized download folder
TEMPORARY_DOWNLOAD_PATH = get_user_download_directory()
# Create the download path if it doesn't exist
if not os.path.exists(os.path.expanduser(TEMPORARY_DOWNLOAD_PATH)):
os.makedirs(os.path.expanduser(TEMPORARY_DOWNLOAD_PATH))
# Grab the HTML contents of the file
site_contents = download_site(NASA_APOD_SITE)
if site_contents == "error":
logger.error("Could not contact site.")
exit()
# Download the image
filename = get_image(site_contents)
if filename is not None:
# Resize the image
resize_image(filename)
# Create the desktop switching xml
filename = create_desktop_background_scroll(filename)
# If the script was unable todays image and IMAGE_SCROLL is set to False,
# the script exits
if filename is None:
logger.error("Today's image could not be downloaded.")
exit()
# Set the wallpaper
status = set_gnome_wallpaper(filename)
logger.info("Finished!")
Atalanttore
- __blackjack__
- User
- Beiträge: 13004
- Registriert: Samstag 2. Juni 2018, 10:21
- Wohnort: 127.0.0.1
- Kontaktdaten:
@Atalanttore: Zuerst kommen da zwei `TypeError`\s von Logging-Aufrufen bei denen zusätzliche Argumente übergeben werden, für die aber keine Platzhalter im ersten Argument vorhanden sind. Entweder formatierst Du die Werte wie an anderer Stelle auch vor dem Logging-Aufruf in das erste Argument, oder Du gibst einen Platzhalter an.
`get_image_info()` enthält komischen Code. `file_url` wird mit einer leeren Zeichenkette initialisiert — das hat zwar den richtigen Datentyp, kann aber niemals ein sinnvoller, gültiger Wert für eine URL sein. Warum wird das da so gemacht? Der Wert wird dann später tatsächlich verwendet wenn kein <a>-Element im HTML vorhanden ist. Den Fall sollte man sinnvoller behandeln. Ich bin fast sicher, dass das genau der Fall ist in den der Code hier rein läuft und die Ausnahme auslöst.
In der Schleife über die <a>-Tags wird die Funktion verlassen und (None, None, None) zurückgegeben sobald auch nur *ein* <a>-Tag im HTML vorhanden ist, der kein <img>-Element enthält. Das sieht ziemlich falsch aus. Und falls alle <a>-Elemente ein <img>-Element enthalten, dann wird die URL vom letzten <a>-Tag verwendet das gefunden wird.
Dann wird auch wieder etwas aus `os.path` mit einer URL verwendet. Das funktioniert an sich schon nicht garantiert, weil Pfade etwas anderes als URLs sind, und es fällt auch auf Systemen wo Pfade und URLs sich ähneln auf die Nase wenn die URL noch einen „query“ und/oder „fragment“ Anteil besitzt.
`get_image_info()` enthält komischen Code. `file_url` wird mit einer leeren Zeichenkette initialisiert — das hat zwar den richtigen Datentyp, kann aber niemals ein sinnvoller, gültiger Wert für eine URL sein. Warum wird das da so gemacht? Der Wert wird dann später tatsächlich verwendet wenn kein <a>-Element im HTML vorhanden ist. Den Fall sollte man sinnvoller behandeln. Ich bin fast sicher, dass das genau der Fall ist in den der Code hier rein läuft und die Ausnahme auslöst.
In der Schleife über die <a>-Tags wird die Funktion verlassen und (None, None, None) zurückgegeben sobald auch nur *ein* <a>-Tag im HTML vorhanden ist, der kein <img>-Element enthält. Das sieht ziemlich falsch aus. Und falls alle <a>-Elemente ein <img>-Element enthalten, dann wird die URL vom letzten <a>-Tag verwendet das gefunden wird.
Dann wird auch wieder etwas aus `os.path` mit einer URL verwendet. Das funktioniert an sich schon nicht garantiert, weil Pfade etwas anderes als URLs sind, und es fällt auch auf Systemen wo Pfade und URLs sich ähneln auf die Nase wenn die URL noch einen „query“ und/oder „fragment“ Anteil besitzt.
“Most people find the concept of programming obvious, but the doing impossible.” — Alan J. Perlis
-
- User
- Beiträge: 407
- Registriert: Freitag 6. August 2010, 17:03
@__blackjack__: Der `logger` unterstützt also kein Konkatenieren von Strings mittels Komma (wie bei der `print()`-Funktion).
Ist der Code in `get_image_info()` jetzt weniger komisch, obwohl der Code nach wie vor nicht so funktioniert wie gwünscht?
Aktueller Code:
Gruß
Atalanttore
Ist der Code in `get_image_info()` jetzt weniger komisch, obwohl der Code nach wie vor nicht so funktioniert wie gwünscht?
Aktueller Code:
Code: Alles auswählen
from gi.repository import GLib
from bs4 import BeautifulSoup
import logging
import subprocess
import urllib.request, urllib.parse, urllib.error
import re
import os
import random
import glob
from PIL import Image
from sys import stdout
from sys import exit
from lxml import etree
from datetime import datetime, timedelta
NASA_APOD_SITE = 'http://apod.nasa.gov/apod/'
TEMPORARY_DOWNLOAD_PATH = '/tmp/backgrounds/'
CUSTOM_FOLDER = 'nasa-apod-backgrounds'
RESOLUTION_TYPE = 'stretch'
DEFAULT_RESOLUTION_X = 1024
DEFAULT_RESOLUTION_Y = 768
IMAGE_SCROLL = True
IMAGE_DURATION = 1200
SEED_IMAGES = 10
SHOW_DEBUG = False
LOG_LEVEL = logging.DEBUG
LOG_FORMAT = '%(asctime)s %(name)s: %(message)s'
logger = logging.getLogger(__name__)
logger.setLevel(LOG_LEVEL)
formatter = logging.Formatter(LOG_FORMAT)
stream_handler = logging.StreamHandler()
stream_handler.setFormatter(formatter)
logger.addHandler(stream_handler)
# Use XRandR to grab the desktop resolution. If the scaling method is set to 'largest',
# we will attempt to grab it from the largest connected device. If the scaling method
# is set to 'stretch' we will grab it from the current value. Default will simply use
# what was set for the default resolutions.
def find_display_resolution():
if RESOLUTION_TYPE == 'default':
logger.info(f"Using default resolution of {DEFAULT_RESOLUTION_X}x{DEFAULT_RESOLUTION_Y}")
return DEFAULT_RESOLUTION_X, DEFAULT_RESOLUTION_Y
resolution_x = 0
resolution_y = 0
logger.info("Attempting to determine the current resolution.")
if RESOLUTION_TYPE == 'largest':
regex_search = 'connected'
else:
regex_search = 'current'
p1 = subprocess.Popen(["xrandr"], stdout=subprocess.PIPE)
p2 = subprocess.Popen(["grep", regex_search], stdin=p1.stdout, stdout=subprocess.PIPE) # TODO: Pythons re-Modul verwenden
p3 = re.findall(regex_search, str(p1.communicate()[0]))
p1.stdout.close()
output = str(p2.communicate()[0])
if RESOLUTION_TYPE == 'largest':
# We are going to go through the connected devices and get the X/Y from the largest
matches = re.finditer(" connected ([0-9]+)x([0-9]+)+", output) # TODO: liefert einen Iterator, der immer „wahr" ist.
if matches:
largest = 0
for match in matches:
if int(match.group(1)) * int(match.group(2)) > largest:
resolution_x = match.group(1)
resolution_y = match.group(2)
else:
logger.warning("Could not determine largest screen resolution.")
else:
reg = re.search(".* current (.*?) x (.*?),.*", output)
if reg:
resolution_x = reg.group(1)
resolution_y = reg.group(2)
else:
logger.warning("Could not determine current screen resolution.")
# If we couldn't find anything automatically use what was set for the defaults
if resolution_x == 0 or resolution_y == 0:
resolution_x = DEFAULT_RESOLUTION_X
resolution_y = DEFAULT_RESOLUTION_Y
logger.warning("Could not determine resolution automatically. Using defaults.")
logger.info(f"Using detected resolution of {resolution_x}x{resolution_y}")
return int(resolution_x), int(resolution_y)
# Uses GLib to find the localized "Downloads" folder
# See: http://askubuntu.com/questions/137896/how-to-get-the-user-downloads-folder-location-with-python
def get_user_download_directory():
downloads_dir = GLib.get_user_special_dir(GLib.USER_DIRECTORY_DOWNLOAD)
if downloads_dir:
# Add any custom folder
new_path = os.path.join(downloads_dir, CUSTOM_FOLDER)
logger.info(f"Using automatically detected path: {new_path}")
else:
new_path = TEMPORARY_DOWNLOAD_PATH
logger.warning("Could not determine download folder with GLib. Using default.")
return new_path
# Download HTML of the site
def download_site(url):
logger.info("Downloading contents of the site to find the image name")
opener = urllib.request.build_opener()
req = urllib.request.Request(url)
try:
response = opener.open(req)
logger.info(f"Response: {response.read()}")
reply = response.read().decode()
except urllib.error.HTTPError as error:
logger.error(f"Error downloading {url} - {error.code}")
reply = f"Error: {error.code})"
return reply
# Finds the image URL and saves it
def get_image(text):
logger.info("Grabbing the image URL")
file_url, filename, file_size = get_image_info('a href', text)
# If file_url is None, the today's picture might be a video
if file_url is None:
return None
logger.info(f"Found name of image: {filename}")
save_to = os.path.join(TEMPORARY_DOWNLOAD_PATH, os.path.splitext(filename)[0] + '.png')
if not os.path.isfile(save_to):
# If the response body is less than 500 bytes, something went wrong
if file_size < 500:
logger.warning("Response less than 500 bytes, probably an error\nAttempting to just grab image source")
file_url, filename, file_size = get_image_info('img src', text)
# If file_url is None, the today's picture might be a video
if file_url is None:
return None
logger.info(f"Found name of image: {filename}")
if file_size < 500:
# Give up
logger.error("Could not find image to download")
exit()
logger.info("Retrieving image")
urllib.request.urlretrieve(file_url, save_to, print_download_status)
# Adding additional padding to ensure entire line
logger.info(f"\rDone downloading {human_readable_size(file_size)} ")
else:
urllib.request.urlretrieve(file_url, save_to)
else:
logger.info("File exists, moving on")
return save_to
def get_image_info(element, source):
# Grabs information about the image
soup = BeautifulSoup(str(source), 'lxml')
tags = soup.find_all('a')
if tags:
for tag in tags:
if tag.find("img"):
file_url = urllib.parse.urljoin(NASA_APOD_SITE, tag.get('href'))
# Create our handle for our remote file
logger.info("Opening remote URL")
remote_file = urllib.request.urlopen(file_url)
filename = os.path.basename(file_url) # TODO: Funktioniert an sich schon nicht garantiert, weil Pfade etwas anderes als URLs sind
file_size = float(remote_file.headers.get("content-length"))
return file_url, filename, file_size
else:
logger.warning("Could not find an image. May be a video today.")
return None, None, None
# Resizes the image to the provided dimensions
def resize_image(filename):
logger.info("Opening local image")
image = Image.open(filename)
current_x, current_y = image.size
if (current_x, current_y) == (DEFAULT_RESOLUTION_X, DEFAULT_RESOLUTION_Y):
logger.info("Images are currently equal in size. No need to scale.")
else:
logger.info("Resizing the image from", image.size[0], "x", image.size[1], "to", DEFAULT_RESOLUTION_X, "x", DEFAULT_RESOLUTION_Y)
image = image.resize((DEFAULT_RESOLUTION_X, DEFAULT_RESOLUTION_Y), Image.ANTIALIAS)
logger.info(f"Saving the image as {filename}")
with open(filename, 'wb'):
image.save(filename, 'PNG')
#file_handler.close()
# Sets the new image as the wallpaper
def set_gnome_wallpaper(file_path):
logger.info("Setting the wallpaper")
command = "gsettings set org.gnome.desktop.background picture-uri file://" + file_path
status, output = subprocess.getstatusoutput(command) # TODO: Statt subprocess.getstatusoutput etwas wie subprocess.run benutzen
return status
def print_download_status(block_count, block_size, total_size):
written_size = human_readable_size(block_count * block_size)
total_size = human_readable_size(total_size)
# Adding space padding at the end to ensure we overwrite the whole line
stdout.write(f"\r{written_size} bytes of {total_size} ")
stdout.flush()
def human_readable_size(number_bytes): # TODO: gibt bei Größen größer 1073741824 None zurück.
for x in ['bytes', 'KB', 'MB']:
if number_bytes < 1024.0:
return "%3.2f%s" % (number_bytes, x)
number_bytes /= 1024.0
# Creates the necessary XML so background images will scroll through
def create_desktop_background_scroll(filename):
if not IMAGE_SCROLL:
return filename
logger.info("Creating XML file for desktop background switching.")
filename = os.path.join(TEMPORARY_DOWNLOAD_PATH, '/nasa_apod_desktop_backgrounds.xml')
# Create our base, background element
background = etree.Element("background")
# Grab our PNGs we have downloaded
images = glob.glob(TEMPORARY_DOWNLOAD_PATH + "/*.png")
num_images = len(images)
if num_images < SEED_IMAGES:
# Let's seed some images
# Start with yesterday and continue going back until we have enough
logger.info("Downloading some seed images as well")
days_back = 0
seed_images_left = SEED_IMAGES
while seed_images_left > 0:
days_back += 1
logger.info(f"Downloading seed image ({seed_images_left} left):")
day_to_try = datetime.now() - timedelta(days=days_back)
# Filenames look like /apYYMMDD.html
seed_filename = os.path.join(NASA_APOD_SITE, "ap" + day_to_try.strftime("%y%m%d") + ".html")
seed_site_contents = download_site(seed_filename)
# Make sure we didn't encounter an error for some reason
if seed_site_contents == "error":
logger.error("Seed site contains an error")
continue
seed_filename = get_image(seed_site_contents)
# If the content was an video or some other error occurred, skip the
# rest.
if seed_filename is None:
continue
resize_image(seed_filename)
# Add this to our list of images
images.append(seed_filename)
seed_images_left -= 1
logger.info("Done downloading seed images")
# Get our images in a random order so we get a new order every time we get a new file
random.shuffle(images)
# Recalculate the number of pictures
num_images = len(images)
for i, image in enumerate(images):
# Create a static entry for keeping this image here for IMAGE_DURATION
static = etree.SubElement(background, "static")
# Length of time the background stays
duration = etree.SubElement(static, "duration")
duration.text = str(IMAGE_DURATION)
# Assign the name of the file for our static entry
static_file = etree.SubElement(static, "file")
static_file.text = images[i]
# Create a transition for the animation with a from and to
transition = etree.SubElement(background, "transition")
# Length of time for the switch animation
transition_duration = etree.SubElement(transition, "duration")
transition_duration.text = "5"
# We are always transitioning from the current file
transition_from = etree.SubElement(transition, "from")
transition_from.text = images[i]
# Create our tranition to element
transition_to = etree.SubElement(transition, "to")
# Check to see if we're at the end, if we are use the first image as the image to
if i + 1 == num_images:
transition_to.text = images[0]
else:
transition_to.text = images[i + 1]
xml_tree = etree.ElementTree(background)
xml_tree.write(filename, pretty_print=True)
return filename
if __name__ == '__main__':
logger.info("Starting")
# Find desktop resolution
DEFAULT_RESOLUTION_X, DEFAULT_RESOLUTION_Y = find_display_resolution()
# Set a localized download folder
TEMPORARY_DOWNLOAD_PATH = get_user_download_directory()
# Create the download path if it doesn't exist
if not os.path.exists(os.path.expanduser(TEMPORARY_DOWNLOAD_PATH)):
os.makedirs(os.path.expanduser(TEMPORARY_DOWNLOAD_PATH))
# Grab the HTML contents of the file
site_contents = download_site(NASA_APOD_SITE)
if site_contents == "error":
logger.error("Could not contact site.")
exit()
# Download the image
filename = get_image(site_contents)
if filename is not None:
# Resize the image
resize_image(filename)
# Create the desktop switching xml
filename = create_desktop_background_scroll(filename)
# If the script was unable todays image and IMAGE_SCROLL is set to False,
# the script exits
if filename is None:
logger.error("Today's image could not be downloaded.")
exit()
# Set the wallpaper
status = set_gnome_wallpaper(filename)
logger.info("Finished!")
Atalanttore
- __blackjack__
- User
- Beiträge: 13004
- Registriert: Samstag 2. Juni 2018, 10:21
- Wohnort: 127.0.0.1
- Kontaktdaten:
@Atalanttore: „Konkatenieren per Komma“ klingt irgendwie so als würde das Komma da irgendetwas besonderes bedeuten. Das hat bei `print()` die gleiche Bedeutung wie bei den `Logger`-Methoden, wie bei allen anderen Funktionen und Methoden: Argumente voneinander trennen. `print()` konkateniert die Argumente auch nicht, sondern gibt die einfach nur der Reihe nach aus, mit einem Leerzeichen dazwischen, beziehungsweise was auch immer als `sep`-Schlüsselwortargument übergeben wurde.
Die Logger-Methoden nehmen auch beliebig viele Positionsargumente entgegen und formatieren die in das erste Argument hinein *falls* eine Ausgabe stattfinden soll. Das ist für Fälle gedacht, wo die Umwandlung eines Arguments in eine Zeichenkette relativ ”teuer” ist, so dass sie nur passieren muss, wenn die Nachricht überhaupt protokolliert werden soll.
`get_image_info()` sieht jetzt sinnvoller aus.
Edit: Diesmal ist mir `download_site()` aufgefallen das im Fehlerfall einen speziellen Fehlerwert liefert, der aber vom gleichen Typ ist wie ein gültiges Ergebnis. Und zwar ist der Fehlerwert die Zeichekette f"Error: {error.code})". Da wo die Funktion aufgerufen wird, wird dann aber auf Gleichheit mit 'error' geprüft. Es wird in der Funktion eine Ausnahme durch einen speziellen Fehlerwert ersetzt, denn den alle Aufrufer explizit prüfen müssen. Genau um solche fragilen Fehlerbehandlungen loszuwerden wurden Ausnahmen erfunden.
Die Logger-Methoden nehmen auch beliebig viele Positionsargumente entgegen und formatieren die in das erste Argument hinein *falls* eine Ausgabe stattfinden soll. Das ist für Fälle gedacht, wo die Umwandlung eines Arguments in eine Zeichenkette relativ ”teuer” ist, so dass sie nur passieren muss, wenn die Nachricht überhaupt protokolliert werden soll.
`get_image_info()` sieht jetzt sinnvoller aus.
Edit: Diesmal ist mir `download_site()` aufgefallen das im Fehlerfall einen speziellen Fehlerwert liefert, der aber vom gleichen Typ ist wie ein gültiges Ergebnis. Und zwar ist der Fehlerwert die Zeichekette f"Error: {error.code})". Da wo die Funktion aufgerufen wird, wird dann aber auf Gleichheit mit 'error' geprüft. Es wird in der Funktion eine Ausnahme durch einen speziellen Fehlerwert ersetzt, denn den alle Aufrufer explizit prüfen müssen. Genau um solche fragilen Fehlerbehandlungen loszuwerden wurden Ausnahmen erfunden.
“Most people find the concept of programming obvious, but the doing impossible.” — Alan J. Perlis
`get_image_info` sieht immer noch falsch aus. Ähnlich wie urljoin gibt es auch ein urlsplit oder urlparse um eine URL wieder auseinander zu nehmen. Ein `return` tief verschachtelt in einer for-Schleife ist schwierig zu lesen.
Das Problem ist aber, dass falls kein a-Tag mit einem img-Tag gefunden wird, None zurückgeliefert wird statt (None, None, None). Das ist schlecht, weil unerwartet und den Fall prüfst Du aber beim Aufrufen auch nicht ab.
Warum ist file_size ein Float? Willst Du auch halbe Bytes verarbeiten können?
Das Problem ist aber, dass falls kein a-Tag mit einem img-Tag gefunden wird, None zurückgeliefert wird statt (None, None, None). Das ist schlecht, weil unerwartet und den Fall prüfst Du aber beim Aufrufen auch nicht ab.
Warum ist file_size ein Float? Willst Du auch halbe Bytes verarbeiten können?
-
- User
- Beiträge: 407
- Registriert: Freitag 6. August 2010, 17:03
@__blackjack__: Danke für die Erklärungen. Sollte die Funktion `download_site()` eine Exception (vielleicht einen `ConnectionError`) zurückgeben, wenn keine Seite heruntergeladen werden konnte?
@Sirius3: Danke für die Erklärungen. Die Funktion `get_image_info()` habe ich weiter angepasst.
Für `urlsplit` oder `urlparse` habe ich noch keinen Beispielcode gefunden, wie man einfach und ohne verschachtelte reguläre Ausdrücke an den Dateinamen kommt. Wie würdest du es machen?
Aktuelle Version der Funktion `get_image_info()` [der restliche Code hat sich nicht geändert]:
Gruß
Atalanttore
@Sirius3: Danke für die Erklärungen. Die Funktion `get_image_info()` habe ich weiter angepasst.
Für `urlsplit` oder `urlparse` habe ich noch keinen Beispielcode gefunden, wie man einfach und ohne verschachtelte reguläre Ausdrücke an den Dateinamen kommt. Wie würdest du es machen?
Aktuelle Version der Funktion `get_image_info()` [der restliche Code hat sich nicht geändert]:
Code: Alles auswählen
def get_image_info(element, source):
# Grabs information about the image
file_url = None
file_name = None
file_size = None
soup = BeautifulSoup(str(source), 'lxml')
tags = soup.find_all('a')
print("Tags:", tags) # Liste ist immer leer :(
if tags:
for tag in tags:
if tag.find("img"):
file_url = urllib.parse.urljoin(NASA_APOD_SITE, tag.get('href'))
# Create our handle for our remote file
logger.info("Opening remote URL")
remote_file = urllib.request.urlopen(file_url)
file_name = os.path.basename(file_url) # TODO: Funktioniert an sich schon nicht garantiert, weil Pfade etwas anderes als URLs sind
file_size = int(remote_file.headers.get("content-length"))
else:
logger.warning("Could not find an image. May be a video today.")
return file_url, file_name, file_size
Atalanttore
@Atalanttore: jetzt hast Du noch das Problem, dass im Falle dass kein a-Tag ein img-Tag enthält keine Warnung ausgegeben wird.
Der Parameter `element` wird gar nicht benutzt.
Auch an anderen Stellen benutzt Du Rückgabewerte (None) wo es besser wäre Exceptions zu benutzen. `exit` sollte in einem sauberen Programm gar nicht vorkommen, weil bei man für solche Funktionen gar keine Fehlerbehandlung machen kann.
Der Parameter `element` wird gar nicht benutzt.
Auch an anderen Stellen benutzt Du Rückgabewerte (None) wo es besser wäre Exceptions zu benutzen. `exit` sollte in einem sauberen Programm gar nicht vorkommen, weil bei man für solche Funktionen gar keine Fehlerbehandlung machen kann.
-
- User
- Beiträge: 407
- Registriert: Freitag 6. August 2010, 17:03
@Sirius3: Danke für die Vorschläge.
Durch das Auskommentieren eines Loggers, der `response.read()` in der Funktion `download_site()` aufruft, gibt die Funktion nun auch den heruntergeladenen HTML-Quellcode zurück. Warum ist das so?
Aktueller Code:
Gruß
Atalanttore
Durch das Auskommentieren eines Loggers, der `response.read()` in der Funktion `download_site()` aufruft, gibt die Funktion nun auch den heruntergeladenen HTML-Quellcode zurück. Warum ist das so?
Aktueller Code:
Code: Alles auswählen
from gi.repository import GLib
from bs4 import BeautifulSoup
import logging
import subprocess
import urllib.request, urllib.parse, urllib.error
import re
import os
import random
import glob
from PIL import Image
from sys import stdout
from sys import exit
from lxml import etree
from datetime import datetime, timedelta
NASA_APOD_SITE = 'http://apod.nasa.gov/apod/'
TEMPORARY_DOWNLOAD_PATH = '/tmp/backgrounds/'
CUSTOM_FOLDER = 'nasa-apod-backgrounds'
RESOLUTION_TYPE = 'stretch'
DEFAULT_RESOLUTION_X = 1024
DEFAULT_RESOLUTION_Y = 768
IMAGE_SCROLL = True
IMAGE_DURATION = 1200
SEED_IMAGES = 10
SHOW_DEBUG = False
LOG_LEVEL = logging.DEBUG
LOG_FORMAT = '%(asctime)s %(name)s: %(message)s'
logger = logging.getLogger(__name__)
logger.setLevel(LOG_LEVEL)
formatter = logging.Formatter(LOG_FORMAT)
stream_handler = logging.StreamHandler()
stream_handler.setFormatter(formatter)
logger.addHandler(stream_handler)
# Use XRandR to grab the desktop resolution. If the scaling method is set to 'largest',
# we will attempt to grab it from the largest connected device. If the scaling method
# is set to 'stretch' we will grab it from the current value. Default will simply use
# what was set for the default resolutions.
def find_display_resolution():
if RESOLUTION_TYPE == 'default':
logger.info(f"Using default resolution of {DEFAULT_RESOLUTION_X}x{DEFAULT_RESOLUTION_Y}")
return DEFAULT_RESOLUTION_X, DEFAULT_RESOLUTION_Y
resolution_x = 0
resolution_y = 0
logger.info("Attempting to determine the current resolution.")
if RESOLUTION_TYPE == 'largest':
regex_search = 'connected'
else:
regex_search = 'current'
p1 = subprocess.Popen(["xrandr"], stdout=subprocess.PIPE)
p2 = subprocess.Popen(["grep", regex_search], stdin=p1.stdout, stdout=subprocess.PIPE) # TODO: Pythons re-Modul verwenden
p3 = re.findall(regex_search, str(p1.communicate()[0]))
p1.stdout.close()
output = str(p2.communicate()[0])
if RESOLUTION_TYPE == 'largest':
# We are going to go through the connected devices and get the X/Y from the largest
matches = re.finditer(" connected ([0-9]+)x([0-9]+)+", output) # TODO: liefert einen Iterator, der immer „wahr" ist.
if matches:
largest = 0
for match in matches:
if int(match.group(1)) * int(match.group(2)) > largest:
resolution_x = match.group(1)
resolution_y = match.group(2)
else:
logger.warning("Could not determine largest screen resolution.")
else:
reg = re.search(".* current (.*?) x (.*?),.*", output)
if reg:
resolution_x = reg.group(1)
resolution_y = reg.group(2)
else:
logger.warning("Could not determine current screen resolution.")
# If we couldn't find anything automatically use what was set for the defaults
if resolution_x == 0 or resolution_y == 0:
resolution_x = DEFAULT_RESOLUTION_X
resolution_y = DEFAULT_RESOLUTION_Y
logger.warning("Could not determine resolution automatically. Using defaults.")
logger.info(f"Using detected resolution of {resolution_x}x{resolution_y}")
return int(resolution_x), int(resolution_y)
# Uses GLib to find the localized "Downloads" folder
# See: http://askubuntu.com/questions/137896/how-to-get-the-user-downloads-folder-location-with-python
def get_user_download_directory():
downloads_dir = GLib.get_user_special_dir(GLib.USER_DIRECTORY_DOWNLOAD)
if downloads_dir:
# Add any custom folder
new_path = os.path.join(downloads_dir, CUSTOM_FOLDER)
logger.info(f"Using automatically detected path: {new_path}")
else:
new_path = TEMPORARY_DOWNLOAD_PATH
logger.warning("Could not determine download folder with GLib. Using default.")
return new_path
# Download HTML of the site
def download_site(url):
logger.info("Downloading contents of the site to find the image name")
opener = urllib.request.build_opener()
req = urllib.request.Request(url)
try:
response = opener.open(req)
#logger.info(f"Response: {response.read()}")
reply = response.read().decode()
except urllib.error.HTTPError as error:
logger.error(f"Error downloading {url} - {error.code}")
reply = "error"
return reply
# Finds the image URL and saves it
def get_image(text):
logger.info("Grabbing the image URL")
file_url, filename, file_size = get_image_info(text)
# If file_url is None, the today's picture might be a video
if file_url is None:
return None
logger.info(f"Found name of image: {filename}")
save_to = os.path.join(TEMPORARY_DOWNLOAD_PATH, os.path.splitext(filename)[0] + '.png')
if not os.path.isfile(save_to):
# If the response body is less than 500 bytes, something went wrong
if file_size < 500:
logger.warning("Response less than 500 bytes, probably an error\nAttempting to just grab image source")
file_url, filename, file_size = get_image_info(text)
# If file_url is None, the today's picture might be a video
if file_url is None:
return None
logger.info(f"Found name of image: {filename}")
if file_size < 500:
# Give up
logger.error("Could not find image to download")
exit()
logger.info("Retrieving image")
urllib.request.urlretrieve(file_url, save_to, print_download_status)
# Adding additional padding to ensure entire line
logger.info(f"\rDone downloading {human_readable_size(file_size)} ")
else:
urllib.request.urlretrieve(file_url, save_to)
else:
logger.info("File exists, moving on")
return save_to
def get_image_info(source):
# Grabs information about the image
file_url = None
file_name = None
file_size = None
soup = BeautifulSoup(str(source), 'lxml')
tags = soup.find_all('a')
print("Tags:", tags) # Liste ist immer leer :(
if tags:
for tag in tags:
if tag.find("img"):
file_url = urllib.parse.urljoin(NASA_APOD_SITE, tag.get('href'))
# Create our handle for our remote file
logger.info("Opening remote URL")
remote_file = urllib.request.urlopen(file_url)
file_name = os.path.basename(file_url) # TODO: Funktioniert an sich schon nicht garantiert, weil Pfade etwas anderes als URLs sind
file_size = int(remote_file.headers.get("content-length"))
else:
logger.warning("Could not find an image. May be a video today.")
return file_url, file_name, file_size
# Resizes the image to the provided dimensions
def resize_image(filename):
logger.info("Opening local image")
image = Image.open(filename)
current_x, current_y = image.size
if (current_x, current_y) == (DEFAULT_RESOLUTION_X, DEFAULT_RESOLUTION_Y):
logger.info("Images are currently equal in size. No need to scale.")
else:
logger.info("Resizing the image from", image.size[0], "x", image.size[1], "to", DEFAULT_RESOLUTION_X, "x", DEFAULT_RESOLUTION_Y)
image = image.resize((DEFAULT_RESOLUTION_X, DEFAULT_RESOLUTION_Y), Image.ANTIALIAS)
logger.info(f"Saving the image as {filename}")
with open(filename, 'wb'):
image.save(filename, 'PNG')
# Sets the new image as the wallpaper
def set_gnome_wallpaper(file_path):
logger.info("Setting the wallpaper")
command = "gsettings set org.gnome.desktop.background picture-uri file://" + file_path
status, output = subprocess.getstatusoutput(command) # TODO: Statt subprocess.getstatusoutput etwas wie subprocess.run benutzen
return status
def print_download_status(block_count, block_size, total_size):
written_size = human_readable_size(block_count * block_size)
total_size = human_readable_size(total_size)
# Adding space padding at the end to ensure we overwrite the whole line
stdout.write(f"\r{written_size} bytes of {total_size} ")
stdout.flush()
def human_readable_size(number_bytes): # TODO: gibt bei Größen größer 1073741824 None zurück.
for x in ['bytes', 'KB', 'MB']:
if number_bytes < 1024.0:
return "%3.2f%s" % (number_bytes, x)
number_bytes /= 1024.0
# Creates the necessary XML so background images will scroll through
def create_desktop_background_scroll(filename):
if not IMAGE_SCROLL:
return filename
logger.info("Creating XML file for desktop background switching.")
filename = os.path.join(TEMPORARY_DOWNLOAD_PATH, '/nasa_apod_desktop_backgrounds.xml')
# Create our base, background element
background = etree.Element("background")
# Grab our PNGs we have downloaded
images = glob.glob(TEMPORARY_DOWNLOAD_PATH + "/*.png")
num_images = len(images)
if num_images < SEED_IMAGES:
# Let's seed some images
# Start with yesterday and continue going back until we have enough
logger.info("Downloading some seed images as well")
days_back = 0
seed_images_left = SEED_IMAGES
while seed_images_left > 0:
days_back += 1
logger.info(f"Downloading seed image ({seed_images_left} left):")
day_to_try = datetime.now() - timedelta(days=days_back)
# Filenames look like /apYYMMDD.html
seed_filename = os.path.join(NASA_APOD_SITE, "ap" + day_to_try.strftime("%y%m%d") + ".html")
seed_site_contents = download_site(seed_filename)
# Make sure we didn't encounter an error for some reason
if seed_site_contents == "error":
logger.error("Seed site contains an error")
continue
seed_filename = get_image(seed_site_contents)
# If the content was an video or some other error occurred, skip the
# rest.
if seed_filename is None:
continue
resize_image(seed_filename)
# Add this to our list of images
images.append(seed_filename)
seed_images_left -= 1
logger.info("Done downloading seed images")
# Get our images in a random order so we get a new order every time we get a new file
random.shuffle(images)
# Recalculate the number of pictures
num_images = len(images)
for i, image in enumerate(images):
# Create a static entry for keeping this image here for IMAGE_DURATION
static = etree.SubElement(background, "static")
# Length of time the background stays
duration = etree.SubElement(static, "duration")
duration.text = str(IMAGE_DURATION)
# Assign the name of the file for our static entry
static_file = etree.SubElement(static, "file")
static_file.text = images[i]
# Create a transition for the animation with a from and to
transition = etree.SubElement(background, "transition")
# Length of time for the switch animation
transition_duration = etree.SubElement(transition, "duration")
transition_duration.text = "5"
# We are always transitioning from the current file
transition_from = etree.SubElement(transition, "from")
transition_from.text = images[i]
# Create our tranition to element
transition_to = etree.SubElement(transition, "to")
# Check to see if we're at the end, if we are use the first image as the image to
if i + 1 == num_images:
transition_to.text = images[0]
else:
transition_to.text = images[i + 1]
xml_tree = etree.ElementTree(background)
xml_tree.write(filename, pretty_print=True)
return filename
if __name__ == '__main__':
logger.info("Starting")
# Find desktop resolution
DEFAULT_RESOLUTION_X, DEFAULT_RESOLUTION_Y = find_display_resolution()
# Set a localized download folder
TEMPORARY_DOWNLOAD_PATH = get_user_download_directory()
# Create the download path if it doesn't exist
if not os.path.exists(os.path.expanduser(TEMPORARY_DOWNLOAD_PATH)):
os.makedirs(os.path.expanduser(TEMPORARY_DOWNLOAD_PATH))
# Grab the HTML contents of the file
site_contents = download_site(NASA_APOD_SITE)
if site_contents == "error":
logger.error("Could not contact site.")
exit() # TODO: `exit` sollte in einem sauberen Programm gar nicht vorkommen
# Download the image
filename = get_image(site_contents)
if filename is not None:
# Resize the image
resize_image(filename)
# Create the desktop switching xml
filename = create_desktop_background_scroll(filename)
# If the script was unable todays image and IMAGE_SCROLL is set to False,
# the script exits
if filename is None:
logger.error("Today's image could not be downloaded.")
exit() # TODO: `exit` sollte in einem sauberen Programm gar nicht vorkommen
# Set the wallpaper
status = set_gnome_wallpaper(filename)
logger.info("Finished!")
Atalanttore
- __blackjack__
- User
- Beiträge: 13004
- Registriert: Samstag 2. Juni 2018, 10:21
- Wohnort: 127.0.0.1
- Kontaktdaten:
@Atalanttore: Weil `response.read()` die gesamte Antwort liest. Die ist dann ”weg”, wie das bei Dateien so üblich ist.
“Most people find the concept of programming obvious, but the doing impossible.” — Alan J. Perlis
-
- User
- Beiträge: 407
- Registriert: Freitag 6. August 2010, 17:03
@__blackjack__: Danke, dass wusste ich noch nicht. Mit dieser Info bin ich bei der Programmausführung nun wieder ein Stück weiter gekommen.
Es erscheinen nun folgende Fehlermeldungen nachdem insgesamt 4 Bilder heruntergeladen wurden:
Aktueller Code:
Gruß
Atalanttore
Es erscheinen nun folgende Fehlermeldungen nachdem insgesamt 4 Bilder heruntergeladen wurden:
Code: Alles auswählen
2019-07-13 17:57:29,137 __main__: Done downloading images
Traceback (most recent call last):
File "/home/ata/PycharmProjects/nasa-apod-desktop/nasa_apod_desktop.py", line 384, in <module>
filename = create_desktop_background_scroll(filename)
File "/home/ata/PycharmProjects/nasa-apod-desktop/nasa_apod_desktop.py", line 353, in create_desktop_background_scroll
xml_tree.write(filename, pretty_print=True)
File "src/lxml/etree.pyx", line 2039, in lxml.etree._ElementTree.write
File "src/lxml/serializer.pxi", line 721, in lxml.etree._tofilelike
File "src/lxml/serializer.pxi", line 780, in lxml.etree._create_output_buffer
File "src/lxml/serializer.pxi", line 770, in lxml.etree._create_output_buffer
PermissionError: [Errno 13] Permission denied
- Warum scheitert es an einer fehlenden Berechtigung?
Im Code wird an mehreren Stellen der Wert einer Konstante mit dem Rückgabewert einer Funktion ersetzt.
Z.B.:Code: Alles auswählen
DEFAULT_RESOLUTION_X, DEFAULT_RESOLUTION_Y = find_display_resolution()
- Was ist davon zu halten?
Aktueller Code:
Code: Alles auswählen
from gi.repository import GLib
from bs4 import BeautifulSoup
import logging
import subprocess
import urllib.request, urllib.parse, urllib.error
import re
import os
import random
import glob
from PIL import Image
from sys import stdout
from sys import exit
from lxml import etree
from datetime import datetime, timedelta
NASA_APOD_SITE = 'http://apod.nasa.gov/apod/'
TEMPORARY_DOWNLOAD_PATH = '/tmp/backgrounds/'
CUSTOM_FOLDER = 'nasa-apod-backgrounds'
RESOLUTION_TYPE = 'stretch'
DEFAULT_RESOLUTION_X = 1024
DEFAULT_RESOLUTION_Y = 768
IMAGE_SCROLL = True
IMAGE_DURATION = 1200
COUNT_IMAGES_FROM_PREVIOUS_DAYS = 3
LOG_LEVEL = logging.DEBUG
LOG_FORMAT = '%(asctime)s %(name)s: %(message)s'
logger = logging.getLogger(__name__)
logger.setLevel(LOG_LEVEL)
formatter = logging.Formatter(LOG_FORMAT)
stream_handler = logging.StreamHandler()
stream_handler.setFormatter(formatter)
logger.addHandler(stream_handler)
# Use XRandR to grab the desktop resolution. If the scaling method is set to 'largest',
# we will attempt to grab it from the largest connected device. If the scaling method
# is set to 'stretch' we will grab it from the current value. Default will simply use
# what was set for the default resolutions.
def find_display_resolution(): # TODO: Überhaupt notwendig?
if RESOLUTION_TYPE == 'default':
logger.info(f"Using default resolution of {DEFAULT_RESOLUTION_X}x{DEFAULT_RESOLUTION_Y}")
return DEFAULT_RESOLUTION_X, DEFAULT_RESOLUTION_Y
resolution_x = 0
resolution_y = 0
logger.info("Attempting to determine the current resolution.")
if RESOLUTION_TYPE == 'largest':
regex_search = 'connected'
else:
regex_search = 'current'
p1 = subprocess.Popen(["xrandr"], stdout=subprocess.PIPE)
p2 = subprocess.Popen(["grep", regex_search], stdin=p1.stdout, stdout=subprocess.PIPE) # TODO: Pythons re-Modul verwenden
p3 = re.findall(regex_search, str(p1.communicate()[0]))
p1.stdout.close()
output = str(p2.communicate()[0])
if RESOLUTION_TYPE == 'largest':
# We are going to go through the connected devices and get the X/Y from the largest
matches = re.finditer(" connected ([0-9]+)x([0-9]+)+", output) # TODO: liefert einen Iterator, der immer „wahr" ist.
if matches:
largest = 0
for match in matches:
if int(match.group(1)) * int(match.group(2)) > largest:
resolution_x = match.group(1)
resolution_y = match.group(2)
else:
logger.warning("Could not determine largest screen resolution.")
else:
reg = re.search(".* current (.*?) x (.*?),.*", output)
if reg:
resolution_x = reg.group(1)
resolution_y = reg.group(2)
else:
logger.warning("Could not determine current screen resolution.")
# If we couldn't find anything automatically use what was set for the defaults
if resolution_x == 0 or resolution_y == 0:
resolution_x = DEFAULT_RESOLUTION_X
resolution_y = DEFAULT_RESOLUTION_Y
logger.warning("Could not determine resolution automatically. Using defaults.")
logger.info(f"Using detected resolution of {resolution_x}x{resolution_y}")
return int(resolution_x), int(resolution_y)
# Uses GLib to find the localized "Downloads" folder
# See: http://askubuntu.com/questions/137896/how-to-get-the-user-downloads-folder-location-with-python
def get_user_download_directory():
downloads_dir = GLib.get_user_special_dir(GLib.USER_DIRECTORY_DOWNLOAD)
if downloads_dir:
# Add any custom folder
new_path = os.path.join(downloads_dir, CUSTOM_FOLDER)
logger.info(f"Using automatically detected path: {new_path}")
else:
new_path = TEMPORARY_DOWNLOAD_PATH
logger.warning("Could not determine download folder with GLib. Using default.")
return new_path
# Download HTML of the site
def download_site(url):
logger.info("Downloading contents of the site to find the image name")
opener = urllib.request.build_opener()
req = urllib.request.Request(url)
try:
response = opener.open(req)
site = response.read()
logger.info(f"Response: {site}")
reply = site.decode()
except urllib.error.HTTPError as error:
logger.error(f"Error downloading {url} - {error.code}")
reply = "error"
return reply
# Finds the image URL and saves it
def get_image(text):
logger.info("Grabbing the image URL")
file_url, filename, file_size = get_image_info(text)
# If file_url is None, the today's picture might be a video
if file_url is None:
return None # TODO: Exception benutzen
logger.info(f"Found name of image: {filename}")
save_to = os.path.join(TEMPORARY_DOWNLOAD_PATH, os.path.splitext(filename)[0] + '.png')
if not os.path.isfile(save_to):
# If the response body is less than 500 bytes, something went wrong
if file_size < 500:
logger.warning("Response less than 500 bytes, probably an error\nAttempting to just grab image source")
file_url, filename, file_size = get_image_info(text)
# If file_url is None, the today's picture might be a video
if file_url is None:
return None # TODO: Exception benutzen
logger.info(f"Found name of image: {filename}")
if file_size < 500:
# Give up
logger.error("Could not find image to download")
exit() # TODO: `exit` sollte in einem sauberen Programm gar nicht vorkommen
logger.info("Retrieving image")
urllib.request.urlretrieve(file_url, save_to, print_download_status)
# Adding additional padding to ensure entire line
logger.info(f"\rDone downloading {human_readable_size(file_size)} ")
else:
urllib.request.urlretrieve(file_url, save_to)
else:
logger.info("File exists, moving on")
return save_to
def get_image_info(source):
# Grabs information about the image
file_url = None
file_name = None
file_size = None
soup = BeautifulSoup(str(source), 'lxml')
tags = soup.find_all('a')
if tags:
for tag in tags:
if tag.find("img"): # TODO: Warnung ausgegeben, wenn kein a-Tag ein img-Tag enthält
file_url = urllib.parse.urljoin(NASA_APOD_SITE, tag.get('href'))
# Create our handle for our remote file
logger.info("Opening remote URL")
remote_file = urllib.request.urlopen(file_url)
file_name = os.path.basename(file_url) # TODO: Funktioniert an sich schon nicht garantiert, weil Pfade etwas anderes als URLs sind
file_size = int(remote_file.headers.get("content-length"))
else:
logger.warning("Could not find an image. May be a video today.")
return file_url, file_name, file_size
# Resizes the image to the provided dimensions
def resize_image(filename):
logger.info("Opening local image")
image = Image.open(filename)
current_x, current_y = image.size
if (current_x, current_y) == (DEFAULT_RESOLUTION_X, DEFAULT_RESOLUTION_Y):
logger.info("Images are currently equal in size. No need to scale.")
else:
logger.info(f"Resizing the image from {image.size[0]} x {image.size[1]} to {DEFAULT_RESOLUTION_X} x {DEFAULT_RESOLUTION_Y}")
image = image.resize((DEFAULT_RESOLUTION_X, DEFAULT_RESOLUTION_Y), Image.ANTIALIAS)
logger.info(f"Saving the image as {filename}")
with open(filename, 'wb'):
image.save(filename, 'PNG')
# Sets the new image as the wallpaper
def set_gnome_wallpaper(file_path):
logger.info("Setting the wallpaper")
command = "gsettings set org.gnome.desktop.background picture-uri file://" + file_path
status, output = subprocess.getstatusoutput(command) # TODO: Statt subprocess.getstatusoutput etwas wie subprocess.run benutzen
return status
def print_download_status(block_count, block_size, total_size):
written_size = human_readable_size(block_count * block_size)
total_size = human_readable_size(total_size)
# Adding space padding at the end to ensure we overwrite the whole line
stdout.write(f"\r{written_size} bytes of {total_size} ")
stdout.flush()
def human_readable_size(number_bytes): # TODO: gibt bei Größen größer 1073741824 None zurück.
for x in ['bytes', 'KB', 'MB']:
if number_bytes < 1024.0:
return "%3.2f%s" % (number_bytes, x)
number_bytes /= 1024.0
# Creates the necessary XML so background images will scroll through
def create_desktop_background_scroll(filename):
if not IMAGE_SCROLL:
return filename
logger.info("Creating XML file for desktop background switching.")
filename = os.path.join(TEMPORARY_DOWNLOAD_PATH, '/nasa_apod_desktop_backgrounds.xml')
# Create our base, background element
background = etree.Element("background")
# Grab our PNGs we have downloaded
images = glob.glob(TEMPORARY_DOWNLOAD_PATH + "/*.png")
num_images = len(images)
if num_images < COUNT_IMAGES_FROM_PREVIOUS_DAYS:
# Start with yesterday and continue going back until we have enough
logger.info("Downloading images of previous days as well")
days_back = 0
images_left = COUNT_IMAGES_FROM_PREVIOUS_DAYS
while images_left > 0:
days_back += 1
logger.info(f"Downloading image ({images_left} left):")
day_to_try = datetime.now() - timedelta(days=days_back)
# Filenames look like /apYYMMDD.html
archive_filename = os.path.join(NASA_APOD_SITE, "ap" + day_to_try.strftime("%y%m%d") + ".html")
archive_site_contents = download_site(archive_filename)
# Make sure we didn't encounter an error for some reason
if archive_site_contents == "error":
logger.error("Archive site contains an error")
continue
archive_filename = get_image(archive_site_contents)
# If the content was an video or some other error occurred, skip the
# rest.
if archive_filename is None:
continue
resize_image(archive_filename)
# Add this to our list of images
images.append(archive_filename)
images_left -= 1
logger.info("Done downloading images")
# Get our images in a random order so we get a new order every time we get a new file
random.shuffle(images)
# Recalculate the number of pictures
num_images = len(images)
for i, image in enumerate(images):
# Create a static entry for keeping this image here for IMAGE_DURATION
static = etree.SubElement(background, "static")
# Length of time the background stays
duration = etree.SubElement(static, "duration")
duration.text = str(IMAGE_DURATION)
# Assign the name of the file for our static entry
static_file = etree.SubElement(static, "file")
static_file.text = images[i]
# Create a transition for the animation with a from and to
transition = etree.SubElement(background, "transition")
# Length of time for the switch animation
transition_duration = etree.SubElement(transition, "duration")
transition_duration.text = "5"
# We are always transitioning from the current file
transition_from = etree.SubElement(transition, "from")
transition_from.text = images[i]
# Create our tranition to element
transition_to = etree.SubElement(transition, "to")
# Check to see if we're at the end, if we are use the first image as the image to
if i + 1 == num_images:
transition_to.text = images[0]
else:
transition_to.text = images[i + 1]
xml_tree = etree.ElementTree(background)
xml_tree.write(filename, pretty_print=True)
return filename
if __name__ == '__main__':
logger.info("Starting")
# Find desktop resolution
DEFAULT_RESOLUTION_X, DEFAULT_RESOLUTION_Y = find_display_resolution()
# Set a localized download folder
TEMPORARY_DOWNLOAD_PATH = get_user_download_directory()
# Create the download path if it doesn't exist
if not os.path.exists(os.path.expanduser(TEMPORARY_DOWNLOAD_PATH)):
os.makedirs(os.path.expanduser(TEMPORARY_DOWNLOAD_PATH))
# Grab the HTML contents of the file
site_contents = download_site(NASA_APOD_SITE)
if site_contents == "error":
logger.error("Could not contact site.")
exit() # TODO: `exit` sollte in einem sauberen Programm gar nicht vorkommen
# Download the image
filename = get_image(site_contents)
if filename is not None:
# Resize the image
resize_image(filename)
# Create the desktop switching xml
filename = create_desktop_background_scroll(filename)
# If the script was unable todays image and IMAGE_SCROLL is set to False,
# the script exits
if filename is None:
logger.error("Today's image could not be downloaded.")
exit() # TODO: `exit` sollte in einem sauberen Programm gar nicht vorkommen
# Set the wallpaper
status = set_gnome_wallpaper(filename)
logger.info("Finished!")
Atalanttore
- __blackjack__
- User
- Beiträge: 13004
- Registriert: Samstag 2. Juni 2018, 10:21
- Wohnort: 127.0.0.1
- Kontaktdaten:
@Atalanttore: Lass Dir doch mal ausgeben was da versucht wird wohin zu speichern.
“Most people find the concept of programming obvious, but the doing impossible.” — Alan J. Perlis
-
- User
- Beiträge: 407
- Registriert: Freitag 6. August 2010, 17:03
@__blackjack__: Danke für den Tipp. Nach `/nasa_apod_desktop_backgrounds.xml` mit Benutzerrechten speichern funktioniert natürlich nicht.
Vor dem Dateinamen habe ich nun den Schrägstrich entfernt und die Zeile mit dem zu `path_with_filename` geänderten Bezeichner sieht nun so aus.
Nun hat sich erstmalig das Desktop-Hintergrundbild geändert und es erscheint jetzt ein heruntergeladenes Astronomisches Bild des Tages. Der zeitgesteuerte Wechsel der Bilder funktioniert allerdings noch nicht. Die Auflösung des Bildes ist auch nicht ideal und das Seitenverhältnis wurde nicht beibehalten.
Ist es überhaupt notwendig, die Größe der heruntergeladenen Bilder mit der Funktion `resize_image()` an die Bildschirmauflösung anzupassen?
Gruß
Atalanttore
Vor dem Dateinamen habe ich nun den Schrägstrich entfernt und die Zeile mit dem zu `path_with_filename` geänderten Bezeichner sieht nun so aus.
Code: Alles auswählen
path_with_filename = os.path.join(TEMPORARY_DOWNLOAD_PATH, 'nasa_apod_desktop_backgrounds.xml')
Ist es überhaupt notwendig, die Größe der heruntergeladenen Bilder mit der Funktion `resize_image()` an die Bildschirmauflösung anzupassen?
Gruß
Atalanttore
-
- User
- Beiträge: 407
- Registriert: Freitag 6. August 2010, 17:03
__blackjack__ hat geschrieben: ↑Sonntag 7. Juli 2019, 21:55 Dann wird auch wieder etwas aus `os.path` mit einer URL verwendet. Das funktioniert an sich schon nicht garantiert, weil Pfade etwas anderes als URLs sind, und es fällt auch auf Systemen wo Pfade und URLs sich ähneln auf die Nase wenn die URL noch einen „query“ und/oder „fragment“ Anteil besitzt.
- Was ist der Unterschied zwischen Pfade und URLs (auf einem Linux-System)?
- `os.path.basename(file_url)` funktioniert zwar, aber soll nicht immer funktionieren.
Wie extrahiert man am besten (pythonischten) den Dateiname aus einer URL?
Atalanttore
Mit https://docs.python.org/3/library/urlli ... llib.parse - und URLs können Hosts, Ports, Usernamen, Passwörter, Protokolle, Parameter enthalten. Hast du sowas schon mal bei Dateien gesehen?
-
- User
- Beiträge: 407
- Registriert: Freitag 6. August 2010, 17:03
@__deets__: So wie ich es jetzt verstanden habe, wird ein Pfad zur URL, wenn er mehr als Verzeichnisnamen und Dateinamen enthält.
Gruß
Atalanttore
Gruß
Atalanttore
-
- User
- Beiträge: 407
- Registriert: Freitag 6. August 2010, 17:03
@__deets__: Mit "Verzeichnis hoch" meinst du den Befehl oder etwas anderes?
Nach weiterer Suche bin ich nun auf eine Funktion zur Extraktion des Dateinamens aus einer URL gestoßen.
Ich habe den Code der Funktion auf das Wesentliche gekürzt:
Ist diese Funktion ein besserer Ansatz als nur `os.path.basename()` dafür zu verwenden?
Gruß
Atalanttore
Nach weiterer Suche bin ich nun auf eine Funktion zur Extraktion des Dateinamens aus einer URL gestoßen.
Ich habe den Code der Funktion auf das Wesentliche gekürzt:
Code: Alles auswählen
def url2filename(url):
"""
Return basename corresponding to url.
Based on https://gist.github.com/zed/c2168b9c52b032b5fb7d
"""
url_path = urllib.parse.urlsplit(url).path
basename = posixpath.basename(urllib.parse.unquote(url_path))
if (os.path.basename(basename) != basename or urllib.parse.unquote(posixpath.basename(url_path)) != basename):
raise ValueError # reject '%2f' or 'dir%5Cbasename.ext' on Windows
return basename
Gruß
Atalanttore
Nein. Ich meine “../../..”. Das sind immer Pfade. In der Funktion ist mir zu oft unquote aufgerufen, mach das EINMAL am Anfang. Und da man weiß, das URLs den / zur Trennung der Komponenten nutzen, würde ich auch das als simples split Argument benutzten. Da posixpath zu nutzen nur weil das zufällig den gleichen Trenner hat ist nicht wirklich besser als das os.path Modul.
-
- User
- Beiträge: 407
- Registriert: Freitag 6. August 2010, 17:03
@__deets__: Ich habe den Code nach deinen Empfehlungen, sofern ich alles richtig verstanden habe, und noch ein wenig mehr umgebaut. Auf reguläre Ausdrücke habe ich verzichtet.
Code:
Gruß
Atalanttore
Code:
Code: Alles auswählen
import fnmatch
import urllib.parse
URL = "https://apod.nasa.gov/image/1906/gendlerM83-New-HST-ESO-LL.jpg"
def get_basename(url, file_extension):
"""Return basename corresponding to url."""
url_path = urllib.parse.urlsplit(url).path
unquoted_url= urllib.parse.unquote(url_path)
basename = unquoted_url.split("/")[-1]
if not fnmatch.fnmatch(basename, f"*.{file_extension}"):
raise ValueError
return basename
print(get_basename(URL, 'jpg'))
Gruß
Atalanttore