Problem beim Parsen von XML-Dokument mit Namespaces

noisefloor · Montag 23. April 2012, 19:49

Hallo,

ich würde gerne ein XML-Dokument parsen und dabei bestimme Tags in eine Liste übernehmen.

Nur ist es so, dass ich mit den Namespaces überhaupt nicht klar kommen, d.h. keinen blassen Dunst habe, wie ich an den Tag-Text kommen, wenn der Tag einen Namespace hat. Die offizielle Doku (und meine Bücher - nicht von Gallileo

) helfen mir hier leider nicht weiter...

Das XML-Dokument sieht so aus:

Code: Alles auswählen

<?xml version="1.0" encoding="utf-8"?>
<gpx xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0" creator="Groundspeak Pocket Query" xsi:schemaLocation="http://www.topografix.com/GPX/1/0 http://www.topografix.com/GPX/1/0/gpx.xsd http://www.groundspeak.com/cache/1/0 http://www.groundspeak.com/cache/1/0/cache.xsd" xmlns="http://www.topografix.com/GPX/1/0">
  <name>My Finds Pocket Query</name>
  <desc>Geocache file generated by Groundspeak</desc>
  <author>Groundspeak</author>
  <email>contact@groundspeak.com</email>
  <time>2012-04-19T10:41:42.7061834Z</time>
  <keywords>cache, geocache, groundspeak</keywords>
  <bounds minlat="12.9431" minlon="-95.36355" maxlat="54.862883" maxlon="77.569283" />
  <wpt lat="49.72775" lon="6.4933">
    <time>2009-03-02T08:00:00Z</time>
    <name>GC1N8XN</name>
    <desc>up by klot, Traditional Cache (1/1)</desc>
    <url>http://www.geocaching.com/seek/cache_details.aspx?guid=266ea959-126f-4e9c-8e7d-93084ef20f97</url>
    <urlname>up</urlname>
    <sym>Geocache Found</sym>
    <type>Geocache|Traditional Cache</type>
    <groundspeak:cache id="1146589" available="True" archived="False" xmlns:groundspeak="http://www.groundspeak.com/cache/1/0">
      <groundspeak:name>up</groundspeak:name>
      <groundspeak:placed_by>klot</groundspeak:placed_by>
      <groundspeak:owner id="1593650">klot</groundspeak:owner>
      <groundspeak:type>Traditional Cache</groundspeak:type>
      <groundspeak:container>Micro</groundspeak:container>
      <groundspeak:difficulty>1</groundspeak:difficulty>
      <groundspeak:terrain>1</groundspeak:terrain>
      <groundspeak:country>Luxembourg</groundspeak:country>
      <groundspeak:state>
      </groundspeak:state>
      <groundspeak:short_description html="False">take a rest and enjoy the scenic view</groundspeak:short_description>
    </groundspeak:cache>
  </wpt>
  <wpt lat="49.639267" lon="5.95545">
    <time>2008-07-10T07:00:00Z</time>
    <name>GC1E268</name>
    <desc>Have a break - A6 exit Steinfort by speedwalkers, Traditional Cache (1/1.5)</desc>
    <url>http://www.geocaching.com/seek/cache_details.aspx?guid=62feba0b-72c6-4d41-9ba8-58529270c63e</url>
    <urlname>Have a break - A6 exit Steinfort</urlname>
    <sym>Geocache Found</sym>
    <type>Geocache|Traditional Cache</type>
    <groundspeak:cache id="931591" available="True" archived="False" xmlns:groundspeak="http://www.groundspeak.com/cache/1/0">
      <groundspeak:name>Have a break - A6 exit Steinfort</groundspeak:name>
      <groundspeak:placed_by>speedwalkers</groundspeak:placed_by>
      <groundspeak:owner id="1307302">speedwalkers</groundspeak:owner>
      <groundspeak:type>Traditional Cache</groundspeak:type>
      <groundspeak:container>Small</groundspeak:container>
      <groundspeak:difficulty>1</groundspeak:difficulty>
      <groundspeak:terrain>1.5</groundspeak:terrain>
      <groundspeak:country>Luxembourg</groundspeak:country>
      <groundspeak:short_description html="True">
      </groundspeak:short_description>
    </groundspeak:cache>
  </wpt>
  <wpt lat="37.989583" lon="23.732033">
    <time>2008-03-06T08:00:00Z</time>
    <name>GC19Z1Y</name>
    <desc>National Archaeol. Museum by Mark-X, Traditional Cache (1.5/1)</desc>
    <url>http://www.geocaching.com/seek/cache_details.aspx?guid=6b2009dc-7d37-439c-b61e-20cb1f95f793</url>
    <urlname>National Archaeol. Museum</urlname>
    <sym>Geocache Found</sym>
    <type>Geocache|Traditional Cache</type>
    <groundspeak:cache id="809410" available="True" archived="False" xmlns:groundspeak="http://www.groundspeak.com/cache/1/0">
      <groundspeak:name>National Archaeol. Museum</groundspeak:name>
      <groundspeak:placed_by>Mark-X</groundspeak:placed_by>
      <groundspeak:owner id="879900">Mark-X</groundspeak:owner>
      <groundspeak:type>Traditional Cache</groundspeak:type>
      <groundspeak:container>Other</groundspeak:container>
      <groundspeak:difficulty>1.5</groundspeak:difficulty>
      <groundspeak:terrain>1</groundspeak:terrain>
      <groundspeak:country>Greece</groundspeak:country>
      <groundspeak:short_description html="False">A magnetic "nano" cache hidden somewhere in the yard of the largest and most important museum in Greece.</groundspeak:short_description>
  </wpt>
</gpx>

Ein Ziel wär z.B. eine Liste der Länder (Tag: <groundspeak:country>), wo hier folglich

Code: Alles auswählen

['Luxembourg','Luxembourg','Greece']

rauskommen müsste.

Wenn's geht sollte ein in Python enthaltenes Modul genutzt werden - muss aber nicht unbedingt.

Wer kann mir denn mal eine Tipp / Hinweis geben?

Gruß, noisefloor

lunar · Montag 23. April 2012, 20:04

@noisefloor: In der Standardbibliothek gibt es das ElementTree-Modul, und über Umwege gelangt man von dessen Dokumentation zu einem Artikel, der den Umgang mit Namensräumen beschreibt.

noisefloor · Montag 23. April 2012, 20:54

Hallo,

ah, eigentlich ist es ganz einfach:

Code: Alles auswählen

>>> from xml.etree import ElementTree as ET
>>> ns = 'http://www.groundspeak.com/cache/1/0'
>>> tree = ET.parse('test.xml')
>>> for elem in tree.iter():
...         if elem.tag == '{%s}country' %ns:
...             print elem.text
... 
Luxembourg
Luxembourg
Greece

Danke.

Gruß, noisefloor

EyDu · Montag 23. April 2012, 21:06

Du solltest dich noch über XPath informieren, das ist genau für solche Aufgaben gedacht.

lunar · Montag 23. April 2012, 21:34

@noisefloor: Du kannst auch direkt in ".iter()" nach dem Tag filtern:

Code: Alles auswählen

countries = [elem.text for elem in tree.iter('{%s}country' % ns)]

@EyDu: Also kürzer oder eleganter als diese LC gehts in diesem Fall auch mit XPath nicht

noisefloor · Dienstag 24. April 2012, 09:19

Hallo,

LC sieht gut aus.

Ansonsten sehe ich das auch so: Da es hier (mir) um das reine Auflisten von Text innerhalb bestimmter Tags geht ist die Lösung mit der LC schon sehr schön & kompakt.

Gruß, noisefloor