open - Obtener una lista de valores de atributos XML en Python

xml etree elementtree example (7)

Debo admitir que soy fanático de xmltramp debido a su facilidad de uso.

Accediendo a lo anterior se convierte en:

import xmltramp values = xmltramp.parse(''''''...'''''') def getValues( values, category ): cat = [ parent for parent in values[''parent'':] if parent(name) == category ] cat_values = [ child(value) for child in parent[''child'':] for parent in cat ] return cat_values getValues( values, "CategoryA" ) getValues( values, "CategoryB" )

Necesito obtener una lista de valores de atributos de elementos secundarios en Python.

Es más fácil de explicar con un ejemplo.

Dado un XML como este:

<elements> <parent name="CategoryA"> <child value="a1"/> <child value="a2"/> <child value="a3"/> </parent> <parent name="CategoryB"> <child value="b1"/> <child value="b2"/> <child value="b3"/> </parent> </elements>

Quiero ser capaz de hacer algo como:

>>> getValues("CategoryA") [''a1'', ''a2'', ''a3''] >>> getValues("CategoryB") [''b1'', ''b2'', ''b3'']

Parece un trabajo para XPath pero estoy abierto a todas las recomendaciones. También me gustaría escuchar acerca de sus bibliotecas favoritas de Python XML.

En Python 3.x, recuperar una lista de atributos es una tarea simple de usar los items() miembros items()

Al usar ElementTree , debajo del fragmento se muestra una forma de obtener la lista de atributos. TEN EN CUENTA que este ejemplo no considera los espacios de nombres, que si están presentes, deberán tenerse en cuenta.

import xml.etree.ElementTree as ET flName = ''test.xml'' tree = ET.parse(flName) root = tree.getroot() for element in root.findall(''<child-node-of-root>''): attrList = element.items() print(len(attrList), " : [", attrList, "]" )

REFERENCIA:

Element.items ()
Devuelve los atributos del elemento como una secuencia de pares (nombre, valor).
Los atributos se devuelven en un orden arbitrario.

Manual de Python

Mi biblioteca preferida de python xml es lxml , que envuelve libxml2.
Xpath parece el camino a seguir aquí, así que escribiría esto como algo como:

from lxml import etree def getValues(xml, category): return [x.attrib[''value''] for x in xml.findall(''/parent[@name="%s"]/*'' % category)] xml = etree.parse(open(''filename.xml'')) >>> print getValues(xml, ''CategoryA'') [''a1'', ''a2'', ''a3''] >>> print getValues(xml, ''CategoryB'') [''b1'', ''b2'', ''b3]

No soy muy viejo en Python, pero aquí hay una solución XPath que usa libxml2.

import libxml2 DOC = """<elements> <parent name="CategoryA"> <child value="a1"/> <child value="a2"/> <child value="a3"/> </parent> <parent name="CategoryB"> <child value="b1"/> <child value="b2"/> <child value="b3"/> </parent> </elements>""" doc = libxml2.parseDoc(DOC) def getValues(cat): return [attr.content for attr in doc.xpathEval("/elements/parent[@name=''%s'']/child/@value" % (cat))] print getValues("CategoryA")

Con resultado ...

[''a1'', ''a2'', ''a3'']

Puedes hacer esto con BeautifulSoup

>>> from BeautifulSoup import BeautifulStoneSoup >>> soup = BeautifulStoneSoup(xml) >>> def getValues(name): . . . return [child[''value''] for child in soup.find(''parent'', attrs={''name'': name}).findAll(''child'')]

Si estás trabajando con HTML / XML, te recomendaría que eches un vistazo a BeautifulSoup. Es similar al árbol DOM pero contiene más funcionalidad.

Usando un W3 DOM estándar como el minidom de stdlib o pxdom:

def getValues(category): for parent in document.getElementsByTagName(''parent''): if parent.getAttribute(''name'')==category: return [ el.getAttribute(''value'') for el in parent.getElementsByTagName(''child'') ] raise ValueError(''parent not found'')

ElementTree 1.3 (lamentablemente no 1.2, que es el que se incluye con Python) es compatible con XPath de esta manera:

import elementtree.ElementTree as xml def getValues(tree, category): parent = tree.find(".//parent[@name=''%s'']" % category) return [child.get(''value'') for child in parent]

Entonces puedes hacer

>>> tree = xml.parse(''data.xml'') >>> getValues(tree, ''CategoryA'') [''a1'', ''a2'', ''a3''] >>> getValues(tree, ''CategoryB'') [''b1'', ''b2'', ''b3'']

lxml.etree (que también proporciona la interfaz ElementTree) también funcionará de la misma manera.