Parsing XML, am I over complicating things?

sabreW4K3@lemmy.tf · 2 years ago

Parsing XML, am I over complicating things?

TootSweet@lemmy.world · edit-2 2 years ago

Nah, I’d say ElementTree is the way to go.

JSON is so nice and easy to work with, it’s spoiled us. XML came before JSON. XML is really a terrible, overengineered, lovecraftian pit of madness on which JSON is a massive improvement for many applications. (YAML’s also pretty yucky, though still an improvement on XML for other applications than JSON is appropriate for. Not terribly difficult to parse. More just littered with gotchas.)

But, if you want to parse XML in Python, ElementTree is the best way to do it.

sabreW4K3@lemmy.tf · 2 years ago

I think I can get a JSON response, would I then be able to do json[element] or would I still need to parse left, right and centre through complication valley?

TootSweet@lemmy.world · 2 years ago

JSON would be a lot easier. Once you’ve parsed it, you’ve got a little structure of dicts, lists, and primitives. So you’d be able to directly index things like you’re hoping.

Just to give an example:

>>> import json
>>> parsed = json.loads('{"foo":"a", "bar":"b", " baz":"c"}')
>>> parsed["foo"]
'a'

So, in short, yes, JSON would do what you’re hoping.

sabreW4K3@lemmy.tf · 2 years ago

Thank you very much!

TootSweet@lemmy.world · 2 years ago

Glad to help!

Oscar@programming.dev · 2 years ago

Another package to check out is lxml. I personally don’t like it due to its typing but sometimes I have been forced to use it for its added features over the builtin etree.

rglullis · edit-2 2 years ago

Perhaps knowing just a bit of xpath would solve your problem?

sabreW4K3@lemmy.tf · 2 years ago

Thank you for your thoughtful suggestion. I ended up getting it done with the JSON parser. Everything should be as easy as JSON

Martín@lemmy.world · 2 years ago

Actually XPATH is arguably more flexible than JSON. There’s also jsonpath, but I don’t think I’ve seen it meaningfully used

sabreW4K3@lemmy.tf · 2 years ago

Do you mind explaining why?

Martín@lemmy.world · edit-2 2 years ago

In both XML and JSON you have lists and embedding hierarchichies (I use this term to abstract away from dictionaries/maps which are not exactly represented in XML). These allow for browsing/iterating and filtering when after a particular node.

One difference is that nodes in XML are named (tags). Another thing that you have in XML and not in JSON is attributes. A good example of their use is querying by tag name, node id or class attributes in HTML (which is a loose example of XML). To do the equivalent in JSON, you need to work with keys and values which are less structured and (arguably as consequence) often missing such meta-data. HTML is a popular example, but pretty much any XML has ids and other meta tags and attributes. JSON standards typically don’t and it’s a long separate topic whether this is due to the characteristics of the format itself.

PS: another big difference is that XML also allows for comments, which allows to also encode intent, not only content.

sabreW4K3@lemmy.tf · edit-2 2 years ago

It seems that XML is better suited for more complex data?

Sorry I took so long to reply, I couldn’t wrap my head around it.