Are there any data cleaning focused LLMs? [also, rant]

AnomalyNexus@alien.top · 1 year ago

Are there any data cleaning focused LLMs? [also, rant]

LocoMod@alien.top · 1 year ago

Ideally we would be better in a timeline where LLMs could do this better than classical methods but we’re not there yet. You can code a handler that cleans up html retrieval quite trivial since you’re just looking for the text in specific tags like articles, headers, paragraphs, etc. There are a ton of frameworks and examples out there on how to do this and a proper handler would execute the cleanup in a fraction of the time even the most powerful LLM ever hoped to.