Hi everyone, just thought I can write a post here to bounce ideas off.

I have multiple clauses, and I want to extract out the object of interest within that clause. For example, this is a clause: " Exit staircases shall be constructed of non-combustible materials to comply with the provisions of Cl.3.10.1.". Obviously, the object of interest here is ‘exit staircases’ or ‘staircases’, so I want that to be extracted. Here is another clause: “No structure or building shall be constructed within a sewer.”. Now, there are multiple object in that clause (i.e structure, building, and sewer), but it is also obvious that the object of interest is referring to ‘sewer’.

I ran through this in GPT-3.5, and it works. The GPT is able to return me the object of interest pretty accurately. However, is it possible to mass generate the response from GPT based on my huge list of clauses, instead of inputting the prompt very often? How do I do that? For example, I have a list of clauses, how can I make use of LLM such that I can get back the object of interest of each particular clause without prompting it manually?

Also, is this the correct/ideal way to extract out the object of interest for a huge list of clauses? My final goal is to cluster those similar object of interest, and see what clauses those object of interest are linked to (via some kind of RAG approach). So I create some kind of knowledge graph from that. Do you think my method is the right approach?

Thanks!

  • fvillena@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    We are developing a library for this exact use case. https://github.com/plncmm/llmner

    Your task is called Named Entity Recognition and llmNER is a library that uses the llm of your choice to extract entities from texts given a natural language description.

    You can give the model a list of strings where to extract the entities and the modem will return a list of annotated strings for you to extract the information you need.

    Be aware that our library is under active development right now.