VC firms are pioneering a new investment strategy: acquiring established businesses and optimizing them with AI to boost efficiency and customer reach.
The idea of AI accounting is so fucking funny to me. The problem is right in the name. They account for stuff. Accountants account for where stuff came from and where stuff went.
Machine learning algorithms are black boxes that can’t show their work. They can absolutely do things like detect fraud and waste by detecting abnormalities in the data, but they absolutely can’t do things like prove an absence of fraud and waste.
LLMs often use bizarre “reasoning” to come up with their responses. And if asked to explain those responses, they then use equally bizarre “reasoning.” That’s because the explanation is just another post-hoc response.
Unless explainability is built in, it is impossible to validate an LLM.
For usage like that you’d wire an LLM into a tool use workflow with whatever accounting software you have. The LLM would make queries to the rigid, non-hallucinating accounting system.
I still don’t think it would be anywhere close to a good idea because you’d need a lot of safeguards and also fuck your accounting and you’ll have some unpleasant meetings with the local equivalent of the IRS.
The LLM would make queries to the rigid, non-hallucinating accounting system.
And then sometimes adds a halucination before returning an answer - particularly when it encournters anything it wasn’t trained on, like important moments when business leaders should be taking a closer look.
There’s not enough popcorn in the world for the shitshow that is coming.
You’re misunderstanding tool use, the LLM only queries something to be done then the actual system returns the result. You can also summarize the result or something but hallucinations in that workload are remarkably low (however without tuning they can drop important information from the response)
The place where it can hallucinate is generating steps for your natural language query, or the entry stage. That’s why you need to safeguard like your ass depends on it. (Which it does, if your boss is stupid enough)
I’m quite aware that it’s less likely to technically hallucinate in these cases. But focusing on that technicality doesn’t serve users well.
These (interesting and useful) use cases do not address the core issue that the query was written by the LLM, without expert oversight, which still leads to situations that are effectively halucinations.
Technically, it is returning a “correct” direct answer to a question that no rational actor would ever have asked.
But when a halucinated (correct looking but deeply flawed) query is sent to the system of record, it’s most honest to still call the results a halucination, as well. Even though they are technically real data, just astonishingly poorly chosen real data.
The meaningless, correct-looking and wrong result for the end user is still just going to be called a halucination, by common folks.
For common usage, it’s important not to promise end users that these scenarios are free of halucination.
You and I understand that technically, they’re not getting back a halucination, just an answer to a bad question.
But for the end user to understand how to use the tool safely, they still need to know that a meaningless correct looking and wrong answer is still possible (and today, still also likely).
The idea of AI accounting is so fucking funny to me. The problem is right in the name. They account for stuff. Accountants account for where stuff came from and where stuff went.
Machine learning algorithms are black boxes that can’t show their work. They can absolutely do things like detect fraud and waste by detecting abnormalities in the data, but they absolutely can’t do things like prove an absence of fraud and waste.
LLMs often use bizarre “reasoning” to come up with their responses. And if asked to explain those responses, they then use equally bizarre “reasoning.” That’s because the explanation is just another post-hoc response.
Unless explainability is built in, it is impossible to validate an LLM.
For usage like that you’d wire an LLM into a tool use workflow with whatever accounting software you have. The LLM would make queries to the rigid, non-hallucinating accounting system.
I still don’t think it would be anywhere close to a good idea because you’d need a lot of safeguards and also fuck your accounting and you’ll have some unpleasant meetings with the local equivalent of the IRS.
And then sometimes adds a halucination before returning an answer - particularly when it encournters anything it wasn’t trained on, like important moments when business leaders should be taking a closer look.
There’s not enough popcorn in the world for the shitshow that is coming.
You’re misunderstanding tool use, the LLM only queries something to be done then the actual system returns the result. You can also summarize the result or something but hallucinations in that workload are remarkably low (however without tuning they can drop important information from the response)
The place where it can hallucinate is generating steps for your natural language query, or the entry stage. That’s why you need to safeguard like your ass depends on it. (Which it does, if your boss is stupid enough)
I’m quite aware that it’s less likely to technically hallucinate in these cases. But focusing on that technicality doesn’t serve users well.
These (interesting and useful) use cases do not address the core issue that the query was written by the LLM, without expert oversight, which still leads to situations that are effectively halucinations.
Technically, it is returning a “correct” direct answer to a question that no rational actor would ever have asked.
But when a halucinated (correct looking but deeply flawed) query is sent to the system of record, it’s most honest to still call the results a halucination, as well. Even though they are technically real data, just astonishingly poorly chosen real data.
The meaningless, correct-looking and wrong result for the end user is still just going to be called a halucination, by common folks.
For common usage, it’s important not to promise end users that these scenarios are free of halucination.
You and I understand that technically, they’re not getting back a halucination, just an answer to a bad question.
But for the end user to understand how to use the tool safely, they still need to know that a meaningless correct looking and wrong answer is still possible (and today, still also likely).
ERP systems already do that, just not using AI.
But ERP is not a cool buzzword, hence it can fuck off we’re in 2025