We all know there’s a lot of hype and skepticism around AI, and over the last year or so I’ve been hearing a lot about “Agentic” AI. I’ve struggled to get a real grasp on what that means without working examples; however, I’ve began to see hints of something. Videos mocking coders who are scrolling their phones while waiting for the AI to complete a task. Peers claiming Claude but not GPT can do complex reasoning and planning. Not much, but enough for me to stop ignoring the term as purely buzz word.
Agentic AI is defined as “an autonomous systems that act independently to achieve complex, multi-step goals without continuous human oversight.” This seems fanciful, but my basic understanding is that these Agentic systems are do the large scale reasoning then use other apps to achieve smaller sub-goals. Essentially these systems allow for pipelines to be set up as verbal lists of tasks then they work their way through the tasks with some perhaps limited problem solving. A crucial aspect of this seems to be that if you give the bot more tools it can do more and handle more failures. Sometimes more tools means a text book or document on your work to help it reason and plan. Sometimes more tools means writing a script for it to use in future analyses.
Now, while these sound mildly interesting, they’re essentially useless if they’re locked behind a pay wall. I’m not paying some company to think poorly for me. Someone else’s tools are not an extension of my skills or personal power since I’d be neither able nor willing to build on them. However, the notion of Local Agentic AI changes this. If it’s on my computer even if I don’t fully understand what it’s doing, I can build on it. I can control it and treat it as an extension of myself – as humans do with all tools.
I’m a modest coder, and even the basic AI has expanded my abilities there just by helping me find algorithms I wouldn’t have known how to find before. I have ran Local LLMs, but I’ve not tried these Agentic LLMs. I worry I was unimpressed too quickly, and gave up on a potentially useful tool. If I can tell the local agent to make a rough version of a function that does XXXX, then I can get more done. If I can tell it to write a simple script that makes this table that I’d normally just do by hand, check the script, then link that scipt to a command for the task I wouldn’t normally trust the AI with then the AI can do a larger chunk of my work. The more scripts I make, the more the AI can do. The more scripts I download from open source communities, the more the AI can do. I don’t have to trust the AI if all it’s doing is moving information around and triggering scripts. I just have to check the scripts. If we start adding in robotics… yeah, I can see the hype.
Of-course, the counter argument is that we’ve had IFTTT triggers and pipelines for decades. So maybe this isn’t fundamentally new, but is it still an impetus to download more tools and build more pipelines? Will I fall behind if I don’t figure out how to use this efficiently and effectively (FOMO)? Does anyone here have experience with Agentic LLMs (especially local)? Also, what’s the best Lemmy community for learning more about this sort of thing and maybe also hooking it up to basic robots?


My limited understanding of “agentic AI” is that it’s LLMs prompted by a script repeatedly in a loop. e.g. You give it an initial prompt, the LLM spits out a response indicating what it thinks it should do next (like run a program) and the script looks at the output of the LLM, does what is asked if it can, then feeds the results back in as the next prompt (or tells the LLM that it can’t/won’t do that if so programmed, I assume), until the LLM indicates to the script in one of its responses that the complex task is done – or indefinitely if the prompt is the sort that will never terminate.
You can limit what the overall system is capable of by careful design of the control script – just using LLMs to bridge the gaps for tasks that are hard/impossible to program traditionally. For controlling a robot, I’d guess that you’d want to have actions for things like taking a picture with a camera (or reading other sensors) that the LLM can analyze, repositioning arms, controlling motors, etc?
If I knew how to set up a robotic arm that could safely turn the pages of a traditional book, then combining it with a camera or flatbed scanner, a couple different vision models + traditional OCR (to sanity check each other), and wiring the whole thing up with a custom script to make really high-quality ebooks out of the books I own that no one has bothered to digitize would be neat… Maybe some day.
You might get more informed responses on db0 or another site; most of Lemmy is straight up AI-hostile. I have mixed feelings on it personally – I really don’t like the AI companies DDOSing the internet for profit while driving up the cost of electronics (etc), but writing off local AI tech entirely also seems like throwing out the baby with the bathwater to me…
Yeah, I was thinking about the code too. I think the looping output explanation makes a lot of sense and puts the “Agentic AI” into a healthier/more-realistic framework.
I’m a lot more inclined to write my own loops than trust someone else’s AI, but with that framework I’m not sure how useful these “AI agents” will be for most non-text based problems since that’d require converting back and forth between text based mediums and whatever medium the problem is in which seems very problematic. For code I could try giving prompts to catch typos, makes tests, and improve functions. Even this seems pretty limited since usually the AI can’t see the larger picture identify the problem and plan a solution on its own. Or maybe it can in some contexts, but not the stuff I’m working on – maybe my work isn’t routine enough, Idk. I have been using it find learn algorithms and get numpy notations, but it just doesn’t grasp what math needs to be done when I try to explain my problems.
I’ll have to think more on how to set up loops that are more generally useful and won’t require more work in making it sure it’s doing what I want it to do than work it gets done.
I don’t know enough about the specifics of what you’re doing to really give detailed suggestions, but I have been surprised by models like qwen3.5/3.6 giving reasonable results to questions about images – like answering questions based on comparing a few images. For example:
Those prompts actually worked for imagery from camera systems I have to deal with for my job – which, frankly, shocked me. The AM/PM one in particular read timestamps in the image (i.e. did OCR automatically) and converted from a 24 hour clock to answer the question. I’ve implemented that in scripts with older computer vision tools (like tesseract) that needed a lot of hand holding; the fact that current models can just do it is sort of mind blowing to me…
I think it’s very much a “how much data on this exist” sort of problem for most of these. Like I can pick out bacteria from fungi on an augur plate trivially, but I don’t know if there are databases of augur plates characterizing different growths with different background colors and all the diversity that real life has. Honestly, I haven’t tried this yet. It might be able to get it just fine or might be able to get it if I backlight the plate - of course, at that stage there are other programs for detecting colonies.
The dream, for me, is to get it to understand the protein structure files and DNA sequence files then hook it up to some lab robotics and automate experiments that are mostly trivial but slightly dynamic. Maybe start with something simple like cloning then build out to other methods. Some of this stuff exists already but companies charge you a fortune and go out of business (or get bought up and discontinued) constantly, so it kinda needs to be stuff I can build and maintain myself – or FOSS.
Even for purely computer stuff, anytime I try to get the AI to help with my proteins, it’s functionally useless because it doesn’t have a way to “see” the protein’s structure file. I can write my own scripts to help with that, but I’ll have to work on the connection between the language the AI thinks in and the actual things my code detects. Or maybe I can tell it to ask questions based on the writing then run code that analyzes the protein to give answers to specific questions… Even then, much of what I’d want help with looking at proteins is how to write analyses of points in 3D space, and while it has helped me pick the right algorithms (sometimes), I’m haven’t really been able to give it enough information to let it check that things are being implemented correctly (I think this is alignment). That might something like hooking it up to pymol (3D viewer) or it might just be a bit too dumb. It’s hard to say without trying it, and it’s a lot of work for something it’s likely to get get confused about even with the ability to “look” at the protein structure.
I feel like, for coding, one thing I’m going to have to get it to do is stop after it makes a function or something so I can check that it’s still going where it’s supposed to go or tell it what the next function needs to do. I don’t know. Maybe I’ll start with lots of hand holding then slowly build it up until it can reliably do more or I can’t get it to be reliable enough. Maybe there’s a coding community on lemmy that’s a decent place to talk shop on how to build these scripts up and what local models are good at what?
I tried throwing this image at qwen3.6 locally with the prompt “What is in this image?” and it spat out:
So you might be able to get further than you expect. You might try collecting some of your own sample data for things you understand (like particular bacteria you’re interested in) and providing those images along with a prompt that includes labels for your sample imagery. (Like “Image 1 is an example of (…whatever…). Image 2 is an example of (…something else…). Which (if any) of these are in Image 3?”)
I don’t know how well it would work, but worth a try.
I tried throwing this image at qwen3.6 as well, and it spits this out (but takes a REALLY long time to do it on my local hardware):
You might be able to it to do something useful with analysis of protein renderings, but I think you’ll need some creativity with the prompts to stop it from overthinking things… It was able to recognize this as myoglobin in the “thinking” stage quickly enough, but then kept second guessing itself about various aspects for a long time. I also tried throwing it at gemma4 but it didn’t really give me much besides recognizing it as some sort of protein structure.
You might find this project and associated HN post interesting – it’s basically someone doing that with electronics analysis instead of biology, IIUC. No idea how well it works, but interesting to look at and think about.
There’s !localllama@sh.itjust.works and probably some communities on lemmy.dbzer0.com as well – they’re not super-active though (at least from what’s made it to my attention browsing by all).
Yeah, I think this is conversation is entering the just try to build it territory, but definitely worth a try.
Thanks. I’ll have to get through my current project, but this will be a good resource when I do more on LLMs.