DEV Community

James Kachamila
James Kachamila

Posted on

The Potential Future of Large Action Models (LAMs) Looks Insane: A Quick Glimpse Through Tony Stark and Nelima

Image description

Believe it or not, my first observation of a Large Action Model (LAM) in action was back in 2008. I can already hear you asking, “How is that possible, though?”. GPUs and TPUs with the necessary computational throughput and parallel processing capabilities were non-existent back then. Data ingestion pipelines and distributed processing frameworks were in their infancy. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) were not fully optimized for large-scale application. Optimization techniques like stochastic gradient descent (SGD) with a per-parameter learning rate was just not there in 2008. Heck, you couldn’t even order food right to your doorstep back then. So, how? Well, the fascinating aspect of the human brain is its inclination to envision the future, and to explore that idea further, we’ll need to take a brief detour to Hollywood. More specifically, delve into the world of our technologically-enhanced, cybernetically-driven, exoskeleton-clad Marvel billionaire genius — Iron Man.

Image description

There is a scene that has been stuck in my head for the longest time. I remember watching this part of the movie on my couch on Blu-ray. It is actually the first scene where we meet J.A.R.V.I.S. (Just A Rather Very Intelligent System) in the movie. Tony Stark asks J.A.R.V.I.S. if he’s online and it responds amusedly with “For you, sir, always.” Tony Stark then asks J.A.R.V.I.S. to create a new project file for the Mark II and J.A.R.V.I.S. asks Tony Stark whether he would like to store the file on the Stark Industries Central Database. Back in 2008, this blew my mind, and I had tons of questions. How does J.A.R.V.I.S. understand language? What kind of algorithms does J.A.R.V.I.S. use to make decisions? Is J.A.R.V.I.S.’s personality programmed, or does it evolve? How does J.A.R.V.I.S. manage to control and integrate with various systems? Can I one day have a J.A.R.V.I.S. as well? What does Tony Stark’s data usage billing statement show? (Okay, maybe not that last one)

Well, after one of the greatest economic runs, the introduction of the transformer architecture, and a zero-interest rate environment — here we are, 16 years later, and I believe that very few people would have imagined that the Iron Man scene could be explained and be closer to reality than ever before. Think about it — J.A.R.V.I.S. is a Large Language Model (LLM) that is also a Large Action Model (LAM)!

Tony Stark’s AI demonstrates the ability to understand and respond to natural language queries. J.A.R.V.I.S.’s ability to comprehend Tony Stark’s request and respond appropriately showcases this aspect of language understanding and generation. As for the LAM part, J.A.R.V.I.S. can perform tasks such as creating project files, managing databases, and interfacing with various systems and technologies like its robot arm. Basically, Tony Stark built an LLM that’s able to execute task through natural language commands and he created an ecosystem of IoT devices that can receive requests and perform real-world and digital actions. A representation of what we might expect from a combination of LLMs and action-oriented AI systems. There is more to say on whether Tony achieved AGI but I guess that it a question for when we finally have an arc reactor.

If you haven’t done so yet, I highly suggest checking out Figure AI’s video demo of their robot using OpenAI’s model to perform mechanized task through speech-to-speech reasoning.

Image description

Let’s take a step back and quickly explain what an LAM actually is. Large Action Models (LAMs) are models trained to get what humans want and guess what moves to make next. Unlike LLMs, which figure out what to say next, LAMs are all about figuring out what to do next. They use LLM’s reasoning capacity and have the ability to execute tasks by themselves — we call these software units “agents”. So instead of just answering natural language queries, they can achieve a specific goal. They break down big tasks into smaller steps you can actually do, like planning a trip or sorting out taxes. There are few notable companies that are trying to unlock its full potential such as the Rabbit r1, SuperAGI and even OpenAI.

So, what’s the catch? Why haven’t we seen LAMs all over the place by know? As it usually goes, there are some challenges when it comes to designing and implementing these new systems.

· Generalization: There are countless of “things” we can have these LAMs do across the internet and our own devices. Seriously, think about all the tasks and individual clicking you did today on your computer and LAMs can theoretically do them for you with the right integration (back on this later). And that’s wherein the challenges lie — designing LAMs that can generalize across different tasks and environments. Often, a specialized formal specification is needed for each specific problem, and solutions for one may not translate well to others which makes it hard to develop these “actions”.

· Integration: LAMs interact with the real world through integration with external systems. This requires complex programming and the ability to adapt to a wide variety of interfaces and protocols. If you want the LAM to do something as simple as sending an email, you will need to have a way for the system and Google to communicate between each other and manage authentication for example. Now multiply this by the number of external systems out there and you’ve got your work cut out for you.

· Decision-Making: In an era where ‘accelerate’ is a term embraced by those who view technology as an unstoppable force destined for exponential growth, there are inherent risks associated with relying solely on AI. LAMs are expected to make decisions autonomously. Ensuring these decisions are accurate, ethical, and align with human intentions is a significant challenge, given that the foundation of all LLMs — language — is imprecise, and there are bad actors at every turn.

Image description

A retrospective is always good. It gives us context and what happened in the past and what is being done in the present. However, what does the future look like? My friend and I decided to take on a challenge in this ever-changing AI landscape. What if there was a way to solve all the issues above? What if an LAM could be specialized enough for those that need it and general enough to reach the masses? What if individual actions result in a better collective outcome? The solution to this challenge is named Nelima — it’s a free community-driven LAM platform designed for taking actions on a user’s behalf using natural language prompts and execute any actions (theoretically). Stay with me here because I did write “insane” in the title.

Image description

So what can you do with it? Well, you can schedule appointments, send emails, check the weather — it can even connect to IoT devices and let you command it, ask it to publish a website or even call an Uber for you! You can integrate your own custom actions to suit your specific needs and you can layer multiple actions to perform more complex tasks. The awesome thing is that each time someone builds a function to do a certain task, Nelima has gained that ability and now everyone can use it as well. Congratulations, you also got the explanation for the community-driven aspect of the LAM. The agent also gets to see the outcomes of each micro-action undertaken to achieve the overarching goal. The potential for reasoning is very much possible, and doesn’t shy off of taking measures (let’s say, it sees your grocery list from a sub-action on fulfilling an action and from its own knowledge it realizes that a certain item has allergens which might be harmful for the user and puts a warning label, even though the user didn’t ask for this) to influence the direction of execution.

None of us are Tony Stark but, collectively we might just be without even knowing. Throughout its evolution, AI has always been about the synthetization of disparate elements, harmonizing them to construct innovative solutions and advancements. Our hypothesis is that this might be the best way to achieve a truly competent LAM that can do anything and we’re going to keep testing it because buckle up, the future is going to look very different.

— — — — — — —

On a side note, if you liked this article and are curious about Nelima, we’re building up this community and are welcoming developers and enthusiasts alike to come join and build things. Whether it be for your own personal benefit or as a contribution to the community, we’re looking for excited people to be part of this project — you can find us on discord! If you’d like to try out the LAM yourself, feel free to head over to sellagen.com/nelima to try it out and give feedback!

Top comments (0)