This is a fun end-of-year piece inspired by "A Christmas Carol" where I explore where AI has been, where it is now, and what I believe is up ahead.
This piece is part of Festive Tech Calendar 2024, hence the callbacks to A Christmas Carol.
The Ghost of AI Past
I'm around the point where I'm considered older for a developer, and yet artificial intelligence predates me by decades, going back to the 1950s.
Over time, we've seen a number of cycles where people assume that the world's logic can be codified into computer programs to produce automated decision-making that rivals the intelligence of a trained human expert. This has come about through things like decision trees, expert systems, and even neural networks.
On its surface, the reasoning is clear: if you can boil down the processes of a human expert into a series of key decisions, computers should be able to flawlessly execute those decisions. Or, at a more biological level, if you can take the structure of a brain and emulate it with a neural network, it should be able to emulate the learning process exhibited by organic life.
While these systems saw some success, they also saw severe limitations. It turns out that actual decision-making is less cut and dry in many cases and relies on the trained experience and intuition of an expert - particularly as environments and conditions change over time. Approaches like fuzzy logic, machine learning, and reinforcement learning looked to address this to some degree, but effective successes with them were limited and such systems were hard to understand and troubleshoot.
On the biological front, neural networks did deliver on their promises to some degree, but they required a large volume of computing and data sources in a clean and ready to train state, which was challenging for decades. The advent of cloud computing and cheap readily available storage and processing resources helped overcome some of these barriers, particularly in fields like computer vision and speech.
Other fields like natural language processing sought to understand the intent of a body of text by breaking down the sentence structure and mapping it to supported actions. These systems struggled with the complexities of human language as well as the differences between how language is formally structured and how it actually occurs in speech and writing - particularly over the internet. However, some systems did see limited successes, such as the interactive fiction "Zork-like" games of the 70's and 80's, and AI chatbots like Alexa and Cortana (as well as their far simpler early predecessor, Eliza).
Over the years we've seen a pattern emerge of new AI breakthroughs reaching public attention, reaching a technical plateau that is not easy to climb out of, and then leading to disappointment in times referred to as "AI winters" when businesses focus on other areas.
The net effect of all of this is that AI has been a field that has historically been partitioned into academia, game development, and specialized industries such as robotics or analytics. I personally felt this pain as I grew up very interested in AI in the 90s, but the only AI jobs I could consider in the mid-2000s as I graduated undergrad were either game development or re-entering academia to pursue academic uses of AI.
The Ghost of AI Present
Over the past decade and a half, cloud computing and storage capabilities have helped AI significantly as organizations can now easily add new computing power on demand when performing experiments. Additionally, the cheap storage costs in the cloud and myriad of ways that data can be collected has led to a slew of data innovations including data lakes, data warehouses, data lakehouses, and more. For the first time, average organizations had enough data to analyze and extract insights from and they had the computing power and commercial tools available to do so.
This mixed together with academic innovations in how neural networks are structured, trained, adjusted, and transferred to other contexts led to new possibilities in image, speech, and video processing, recognition, and generation. These innovations led to real deliverables to end users including being able to search by images, autonomous vacuums and lawn mowers, real-time transcription and translation software, and even the fabled self-driving car.
Of course, the massive change with modern AI systems recently has been the advent of generalized pre-trained transformers like BERT, GPT 3.5 Turbo (ChatGPT), and newer systems like 4o-mini and Claude. These transformer-based large language models are able to generate new content based on observed structures in their training data and an input consisting of text, speech, images, and/or video.
This has led to our current AI "boom" where we are seeing rapid iteration and innovation as organizations seek to find the next great model or way to apply them and organizations fight for AI supremacy - or at least to not be starved of customers and investment money.
As I write this, we are winding down 2024 and fatigue from these new innovations has broadly set in over the last few years. The tech community is splintered in what people care about. A group of people are intently following the latest model releases and new features introduced and get deep into comparing the different options. Others take a more holistic approach and focus more on the larger picture, looking at how people are integrating AI systems into their offerings and focusing more on the big picture of AI and how it can be secured, monitored, and controlled within organizations. Others are looking at AI from a utilitarian perspective and trying to see what ways it can streamline their current workloads or make new capabilities possible to employees and customers. Finally, many are simply starting to tune out these AI innovations as over-hyped and distracting from other developments.
You may find your self in multiple of these buckets to some degree or another, or you might see yourself in none of them. But all of these areas are worth discussing in more detail.
2024 saw the rise of multi-modal models that could take in text and images or multiple modalities of data and respond to them in some degree or another. We also saw a greater variety of models through the arrival of the mini models like gpt-4o-mini that were designed to be small, fast, and cheap. We also saw significant progress from non-open AI models including Phi, Meta, Mistral, HuggingFace, Stability AI, and others. This added dimensionality makes model selection a non-trivial activity and a real challenge organizations must face as they look to implement their own solutions around these models - particularly with new models emerging and old models being updated or retired on a rolling basis.
On the individual level, we're seeing AI adapted more by individuals looking to massage emails, document drafts, messages, and social posts. We're also seeing AI becoming more and more a part of the process for technical tasks like software engineering. As someone who is still active and writing code most days of the week I don't always find myself using AI while coding, but it can be incredibly helpful when I move into new frontiers where I've not worked much with specific libraries or languages before, when I'm troubleshooting specific configuration issues, or when I'm simply forgetting the exact technical steps for a task. Sure I could search for many of these things in a search engine, but having AI integrated into an IDE helps keep me engaged in my editor instead of seeing myself switching over to my email tab while I'm in my browser searching for documentation.
Of course, security is a huge concern with any AI system and organizations are now needing to think about if code completions or user queries are going to be used by another organization to train or fine-tune models. Additionally, code completion may generate code that appears in another codebase that has a license the organization is not comfortable with. To fix this, code generation tools are also using code scanning to make sure their output doesn't match to code in these other repositories. Finally, enterprise-level AI security is becoming more standardized through settings at the organization level in services like GitHub.
AI systems aren't just wrappers around large language models anymore, either. While LLMs were once limited to the data they were trained on, techniques like retrieval augmentation generation (RAG) and AI orchestration are becoming more standard as ways of augmenting AI Systems.
With RAG, an application uses an LLM and queries for additional data related to the user's input query (this relationship is often determined by using text embeddings and vector search). If the search reveals information likely related to what the user typed in, this information can be added to the system prompt that guides the agent in responding to the user. This process augments the prompt with additional context that was retrieved from elsewhere, allowing the system to generate more relevant output for the user.
AI Orchestration is like RAG, but with additional scope and scale. While RAG systems typically search one external data source, AI Orchestration systems might look at multiple external data sources including web search, vector database search, running targeted database queries, and more. AI Orchestration systems typically offer these integrations as options for responding to a request and are not necessarily invoked for every incoming message. Additionally, AI Orchestration systems can be connected to actions to take if the request merits it. For example, if a support system is having difficulty assisting a customer, it could invoke an action to generate a support ticket on the user's behalf. For more on this topic, see my recent article on building a digital dungeon master with Semantic Kernel.
AI Orchestration systems are continuing to mature and evolve, with new innovations around planners, multi-agent interactions and coordination, and persistent memory currently in the works. Additionally, we're looking at how we can best test AI systems as these systems tend to be non-deterministic (they produce different results each time they run) and chaotic, resulting in different testing strategies than those employed in traditional software testing projects. However, we are making progress in this area through technologies like PromptFlow.
Now that we've covered where we are, let's look at what might be ahead in AI in the years ahead.
The Ghost of AI Future
I'm no futurist, nor am I an AI researcher, though I have intensely studied AI in an academic and professional capacity, but I do have some predictions on the future directions of AI.
First, I see public attention desiring to move away from artificial intelligence unless significant new innovations are made. There's a general fatigue from keeping track of the rapid pace of innovation at the moment, and with that fatigue comes a desire for a new topic of interest to enter the tech discourse.
However, I'm not sure that desire is going to be satisfied. When we look at AI innovation at the moment we're seeing innovations in video generation and analysis, general performance improvements, and a shift towards giving organizations the opportunity to craft new add-ins and extensions into AI systems like Copilot and OpenAI models. These developments are further expanding the potential impact of these technologies while giving organizations new ways of connecting to these services.
Additionally, and much more likely to make a profound impact, the developments with transformer-based models have given reinforcement learning additional attention and resources - though it hasn't been without its merits in the past decade, including accomplishments with game playing, most notably with the AlphaGo program defeating the world champion at Go in 2017.
Reinforcement learning allows systems to experiment and interpret input without guidance on what optimal solutions should look like. This makes reinforcement learning ideal for building adaptive and optimized solutions to problems from game playing to robotic balance and motion.
While reinforcement learning can work well with sensors and actuators, it also has been shown to work well with pixels on screens, including playing games like Super Mario Brothers just based on the pixels on the screen.
When you combine the disciplines of reinforcement learning, computer vision, large language models, and AI orchestration you start to see some interesting possibilities including AI systems that can interact with a computer at a graphical user interface level, interpreting icons and screens to perform common prescribed actions and workflows. This technology is in its infancy, but the direction the research is going could see AI systems driving mouse and keyboard actions to work with computers at a more generalized level.
Of course, systems that are built for specific tasks and interact with those systems directly are going to be much more successful at those tasks, but it's interesting to see a direction that workflow-based AI systems might take.
This begs the question that many of my students had when Chat GPT was unveiled: are our jobs safe? To which, you can now answer some questions:
- Can your job be succinctly described in an email?
- Does your job rarely require expert knowledge or advanced intuition?
- Does your job involve a large degree of work that is similar in nature?
- Is there ample documentation on the decisions that need to be made or the business domain involved?
For most of us, the answers to these questions are no, but some aspects of our jobs might fit into these buckets, and those aspects will likely change.
Regardless of whether public interest in AI continues to hold, there will be people innovating in these areas for years and we will see other breakthroughs and innovations in the future.
What those AI systems look like is anyone's guess, but my personal guess is that they'll look less like Commander Data or HAL and more like a very gifted search engine or automated wizard. In order to succeed with them, you'll need to understand what you want and be able to describe it, but the more AI systems advance, the more our words will become magical in terms of their effects and outputs.
Top comments (0)