AI Essentials for Tech Executives

On April 24, O’Reilly Media will be hosting Coding with AI: The End of Software Development as We Know It—a live virtual tech conference spotlighting how AI is already supercharging developers, boosting productivity, and providing real value to their organizations. If you’re in the trenches building tomorrow’s development practices today and interested in speaking at the event, we’d love to hear from you by March 5. You can find more information and our call for presentations here.

99% of Executives Are Misled by AI Advice

As an executive, you’re bombarded with articles and advice on
building AI products.

The problem is, a lot of this “advice” comes from other executives
who rarely interact with the practitioners actually working with AI.
This disconnect leads to misunderstandings, misconceptions, and
wasted resources.

A Case Study in Misleading AI Advice

An example of this disconnect in action comes from an interview with Jake Heller, CEO of Casetext.

During the interview, Jake made a statement about AI testing that was widely shared:

One of the things we learned is that after it passes 100 tests, the odds that it will pass a random distribution of 100k user inputs with 100% accuracy is very high. (emphasis added)

This claim was then amplified by influential figures like Jared Friedman and Garry Tan of Y Combinator, reaching countless founders and executives:

The morning after this advice was shared, I received numerous emails from founders asking if they should aim for 100% test-pass rates.

If you’re not hands-on with AI, this advice might sound reasonable. But any practitioner would know it’s deeply flawed.

“Perfect” Is Flawed

In AI, a perfect score is a red flag. This happens when a model has inadvertently been trained on data or prompts that are too similar to tests. Like a student who was given the answers before an exam, the model will look good on paper but be unlikely to perform well in the real world.

If you are sure your data is clean but you’re still getting 100% accuracy, chances are your test is too weak or not measuring what matters. Tests that always pass don’t help you improve; they’re just giving you a false sense of security.

Most importantly, when all your models have perfect scores, you lose the ability to differentiate between them. You won’t be able to identify why one model is better than another, or strategize about how to make further improvements.

The goal of evaluations isn’t to pat yourself on the back for a perfect score.

It’s to uncover areas for improvement and ensure your AI is truly solving the problems it’s meant to address. By focusing on real-world performance and continuous improvement, you’ll be much better positioned to create AI that delivers genuine value. Evals are a big topic, and we’ll dive into them more in a future chapter.

Moving Forward

When you’re not hands-on with AI, it’s hard to separate hype from reality. Here are some key takeaways to keep in mind:

Be skeptical of advice or metrics that sound too good to be true.
Focus on real-world performance and continuous improvement.
Seek advice from experienced AI practitioners who can communicate effectively with executives. (You’ve come to the right place!)

We’ll dive deeper into how to test AI, along with a data review toolkit in a future chapter. First, we’ll look at the biggest mistake executives make when investing in AI.

The #1 Mistake Companies Make with AI

One of the first questions I ask tech leaders is how they plan to improve AI reliability, performance, or user satisfaction. If the answer is “We just bought XYZ tool for that, so we’re good,” I know they’re headed for trouble. Focusing on tools over processes is a red flag and the biggest mistake I see executives make when it comes to AI.

Improvement Requires Process

Assuming that buying a tool will solve your AI problems is like joining a gym but not actually going. You’re not going to see improvement by just throwing money at the problem. Tools are only the first step; the real work comes after. For example, the metrics that come built-in to many tools rarely correlate with what you actually care about. Instead, you need to design metrics that are specific to your business, along with tests to evaluate your AI’s performance.

The data you get from these tests should also be reviewed regularly to make sure you’re on track. No matter what area of AI you’re working on—model evaluation, retrieval-augmented generation (RAG), or prompting strategies—the process is what matters most. Of course, there’s more to making improvements than just relying on tools and metrics. You also need to develop and follow processes.

Rechat’s Success Story

Rechat is a great example of how focusing on processes can lead to real improvements. The company decided to build an AI agent for real estate agents to help with a large variety of tasks related to different aspects of the job. However, they were struggling with consistency. When the agent worked, it was great, but when it didn’t, it was a disaster. The team would make a change to address a failure mode in one place but end up causing issues in other areas. They were stuck in a cycle of whack-a-mole. They didn’t have visibility into their AI’s performance beyond “vibe checks,” and their prompts were becoming increasingly unwieldy.

When I came in to help, the first thing I did was apply a systematic approach that is illustrated in Figure 2-1.

This is a virtuous cycle for systematically improving large language models (LLMs). The key insight is that you need both quantitative and qualitative feedback loops that are fast. You start with LLM invocations (both synthetic and human-generated), then simultaneously:

Run unit tests to catch regressions and verify expected behaviors
Collect detailed logging traces to understand model behavior

These feed into evaluation and curation (which needs to be increasingly automated over time). The eval process combines:

Human review
Model-based evaluation
A/B testing

The results then inform two parallel streams:

Fine-tuning with carefully curated data
Prompt engineering improvements

These both feed into model improvements, which starts the cycle again. The dashed line around the edge emphasizes this as a continuous, iterative process—you keep cycling through faster and faster to drive continuous improvement. By focusing on the processes outlined in this diagram, Rechat was able to reduce its error rate by over 50% without investing in new tools!

Check out this ~15-minute video on how we implemented this process-first approach at Rechat.

Avoid the Red Flags

Instead of asking which tools you should invest in, you should be asking your team:

What are our failure rates for different features or use cases?
What categories of errors are we seeing?
Does the AI have the proper context to help users? How is this being measured?
What is the impact of recent changes to the AI?

The answers to each of these questions should involve appropriate metrics and a systematic process for measuring, reviewing, and improving them. If your team struggles to answer these questions with data and metrics, you are in danger of going off the rails!

Avoiding Jargon Is Critical

We’ve talked about why focusing on processes is better than just buying tools. But there’s one more thing that’s just as important: how we talk about AI. Using the wrong words can hide real problems and slow down progress. To focus on processes, we need to use clear language and ask good questions. That’s why we provide an AI communication cheat sheet for executives in the next section. That section helps you:

Understand what AI can and can’t do
Ask questions that lead to real improvements
Ensure that everyone on your team can participate

Using this cheat sheet will help you talk about processes, not just tools. It’s not about knowing every tech word. It’s about asking the right questions to understand how well your AI is working and how to make it better. In the next chapter, we’ll share a counterintuitive approach to AI strategy that can save you time and resources in the long run.

AI Communication Cheat Sheet for Executives

Why Plain Language Matters in AI

As an executive, using simple language helps your team understand AI concepts better. This cheat sheet will show you how to avoid jargon and speak plainly about AI. This way, everyone on your team can work together more effectively.

At the end of this chapter, you’ll find a helpful glossary. It explains common AI terms in plain language.

Helps Your Team Understand and Work Together

Using simple words breaks down barriers. It makes sure everyone—no matter their technical skills—can join the conversation about AI projects. When people understand, they feel more involved and responsible. They are more likely to share ideas and spot problems when they know what’s going on.

Improves Problem-Solving and Decision Making

Focusing on actions instead of fancy tools helps your team tackle real challenges. When we remove confusing words, it’s easier to agree on goals and make good plans. Clear talk leads to better problem-solving because everyone can pitch in without feeling left out.

Reframing AI Jargon into Plain Language

Here’s how to translate common technical terms into everyday language that anyone can understand.

Examples of Common Terms, Translated

Changing technical terms into everyday words makes AI easy to understand. The following table shows how to say things more simply:

Instead of saying…	Say…
“We’re implementing a RAG approach.”	“We’re making sure the AI always has the right information to answer questions well.”
“We’ll use few-shot prompting and chain-of-thought reasoning.”	“We’ll give examples and encourage the AI to think before it answers.”
“Our model suffers from hallucination issues.”	“Sometimes, the AI makes things up, so we need to check its answers.”
“Let’s adjust the hyperparameters to optimize performance.”	“We can tweak the settings to make the AI work better.”
“We need to prevent prompt injection attacks.”	“We should make sure users can’t trick the AI into ignoring our rules.”
“Deploy a multimodal model for better results.”	“Let’s use an AI that understands both text and images.”
“The AI is overfitting on our training data.”	“The AI is too focused on old examples and isn’t doing well with new ones.”
“Consider utilizing transfer learning techniques.”	“We can start with an existing AI model and adapt it for our needs.”
“We’re experiencing high latency in responses.”	“The AI is taking too long to reply; we need to speed it up.”

How This Helps Your Team

By using plain language, everyone can understand and join in. People from all parts of your company can share ideas and work together. This reduces confusion and helps projects move faster, because everyone knows what’s happening.

Strategies for Promoting Plain Language in Your Organization

Now let’s look at specific ways you can encourage clearer communication across your teams.

Lead by Example

Use simple words when you talk and write. When you make complex ideas easy to understand, you show others how to do the same. Your team will likely follow your lead when they see that you value clear communication.

Challenge Jargon When It Comes Up

If someone uses technical terms, ask them to explain in simple words. This helps everyone understand and shows that it’s okay to ask questions.

Example: If a team member says, “Our AI needs better guardrails,” you might ask, “Can you tell me more about that? How can we make sure the AI gives safe and appropriate answers?”

Encourage Open Conversation

Make it okay for people to ask questions and say when they don’t understand. Let your team know it’s good to seek clear explanations. This creates a friendly environment where ideas can be shared openly.

Conclusion

Using plain language in AI isn’t just about making communication easier—it’s about helping everyone understand, work together, and succeed with AI projects. As a leader, promoting clear talk sets the tone for your whole organization. By focusing on actions and challenging jargon, you help your team come up with better ideas and solve problems more effectively.

Glossary of AI Terms

Use this glossary to understand common AI terms in simple language:

Term	Short Definition	Why It Matters
AGI (Artificial General Intelligence)	AI that can do any intellectual task a human can	While some define AGI as AI that’s as smart as a human in every way, this isn’t something you need to focus on right now. It’s more important to build AI solutions that solve your specific problems today.
Agents	AI models that can perform tasks or run code without human help	Agents can automate complex tasks by making decisions and taking actions on their own. This can save time and resources, but you need to watch them carefully to make sure they are safe and do what you want.
Batch Processing	Handling many tasks at once	If you can wait for AI answers, you can process requests in batches at a lower cost. For example, OpenAI offers batch processing that’s cheaper but slower.
Chain of Thought	Prompting the model to think and plan before answering	When the model thinks first, it gives better answers but takes longer. This trade-off affects speed and quality.
Chunking	Breaking long texts into smaller parts	Splitting documents helps search them better. How you divide them affects your results.
Context Window	The maximum text the model can use at once	The model has a limit on how much text it can handle. You need to manage this to fit important information.
Distillation	Making a smaller, faster model from a big one	It lets you use cheaper, faster models with less delay (latency). But, the smaller model might not be as accurate or powerful as the big one. So, you trade some performance for speed and cost savings.
Embeddings	Turning words into numbers that show meaning	Embeddings let you search documents by meaning, not just exact words. This helps you find information even if different words are used, making searches smarter and more accurate.
Few-Shot Learning	Teaching the model with only a few examples	By giving the model examples, you can guide it to behave the way you want. It’s a simple but powerful way to teach the AI what is good or bad.
Fine-Tuning	Adjusting a pre-trained model for a specific job	It helps make the AI better for your needs by teaching it with your data, but it might become less good at general tasks. Fine-tuning works best for specific jobs where you need higher accuracy.
Frequency Penalties	Settings to stop the model from repeating words	Helps make AI responses more varied and interesting, avoiding boring repetition.
Function Calling	Getting the model to trigger actions or code	Allows AI to interact with apps, making it useful for tasks like getting data or automating jobs.
Guardrails	Safety rules to control model outputs	Guardrails help reduce the chance of the AI giving bad or harmful answers, but they are not perfect. It’s important to use them wisely and not rely on them completely.
Hallucination	When AI makes up things that aren’t true	AIs sometimes make stuff up, and you can’t completely stop this. It’s important to be aware that mistakes can happen, so you should check the AI’s answers.
Hyperparameters	Settings that affect how the model works	By adjusting these settings, you can make the AI work better. It often takes trying different options to find what works best.
Hybrid Search	Combining search methods to get better results	By using both keyword and meaning-based search, you get better results. Just using one might not work well. Combining them helps people find what they’re looking for more easily.
Inference	Getting an answer back from the model	When you ask the AI a question and it gives you an answer, that’s called inference. It’s the process of the AI making predictions or responses. Knowing this helps you understand how the AI works and the time or resources it might need to give answers.
Inference Endpoint	Where the model is available for use	Lets you use the AI model in your apps or services.
Latency	The time delay in getting a response	Lower latency means faster replies, improving user experience.
Latent Space	The hidden way the model represents data inside it	Helps us understand how the AI processes information.
LLM (Large Language Model)	A big AI model that understands and generates text	Powers many AI tools, like chatbots and content creators.
Model Deployment	Making the model available online	Needed to put AI into real-world use.
Multimodal	Models that handle different data types, like text and images	People use words, pictures, and sounds. When AI can understand all these, it can help users better. Using multimodal AI makes your tools more powerful.
Overfitting	When a model learns training data too well but fails on new data	If the AI is too tuned to old examples, it might not work well on new stuff. Getting perfect scores on tests might mean it’s overfitting. You want the AI to handle new things, not just repeat what it learned.
Pre-training	The model’s initial learning phase on lots of data	It’s like giving the model a big education before it starts specific jobs. This helps it learn general things, but you might need to adjust it later for your needs.
Prompt	The input or question you give to the AI	Giving clear and detailed prompts helps the AI understand what you want. Just like talking to a person, good communication gets better results.
Prompt Engineering	Designing prompts to get the best results	By learning how to write good prompts, you can make the AI give better answers. It’s like improving your communication skills to get the best results.
Prompt Injection	A security risk where bad instructions are added to prompts	Users might try to trick the AI into ignoring your rules and doing things you don’t want. Knowing about prompt injection helps you protect your AI system from misuse.
Prompt Templates	Pre-made formats for prompts to keep inputs consistent	They help you communicate with the AI consistently by filling in blanks in a set format. This makes it easier to use the AI in different situations and ensures you get good results.
Rate Limiting	Limiting how many requests can be made in a time period	Prevents system overload, keeping services running smoothly.
Reinforcement Learning from Human Feedback (RLHF)	Training AI using people’s feedback	It helps the AI learn from what people like or don’t like, making its answers better. But it’s a complex method, and you might not need it right away.
Reranking	Sorting results to pick the most important ones	When you have limited space (like a small context window), reranking helps you choose the most relevant documents to show the AI. This ensures the best information is used, improving the AI’s answers.
Retrieval-augmented generation (RAG)	Providing relevant context to the LLM	A language model needs proper context to answer questions. Like a person, it needs access to information such as data, past conversations, or documents to give a good answer. Collecting and giving this info to the AI before asking it questions helps prevent mistakes or it saying, “I don’t know.”
Semantic Search	Searching based on meaning, not just words	It lets you search based on meaning, not just exact words, using embeddings. Combining it with keyword search (hybrid search) gives even better results.
Temperature	A setting that controls how creative AI responses are	Lets you choose between predictable or more imaginative answers. Adjusting temperature can affect the quality and usefulness of the AI’s responses.
Token Limits	The max number of words or pieces the model handles	Affects how much information you can input or get back. You need to plan your AI use within these limits, balancing detail and cost.
Tokenization	Breaking text into small pieces the model understands	It allows the AI to understand the text. Also, you pay for AI based on the number of tokens used, so knowing about tokens helps manage costs.
Top-p Sampling	Choosing the next word from top choices making up a set probability	Balances predictability and creativity in AI responses. The trade-off is between safe answers and more varied ones.
Transfer Learning	Using knowledge from one task to help with another	You can start with a strong AI model someone else made and adjust it for your needs. This saves time and keeps the model’s general abilities while making it better for your tasks.
Transformer	A type of AI model using attention to understand language	They are the main type of model used in generative AI today, like the ones that power chatbots and language tools.
Vector Database	A special database for storing and searching embeddings	They store embeddings of text, images, and more, so you can search by meaning. This makes finding similar items faster and improves searches and recommendations.
Zero-Shot Learning	When the model does a new task without training or examples	This means you don’t give any examples to the AI. While it’s good for simple tasks, not providing examples might make it harder for the AI to perform well on complex tasks. Giving examples helps, but takes up space in the prompt. You need to balance prompt space with the need for examples.

Footnotes

Diagram adapted from my blog post, “Your AI Product Needs Evals”.

This post is an excerpt (chapters 1-3) of an upcoming report of the same title. The full report will be released on the O’Reilly learning platform on February 27, 2025.

AI NewsWire .org

Archives

Categories