Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Core Concepts

Understanding the fundamental building blocks of agentic AI systems is essential before building complex applications. This chapter introduces the core concepts that form the foundation of Opik and agentic AI development.

LLM

At the heart of every agentic AI system is a Large Language Model (LLM). LLMs are neural networks trained on vast amounts of text data, enabling them to understand and generate human-like text, reason about problems, and follow instructions.

In the context of agentic systems, LLMs serve as the “brain” that:

When working with Opik, you’ll interact with LLMs through various providers (OpenAI, Anthropic, local models, etc.). The LLM receives prompts containing system instructions, tool descriptions, conversation history, and user requests, then generates responses that may include tool calls, reasoning, or direct answers.

Traces and Spans

Traces and spans are fundamental concepts for understanding and debugging agentic systems. They provide a hierarchical view of how your agent processes requests and executes actions.

Traces

A trace represents a complete execution of an agentic task from start to finish. It captures the entire lifecycle of a single request or conversation, including:

Each trace has a unique identifier and contains metadata such as timestamps, duration, and status (success, failure, timeout, etc.).

Spans

Each trace is composed of multiple sub-steps called spans. A span represents a single unit of work within a trace, such as:

Spans are organized hierarchically, allowing you to see the complete flow of execution. For example:

This hierarchical structure makes it easy to:

Thread

A thread represents a conversation or session with an agentic system. It maintains the context and history of interactions between a user (or system) and the agent over time.

Key characteristics of threads:

In Opik, threads are essential for:

Tool

Tools are the capabilities that agentic systems use to interact with the world beyond the LLM. A tool is essentially a function that the agent can call to perform actions, retrieve information, or manipulate data.

Common types of tools include:

In Opik, tools are defined with:

The LLM uses tool descriptions to decide which tools to call and with what parameters, making clear and accurate descriptions crucial for effective agent behavior.

Metrics

Metrics are quantitative measurements that help you understand and improve your agentic systems. They provide insights into performance, quality, cost, and user experience.

Common categories of metrics include:

Performance Metrics

Quality Metrics

Behavioral Metrics

Opik provides built-in support for tracking and analyzing metrics across traces, spans, and experiments, enabling data-driven optimization of your agentic systems.

Datasets

Datasets are collections of test cases, examples, or scenarios used to evaluate and improve agentic systems. A dataset typically contains:

Datasets serve multiple purposes:

In Opik, datasets can be:

Experiments

Experiments are systematic tests where you vary one or more aspects of your agentic system to measure the impact on performance. An experiment typically involves:

Common experiment types include:

Opik’s experiment framework helps you:

Prompts

Prompts are the instructions and context provided to LLMs that guide their behavior. In agentic systems, prompts are carefully structured to enable effective tool use, reasoning, and task completion.

Foundational Prompt

The foundational prompt is the base instruction set that defines the agent’s core identity, capabilities, and behavioral guidelines. It typically includes:

The foundational prompt is usually static and defines the “personality” and fundamental behavior of your agentic system.

System Prompt

The system prompt is the primary instruction set sent to the LLM for each interaction. It combines:

The system prompt is more dynamic than the foundational prompt and may be customized based on the current thread, user context, or specific task requirements.

Tool Descriptions

Tool descriptions are natural language explanations of available tools that help the LLM understand:

Well-written tool descriptions are critical for effective agent behavior. They should be:

User Prompt

The user prompt is the actual request or input from the user (or system) that the agent needs to process. It can be:

The user prompt, combined with the system prompt and conversation history, forms the complete input to the LLM for generating a response.

Prompt Engineering Best Practices

Effective prompt engineering is crucial for building high-performing agentic systems:

  1. Be explicit: Clearly state what you want the agent to do

  2. Provide examples: Include few-shot examples when possible

  3. Structure information: Use clear formatting and organization

  4. Iterate and test: Continuously refine prompts based on results

  5. Consider context length: Balance detail with token efficiency

  6. Test edge cases: Ensure prompts handle unusual inputs gracefully

In the next chapter, we’ll see how these core concepts come together to build complete agentic systems.