[EN] The Engineering Complexity of Agentic AI

Since last year, Agentic AI has become the center of nearly every technical discussion. Even people outside the AI field have heard of terms like MCP and Agent.

This essay looks back on the past year and a half I spent building at the front line of this domain.

The app I worked on was almost entirely driven by LLMs. That meant I had to adopt every new trend that appeared in the LLM ecosystem. Quality is everything in an LLM-based product, and improving it required constant adaptation.

Whenever concepts like Multi-Agent, MCP, or Agentic Loop were introduced, I incorporated them into the product right away. Working this close to the edge made me realize something simple: the more advanced the technology gets, the more important the fundamentals of software engineering become.

Example 1. MCP and the Hidden Risks of Integration

The Model Context Protocol (MCP) created real value by standardizing how LLM-based apps talk to each other.

If a provider exposes an API that follows the MCP spec, it can be connected to existing agentic systems with minimal effort.

This ease of integration is why so many companies have recently announced MCP support.

But from an engineering perspective, what are the risks?

An MCP server is, in essence, a microservice. It inherits the same risks that any microservice architecture carries: network errors that would never occur in a single-server setup, and the need for careful error-handling policies such as retries and partial-result handling.

In production, you need well-defined retry rules, timeout strategies, and recovery behavior.

If there’s a load balancer or proxy sitting between production MCPs, debugging becomes even harder. Observability tools like OpenTelemetry are practically a requirement to trace failures across services.

Streaming adds another layer of difficulty. LLM servers often need to stream their output, since users can’t wait for the final token. The first token must arrive quickly.

That raises practical questions.

How should multiple devices viewing the same stream be handled?

What happens if another stream interrupts an ongoing one?

If the client disconnects mid-stream, how do you resume safely?