AI Can Do Everything Now - So What Should We Build?

OKAIBOX Dev Diary Day 2 - What Should Developers Build in the Age of AI?

I was supposed to unbox the LattePanda IOTA today.

In Day 1, I even promised I’d crack open the hardware and share a detailed BOM breakdown. But before touching hardware, I need to sort out some thoughts first.

When AI Can Do Everything, What Should We Build?

I’ve been having a bit of an existential crisis lately.

It’s 2026. Fire up Cursor, say “build me this app,” and it just… appears. AI agents like OpenClaw have crossed 160K GitHub stars, controlling PCs directly. MCP is becoming a standard protocol. AI reads files, fixes code, searches the web… anyone can do this now.

So the question that keeps nagging at me:

“What should I actually be building?”

When AI writes code for you, does writing code still matter? Even if I build an AI agent like OKAIBOX, OpenClaw is already doing it well. What kind of development actually means something in this era?

I sat with this question for a few days, trying different things. And I found my answer in an unexpected place.

AI Can’t Even Edit a Single Document

I have this quotation template we use at work - a .docx file. I asked AI to “update the amount in item 3.”

The text changed. But all the formatting was gone.

Table borders disappeared, fonts changed, merged cells came apart. The clean, professional quotation became a blob of plain text. I connected an MCP server directly and tried manipulating the XML structure, but style information kept getting stripped away.

HWP files (Korea’s standard word processor format)? Can’t even open them. “Unsupported format.” That’s it.

This got me thinking deeply.

Understanding “Content” vs. Handling “Form”

AI has reached near-perfect capability with content. Understanding text, grasping context, generating new content - it genuinely excels at this.

But form is an entirely different problem.

A document isn’t just a sequence of text. Fonts, margins, tables, merged cells, page breaks, headers/footers, style references… these intertwine in complex ways to create what we call a “document.”

Crack open a single DOCX file and you see the reality. It looks like one file, but it’s actually a ZIP archive containing dozens of XML files. document.xml, styles.xml, numbering.xml, relationships.xml… modifying a single piece of text means updating style references, numbering, and relationship data across multiple files.

graph TD
    subgraph "Where AI excels"
        A[Plain Text] --> B[Content Understanding]
        B --> C[Text Modification]
        C --> D[Result Output]
    end

    subgraph "Where AI struggles"
        E[Formatted Document] --> F[Extract Archive]
        F --> G[Parse XML]
        G --> H[Map Styles]
        H --> I[Modify Content]
        I --> J[Reapply Styles]
        J --> K[Reassemble XML]
        K --> L[Recompress Archive]
    end

    style A fill:#e8f5e8
    style D fill:#e8f5e8
    style E fill:#ffebee
    style H fill:#fff3e0
    style J fill:#fff3e0

Left side: 4 steps. Right side: 7 steps. But it’s not just about the number of steps.

The critical points are “Map Styles” and “Reapply Styles.” If even one piece of information gets lost here, the output breaks. And this isn’t a problem of “intelligence” - it’s a problem of tooling.

The AI model itself is smart enough. Explain a document structure and it understands. But there’s no intermediate tool that connects that understanding to actual file manipulation. Like having the world’s best hammer but no nails.

The Empty Layer in AI Development

Let’s zoom out.

The AI development ecosystem right now is buzzing with activity - chatbots, RAG systems, AI agents, automation tools. Everyone’s building valuable things on top of AI.

But something stands out. There’s a lot of action above AI, and it’s pretty quiet below.

Think about it this way:

graph TB
    subgraph "Where most AI development happens"
        App1[Chatbots]
        App2[RAG Systems]
        App3[AI Agents]
        App4[Automation Tools]
    end

    subgraph "AI Model Layer"
        LLM[GPT / Claude / Gemini]
    end

    subgraph "The empty layer"
        Infra1[Document Format Editing Engine]
        Infra2[Local File System Bridge]
        Infra3[Regional Service API Adapters]
        Infra4[Legacy Format Converters]
    end

    App1 --> LLM
    App2 --> LLM
    App3 --> LLM
    App4 --> LLM
    LLM --> Infra1
    LLM --> Infra2
    LLM --> Infra3
    LLM --> Infra4

    style App1 fill:#e1f5fe
    style App2 fill:#e1f5fe
    style App3 fill:#e1f5fe
    style App4 fill:#e1f5fe
    style LLM fill:#fff3e0
    style Infra1 fill:#ffebee
    style Infra2 fill:#ffebee
    style Infra3 fill:#ffebee
    style Infra4 fill:#ffebee

Plenty of apps are being built on top. Big tech companies are competing to improve the AI models in the middle. But the bottom layer - the infrastructure that connects AI to the real world - is largely empty.

Even if AI perfectly understands “change the amount in quotation item 3,” without a docx editing engine that preserves formatting, it simply can’t execute.

This is the gap in today’s AI development ecosystem. Vibrant on top, hollow underneath.

In Korea, This Gap Is Even Wider

This infrastructure layer is thin globally, but in Korea the situation is more severe.

Area	Global Status	Korea Status
Document Formats	DOCX editing libraries exist (incomplete)	No HWP/HWPX editing libraries
Messaging	WhatsApp/Telegram APIs well-established	KakaoTalk bot API limited
Government Automation	Mostly web-standard based	ActiveX/security programs required
Financial Services	Open banking APIs common	Complex certificate/security modules

HWP is the prime example. It’s Korea’s proprietary document format by Hancom, with a binary structure that’s inherently difficult to parse. The official specification is only partially public, and building a proper editing library sometimes requires reverse engineering.

HWPX is better - it’s the next-gen format based on XML with an open structure. But a proper read-write-edit library? Doesn’t exist. Not in Python. Not in JavaScript.

Will any global AI company build HWP support? No. Korea is the only country that uses it. OpenAI, Google, Anthropic - none of them will ever touch HWP.

This isn’t a problem that solves itself by waiting.

OKAIBOX Changes Direction

In Day 1, I introduced OKAIBOX as “Korea’s OpenClaw.” A hardware-based AI agent. KakaoTalk integration, native Windows, Korean service optimization.

But that alone would just make it a Korean version of OpenClaw. Playing on the surface.

What’s really needed is building the layer underneath.

graph LR
    subgraph "Day 1: Above the Surface"
        A1[OKAIBOX Hardware] --> A2[Windows 11 IoT]
        A2 --> A3[KakaoTalk Integration]
        A2 --> A4[Korean Web Automation]
        A2 --> A5["Open HWP<br/>(via Hangul program)"]
    end

    subgraph "Day 2: Below the Surface"
        B1[HWP/HWPX Editing Engine] --> B2[MCP Server]
        B2 --> B3[OKAIBOX]
        B2 --> B4[Claude / GPT]
        B2 --> B5[OpenClaw and<br/>all AI agents]
        B1 --> B6[Format-Preserving Edit]
        B1 --> B7[Table/Chart Manipulation]
    end

    style B1 fill:#e8f5e8
    style B2 fill:#e1f5fe

In Day 1, I planned to “run the Hangul program to handle HWP files.” But that’s just depending on the Hangul application. It’s not AI handling documents directly - it’s mimicking what humans already do.

The new direction: build an engine that directly parses and edits HWP/HWPX files, wrap it as an MCP server, and make it callable from any AI. Not just OKAIBOX - Claude, GPT, OpenClaw, anything can use this engine to handle Korean documents.

Not an app. Infrastructure.

From “Korea’s OpenClaw” to “Korean document infrastructure for AI.” That’s the pivot.

My Answer: Build the Infrastructure

Making it possible for AI to do what it currently can’t. Not touching the AI model itself, but expanding the contact surface where AI meets the real world. That’s the development I want to do.

Build an HWP parsing library, and every AI in the world can handle Korean documents. Build a format-preserving document editing engine, and AI can work with real business documents.

It’s not easy. You have to read file format specs, implement binary parsing, handle edge cases one by one. It’s not glamorous either. But one piece of infrastructure like this opens up countless possibilities for everything built on top.

This is the direction I found in the age of one-click everything.

Updated Roadmap

I’ve revised the plan from Day 1.

Priority 1: HWP/HWPX Editing Engine

Analyze the HWPX file format (starting with XML-based format)
Implement read/write/edit library
Format preservation as the key goal

Priority 2: MCP Server

Wrap the engine with MCP protocol
Make it immediately usable from Claude, GPT, etc.
Include DOCX format-preserving editing

Priority 3: OKAIBOX Hardware (in parallel)

LattePanda IOTA setup continues
Hardware and software development in parallel

The LattePanda unboxing has been pushed to Day 3 or Day 4. I’ll be tearing apart the HWPX file format in Day 3.

Wrapping Up

What’s the most valuable thing a developer can do in the age of AI?

Building tools that let AI reach places it currently can’t. That’s the answer I found.

OKAIBOX started with a shallow direction - “Korea’s OpenClaw.” But hitting real walls showed me what’s actually needed. The reality that AI can’t edit a single Hangul file. Filling that gap feels like the work I should be doing.

Next up, I’ll be dissecting actual HWPX files. It’s XML-based, so it should be more approachable than DOCX… right?

Series: OKAIBOX Dev Diary

Previous: Day 1 - Everyone’s Talking About OpenClaw, But I Was Already Building the Korean Version
Current: Day 2 - AI Can Do Everything Now - So What Should We Build?
Next: Day 3 - Tearing Apart the HWPX File Format (Coming Soon)