First App!

Day 2 on journey to create a distributed graph coding ai.

With my frustration with the poor capabilities of even the largest agents to deal with very large codebases, I started work on a distributed graph based coding copilot, working name for now: gcoder

Currently, even Claude Code & Cursor with Sonnet 4.5 [1M] fails to reliably debug large codebases - see my other blog posts on Heimdall architecture, and at a cost of $1 per minute - it gets extremely expensive.

My goal here is to create a coding ai, that can work on my projects at scale, very cheaply, accurately and very fast. The app below was generated in 30 sec, and cost $0, due to using local llm models.

Day 2 - first app created

Day 2, good progress, totally rewrote the entire codebase, and confirmed its capability by it generating and running its first working code.

Create a Flask web application that:
1. Has a form where users enter a name
2. Uses Anthropic Claude API to analyze the name's origin and meaning
3. Displays the AI-generated analysis
4. Includes error handling
Requirements
- Flask web framework
- Anthropic Python SDK
- HTML templates with Bootstrap styling
- Proper error handling and validation
- Docker containerization
- Environment variable configuration

A small flask app that will look up the etymology of your name. This tests a few things:

Ensuring all imports and requirements line up
Debugging
Importing secrets
Being able to run the app in a local docker container
Running tests against the app
Debugging runtime errors

Overall, not bad for a single days work.. I think this proves the architecture is worth taking forward.

Technical Notes

Core Innovation

Traditional copilots treat code as text and dump entire repositories into LLM context windows. Heimdall instead:

Represents everything as a graph - Code, domains, runtime, plans, security as interconnected nodes
Uses graph queries first - Structure-based operations are instant and free
Falls back to semantic search - Vector embeddings only when graph queries insufficient
Reserves LLMs for reasoning - Planning, decisions, and generation only

The Three-Tier Intelligence Strategy

Tier 1: Graph Operations (70% of requests)
├─ OpenCypher queries on FalkorDB
├─ Pattern matching, traversals, graph algorithms
├─ Cost: $0, Latency: 50-200ms
└─ Use: Structure-based queries

Tier 2: Vector DB Embeddings (20% of requests)
├─ BERT embeddings via local encoder
├─ Semantic search when graph insufficient
├─ Cost: $0 (self-hosted), Latency: 50-100ms
└─ Use: Natural language → code mapping, external log linking

Tier 3: LLM Reasoning (10% of requests)
├─ Local: Qwen 32B, DeepSeek 33B, Llama 70B (80% of LLM work)
├─ API: Claude Haiku 4 (15% of LLM work)
├─ API: Claude Sonnet 4.5 (5% of LLM work - complex planning only)
├─ Cost: $0-$0.40/request, Latency: 1-5s
└─ Use: Planning, architectural decisions, code generation

Success Metrics

vs Traditional Copilots:

91% cost reduction ($1,900/mo vs $73,000/mo at 1K req/day)
4x faster (5s avg vs 20s+)
10x more accurate (graph-based precision vs text-based guessing)
Infinite scale (no context window limits)

Targets:

Code generation success: >80%
Bug localization accuracy: >60% (top-5)
Plan accuracy: >85%
Latency P95: <7s

I'm making a significant number of additions today, including quite a few around contract negotiation - in this graph coder, contracts are all external dependencies: api's, configs, requirements, secrets, db tables/fields/orm, etc.

Any changes to an interface gets negotiated through the graph planner, between all the parties involved in an event driven manner.

This is all very early days, so I expect I'll make significant changes to the whole system as I learn more.

See you next post