# EdgeBits — Tech Stack & Language Principle

> Single source of truth for *which language goes where, and why.* Read this before proposing a new service.

## 🎯 Core principle

> **Go for the control plane. Python for the data plane. UI in TypeScript/React. Connectors and egress are language-agnostic microservices &mdash; pick what fits the protocol library and the dev/market timing.**

Per-category rationale below. Don't pick a language by author preference.

---

## 🟢 Control plane &mdash; Go

| Service | Path | Why Go |
|---|---|---|
| Edge Manager API | [`edge-manager/api/`](edge-manager/api/) | chi router, single binary, structured concurrency. Fleet management benefits from low per-goroutine overhead. |
| Edge Sync (DMZ poller) | [`edge/edge-sync/`](edge/edge-sync/) | Static binary on every edge node. SQLite offline queue, exponential backoff. Tiny footprint suits ARM gateways. |
| Edge SDK | [`sdk/edge/`](sdk/edge/) | Typed Go client for the Edge REST API. |
| Edge Manager SDK | [`sdk/edge-manager/`](sdk/edge-manager/) | Typed Go client for the Edge Manager REST API. Used by Edge Sync. |
| TUIs (`edgebits-edge-tui`, `edgebits-manager-tui`) | [`edge/tui/`](edge/tui/), [`edge-manager/tui/`](edge-manager/tui/) | bubbletea + lipgloss; ship as single binaries operators can `scp`. |

Go conventions: `gofmt`, `slog` JSON, `context.Context` first param, check every error, table-driven tests. Full rules in [.claude/rules/coding-standards.md](.claude/rules/coding-standards.md) §Go.

---

## 🐍 Data plane &mdash; Python

| Service | Path | Why Python |
|---|---|---|
| Edge Gateway | [`edge/gateway/`](edge/gateway/) | FastAPI proxy. Pydantic validation. Auto OpenAPI at `/openapi.json` + Swagger at `/docs`. **Intentional &mdash; stays in Python.** See *Reassessment* below. |
| Edge Core | [`edge/core/`](edge/core/) | Shared library: envelope, event_bus, models, scheduler, observability, registry. Imported by every Python data-plane service. |
| Pipeline Engine | [`edge/pipeline-engine/`](edge/pipeline-engine/) | Hosts the preset library ([`blocks/presets/`](blocks/presets/)). Pandas/numpy for aggregation. |
| Event Engine | [`edge/event-engine/`](edge/event-engine/) | Hosts the AST-safe Python expression evaluator + Python actions. |
| Buffer Manager | [`edge/buffer/`](edge/buffer/) | TimescaleDB writer with hybrid time/size flush. (See *Reassessment* below &mdash; strongest "should this be Go?" candidate.) |
| Analytics (planned) | `analytics/` | pandas + numpy for OEE/energy, ONNX for ML, ReportLab for reports, **`pyrfc`** for SAP RFC (Python-only). |

Python conventions: type hints, Pydantic models, async I/O, snake_case files, structlog, exact pinned versions, no bare `except:`. Full rules in [.claude/rules/coding-standards.md](.claude/rules/coding-standards.md) §Python.

---

## 🌐 UI &mdash; TypeScript / React

| Surface | Path | Stack |
|---|---|---|
| Edge UI | [`edge/ui/`](edge/ui/) | React 18 + Vite + Tailwind + shadcn/ui + Lucide |
| Edge Manager UI | [`edge-manager/ui/`](edge-manager/ui/) | React 18 + Vite + shadcn/ui + Lucide; 12 pages |
| Marketing site | [`website/`](website/) | Static HTML + CSS + Lucide via CDN; Plus Jakarta Sans + Roboto Mono |
| Documentation site | [`docs/`](docs/) | MkDocs + `mkdocs-shadcn` (theme migration to MkDocs Material under consideration &mdash; [issues/pending/chore-docs-site.md](issues/pending/chore-docs-site.md)) |

UI conventions: strict TypeScript, functional components, no `any`, shadcn/ui everywhere (no raw HTML), Sonner toasts, `<Skeleton>` for loading, Lucide icons only (no emojis), CSS custom props (no inline styles), `rounded-[var(--radius-sm)]`. Full rules in [.claude/rules/coding-standards.md](.claude/rules/coding-standards.md) §TypeScript/React.

---

## 🔌 Connectors / Egress &mdash; language-agnostic microservices

> Pick the language that gives the best protocol library, the right runtime characteristics, and the timeline the market needs. Each block is its own Docker container.

### Connectors (ingest)

| Connector | Language | Status | Why this language |
|---|---|---|---|
| Modbus TCP/RTU | C++ | 🟡 Stub | `libmodbus` is the de-facto library; perf matters at high poll rates. **Production work pending &mdash; cash cow per [market.md](internal/market.md#L55).** |
| MQTT | Go | ✅ Shipped | Eclipse Paho Go; goroutine-per-subscription scales for sensor fleets. |
| REST/HTTP ingest | Node | ✅ Shipped | Express + node-fetch; cleanest path to webhook receivers. |
| Siemens S7 | Python | ✅ Shipped (most mature) | `python-snap7`; async polling cadence. |
| OPC-UA | C (via [S2OPC](https://www.s2opc.com)) | 🛠️ Planned | Apache 2.0, OPC Foundation certified, **ANSSI CSPN visa**, **SIL3 / IEC 61508**. C-based footprint suits ARM. Real procurement credentials. |
| EtherNet/IP (CIP) | C (`libplctag` likely) | 🛠️ Planned | Mirror S2OPC posture; de-facto Allen-Bradley library. Footprint over ergonomics. |
| SAP IDoc + OData | Python | 🛠️ Planned | `pyrfc` for RFC + `httpx` for OData. SAP RFC has no Go binding &mdash; not optional. |

### Egress

| Egress | Language | Status | Why this language |
|---|---|---|---|
| AVEVA PI Web API | Python | ✅ Shipped | `httpx` async; PI AF extension stays in same codebase. |
| REST/HTTP push | Node | ✅ Shipped | Cleanest async-await + retry-loop ergonomics; circuit breaker. |
| Sparkplug B | Node | ✅ Shipped | Eclipse Tahu Node version is most actively maintained. |
| REST (Go scaffold) | Go | 🟡 No manifest | **Either kill or document why three REST egresses exist** &mdash; [issues/pending/chore-website-copy-honesty.md](issues/pending/chore-website-copy-honesty.md). |
| REST (C++ scaffold) | C++ | 🟡 No manifest | Same. |
| SAP RFC/BAPI | Python | 🛠️ Planned | `pyrfc` &mdash; only viable choice. |
| FactoryTalk Historian | TBD | 🛠️ Planned (priority 1 next) | Likely Python or .NET (Rockwell ecosystem). |
| Wonderware / AVEVA System Platform | TBD | 🛠️ Planned (priority 2) | Likely Python via OPC-UA gateway. |

### Connector / egress contract (language-agnostic)

Every connector and egress, regardless of language, must satisfy this contract. Per [.claude/rules/architecture.md](.claude/rules/architecture.md):

| Requirement | Protocol |
|---|---|
| Data exchange | MQTT publish/subscribe of JSON Envelope on UNS topics |
| Health check | HTTP `GET /health` |
| Registration | HTTP `POST {GATEWAY_URL}/api/v1/services` |
| Config | `SERVICE_ID` + `GATEWAY_URL` env vars only |

If a new connector can't satisfy this 5-step contract, the language choice was wrong.

### Decision matrix &mdash; new connector / egress

| Situation | Use this | Why |
|---|---|---|
| Vendor library only in language X | Language X | E.g., SAP RFC = `pyrfc` = Python. |
| Security cert / footprint critical | C with certified library | E.g., S2OPC for OPC-UA. |
| High-frequency polling (>1000 reads/sec) | C++ or Go | Memory + GC overhead matters. |
| Async pub/sub with many subs | Go or Node | Lightweight concurrency. |
| HTTP receiver / pusher | Node | Cleanest ergonomics. |
| ML / scientific computing inside | Python | Pandas/numpy/ONNX. |
| Tight memory (<32MB) | Go or C++ | Avoid Python; avoid Node. |
| Speed-to-ship over perf | Python or Node | Higher productivity per hour. |

Defaults when nothing else applies: **Python for ingest, Node for push egress.**

---

## 🧱 Building blocks (presets, rules, actions) &mdash; in-process Python

In-process within Pipeline Engine + Event Engine. Loaded by the Python process at startup via the catalog. Not microservices.

| Type | Path | Count |
|---|---|---|
| Presets | [`blocks/presets/`](blocks/presets/) | 15 (aggregate, enrich, filter, script, transform) |
| Rules | [`blocks/event-processors/rules/`](blocks/event-processors/rules/) | 6 (threshold, range, change, deadband, expression, chain) |
| Actions | [`blocks/event-processors/actions/`](blocks/event-processors/actions/) | 8 (log, webhook, event, setpoint, email, slack, mqtt_publish, config_tune) |

**Why in-process Python:** evaluated per-envelope at 1000s/sec. Cross-process IPC per envelope would dominate cost. The script blocks (`lua_script`, `python_expr`) use sandboxed AST evaluators &mdash; same in-process model.

---

## 💾 Storage

| Layer | Tech | Why |
|---|---|---|
| Edge config | SQLite | Single-file; survives restart; no external dep. Per-node config tree. |
| Edge buffer (telemetry) | TimescaleDB | Hybrid time/size flush; continuous aggregates planned. **Heavy for ARM &mdash; reassessment tracked in `issues/`.** |
| Edge scheduler | SQLite (APScheduler) | Cron + interval + manual; survives restart. |
| Edge Manager state | PostgreSQL (`pgx/v5`) | Tenant-isolated rows; `(tenant_id, ...)` keys on every multi-tenant table. Migrated from in-memory; `MemoryStore` deleted. |
| Edge Manager queue | Redis + asynq | Background jobs (heartbeat, deploy orchestration). |
| Analytics (planned) | TimescaleDB | Cross-site OEE / energy / reports with continuous aggregates. |

---

## 🚌 Transport

| Channel | Protocol | Use |
|---|---|---|
| UNS event bus | MQTT (Mosquitto / EMQX) | Every envelope on `uns/{site}/{area}/{line}/{device}/{tag}`. The wire format. |
| Service control | HTTP / REST | Edge Manager &rlarr; Edge Sync; Edge UI &rlarr; Edge Gateway; SDK clients. |
| Service-management commands | MQTT (`cmd/{service_id}/*`) | Cold/hot reload, flush, write_setpoint. |
| Live debug tunnel | WSS via bastion | On-demand outbound when an engineer needs to reach a node. |
| Outbound edge poll | HTTPS only | Edge Sync polls every 30s. **No inbound ports on the edge.** |

---

## 📡 Observability &mdash; fasten (audit + correlation SDK)

[`fasten`](https://github.com/nerdapplabs/fasten) is the cross-language audit + correlation SDK that backs every state-changing operation across Edge Node, Edge Manager, and Edge Sync. Vendored as a git submodule at [`fasten/`](fasten/); installed editable so submodule pulls auto-reflect at runtime.

| Surface | Library | Use |
|---|---|---|
| Audit emission | `fasten.emit()` (Py) / `fasten.Emit()` (Go) | Every state change emits a row with `actor + target + tenant + request_id`. Codes declared in [`audit_codes.py`](edge/core/observability/audit_codes.py) and [`edge-manager/api/internal/audit/codes.go`](edge-manager/api/internal/audit/codes.go) before emission. |
| Request-ID correlation | `fasten.shim.http.RequestIDMiddleware` (Py) / `fasten.RequestID` (Go) | Single `request_id` flows from edge ↔ manager ↔ egress, surfaces in audit + sys logs + api logs. |
| API request logging | `fasten.shim.http.APILogger` (Py) / `fasten.APILogger` (Go) | Every HTTP request → fasten api ring buffer with method / path / status / duration_ms. |
| Sys log capture | `fasten.shim.structlog.configure()` (Py one-call) / `slog.New(fasten.NewSlogHandler(base))` (Go) | structlog → fasten ring + console; same shape both languages. |
| Mountable reader | `fasten.reader.router()` (Py) / `fasten.NewReader()` (Go) | Both products serve canonical `/api/v1/logs/{sys,api,audit}` from the same SDK handler. |
| Storage | `fasten.store.sqlite.SQLiteStore` (default) / `fasten[postgres]` (regulated) | SQLite for dev + edge ops; Postgres for compliance-tier deployments per [compliance-plan.md](docs/roadmap/compliance-plan.md). |
| Redaction | `fasten.redactor()` | Secret-key scrubbing before persistence; OWASP-aligned defaults + per-tenant `extra_redact_keys`. |

Adopters use **public API only** — zero private imports across edge / edge-manager / edge-sync. Cross-language parity: Python and Go talk to the same audit-row shape on the wire.

## 🔐 Auth &amp; identity

Defined in [docs/roadmap/edge-auth-plan.md](docs/roadmap/edge-auth-plan.md). Phase 1 lands as Day 1-3 of the [website-product-gap-fix.md](docs/roadmap/website-product-gap-fix.md) sprint. Library stack:

| Concern | Edge Node (Python) | Edge Manager (Go) |
|---|---|---|
| Password hashing | `argon2-cffi` (argon2id, OWASP defaults) | `github.com/alexedwards/argon2id` |
| JWT mint + validate | [Authlib](https://docs.authlib.org/) (`authlib.jose`) | `github.com/golang-jwt/jwt/v5` |
| OIDC client (Phase 4 SSO) | Authlib (already in stack) | `golang.org/x/oauth2` + `github.com/coreos/go-oidc/v3` |
| TOTP MFA (Phase 3) | `pyotp` | `github.com/pquerna/otp` |
| WebAuthn (Phase 3) | `webauthn` (py-webauthn) | `github.com/go-webauthn/webauthn` |
| Rate limit (login brute-force) | `slowapi` | `golang.org/x/time/rate` (single-instance) or `github.com/ulule/limiter/v3` (Redis-backed) |
| Federated identity (Phase 4) | &mdash; (EM is the relying party) | [Zitadel](https://zitadel.com) as IdP-of-IdPs (open-source, Go-based, multi-tenant via orgs) |

Architecture: **EM-master / EN-replica with sync-down.** EM holds users + access matrix + tokens; EN holds a per-node-scoped read-only cache; login happens locally on each EN against the cache (DMZ-tolerant by design). Zitadel federates to corporate ADs / Azure AD / Okta / Google Workspace; customers can also BYO any OIDC IdP — same EM code path.

## 📜 API contracts &mdash; both APIs publish OpenAPI

| API | Spec | Swagger UI |
|---|---|---|
| Edge Gateway (FastAPI) | `/openapi.json` (auto-generated) | `/docs` |
| Edge Manager (Go chi) | `/api/v1/openapi.json` ([docs.ServeSpec](edge-manager/api/internal/docs/swagger.go)) | `/docs` ([docs.SwaggerUI](edge-manager/api/internal/docs/swagger.go)) |

Both specs are buildable locally and ready for partner / customer integration work.

---

## 📦 Build / runtime

- **Containerised** &mdash; every connector/egress ships as `linux/amd64 + linux/arm64` Docker image (`docker buildx`).
- **Dev compose** &mdash; volume mounts + `docker compose watch`. Always use [`docker-compose.dev.yml`](edge/docker-compose.dev.yml), never the prod compose.
- **Production compose** is the same shape with hardening overlays (no source mounts, real secrets).
- **Debian packaging** via `fpm` for connectors that ship as `.deb` to plants without Docker.

---

## 🔄 Reassessment &mdash; should some Python services move to Go or C++?

The "Python data plane" default holds for most services, but three are worth revisiting under measured pressure. **Don't rewrite pre-emptively.** Recommendations are conditional on benchmarks, not folklore.

### Move candidates (ranked)

| Service | Today | Move to | Trigger | Effort |
|---|---|---|---|---|
| **Buffer Manager** | Python | **Go** | Write-rate or RAM footprint pain in a customer benchmark, OR when buffer-tech change forces a rewrite anyway | ~2-3 weeks |
| **S7 connector** | Python | **C/C++** | After the C++ template stabilises from Modbus + OPC-UA + EtherNet/IP work; consistency with the certified-stack pattern | ~2-3 weeks |

### Why these two (and not others)

- **Buffer Manager** is the hottest write path; `pgx` outperforms `asyncpg` on batch INSERT/COPY; couples loosely to Edge Core (envelope shape is 8 fields).
- **S7 connector** wraps a C library (`snap7`) in Python; removing the wrapper aligns with the C/C++ certified-stack pattern (libmodbus, S2OPC, libplctag).

### Stay Python

| Service | Why |
|---|---|
| Edge Gateway | **Intentional.** FastAPI + Pydantic gives schema-driven validation, auto OpenAPI + Swagger, and the shortest path to evolving the API contract. Control-plane-vs-data-plane logic is the wrong frame here &mdash; the gateway's job is *typed HTTP over MQTT + config_tree*, where Python's tooling wins. Rewrite would trade validated iteration speed for a marginal latency gain the workload doesn't need. |
| Edge Core | Shared by every data-plane service. Move requires moving every consumer. |
| Pipeline Engine | Hosts the preset library + engine primitives. |
| Event Engine | Hosts AST-safe Python expression evaluator + 8 actions. |
| AVEVA PI egress | REST batch workload; PI AF extension wants Python iteration speed. |
| Future Analytics | pandas/numpy/ONNX/ReportLab/`pyrfc` &mdash; Python-only ecosystem. |
| In-process Functions/Rules/Actions | Coupled to host process language. |

### Discipline rules for any rewrite proposal

1. **Don't rewrite without a benchmark.** "Python is slow" is folklore until measured on this workload.
2. **Don't rewrite during the v1 customer push.** Every rewrite-week is a week not spent shipping Modbus / OPC-UA / FactoryTalk / Postgres migration.
3. **Don't rewrite without tests + CI baseline.** Untested service in another language = uncatchable regressions. Add tests to the Python version first.

---

## 🚫 Anti-patterns

- **Don't add a sixth language without a market reason.** Polyglot is OK when justified by a vendor library; not OK when it's author preference.
- **Don't write business logic in the Edge Gateway.** Gateway is a pure proxy; presets go to Pipeline Engine, actions go to Event Engine. ([master-plan.md:58](internal/master-plan.md#L58) flags the existing violation.)
- **Don't skip the 5-step connector contract.** If you can't fit the contract, redesign &mdash; don't shim around it.
- **Don't use Python for the control plane** even if "it'd be faster to write." Edge Manager + Edge Sync are Go because uptime + single-binary deployment matter.
- **Don't use Go for ML / numerical work.** SciPy/Pandas/ONNX live in Python.
- **Don't use Node for long-running compute.** Async I/O is its strength; CPU-bound work belongs elsewhere.

---

## 📚 Enforcement

- [.claude/rules/coding-standards.md](.claude/rules/coding-standards.md) — per-language style.
- [.claude/rules/architecture.md](.claude/rules/architecture.md) — connector/egress 5-step contract.
- [.claude/rules/edge.md](.claude/rules/edge.md) — Edge platform service taxonomy.
- This file is the *map* across all of the above. New service proposals consult this **first**.
