Backend Conventions (Functions + Jobs)

This document defines how we write backend code in apps/functions so it is consistent, scalable, and “big-tech boring”: predictable contracts, safe retries, and clear module boundaries.

It is written to be actionable by engineers doing refactors.

Guardrails (non‑negotiables)

Public API is a contract: /v1/... endpoints are stable. Breaking changes require versioning + deprecation policy.
Everything is retryable: any request or job may run more than once. Handlers MUST be idempotent.
Async-first architecture: user-facing endpoints should return quickly; long work is done by background jobs/workers.
Validate at boundaries: validate and normalize inputs at HTTP/job boundaries using schemas.
Observability by default: every request/job MUST emit correlated logs/traces and write durable status state.
Vendor coupling is allowed, but contained: GCP SDK usage should live in small adapters, not scattered through domain logic.

Paved path: Architecture we build toward

Public HTTP API

Express API (e.g. apps/functions/api.js) exposes /v1/... endpoints.
Endpoints should be thin: auth → validation → create resource → enqueue work → return ID.

Async engine

Cloud Tasks queues drive orchestration and platform publishing workers.
Workers update durable state (Firestore) so clients can poll status and/or receive webhooks.

Layering (code boundaries)

http/: request/response shaping only (routers, handlers, status codes).
domain/ (target state): pure business logic (publishing rules, target resolution policy, mapping of states), no SDK calls.
adapters/gcp/ (target state): Firestore/Tasks/Storage wrappers; this is where vendor coupling lives.
integrations/: platform-specific API clients + translation (TikTok/X/YouTube/etc).
services/: application workflows (glue): calls domain + adapters + integrations.

Note: today you have a large shared/ folder. That’s OK. Refactors should move code toward the boundaries above.

Repository structure: what belongs where (today)

apps/functions/api.js
- Express app composition: middleware order, route mounting, 404, error handler.
apps/functions/http/routes/*
- Route definitions + middleware chain.
- Should call into services; avoid embedding business logic.
apps/functions/http/handlers/*
- Handler logic that is still HTTP-shaped (req/res). Keep small; delegate to services.
apps/functions/middleware/*
- Cross-cutting HTTP concerns: auth, validation, rate limiting, error handling, request IDs.
apps/functions/orchestrators/*
- Job/workflow orchestration. Should be “workflow glue,” not platform implementation details.
apps/functions/services/*
- Application-level workflows (media acquisition, AI transcription, template rendering, etc.).
apps/functions/integrations/*
- Platform-specific code: OAuth, token refresh, publish calls, polling, metrics.
apps/functions/shared/*
- Shared primitives and utilities. Refactor goal is to split into domain/ + adapters/ over time.

HTTP API standards (integration-friendly)

Request lifecycle (middleware order)

Use a consistent chain for all routes:

Request ID / trace context
Auth
Subscription / authorization
Rate limiting
Schema validation
Handler
Error handler (last)

Authentication standards

Public API: API key auth for server-to-server integrations (Zapier/Make/n8n) is allowed.
Dashboard/API key management: Firebase ID token auth is allowed.

All auth middleware MUST set a single consistent auth context on the request (example: req.auth.userId).

Errors

Follow the project’s REST conventions doc (engineering/api-conventions-rest.md): stable error shape, consistent status codes.

Rules of thumb:

400: validation (client can fix payload)
401/403: auth/permission
404: missing resource
409: conflict/idempotency collision
429: rate limit exceeded
5xx: internal errors or upstream outage

Idempotency (required for side effects)

Endpoints that create resources or trigger side effects MUST support an Idempotency-Key header.

Same key + same endpoint + same auth context MUST return the same response.
If the same key is reused with a meaningfully different payload, return 409.

Background jobs (Cloud Tasks) standards

Job payloads are versioned contracts

Every task payload MUST include:

version: integer (start at 1)
requestId: correlation ID (or inherit from trace context)
idempotencyKey: required when the job causes side effects
createdAt: ISO timestamp (optional)

Validate payloads at the worker boundary using schemas (Zod or @soku/schema).

Idempotency and retries (the core rule)

Cloud Tasks retries are normal. Workers MUST be safe under:

duplicate deliveries
timeouts and partial failures
worker redeploys mid-execution

Required pattern:

Persist a durable “run record” before calling external platforms.
If a run record indicates success already happened, return success without redoing the side effect.
Store platform IDs returned by upstreams (tweet/video/post IDs) and treat them as ground truth.

Trace propagation

If a request enqueues a task, it MUST propagate trace context (correlation) to the task payload and/or headers.

Queue naming and task naming

Queue names should be stable and descriptive (e.g. publish-x, publish-tiktok).
Task names should be deterministic when possible (so duplicates collapse) and include a stable identifier (submission ID / run ID).

Firestore standards

Single source of truth for status: all async workflows must update durable status documents that power UI + API status endpoints.
Prefer bounded data: avoid unbounded arrays in documents for high-churn data.
Transactions for concurrency: use transactions when multiple workers may race.
Timestamps: use serverTimestamp() for durable events; avoid mixing client timestamps for authoritative state.
Pagination: use cursor pagination at the API boundary; store indexes as needed.

Integration standards (external platform APIs)

Timeouts + retries: external calls should have explicit timeouts and retry/backoff policy (do not retry non-retryable errors).
Error mapping: map platform errors into a small set of internal error codes (auth expired, media invalid, rate limited, platform outage, unknown).
Token lifecycle: refresh tokens in a dedicated module; do not refresh inline everywhere.
Media normalization: platform-specific constraints should live near platform code; cross-platform normalization should be reusable and tested.

Logging & observability

Every HTTP request MUST have a request ID and trace context.
Every job execution MUST log:
- queue + task identifiers
- submission/run identifiers
- target platform/account
- outcome (success/failure) + error codes
- timing (duration)

Redaction rules:

NEVER log raw secrets (API keys, OAuth tokens).
Avoid logging full request bodies in production; log keys, sizes, and IDs.

Code review checklist (use in PRs)

Boundaries: HTTP code doesn’t embed business rules; platform code doesn’t drive orchestration policy.
Validation: all external inputs validated (HTTP + tasks).
Idempotency: create/charge/publish paths are safe to retry.
Status: durable status updates exist and are testable.
Errors: consistent error shape and actionable codes.
Observability: correlated logs/traces; no secrets in logs.
Tests: unit tests for business logic; integration tests for critical flows (emulators where possible).

Refactor playbook (how to “fix the codebase” safely)

When cleaning existing code, do it in this order to minimize risk:

Add/confirm contracts at boundaries (schemas for HTTP + task payloads).
Make execution idempotent (run records + dedupe keys) before changing structure.
Extract adapters (Firestore/Tasks/Storage wrappers) to reduce scattered vendor coupling.
Extract domain modules (pure logic) and write unit tests.
Reorganize folders once behavior is locked by tests.

LLM Notes

When adding new endpoints/jobs, follow the guardrails above and the repo’s existing docs:
- engineering/api-conventions-rest.md
- engineering/testing-strategy.md
- engineering/observability.md
Prefer small, composable modules with explicit inputs/outputs.
Do not introduce new architectural patterns (CQRS, event sourcing, etc.) without an ADR.

Guardrails (non‑negotiables)​

Paved path: Architecture we build toward​

Repository structure: what belongs where (today)​

HTTP API standards (integration-friendly)​

Request lifecycle (middleware order)​

Authentication standards​

Errors​

Idempotency (required for side effects)​

Background jobs (Cloud Tasks) standards​

Job payloads are versioned contracts​

Idempotency and retries (the core rule)​

Trace propagation​

Queue naming and task naming​

Firestore standards​

Integration standards (external platform APIs)​

Logging & observability​

Code review checklist (use in PRs)​

Refactor playbook (how to “fix the codebase” safely)​

LLM Notes​