Backend Conventions (Functions + Jobs)
This document defines how we write backend code in apps/functions so it is consistent, scalable, and “big-tech boring”: predictable contracts, safe retries, and clear module boundaries.
It is written to be actionable by engineers doing refactors.
Guardrails (non‑negotiables)
- Public API is a contract:
/v1/...endpoints are stable. Breaking changes require versioning + deprecation policy. - Everything is retryable: any request or job may run more than once. Handlers MUST be idempotent.
- Async-first architecture: user-facing endpoints should return quickly; long work is done by background jobs/workers.
- Validate at boundaries: validate and normalize inputs at HTTP/job boundaries using schemas.
- Observability by default: every request/job MUST emit correlated logs/traces and write durable status state.
- Vendor coupling is allowed, but contained: GCP SDK usage should live in small adapters, not scattered through domain logic.
Paved path: Architecture we build toward
Public HTTP API
- Express API (e.g.
apps/functions/api.js) exposes/v1/...endpoints. - Endpoints should be thin: auth → validation → create resource → enqueue work → return ID.
Async engine
- Cloud Tasks queues drive orchestration and platform publishing workers.
- Workers update durable state (Firestore) so clients can poll status and/or receive webhooks.
Layering (code boundaries)
http/: request/response shaping only (routers, handlers, status codes).domain/(target state): pure business logic (publishing rules, target resolution policy, mapping of states), no SDK calls.adapters/gcp/(target state): Firestore/Tasks/Storage wrappers; this is where vendor coupling lives.integrations/: platform-specific API clients + translation (TikTok/X/YouTube/etc).services/: application workflows (glue): calls domain + adapters + integrations.
Note: today you have a large
shared/folder. That’s OK. Refactors should move code toward the boundaries above.
Repository structure: what belongs where (today)
apps/functions/api.js- Express app composition: middleware order, route mounting, 404, error handler.
apps/functions/http/routes/*- Route definitions + middleware chain.
- Should call into services; avoid embedding business logic.
apps/functions/http/handlers/*- Handler logic that is still HTTP-shaped (req/res). Keep small; delegate to services.
apps/functions/middleware/*- Cross-cutting HTTP concerns: auth, validation, rate limiting, error handling, request IDs.
apps/functions/orchestrators/*- Job/workflow orchestration. Should be “workflow glue,” not platform implementation details.
apps/functions/services/*- Application-level workflows (media acquisition, AI transcription, template rendering, etc.).
apps/functions/integrations/*- Platform-specific code: OAuth, token refresh, publish calls, polling, metrics.
apps/functions/shared/*- Shared primitives and utilities. Refactor goal is to split into
domain/+adapters/over time.
- Shared primitives and utilities. Refactor goal is to split into
HTTP API standards (integration-friendly)
Request lifecycle (middleware order)
Use a consistent chain for all routes:
- Request ID / trace context
- Auth
- Subscription / authorization
- Rate limiting
- Schema validation
- Handler
- Error handler (last)
Authentication standards
- Public API: API key auth for server-to-server integrations (Zapier/Make/n8n) is allowed.
- Dashboard/API key management: Firebase ID token auth is allowed.
All auth middleware MUST set a single consistent auth context on the request (example: req.auth.userId).
Errors
Follow the project’s REST conventions doc (engineering/api-conventions-rest.md): stable error shape, consistent status codes.
Rules of thumb:
- 400: validation (client can fix payload)
- 401/403: auth/permission
- 404: missing resource
- 409: conflict/idempotency collision
- 429: rate limit exceeded
- 5xx: internal errors or upstream outage
Idempotency (required for side effects)
Endpoints that create resources or trigger side effects MUST support an Idempotency-Key header.
- Same key + same endpoint + same auth context MUST return the same response.
- If the same key is reused with a meaningfully different payload, return 409.
Background jobs (Cloud Tasks) standards
Job payloads are versioned contracts
Every task payload MUST include:
version: integer (start at1)requestId: correlation ID (or inherit from trace context)idempotencyKey: required when the job causes side effectscreatedAt: ISO timestamp (optional)
Validate payloads at the worker boundary using schemas (Zod or @soku/schema).
Idempotency and retries (the core rule)
Cloud Tasks retries are normal. Workers MUST be safe under:
- duplicate deliveries
- timeouts and partial failures
- worker redeploys mid-execution
Required pattern:
- Persist a durable “run record” before calling external platforms.
- If a run record indicates success already happened, return success without redoing the side effect.
- Store platform IDs returned by upstreams (tweet/video/post IDs) and treat them as ground truth.
Trace propagation
If a request enqueues a task, it MUST propagate trace context (correlation) to the task payload and/or headers.
Queue naming and task naming
- Queue names should be stable and descriptive (e.g.
publish-x,publish-tiktok). - Task names should be deterministic when possible (so duplicates collapse) and include a stable identifier (submission ID / run ID).
Firestore standards
- Single source of truth for status: all async workflows must update durable status documents that power UI + API status endpoints.
- Prefer bounded data: avoid unbounded arrays in documents for high-churn data.
- Transactions for concurrency: use transactions when multiple workers may race.
- Timestamps: use
serverTimestamp()for durable events; avoid mixing client timestamps for authoritative state. - Pagination: use cursor pagination at the API boundary; store indexes as needed.
Integration standards (external platform APIs)
- Timeouts + retries: external calls should have explicit timeouts and retry/backoff policy (do not retry non-retryable errors).
- Error mapping: map platform errors into a small set of internal error codes (auth expired, media invalid, rate limited, platform outage, unknown).
- Token lifecycle: refresh tokens in a dedicated module; do not refresh inline everywhere.
- Media normalization: platform-specific constraints should live near platform code; cross-platform normalization should be reusable and tested.
Logging & observability
- Every HTTP request MUST have a request ID and trace context.
- Every job execution MUST log:
- queue + task identifiers
- submission/run identifiers
- target platform/account
- outcome (success/failure) + error codes
- timing (duration)
Redaction rules:
- NEVER log raw secrets (API keys, OAuth tokens).
- Avoid logging full request bodies in production; log keys, sizes, and IDs.
Code review checklist (use in PRs)
- Boundaries: HTTP code doesn’t embed business rules; platform code doesn’t drive orchestration policy.
- Validation: all external inputs validated (HTTP + tasks).
- Idempotency: create/charge/publish paths are safe to retry.
- Status: durable status updates exist and are testable.
- Errors: consistent error shape and actionable codes.
- Observability: correlated logs/traces; no secrets in logs.
- Tests: unit tests for business logic; integration tests for critical flows (emulators where possible).
Refactor playbook (how to “fix the codebase” safely)
When cleaning existing code, do it in this order to minimize risk:
- Add/confirm contracts at boundaries (schemas for HTTP + task payloads).
- Make execution idempotent (run records + dedupe keys) before changing structure.
- Extract adapters (Firestore/Tasks/Storage wrappers) to reduce scattered vendor coupling.
- Extract domain modules (pure logic) and write unit tests.
- Reorganize folders once behavior is locked by tests.
LLM Notes
- When adding new endpoints/jobs, follow the guardrails above and the repo’s existing docs:
engineering/api-conventions-rest.mdengineering/testing-strategy.mdengineering/observability.md
- Prefer small, composable modules with explicit inputs/outputs.
- Do not introduce new architectural patterns (CQRS, event sourcing, etc.) without an ADR.