Skip to main content

Tracing & Logging

Quick links: OrchestrationPublic API

Overview

The system implements distributed tracing and structured logging to provide end-to-end visibility across async operations. Every request generates a unique trace that flows through the API, orchestration, and platform-specific publish functions.

Implementation:

  • Logging: apps/functions/utils/logger.js
  • Trace context: Passed through function calls and Cloud Tasks

Distributed Tracing

Core Concepts

Trace: A complete request flow from API entry to all platform publishes

  • Identified by unique traceId (UUID)
  • May span multiple Cloud Functions executions
  • Crosses async boundaries via Cloud Tasks

Span: A single operation within a trace

  • Identified by unique spanId (UUID)
  • Has a parent span (except root span)
  • Tracks operation start/end and status
  • Contains operation-specific metadata

Flow: A business process that may generate multiple traces

  • Identified by flowId (often postSubmissionId)
  • Groups related operations across time
  • Useful for multi-step workflows

Trace Hierarchy

traceId: "550e8400-e29b-41d4-a716-446655440000"
├─ spanId: "root" (depth: 0)
│ └─ API Request: POST /v1/posts
│ ├─ spanId: "orchestrator-span" (depth: 1)
│ │ └─ Orchestration: Dispatch tasks
│ │ ├─ spanId: "instagram-span" (depth: 2)
│ │ │ └─ Publish to Instagram
│ │ ├─ spanId: "threads-span" (depth: 2)
│ │ │ └─ Publish to Threads
│ │ └─ spanId: "x-span" (depth: 2)
│ │ └─ Publish to X

Trace Context Propagation

1. API Layer (apps/functions/api.js)

// Generate root trace context
const traceId = uuidv4();
const spanId = 'root';
const depth = 0;

req.traceContext = { traceId, spanId, depth };

2. Orchestrator (apps/functions/orchestrators/publishing-service.js)

// Create child span for orchestration
const orchestratorSpanId = uuidv4();
const orchestratorContext = {
traceId: req.traceContext.traceId,
spanId: orchestratorSpanId,
parentSpanId: req.traceContext.spanId,
depth: req.traceContext.depth + 1
};

3. Platform Publish Tasks

// Each task receives trace context in payload
const taskPayload = {
userId,
postSubmissionId,
accountId,
traceContext: {
traceId: orchestratorContext.traceId,
spanId: uuidv4(), // New span for this platform
parentSpanId: orchestratorContext.spanId,
depth: orchestratorContext.depth + 1
}
};

Trace Context Fields

FieldTypeDescriptionExample
traceIdUUIDUnique trace identifier"550e8400-e29b-41d4-a716-446655440000"
spanIdUUIDUnique span identifier"7c9e6679-7425-40de-944b-e07fc1f90ae7"
parentSpanIdUUIDParent span (for hierarchy)"root"
depthnumberNesting level in trace tree0 (root), 1 (orchestrator), 2 (platform)
flowIdstringBusiness flow identifierpostSubmissionId

Structured Logging

Log Collection

All logs are written to the logs Firestore collection for queryability and persistence.

Collection: logs/{eventId}

Document structure:

{
// Identity
eventId: string, // Unique log entry ID (UUID)
timestamp: Timestamp, // When this log was created

// Trace context
traceId: string | null, // Distributed trace ID
spanId: string | null, // Current span ID
parentSpanId: string | null, // Parent span ID (for hierarchy)
depth: number, // Span nesting depth
flowId: string | null, // Business flow ID (e.g., postSubmissionId)

// Log metadata
level: string, // "debug" | "info" | "warn" | "error"
message: string, // Human-readable message
spanStatus: string, // "ok" | "error"

// Time-series optimization
bucketHour: string, // Format: "YYYY-MM-DD-HH" (e.g., "2025-01-15-10")

// Context
userId: string | null, // User who initiated the operation
functionName: string, // Cloud Function name

// Additional data
...metadata // Operation-specific fields (platform, accountId, etc.)
}

Log Levels

LevelUsageExample
debugDetailed troubleshooting info"Normalized video URL: https://..."
infoNormal operation events"Successfully dispatched to Instagram"
warnPotentially problematic events"Using default account, multiple exist"
errorErrors and failures"Failed to publish to X: Token expired"

Logger API

import logger from './utils/logger.js';

// Log with trace context
logger.info({
message: 'Orchestration started',
traceContext: { traceId, spanId, depth },
flowId: postSubmissionId,
userId: req.user.uid,
metadata: {
platforms: ['instagram', 'threads'],
accountIds: { instagram: 'acc_123', threads: 'acc_456' }
}
});

// Log error with span status
logger.error({
message: 'Failed to publish to Instagram',
traceContext: { traceId, spanId, depth },
flowId: postSubmissionId,
spanStatus: 'error',
error: err.message,
metadata: {
platform: 'instagram',
accountId: 'acc_123',
errorCode: 'invalid_token'
}
});

Time-Series Bucketing

To enable efficient time-series queries, all logs include a bucketHour field:

// Format: "YYYY-MM-DD-HH"
bucketHour: "2025-01-15-10"

Benefits:

  • Query logs by hour without full table scan
  • Composite index: (bucketHour ASC, timestamp ASC)
  • Enables fast time-range queries
  • Simplifies retention policies (delete old buckets)

Querying Logs

By Trace ID (Full Request Flow)

// Get all operations for a specific request
const logs = await db.collection('logs')
.where('traceId', '==', traceId)
.orderBy('timestamp', 'asc')
.get();

// Logs will include:
// - API request received
// - Orchestration started
// - Tasks dispatched
// - Each platform publish attempt

By Flow ID (Business Process)

// Get all operations for a post submission
const logs = await db.collection('logs')
.where('flowId', '==', postSubmissionId)
.orderBy('timestamp', 'asc')
.get();

// Useful for debugging a specific post across retries

By Time Range (Time-Series)

// Get all logs for a specific hour
const logs = await db.collection('logs')
.where('bucketHour', '==', '2025-01-15-10')
.orderBy('timestamp', 'asc')
.get();

// Get logs for multiple hours
const logs = await db.collection('logs')
.where('bucketHour', '>=', '2025-01-15-10')
.where('bucketHour', '<=', '2025-01-15-12')
.orderBy('bucketHour', 'asc')
.orderBy('timestamp', 'asc')
.get();

By User (User-Specific Logs)

// Get all operations initiated by a user
const logs = await db.collection('logs')
.where('userId', '==', userId)
.where('bucketHour', '>=', startBucket)
.orderBy('bucketHour', 'asc')
.orderBy('timestamp', 'asc')
.limit(100)
.get();

Failed Operations

// Find all errors in the last hour
const errors = await db.collection('logs')
.where('bucketHour', '==', currentBucket)
.where('level', '==', 'error')
.orderBy('timestamp', 'desc')
.get();

// Find failed spans
const failedSpans = await db.collection('logs')
.where('bucketHour', '==', currentBucket)
.where('spanStatus', '==', 'error')
.orderBy('timestamp', 'desc')
.get();

Composite Indexes

Required Firestore indexes for efficient log queries:

{
"collectionGroup": "logs",
"queryScope": "COLLECTION",
"fields": [
{ "fieldPath": "traceId", "order": "ASCENDING" },
{ "fieldPath": "timestamp", "order": "ASCENDING" }
]
},
{
"collectionGroup": "logs",
"queryScope": "COLLECTION",
"fields": [
{ "fieldPath": "flowId", "order": "ASCENDING" },
{ "fieldPath": "timestamp", "order": "ASCENDING" }
]
},
{
"collectionGroup": "logs",
"queryScope": "COLLECTION",
"fields": [
{ "fieldPath": "bucketHour", "order": "ASCENDING" },
{ "fieldPath": "timestamp", "order": "ASCENDING" }
]
},
{
"collectionGroup": "logs",
"queryScope": "COLLECTION",
"fields": [
{ "fieldPath": "userId", "order": "ASCENDING" },
{ "fieldPath": "bucketHour", "order": "ASCENDING" },
{ "fieldPath": "timestamp", "order": "ASCENDING" }
]
},
{
"collectionGroup": "logs",
"queryScope": "COLLECTION",
"fields": [
{ "fieldPath": "bucketHour", "order": "ASCENDING" },
{ "fieldPath": "level", "order": "ASCENDING" },
{ "fieldPath": "timestamp", "order": "DESCENDING" }
]
}

See: firestore.indexes.json

Error Logs

High-severity errors are also written to a dedicated errorLogs collection for faster querying and alerting.

Collection: errorLogs/{id}

Document structure:

{
id: string, // Auto-generated
timestamp: Timestamp, // When error occurred
level: 'error', // Always "error"
message: string, // Error message
code: string, // Error code (e.g., "invalid_token")

// Context
userId: string | null,
functionName: string,
endpoint: string | null, // API endpoint if applicable

// Trace context
traceId: string | null,
spanId: string | null,
flowId: string | null,

// Error details
stack: string | null, // Stack trace
metadata: object // Additional error context
}

Usage:

// Automatically logged by error handling middleware
// Also manually logged for critical errors
logger.error({
message: 'Failed to publish to Instagram',
code: 'instagram_api_error',
traceContext,
flowId: postSubmissionId,
error: err,
metadata: { platform: 'instagram', statusCode: 400 }
});

Observability Dashboard Queries

Request Success Rate

// Calculate success rate for last hour
const bucket = getCurrentBucket(); // e.g., "2025-01-15-10"

const total = await db.collection('logs')
.where('bucketHour', '==', bucket)
.where('functionName', '==', 'api')
.count()
.get();

const errors = await db.collection('logs')
.where('bucketHour', '==', bucket)
.where('functionName', '==', 'api')
.where('level', '==', 'error')
.count()
.get();

const successRate = ((total - errors) / total) * 100;

Platform Performance

// Average success rate per platform
const platforms = ['instagram', 'threads', 'x', 'tiktok', 'youtube', 'linkedin'];

for (const platform of platforms) {
const logs = await db.collection('logs')
.where('bucketHour', '==', bucket)
.where('metadata.platform', '==', platform)
.get();

const total = logs.size;
const failures = logs.docs.filter(doc => doc.data().level === 'error').length;
const successRate = ((total - failures) / total) * 100;

console.log(`${platform}: ${successRate.toFixed(2)}% success`);
}

Slowest Operations

// Find operations with high duration (stored in metadata)
const slowOps = await db.collection('logs')
.where('bucketHour', '==', bucket)
.where('metadata.durationMs', '>', 5000) // >5 seconds
.orderBy('metadata.durationMs', 'desc')
.limit(10)
.get();

Error Frequency by Code

// Group errors by error code
const errors = await db.collection('errorLogs')
.where('timestamp', '>=', hourAgo)
.get();

const errorCounts = {};
errors.forEach(doc => {
const code = doc.data().code || 'unknown';
errorCounts[code] = (errorCounts[code] || 0) + 1;
});

// Output: { "invalid_token": 5, "rate_limit_exceeded": 12, ... }

Log Retention

Strategy

Logs are retained based on time and importance:

CollectionRetentionReason
logs30 daysDebugging recent issues, normal operations
errorLogs90 daysLonger retention for error pattern analysis
users/{uid}/apiRequests30 daysAPI usage tracking per user

Cleanup Process

Option 1: Scheduled Cloud Function

// Run daily to delete old logs
export const cleanupLogs = onSchedule('0 2 * * *', async () => {
const cutoff = new Date();
cutoff.setDate(cutoff.getDate() - 30);

const oldLogs = await db.collection('logs')
.where('timestamp', '<', cutoff)
.limit(500)
.get();

const batch = db.batch();
oldLogs.forEach(doc => batch.delete(doc.ref));
await batch.commit();
});

Option 2: TTL Policy (Firestore TTL)

// Set TTL field on log creation
{
...logData,
expireAt: new Date(Date.now() + 30 * 24 * 60 * 60 * 1000) // 30 days
}

// Enable Firestore TTL policy for automatic deletion
// gcloud firestore fields ttls update expireAt \
// --collection-group=logs \
// --enable-ttl

Best Practices

1. Always Include Trace Context

// ✅ Good: Include trace context
logger.info({
message: 'Dispatching task',
traceContext: { traceId, spanId, depth },
flowId: postSubmissionId
});

// ❌ Bad: Missing trace context
logger.info({ message: 'Dispatching task' });

2. Use Appropriate Log Levels

// ✅ Good: Error for failures
logger.error({ message: 'API call failed', error: err });

// ❌ Bad: Info for errors
logger.info({ message: 'Something went wrong' });

3. Structure Metadata

// ✅ Good: Structured metadata
logger.info({
message: 'Post published',
metadata: {
platform: 'instagram',
accountId: 'acc_123',
postId: 'post_456'
}
});

// ❌ Bad: Unstructured string
logger.info({ message: 'Post published to instagram acc_123 as post_456' });

4. Set Span Status

// ✅ Good: Mark span as error on failure
logger.error({
message: 'Publish failed',
spanStatus: 'error',
traceContext
});

// Mark span as ok on success
logger.info({
message: 'Publish succeeded',
spanStatus: 'ok',
traceContext
});

Visualization

Trace Timeline Example

Time: 0ms ──────────────────────────────────────> 5000ms

[API Request] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
traceId: 550e8400, spanId: root, depth: 0

[Orchestrator] ━━━━━━━━━━━━━━━━━━━━━━━
traceId: 550e8400, spanId: abc123, depth: 1

[Instagram] ━━━━━━━━━━━━━
spanId: def456, depth: 2

[Threads] ━━━━━━━━━━━
spanId: ghi789, depth: 2

[X] ━━━━━━━━━━━━━━━━━━━━━
spanId: jkl012, depth: 2