Tracing & Logging
Quick links: Orchestration • Public API
Overview
The system implements distributed tracing and structured logging to provide end-to-end visibility across async operations. Every request generates a unique trace that flows through the API, orchestration, and platform-specific publish functions.
Implementation:
- Logging:
apps/functions/utils/logger.js - Trace context: Passed through function calls and Cloud Tasks
Distributed Tracing
Core Concepts
Trace: A complete request flow from API entry to all platform publishes
- Identified by unique
traceId(UUID) - May span multiple Cloud Functions executions
- Crosses async boundaries via Cloud Tasks
Span: A single operation within a trace
- Identified by unique
spanId(UUID) - Has a parent span (except root span)
- Tracks operation start/end and status
- Contains operation-specific metadata
Flow: A business process that may generate multiple traces
- Identified by
flowId(oftenpostSubmissionId) - Groups related operations across time
- Useful for multi-step workflows
Trace Hierarchy
traceId: "550e8400-e29b-41d4-a716-446655440000"
├─ spanId: "root" (depth: 0)
│ └─ API Request: POST /v1/posts
│ ├─ spanId: "orchestrator-span" (depth: 1)
│ │ └─ Orchestration: Dispatch tasks
│ │ ├─ spanId: "instagram-span" (depth: 2)
│ │ │ └─ Publish to Instagram
│ │ ├─ spanId: "threads-span" (depth: 2)
│ │ │ └─ Publish to Threads
│ │ └─ spanId: "x-span" (depth: 2)
│ │ └─ Publish to X
Trace Context Propagation
1. API Layer (apps/functions/api.js)
// Generate root trace context
const traceId = uuidv4();
const spanId = 'root';
const depth = 0;
req.traceContext = { traceId, spanId, depth };
2. Orchestrator (apps/functions/orchestrators/publishing-service.js)
// Create child span for orchestration
const orchestratorSpanId = uuidv4();
const orchestratorContext = {
traceId: req.traceContext.traceId,
spanId: orchestratorSpanId,
parentSpanId: req.traceContext.spanId,
depth: req.traceContext.depth + 1
};
3. Platform Publish Tasks
// Each task receives trace context in payload
const taskPayload = {
userId,
postSubmissionId,
accountId,
traceContext: {
traceId: orchestratorContext.traceId,
spanId: uuidv4(), // New span for this platform
parentSpanId: orchestratorContext.spanId,
depth: orchestratorContext.depth + 1
}
};
Trace Context Fields
| Field | Type | Description | Example |
|---|---|---|---|
traceId | UUID | Unique trace identifier | "550e8400-e29b-41d4-a716-446655440000" |
spanId | UUID | Unique span identifier | "7c9e6679-7425-40de-944b-e07fc1f90ae7" |
parentSpanId | UUID | Parent span (for hierarchy) | "root" |
depth | number | Nesting level in trace tree | 0 (root), 1 (orchestrator), 2 (platform) |
flowId | string | Business flow identifier | postSubmissionId |
Structured Logging
Log Collection
All logs are written to the logs Firestore collection for queryability and persistence.
Collection: logs/{eventId}
Document structure:
{
// Identity
eventId: string, // Unique log entry ID (UUID)
timestamp: Timestamp, // When this log was created
// Trace context
traceId: string | null, // Distributed trace ID
spanId: string | null, // Current span ID
parentSpanId: string | null, // Parent span ID (for hierarchy)
depth: number, // Span nesting depth
flowId: string | null, // Business flow ID (e.g., postSubmissionId)
// Log metadata
level: string, // "debug" | "info" | "warn" | "error"
message: string, // Human-readable message
spanStatus: string, // "ok" | "error"
// Time-series optimization
bucketHour: string, // Format: "YYYY-MM-DD-HH" (e.g., "2025-01-15-10")
// Context
userId: string | null, // User who initiated the operation
functionName: string, // Cloud Function name
// Additional data
...metadata // Operation-specific fields (platform, accountId, etc.)
}
Log Levels
| Level | Usage | Example |
|---|---|---|
debug | Detailed troubleshooting info | "Normalized video URL: https://..." |
info | Normal operation events | "Successfully dispatched to Instagram" |
warn | Potentially problematic events | "Using default account, multiple exist" |
error | Errors and failures | "Failed to publish to X: Token expired" |
Logger API
import logger from './utils/logger.js';
// Log with trace context
logger.info({
message: 'Orchestration started',
traceContext: { traceId, spanId, depth },
flowId: postSubmissionId,
userId: req.user.uid,
metadata: {
platforms: ['instagram', 'threads'],
accountIds: { instagram: 'acc_123', threads: 'acc_456' }
}
});
// Log error with span status
logger.error({
message: 'Failed to publish to Instagram',
traceContext: { traceId, spanId, depth },
flowId: postSubmissionId,
spanStatus: 'error',
error: err.message,
metadata: {
platform: 'instagram',
accountId: 'acc_123',
errorCode: 'invalid_token'
}
});
Time-Series Bucketing
To enable efficient time-series queries, all logs include a bucketHour field:
// Format: "YYYY-MM-DD-HH"
bucketHour: "2025-01-15-10"
Benefits:
- Query logs by hour without full table scan
- Composite index:
(bucketHour ASC, timestamp ASC) - Enables fast time-range queries
- Simplifies retention policies (delete old buckets)
Querying Logs
By Trace ID (Full Request Flow)
// Get all operations for a specific request
const logs = await db.collection('logs')
.where('traceId', '==', traceId)
.orderBy('timestamp', 'asc')
.get();
// Logs will include:
// - API request received
// - Orchestration started
// - Tasks dispatched
// - Each platform publish attempt
By Flow ID (Business Process)
// Get all operations for a post submission
const logs = await db.collection('logs')
.where('flowId', '==', postSubmissionId)
.orderBy('timestamp', 'asc')
.get();
// Useful for debugging a specific post across retries
By Time Range (Time-Series)
// Get all logs for a specific hour
const logs = await db.collection('logs')
.where('bucketHour', '==', '2025-01-15-10')
.orderBy('timestamp', 'asc')
.get();
// Get logs for multiple hours
const logs = await db.collection('logs')
.where('bucketHour', '>=', '2025-01-15-10')
.where('bucketHour', '<=', '2025-01-15-12')
.orderBy('bucketHour', 'asc')
.orderBy('timestamp', 'asc')
.get();
By User (User-Specific Logs)
// Get all operations initiated by a user
const logs = await db.collection('logs')
.where('userId', '==', userId)
.where('bucketHour', '>=', startBucket)
.orderBy('bucketHour', 'asc')
.orderBy('timestamp', 'asc')
.limit(100)
.get();
Failed Operations
// Find all errors in the last hour
const errors = await db.collection('logs')
.where('bucketHour', '==', currentBucket)
.where('level', '==', 'error')
.orderBy('timestamp', 'desc')
.get();
// Find failed spans
const failedSpans = await db.collection('logs')
.where('bucketHour', '==', currentBucket)
.where('spanStatus', '==', 'error')
.orderBy('timestamp', 'desc')
.get();
Composite Indexes
Required Firestore indexes for efficient log queries:
{
"collectionGroup": "logs",
"queryScope": "COLLECTION",
"fields": [
{ "fieldPath": "traceId", "order": "ASCENDING" },
{ "fieldPath": "timestamp", "order": "ASCENDING" }
]
},
{
"collectionGroup": "logs",
"queryScope": "COLLECTION",
"fields": [
{ "fieldPath": "flowId", "order": "ASCENDING" },
{ "fieldPath": "timestamp", "order": "ASCENDING" }
]
},
{
"collectionGroup": "logs",
"queryScope": "COLLECTION",
"fields": [
{ "fieldPath": "bucketHour", "order": "ASCENDING" },
{ "fieldPath": "timestamp", "order": "ASCENDING" }
]
},
{
"collectionGroup": "logs",
"queryScope": "COLLECTION",
"fields": [
{ "fieldPath": "userId", "order": "ASCENDING" },
{ "fieldPath": "bucketHour", "order": "ASCENDING" },
{ "fieldPath": "timestamp", "order": "ASCENDING" }
]
},
{
"collectionGroup": "logs",
"queryScope": "COLLECTION",
"fields": [
{ "fieldPath": "bucketHour", "order": "ASCENDING" },
{ "fieldPath": "level", "order": "ASCENDING" },
{ "fieldPath": "timestamp", "order": "DESCENDING" }
]
}
See: firestore.indexes.json
Error Logs
High-severity errors are also written to a dedicated errorLogs collection for faster querying and alerting.
Collection: errorLogs/{id}
Document structure:
{
id: string, // Auto-generated
timestamp: Timestamp, // When error occurred
level: 'error', // Always "error"
message: string, // Error message
code: string, // Error code (e.g., "invalid_token")
// Context
userId: string | null,
functionName: string,
endpoint: string | null, // API endpoint if applicable
// Trace context
traceId: string | null,
spanId: string | null,
flowId: string | null,
// Error details
stack: string | null, // Stack trace
metadata: object // Additional error context
}
Usage:
// Automatically logged by error handling middleware
// Also manually logged for critical errors
logger.error({
message: 'Failed to publish to Instagram',
code: 'instagram_api_error',
traceContext,
flowId: postSubmissionId,
error: err,
metadata: { platform: 'instagram', statusCode: 400 }
});
Observability Dashboard Queries
Request Success Rate
// Calculate success rate for last hour
const bucket = getCurrentBucket(); // e.g., "2025-01-15-10"
const total = await db.collection('logs')
.where('bucketHour', '==', bucket)
.where('functionName', '==', 'api')
.count()
.get();
const errors = await db.collection('logs')
.where('bucketHour', '==', bucket)
.where('functionName', '==', 'api')
.where('level', '==', 'error')
.count()
.get();
const successRate = ((total - errors) / total) * 100;
Platform Performance
// Average success rate per platform
const platforms = ['instagram', 'threads', 'x', 'tiktok', 'youtube', 'linkedin'];
for (const platform of platforms) {
const logs = await db.collection('logs')
.where('bucketHour', '==', bucket)
.where('metadata.platform', '==', platform)
.get();
const total = logs.size;
const failures = logs.docs.filter(doc => doc.data().level === 'error').length;
const successRate = ((total - failures) / total) * 100;
console.log(`${platform}: ${successRate.toFixed(2)}% success`);
}
Slowest Operations
// Find operations with high duration (stored in metadata)
const slowOps = await db.collection('logs')
.where('bucketHour', '==', bucket)
.where('metadata.durationMs', '>', 5000) // >5 seconds
.orderBy('metadata.durationMs', 'desc')
.limit(10)
.get();
Error Frequency by Code
// Group errors by error code
const errors = await db.collection('errorLogs')
.where('timestamp', '>=', hourAgo)
.get();
const errorCounts = {};
errors.forEach(doc => {
const code = doc.data().code || 'unknown';
errorCounts[code] = (errorCounts[code] || 0) + 1;
});
// Output: { "invalid_token": 5, "rate_limit_exceeded": 12, ... }
Log Retention
Strategy
Logs are retained based on time and importance:
| Collection | Retention | Reason |
|---|---|---|
logs | 30 days | Debugging recent issues, normal operations |
errorLogs | 90 days | Longer retention for error pattern analysis |
users/{uid}/apiRequests | 30 days | API usage tracking per user |
Cleanup Process
Option 1: Scheduled Cloud Function
// Run daily to delete old logs
export const cleanupLogs = onSchedule('0 2 * * *', async () => {
const cutoff = new Date();
cutoff.setDate(cutoff.getDate() - 30);
const oldLogs = await db.collection('logs')
.where('timestamp', '<', cutoff)
.limit(500)
.get();
const batch = db.batch();
oldLogs.forEach(doc => batch.delete(doc.ref));
await batch.commit();
});
Option 2: TTL Policy (Firestore TTL)
// Set TTL field on log creation
{
...logData,
expireAt: new Date(Date.now() + 30 * 24 * 60 * 60 * 1000) // 30 days
}
// Enable Firestore TTL policy for automatic deletion
// gcloud firestore fields ttls update expireAt \
// --collection-group=logs \
// --enable-ttl
Best Practices
1. Always Include Trace Context
// ✅ Good: Include trace context
logger.info({
message: 'Dispatching task',
traceContext: { traceId, spanId, depth },
flowId: postSubmissionId
});
// ❌ Bad: Missing trace context
logger.info({ message: 'Dispatching task' });
2. Use Appropriate Log Levels
// ✅ Good: Error for failures
logger.error({ message: 'API call failed', error: err });
// ❌ Bad: Info for errors
logger.info({ message: 'Something went wrong' });
3. Structure Metadata
// ✅ Good: Structured metadata
logger.info({
message: 'Post published',
metadata: {
platform: 'instagram',
accountId: 'acc_123',
postId: 'post_456'
}
});
// ❌ Bad: Unstructured string
logger.info({ message: 'Post published to instagram acc_123 as post_456' });
4. Set Span Status
// ✅ Good: Mark span as error on failure
logger.error({
message: 'Publish failed',
spanStatus: 'error',
traceContext
});
// Mark span as ok on success
logger.info({
message: 'Publish succeeded',
spanStatus: 'ok',
traceContext
});
Visualization
Trace Timeline Example
Time: 0ms ──────────────────────────────────────> 5000ms
[API Request] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
traceId: 550e8400, spanId: root, depth: 0
[Orchestrator] ━━━━━━━━━━━━━━━━━━━━━━━
traceId: 550e8400, spanId: abc123, depth: 1
[Instagram] ━━━━━━━━━━━━━
spanId: def456, depth: 2
[Threads] ━━━━━━━━━━━
spanId: ghi789, depth: 2
[X] ━━━━━━━━━━━━━━━━━━━━━
spanId: jkl012, depth: 2
Related Documentation
- Orchestration - How tracing flows through orchestration
- Public API - API endpoints that generate traces
- Rate Limiting - Rate limit tracking logs