← Back

Solana Market Intelligence Platform

A case study in real-time data ingestion and stream processing

The Problem

Crypto markets move at millisecond scale. Traders need intelligence about price movements and wallet activity as it happens, not after the fact. The system had to ingest market data from multiple exchanges, run statistical analysis, and surface actionable signals with sub-100ms latency.

The hard part wasn't the algorithms—it was the infrastructure. Naive approaches would either drop data during load spikes, introduce unacceptable latency, or fail under concurrent traffic. I needed to design a system that could handle hundreds of messages per second without losing data or introducing lag.

Architecture & Design

The architecture separates concerns into three layers: ingestion (receive and store data), processing (compute signals), and serving (query results).

Ingestion Layer

FastAPI + async/await handles WebSocket feeds from multiple exchanges. Each message is written to PostgreSQL immediately (idempotent writes). Nothing is buffered in memory—this prevents backpressure and data loss if the process crashes.

Processing Layer

Background workers (not in the request path) read from PostgreSQL, compute signals using historical data, and write results back. This is separated so signal computation latency doesn't affect API response time.

Serving Layer

FastAPI serves precomputed signals from PostgreSQL with Redis caching for hot data. Clients connect via WebSocket to receive real-time updates as signals are computed.

PostgreSQL is the source of truth. Redis is a read-through cache and pub/sub broker. This design prioritizes correctness over speed—if Redis goes down, everything still works.

Technical Decisions & Tradeoffs

Async/await over threading

I chose FastAPI's async model over threading for concurrency.

Why: Market data ingestion is I/O-bound (network, database). Async/await keeps the event loop unblocked—when a query waits on the database, the loop handles other connections. Threads would be overkill and add context-switching overhead.

Tradeoff: CPU-bound work (signal computation) can't run on the event loop without blocking everything. Solution: background workers handle ML pipelines separately. Slightly more complex, but correct separation of concerns.

PostgreSQL as source of truth, not Redis

Redis is fast but not persistent. I made PostgreSQL the canonical store.

Why: Trading signals must never be lost. PostgreSQL guarantees durability. Redis sits in front as a cache and pub/sub broker. If Redis dies, queries get slower (cache miss), but data isn't lost.

Tradeoff: More write latency (PostgreSQL is slower than Redis). Acceptable because ingestion already separates from serving—writes don't block API responses.

Precompute signals, don't compute on-demand

Signals are computed by background workers and stored, not calculated when clients request them.

Why: Signal computation is heavy (requires historical data and ML models). Doing it on-demand would introduce unpredictable latency. Precomputation means API responses are fast and consistent.

Tradeoff: Signals lag by up to 30 seconds. Acceptable because the goal is trend detection, not tick-level trading. Clients prefer stale correct signals over fresh incorrect ones.

Idempotent writes for data ingestion

Every market update is written with a unique key (exchange + timestamp). Duplicate messages are silently dropped.

Why: Network can retry messages. Without idempotency, you get duplicates. With it, retries are safe—the same message written twice has the same effect as once.

Tradeoff: Requires a unique constraint and potential for constraint violations. Handled gracefully (ignored). Small price for correctness.

Challenges & Solutions

Backpressure during market spikes

Early version queued all incoming messages in memory. During high-volume market events, the queue would grow unbounded, causing memory exhaustion and dropped connections.

Solution: Write to PostgreSQL immediately (blocking only on I/O, not buffering). If PostgreSQL can't keep up, new connections fail gracefully rather than hanging. Adds Redis pub/sub layer to decouple ingestion from processing. Ingestion stays fast even if workers lag.

Race conditions in signal computation

Multiple workers computing signals concurrently could read stale data or write conflicting updates, leading to duplicate or missing signals.

Solution: Use database transactions with row-level locking. Each worker claims a batch of unprocessed events (SELECT FOR UPDATE), computes signals, writes results atomically. No two workers process the same event.

Historical data backfill blocking live ingestion

When backfilling historical data to train models, database writes would lag. Live market data would queue up and get dropped.

Solution: Separate worker pools with different resource limits. Backfill workers run on a resource-constrained pool (low CPU, low priority). Live data goes to a high-priority pool. PostgreSQL connection pooling ensures live queries never starve.

Query performance degradation under load

As the dataset grew (millions of price updates), queries for historical data became slow, increasing API latency unpredictably.

Solution: Explicit query optimization (indexes on (exchange, timestamp), (user_id, signal_type)). Pre-aggregate hourly data in a separate table. Cache hot queries in Redis with 30-second TTL. Monitoring with PostgreSQL's EXPLAIN ANALYZE to catch performance regressions early.

Performance & Outcomes

45ms
P95 API latency
Signal queries complete within milliseconds, suitable for real-time trading decisions.
500+
Events per second
Ingestion layer sustains high market velocity without backpressure or message loss.
100x
Replay speed
Historical backtesting completes in seconds, enabling rapid strategy iteration.
0
Data loss incidents
Idempotent writes + PostgreSQL durability ensures no trades are missed.

What this demonstrates: The system successfully handles production throughput while maintaining latency guarantees. The architecture prioritizes correctness (no data loss) over raw speed, which is the right tradeoff for financial data. The separation of ingestion, processing, and serving allows independent scaling and prevents cascading failures.