Skip to content

Architecture

Pillow’s architecture is designed for scalability, performance, and reliability. This page provides an overview of the system design and component interactions.

Pillow follows a microservices architecture with clear separation of concerns:

┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Pillow App │ │ Mill API │ │ Load Balancer │
│ (Next.js) │────│ (Go + Postgres)│────│ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ │ │
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Data Layer │ │ Cache Layer │ │ Messaging │
│ (PostgreSQL) │ │ (Redis) │ │ (Redpanda) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ │ │
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Connectors │ │ External │ │ Monitoring │
│ (Data Sources) │ │ APIs │ │ & Logging │
└─────────────────┘ └─────────────────┘ └─────────────────┘
  • Language: Go 1.24+
  • Database: PostgreSQL with PostGIS
  • Purpose: Core API server and business logic
  • Features:
    • RESTful API endpoints
    • Authentication & authorization (JWT/service tokens)
    • Data validation and processing
    • Rate limiting and security
    • Health monitoring
    • OpenAPI 3.0 documentation
  • Language: TypeScript/Next.js 14
  • Purpose: Modern web application for property search and exploration
  • Features:
    • Property search and filtering
    • Interactive maps and visualizations
    • Market analytics and insights
    • Responsive, mobile-optimized UI
    • Real-time data updates
  • Language: Go
  • Purpose: Automated data collection from property websites worldwide
  • Features:
    • Dual-phase runs: discovery of new listings + enrichment of stale records
    • Event-driven enrichment via Kafka (Redpanda)
    • Property data normalisation and validation
    • Batch and single property submission to Mill
    • JWT-based authentication with Mill API
    • Adaptive rate limiting with exponential backoff
  • Framework: Starlight/Astro
  • Purpose: API documentation and guides
  • Features:
    • Interactive API explorer
    • Code examples
    • Developer guides
    • Search functionality

Mill’s operational datastore is PostgreSQL (with PostGIS). At a high level:

TableWhat it containsNotes
propertiesThe canonical, deduplicated property record (wide table)For full field documentation, see Data Schema
property_imagesImage URLs keyed to a propertyproperty_images.property_id references properties.id
usersUser accounts for authenticated accessUsed to associate token ownership
api_tokensHashed API tokens (service/user tokens)api_tokens.user_id references users.id

Data Ingestion (Connectors → Mill → PostgreSQL)

Section titled “Data Ingestion (Connectors → Mill → PostgreSQL)”
Property Websites → Connectors → Mill API → Database
↓ ↓ ↓ ↓
Scrape listings Normalise Validate PostgreSQL
Extract data Rate limit Geocode + PostGIS
Batch submit Auth (JWT) Deduplicate indexes

A background feedback loop keeps existing records fresh. A single EnrichmentDispatcher reads all enrichment requests from Kafka and routes each to the correct enricher by source domain, then optionally gap-fills missing fields using other connectors in the same country:

Mill EnrichmentScheduler (hourly)
├─ Query properties with missing data or stale records
├─ Publish EnrichmentRequest to Kafka topic (includes address + country)
└─ Mark properties as queued (24h cooloff)
Connectors — EnrichmentDispatcher (single Kafka consumer)
├─ Read up to 200 messages from "property-enrichment"
├─ Route each message to the correct enricher by source domain
├─ Phase 1: Re-scrape the original listing URL (primary enrichment)
├─ Phase 2: Gap-fill missing fields via AddressSearcher connectors (same country)
└─ Submit merged result back to Mill

See Discovery & Enrichment for full details.

Client Request → Authentication → Rate Limiting → Business Logic → Response
↓ ↓ ↓ ↓ ↓
HTTP/REST JWT tokens Redis counter PostgreSQL JSON response
Service keys Per user/IP + Redis cache
  • API Layer: Stateless design allows multiple Mill instances
  • Database: PostgreSQL supports read replicas and partitioning
  • Frontend: CDN deployment with edge caching
  • Connectors: Distributed across multiple workers/containers
  • Caching Strategy: Multi-layer caching (Redis, CDN, browser)
  • Database Optimization: Proper indexing and query optimization
  • Connection Pooling: Efficient database connection management
  • Rate Limiting: Prevents abuse and ensures fair resource usage
  • Load Balancing: Traffic distribution across multiple instances
  • Health Checks: Automatic service discovery and failover
  • Data Replication: Cross-region database backups
  • Monitoring: Real-time alerting and performance tracking
  • JWT Tokens: Stateless authentication for API access
  • API Keys: Service-to-service authentication
  • OAuth Integration: Third-party authentication support
  • Role-Based Access: Granular permission system
  • Encryption: Data encrypted at rest and in transit
  • Input Validation: All user inputs validated and sanitized
  • SQL Injection Prevention: Parameterized queries and ORM usage
  • Rate Limiting: Protection against brute force attacks
  • HTTPS Only: All traffic encrypted with TLS
  • CORS Configuration: Proper cross-origin resource sharing
  • Security Headers: HSTS, CSP, and other security headers
  • API Gateway: Centralized security policy enforcement
  • API Server: Go with Gin framework
  • Database: PostgreSQL with PostGIS for spatial queries
  • Caching: Redis for session and data caching
  • Message Queue: Redpanda (Kafka-compatible) for event streaming
  • Monitoring: Prometheus + Grafana (planned)
  • Framework: Next.js with React
  • Styling: Tailwind CSS
  • State Management: Zustand
  • Maps: Mapbox GL JS
  • Charts: Chart.js / D3.js
  • Containerization: Docker + Docker Compose
  • Orchestration: Kubernetes (production)
  • CI/CD: GitHub Actions
  • Cloud Provider: AWS/GCP/Azure compatible
  • CDN: CloudFlare or AWS CloudFront
  1. Docker Compose: Single command setup
  2. Hot Reload: Automatic code reloading
  3. Local Databases: Containerized services
  4. Testing Environment: Isolated test data
  1. Code Push: Trigger automated tests
  2. Testing: Unit, integration, and E2E tests
  3. Build: Create container images
  4. Deploy: Automated deployment to staging/production
  • Logs: Structured logging with correlation IDs
  • Metrics: Performance and business metrics
  • Tracing: Distributed request tracing
  • Alerts: Automated incident response

To dive deeper into specific components: