Architecture

Pillow’s architecture is designed for scalability, performance, and reliability. This page provides an overview of the system design and component interactions.

System Overview

Pillow follows a microservices architecture with clear separation of concerns:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Pillow App    │    │   Mill API      │    │   Load Balancer │
│   (Next.js)     │────│  (Go + Postgres)│────│                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         │                       │                       │
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Data Layer    │    │   Cache Layer   │    │   Messaging     │
│  (PostgreSQL)   │    │    (Redis)      │    │   (Redpanda)    │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         │                       │                       │
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Connectors    │    │   External      │    │   Monitoring    │
│ (Data Sources)  │    │     APIs        │    │   & Logging     │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Core Components

1. The Mill (API Service)

Language: Go 1.24+
Database: PostgreSQL with PostGIS
Purpose: Core API server and business logic
Features:
- RESTful API endpoints
- Authentication & authorization (JWT/service tokens)
- Data validation and processing
- Rate limiting and security
- Health monitoring
- OpenAPI 3.0 documentation

2. Pillow App (Frontend)

Language: TypeScript/Next.js 14
Purpose: Modern web application for property search and exploration
Features:
- Property search and filtering
- Interactive maps and visualizations
- Market analytics and insights
- Responsive, mobile-optimized UI
- Real-time data updates

3. Connectors (Data Collectors)

Language: Go
Purpose: Automated data collection from property websites worldwide
Features:
- Dual-phase runs: discovery of new listings + enrichment of stale records
- Event-driven enrichment via Kafka (Redpanda)
- Property data normalisation and validation
- Batch and single property submission to Mill
- JWT-based authentication with Mill API
- Adaptive rate limiting with exponential backoff

4. Documentation Site

Framework: Starlight/Astro
Purpose: API documentation and guides
Features:
- Interactive API explorer
- Code examples
- Developer guides
- Search functionality

Database tables

Mill’s operational datastore is PostgreSQL (with PostGIS). At a high level:

Table	What it contains	Notes
`properties`	The canonical, deduplicated property record (wide table)	For full field documentation, see Data Schema
`property_images`	Image URLs keyed to a property	`property_images.property_id` references `properties.id`
`users`	User accounts for authenticated access	Used to associate token ownership
`api_tokens`	Hashed API tokens (service/user tokens)	`api_tokens.user_id` references `users.id`

Data Flow

Data Ingestion (Connectors → Mill → PostgreSQL)

Property Websites → Connectors → Mill API → Database
       ↓                ↓            ↓          ↓
  Scrape listings  Normalise     Validate   PostgreSQL
  Extract data     Rate limit    Geocode    + PostGIS
  Batch submit     Auth (JWT)    Deduplicate  indexes

Event-Driven Enrichment Loop

A background feedback loop keeps existing records fresh. A single EnrichmentDispatcher reads all enrichment requests from Kafka and routes each to the correct enricher by source domain, then optionally gap-fills missing fields using other connectors in the same country:

Mill EnrichmentScheduler (hourly)
  ├─ Query properties with missing data or stale records
  ├─ Publish EnrichmentRequest to Kafka topic (includes address + country)
  └─ Mark properties as queued (24h cooloff)

Connectors — EnrichmentDispatcher (single Kafka consumer)
  ├─ Read up to 200 messages from "property-enrichment"
  ├─ Route each message to the correct enricher by source domain
  ├─ Phase 1: Re-scrape the original listing URL (primary enrichment)
  ├─ Phase 2: Gap-fill missing fields via AddressSearcher connectors (same country)
  └─ Submit merged result back to Mill

See Discovery & Enrichment for full details.

API Request Processing

Client Request → Authentication → Rate Limiting → Business Logic → Response
      ↓              ↓               ↓              ↓              ↓
   HTTP/REST      JWT tokens     Redis counter   PostgreSQL     JSON response
                  Service keys   Per user/IP     + Redis cache

Scalability Design

Horizontal Scaling

API Layer: Stateless design allows multiple Mill instances
Database: PostgreSQL supports read replicas and partitioning
Frontend: CDN deployment with edge caching
Connectors: Distributed across multiple workers/containers

Performance Optimization

Caching Strategy: Multi-layer caching (Redis, CDN, browser)
Database Optimization: Proper indexing and query optimization
Connection Pooling: Efficient database connection management
Rate Limiting: Prevents abuse and ensures fair resource usage

High Availability

Load Balancing: Traffic distribution across multiple instances
Health Checks: Automatic service discovery and failover
Data Replication: Cross-region database backups
Monitoring: Real-time alerting and performance tracking

Security Architecture

Authentication & Authorization

JWT Tokens: Stateless authentication for API access
API Keys: Service-to-service authentication
OAuth Integration: Third-party authentication support
Role-Based Access: Granular permission system

Data Protection

Encryption: Data encrypted at rest and in transit
Input Validation: All user inputs validated and sanitized
SQL Injection Prevention: Parameterized queries and ORM usage
Rate Limiting: Protection against brute force attacks

Network Security

HTTPS Only: All traffic encrypted with TLS
CORS Configuration: Proper cross-origin resource sharing
Security Headers: HSTS, CSP, and other security headers
API Gateway: Centralized security policy enforcement

Technology Stack

Backend Services

API Server: Go with Gin framework
Database: PostgreSQL with PostGIS for spatial queries
Caching: Redis for session and data caching
Message Queue: Redpanda (Kafka-compatible) for event streaming
Monitoring: Prometheus + Grafana (planned)

Frontend Technologies

Framework: Next.js with React
Styling: Tailwind CSS
State Management: Zustand
Maps: Mapbox GL JS
Charts: Chart.js / D3.js

Infrastructure

Containerization: Docker + Docker Compose
Orchestration: Kubernetes (production)
CI/CD: GitHub Actions
Cloud Provider: AWS/GCP/Azure compatible
CDN: CloudFlare or AWS CloudFront

Development Workflow

Local Development

Docker Compose: Single command setup
Hot Reload: Automatic code reloading
Local Databases: Containerized services
Testing Environment: Isolated test data

CI/CD Pipeline

Code Push: Trigger automated tests
Testing: Unit, integration, and E2E tests
Build: Create container images
Deploy: Automated deployment to staging/production

Monitoring & Observability

Logs: Structured logging with correlation IDs
Metrics: Performance and business metrics
Tracing: Distributed request tracing
Alerts: Automated incident response

Next Steps

To dive deeper into specific components:

The Mill - Core API service details
Connectors - Data collection architecture
Views - Frontend architecture patterns
Deployment - Production deployment strategies