Architecture
Pillow’s architecture is designed for scalability, performance, and reliability. This page provides an overview of the system design and component interactions.
System Overview
Section titled “System Overview”Pillow follows a microservices architecture with clear separation of concerns:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐│ Pillow App │ │ Mill API │ │ Load Balancer ││ (Next.js) │────│ (Go + Postgres)│────│ │└─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ │ │ │ │┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐│ Data Layer │ │ Cache Layer │ │ Messaging ││ (PostgreSQL) │ │ (Redis) │ │ (Redpanda) │└─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ │ │ │ │┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐│ Connectors │ │ External │ │ Monitoring ││ (Data Sources) │ │ APIs │ │ & Logging │└─────────────────┘ └─────────────────┘ └─────────────────┘Core Components
Section titled “Core Components”1. The Mill (API Service)
Section titled “1. The Mill (API Service)”- Language: Go 1.24+
- Database: PostgreSQL with PostGIS
- Purpose: Core API server and business logic
- Features:
- RESTful API endpoints
- Authentication & authorization (JWT/service tokens)
- Data validation and processing
- Rate limiting and security
- Health monitoring
- OpenAPI 3.0 documentation
2. Pillow App (Frontend)
Section titled “2. Pillow App (Frontend)”- Language: TypeScript/Next.js 14
- Purpose: Modern web application for property search and exploration
- Features:
- Property search and filtering
- Interactive maps and visualizations
- Market analytics and insights
- Responsive, mobile-optimized UI
- Real-time data updates
3. Connectors (Data Collectors)
Section titled “3. Connectors (Data Collectors)”- Language: Go
- Purpose: Automated data collection from property websites worldwide
- Features:
- Dual-phase runs: discovery of new listings + enrichment of stale records
- Event-driven enrichment via Kafka (Redpanda)
- Property data normalisation and validation
- Batch and single property submission to Mill
- JWT-based authentication with Mill API
- Adaptive rate limiting with exponential backoff
4. Documentation Site
Section titled “4. Documentation Site”- Framework: Starlight/Astro
- Purpose: API documentation and guides
- Features:
- Interactive API explorer
- Code examples
- Developer guides
- Search functionality
Database tables
Section titled “Database tables”Mill’s operational datastore is PostgreSQL (with PostGIS). At a high level:
| Table | What it contains | Notes |
|---|---|---|
properties | The canonical, deduplicated property record (wide table) | For full field documentation, see Data Schema |
property_images | Image URLs keyed to a property | property_images.property_id references properties.id |
users | User accounts for authenticated access | Used to associate token ownership |
api_tokens | Hashed API tokens (service/user tokens) | api_tokens.user_id references users.id |
Data Flow
Section titled “Data Flow”Data Ingestion (Connectors → Mill → PostgreSQL)
Section titled “Data Ingestion (Connectors → Mill → PostgreSQL)”Property Websites → Connectors → Mill API → Database ↓ ↓ ↓ ↓ Scrape listings Normalise Validate PostgreSQL Extract data Rate limit Geocode + PostGIS Batch submit Auth (JWT) Deduplicate indexesEvent-Driven Enrichment Loop
Section titled “Event-Driven Enrichment Loop”A background feedback loop keeps existing records fresh. A single EnrichmentDispatcher reads all enrichment requests from Kafka and routes each to the correct enricher by source domain, then optionally gap-fills missing fields using other connectors in the same country:
Mill EnrichmentScheduler (hourly) ├─ Query properties with missing data or stale records ├─ Publish EnrichmentRequest to Kafka topic (includes address + country) └─ Mark properties as queued (24h cooloff)
Connectors — EnrichmentDispatcher (single Kafka consumer) ├─ Read up to 200 messages from "property-enrichment" ├─ Route each message to the correct enricher by source domain ├─ Phase 1: Re-scrape the original listing URL (primary enrichment) ├─ Phase 2: Gap-fill missing fields via AddressSearcher connectors (same country) └─ Submit merged result back to MillSee Discovery & Enrichment for full details.
API Request Processing
Section titled “API Request Processing”Client Request → Authentication → Rate Limiting → Business Logic → Response ↓ ↓ ↓ ↓ ↓ HTTP/REST JWT tokens Redis counter PostgreSQL JSON response Service keys Per user/IP + Redis cacheScalability Design
Section titled “Scalability Design”Horizontal Scaling
Section titled “Horizontal Scaling”- API Layer: Stateless design allows multiple Mill instances
- Database: PostgreSQL supports read replicas and partitioning
- Frontend: CDN deployment with edge caching
- Connectors: Distributed across multiple workers/containers
Performance Optimization
Section titled “Performance Optimization”- Caching Strategy: Multi-layer caching (Redis, CDN, browser)
- Database Optimization: Proper indexing and query optimization
- Connection Pooling: Efficient database connection management
- Rate Limiting: Prevents abuse and ensures fair resource usage
High Availability
Section titled “High Availability”- Load Balancing: Traffic distribution across multiple instances
- Health Checks: Automatic service discovery and failover
- Data Replication: Cross-region database backups
- Monitoring: Real-time alerting and performance tracking
Security Architecture
Section titled “Security Architecture”Authentication & Authorization
Section titled “Authentication & Authorization”- JWT Tokens: Stateless authentication for API access
- API Keys: Service-to-service authentication
- OAuth Integration: Third-party authentication support
- Role-Based Access: Granular permission system
Data Protection
Section titled “Data Protection”- Encryption: Data encrypted at rest and in transit
- Input Validation: All user inputs validated and sanitized
- SQL Injection Prevention: Parameterized queries and ORM usage
- Rate Limiting: Protection against brute force attacks
Network Security
Section titled “Network Security”- HTTPS Only: All traffic encrypted with TLS
- CORS Configuration: Proper cross-origin resource sharing
- Security Headers: HSTS, CSP, and other security headers
- API Gateway: Centralized security policy enforcement
Technology Stack
Section titled “Technology Stack”Backend Services
Section titled “Backend Services”- API Server: Go with Gin framework
- Database: PostgreSQL with PostGIS for spatial queries
- Caching: Redis for session and data caching
- Message Queue: Redpanda (Kafka-compatible) for event streaming
- Monitoring: Prometheus + Grafana (planned)
Frontend Technologies
Section titled “Frontend Technologies”- Framework: Next.js with React
- Styling: Tailwind CSS
- State Management: Zustand
- Maps: Mapbox GL JS
- Charts: Chart.js / D3.js
Infrastructure
Section titled “Infrastructure”- Containerization: Docker + Docker Compose
- Orchestration: Kubernetes (production)
- CI/CD: GitHub Actions
- Cloud Provider: AWS/GCP/Azure compatible
- CDN: CloudFlare or AWS CloudFront
Development Workflow
Section titled “Development Workflow”Local Development
Section titled “Local Development”- Docker Compose: Single command setup
- Hot Reload: Automatic code reloading
- Local Databases: Containerized services
- Testing Environment: Isolated test data
CI/CD Pipeline
Section titled “CI/CD Pipeline”- Code Push: Trigger automated tests
- Testing: Unit, integration, and E2E tests
- Build: Create container images
- Deploy: Automated deployment to staging/production
Monitoring & Observability
Section titled “Monitoring & Observability”- Logs: Structured logging with correlation IDs
- Metrics: Performance and business metrics
- Tracing: Distributed request tracing
- Alerts: Automated incident response
Next Steps
Section titled “Next Steps”To dive deeper into specific components:
- The Mill - Core API service details
- Connectors - Data collection architecture
- Views - Frontend architecture patterns
- Deployment - Production deployment strategies