Connectors
Connectors are Pillow’s data collection agents. They scrape property listings from third-party websites, normalise the data, and submit it to the Mill API for storage and indexing. Each connector targets a specific real estate website and region.
How Connectors Work
Section titled “How Connectors Work”Every connector invocation performs two sequential phases:
- Discovery — scrape new listings from a third-party site and submit them to Mill.
- Enrichment — consume a batch of re-scrape requests from Kafka and push updated data back to Mill.
This dual-phase design means a single process keeps the property corpus both growing and accurate.
Discovery Phase
Section titled “Discovery Phase”During discovery, the connector:
- Builds paginated search URLs for its target website and location.
- Scrapes each page, extracting individual listing URLs.
- Visits each listing page and maps the raw HTML/JSON into a standardised
Propertystruct. - Submits the batch to Mill via HTTP or Kafka.
All connectors implement the Connector interface:
type Connector interface { GetName() string GetSource() string ScrapeProperties(opts ConnectorOptions) ([]Property, error) GetStats() ConnectorStats SetRateLimit(delay time.Duration) HealthCheck() error}Enrichment Phase
Section titled “Enrichment Phase”After discovery, if the connector also implements the Enricher interface and Kafka brokers are configured, it drains up to 50 enrichment requests from the property-enrichment Kafka topic before exiting.
type Enricher interface { GetCountry() string EnrichProperty(ctx context.Context, req EnrichmentRequest) (*Property, error)}GetCountry()returns an ISO 3166-1 alpha-2 code (e.g."NZ","AU","MT"). Messages for other countries are skipped.EnrichProperty()re-scrapes a known listing URL and returns a*Propertywith fresh data. Returning(nil, nil)is safe — it signals that the property is not handled by this connector.
Event-Driven Enrichment Pipeline
Section titled “Event-Driven Enrichment Pipeline”The enrichment pipeline is an event-driven loop between Mill and the connectors, powered by Kafka (Redpanda).
How Mill Populates the Queue
Section titled “How Mill Populates the Queue”Mill’s EnrichmentScheduler runs as a background goroutine every hour. Each cycle it:
- Queries the
propertiestable for candidates that are incomplete or stale:- Priority candidates — properties with missing images, zero price, or zero bedrooms.
- Stale candidates — properties whose
updated_atis older than two days.
- Applies a cooloff — a property is skipped if its
enrichment_queued_attimestamp is less than 24 hours ago, preventing duplicate work. - Publishes one
EnrichmentRequestmessage per candidate to theproperty-enrichmentKafka topic, using theproperty_idas the message key. - Stamps
enrichment_queued_at = NOW()on all published properties so they are not re-queued until the cooloff period elapses.
How Connectors Consume the Queue
Section titled “How Connectors Consume the Queue”The enrichment consumer (connectors/common/enrichment_consumer.go) operates in bounded batch mode — it reads up to 50 messages with a short timeout (5 seconds), processes each one, then returns. It does not run as a long-lived loop; the run cadence is controlled externally (by a Kubernetes CronJob or the continuous run mode).
for processed < batchSize: msg = reader.FetchMessage(timeout) # short deadline if timeout: break # done for this run
if msg.Country != enricher.GetCountry(): commit(msg) # not our country, skip continue
enriched = enricher.EnrichProperty(msg) if enriched != nil: submitter.SubmitProperty(enriched) # POST back to Mill commit(msg)Offsets are committed after each message. If EnrichProperty returns an error the message is still committed to avoid tight retry loops — Mill’s deduplication and validation layers guard against corrupt data.
End-to-End Data Flow
Section titled “End-to-End Data Flow”Mill EnrichmentScheduler (runs every hour) │ ├─ SELECT candidates WHERE │ (missing images/price/rooms OR updated_at < now()-2d) │ AND enrichment_queued_at IS NULL OR < now()-1d │ ├─ PUBLISH to "property-enrichment" Kafka topic │ key = property_id │ └─ UPDATE enrichment_queued_at = now()
Connector process (runs on schedule or continuously) │ ├─ Phase 1: Discovery │ ScrapeProperties(opts) → POST /connectors/properties/batch │ └─ Phase 2: Enrichment (if Enricher interface is implemented) consume "property-enrichment" ├─ filter by Country == GetCountry() ├─ EnrichProperty(sourceURL) └─ POST /connectors/properties/singleRun Modes
Section titled “Run Modes”Connectors support three run modes, controlled by CLI flags:
| Mode | Flag | Description |
|---|---|---|
| Single connector | -connector <name> | Run one connector’s discovery (and enrichment if applicable). |
| All connectors | -mode discovery | Run all configured connectors sequentially in one pass. |
| Continuous | -mode discovery-continuous | Repeat discovery cycles with a configurable interval (default 5 min). |
| Enrichers only | -mode enrichers | Run only the enrichment phase for all enricher-capable connectors. |
| Enrichers continuous | -mode enrichers-continuous | Repeat enrichment cycles with a configurable interval. |
Running a Single Connector
Section titled “Running a Single Connector”cd connectors
# Discovery only (no Kafka needed)go run . -connector homes-co-nz -mill-api http://localhost:4000
# Discovery + enrichmentgo run . -connector homes-co-nz -mill-api http://localhost:4000 \ -kafka-brokers localhost:19092
# Dry run — scrape but don't submitgo run . -connector maltapark -mill-api http://localhost:4000 -dry-runRunning All Connectors
Section titled “Running All Connectors”cd connectors
# Single pass through all connectorsgo run . -mode discovery -mill-api http://localhost:4000 \ -kafka-brokers localhost:19092
# Continuous mode (repeats every 5 minutes)go run . -mode discovery-continuous -mill-api http://localhost:4000 \ -kafka-brokers localhost:19092Running Enrichers Only
Section titled “Running Enrichers Only”cd connectors
# Single enrichment passgo run . -mode enrichers -mill-api http://localhost:4000 \ -kafka-brokers localhost:19092
# Continuous enrichmentgo run . -mode enrichers-continuous -mill-api http://localhost:4000 \ -kafka-brokers localhost:19092Available Connectors
Section titled “Available Connectors”| Connector | Region | Source | Enrichment |
|---|---|---|---|
homes-co-nz | New Zealand | homes.co.nz | Yes |
harcourts-nz | New Zealand | harcourts.co.nz | Yes |
harcourts | New Zealand | harcourts.com | Yes |
homes-nz | New Zealand | homes.co.nz | Yes |
realestate-nz | New Zealand | realestate.co.nz | Yes |
realestate-au | Australia | realestate.com.au | Yes |
harcourts-au | Australia | harcourts.com.au | Yes |
domain-au | Australia | domain.com.au | Yes |
homely-au | Australia | homely.com.au | Yes |
allhomes-au | Australia | allhomes.com.au | Yes |
property-au | Australia | property.com.au | Yes |
view-au | Australia | view.com.au | Yes |
onthehouse-au | Australia | onthehouse.com.au | Yes |
rent-au | Australia | rent.com.au | Yes |
reiwa-au | Australia | reiwa.com.au | Yes |
realcommercial-au | Australia | realcommercial.com.au | Yes |
commercialrealestate-au | Australia | commercialrealestate.com.au | Yes |
hausples-pg | PNG | hausples.com.pg | Yes |
marketmeri-pg | PNG | marketmeri.com | Yes |
property-pg | PNG | property.com.pg | Yes |
property-com-pg | PNG | property.com.pg | Yes |
property-com-fj | Fiji | property.com.fj | Yes |
housingsamoa-com | Samoa | housingsamoa.com | Yes |
zillow | USA | zillow.com | Yes |
realtor | USA | realtor.com | Yes |
redfin | USA | redfin.com | Yes |
openstreetmap | Global | openstreetmap.org | No |
realtor-ca | Canada | realtor.ca | Yes |
zolo-ca | Canada | zolo.ca | Yes |
point2homes-ca | Canada | point2homes.com | Yes |
inmuebles24 | Mexico | inmuebles24.com | Yes |
vivanuncios-mx | Mexico | vivanuncios.com.mx | Yes |
lamudi-mx | Mexico | lamudi.com.mx | Yes |
vivareal-br | Brazil | vivareal.com.br | Yes |
zapimoveis-br | Brazil | zapimoveis.com.br | Yes |
olx-br | Brazil | olx.com.br | Yes |
zonaprop-ar | Argentina | zonaprop.com.ar | Yes |
argenprop-ar | Argentina | argenprop.com | Yes |
mercadolibre-ar | Argentina | mercadolibre.com.ar | Yes |
homedy | Vietnam | homedy.com | Yes |
suumo-jp | Japan | suumo.jp | Yes |
99acres-in | India | 99acres.com | Yes |
zigbang-kr | South Korea | zigbang.com | Yes |
rightmove-co-uk | UK | rightmove.co.uk | Yes |
maltapark | Malta | maltapark.com | Yes |
immobilienscout24-de | Germany | immobilienscout24.de | Yes |
seloger-fr | France | seloger.com | Yes |
immobiliare-it | Italy | immobiliare.it | Yes |
idealista-es | Spain | idealista.com | Yes |
funda-nl | Netherlands | funda.nl | Yes |
homegate-ch | Switzerland | homegate.ch | Yes |
aqar-sa | Saudi Arabia | aqar.fm | Yes |
See Implemented Connectors for the complete list including health status.
Creating a New Connector
Section titled “Creating a New Connector”See Creating a Connector for a step-by-step guide, and Discovery & Enrichment for a deep dive into implementing the Enricher interface.
Next Steps
Section titled “Next Steps”- Discovery & Enrichment — enrichment pipeline architecture and implementation details
- Creating a Connector — build a new source connector from scratch
- Data Access & Reliability — proxies, managed APIs, and headless rendering for improving connector reliability
- Implemented Connectors — full list and health status
- Mill API Documentation — how connectors interact with the API
- Architecture — overall system design