Skip to content

Connectors

Connectors are Pillow’s data collection agents. They scrape property listings from third-party websites, normalise the data, and submit it to the Mill API for storage and indexing. Each connector targets a specific real estate website and region.

Every connector invocation performs two sequential phases:

  1. Discovery — scrape new listings from a third-party site and submit them to Mill.
  2. Enrichment — consume a batch of re-scrape requests from Kafka and push updated data back to Mill.

This dual-phase design means a single process keeps the property corpus both growing and accurate.

During discovery, the connector:

  1. Builds paginated search URLs for its target website and location.
  2. Scrapes each page, extracting individual listing URLs.
  3. Visits each listing page and maps the raw HTML/JSON into a standardised Property struct.
  4. Submits the batch to Mill via HTTP or Kafka.

All connectors implement the Connector interface:

type Connector interface {
GetName() string
GetSource() string
ScrapeProperties(opts ConnectorOptions) ([]Property, error)
GetStats() ConnectorStats
SetRateLimit(delay time.Duration)
HealthCheck() error
}

After discovery, if the connector also implements the Enricher interface and Kafka brokers are configured, it drains up to 50 enrichment requests from the property-enrichment Kafka topic before exiting.

type Enricher interface {
GetCountry() string
EnrichProperty(ctx context.Context, req EnrichmentRequest) (*Property, error)
}
  • GetCountry() returns an ISO 3166-1 alpha-2 code (e.g. "NZ", "AU", "MT"). Messages for other countries are skipped.
  • EnrichProperty() re-scrapes a known listing URL and returns a *Property with fresh data. Returning (nil, nil) is safe — it signals that the property is not handled by this connector.

The enrichment pipeline is an event-driven loop between Mill and the connectors, powered by Kafka (Redpanda).

Mill’s EnrichmentScheduler runs as a background goroutine every hour. Each cycle it:

  1. Queries the properties table for candidates that are incomplete or stale:
    • Priority candidates — properties with missing images, zero price, or zero bedrooms.
    • Stale candidates — properties whose updated_at is older than two days.
  2. Applies a cooloff — a property is skipped if its enrichment_queued_at timestamp is less than 24 hours ago, preventing duplicate work.
  3. Publishes one EnrichmentRequest message per candidate to the property-enrichment Kafka topic, using the property_id as the message key.
  4. Stamps enrichment_queued_at = NOW() on all published properties so they are not re-queued until the cooloff period elapses.

The enrichment consumer (connectors/common/enrichment_consumer.go) operates in bounded batch mode — it reads up to 50 messages with a short timeout (5 seconds), processes each one, then returns. It does not run as a long-lived loop; the run cadence is controlled externally (by a Kubernetes CronJob or the continuous run mode).

for processed < batchSize:
msg = reader.FetchMessage(timeout) # short deadline
if timeout: break # done for this run
if msg.Country != enricher.GetCountry():
commit(msg) # not our country, skip
continue
enriched = enricher.EnrichProperty(msg)
if enriched != nil:
submitter.SubmitProperty(enriched) # POST back to Mill
commit(msg)

Offsets are committed after each message. If EnrichProperty returns an error the message is still committed to avoid tight retry loops — Mill’s deduplication and validation layers guard against corrupt data.

Mill EnrichmentScheduler (runs every hour)
├─ SELECT candidates WHERE
│ (missing images/price/rooms OR updated_at < now()-2d)
│ AND enrichment_queued_at IS NULL OR < now()-1d
├─ PUBLISH to "property-enrichment" Kafka topic
│ key = property_id
└─ UPDATE enrichment_queued_at = now()
Connector process (runs on schedule or continuously)
├─ Phase 1: Discovery
│ ScrapeProperties(opts) → POST /connectors/properties/batch
└─ Phase 2: Enrichment (if Enricher interface is implemented)
consume "property-enrichment"
├─ filter by Country == GetCountry()
├─ EnrichProperty(sourceURL)
└─ POST /connectors/properties/single

Connectors support three run modes, controlled by CLI flags:

ModeFlagDescription
Single connector-connector <name>Run one connector’s discovery (and enrichment if applicable).
All connectors-mode discoveryRun all configured connectors sequentially in one pass.
Continuous-mode discovery-continuousRepeat discovery cycles with a configurable interval (default 5 min).
Enrichers only-mode enrichersRun only the enrichment phase for all enricher-capable connectors.
Enrichers continuous-mode enrichers-continuousRepeat enrichment cycles with a configurable interval.
Terminal window
cd connectors
# Discovery only (no Kafka needed)
go run . -connector homes-co-nz -mill-api http://localhost:4000
# Discovery + enrichment
go run . -connector homes-co-nz -mill-api http://localhost:4000 \
-kafka-brokers localhost:19092
# Dry run — scrape but don't submit
go run . -connector maltapark -mill-api http://localhost:4000 -dry-run
Terminal window
cd connectors
# Single pass through all connectors
go run . -mode discovery -mill-api http://localhost:4000 \
-kafka-brokers localhost:19092
# Continuous mode (repeats every 5 minutes)
go run . -mode discovery-continuous -mill-api http://localhost:4000 \
-kafka-brokers localhost:19092
Terminal window
cd connectors
# Single enrichment pass
go run . -mode enrichers -mill-api http://localhost:4000 \
-kafka-brokers localhost:19092
# Continuous enrichment
go run . -mode enrichers-continuous -mill-api http://localhost:4000 \
-kafka-brokers localhost:19092
ConnectorRegionSourceEnrichment
homes-co-nzNew Zealandhomes.co.nzYes
harcourts-nzNew Zealandharcourts.co.nzYes
harcourtsNew Zealandharcourts.comYes
homes-nzNew Zealandhomes.co.nzYes
realestate-nzNew Zealandrealestate.co.nzYes
realestate-auAustraliarealestate.com.auYes
harcourts-auAustraliaharcourts.com.auYes
domain-auAustraliadomain.com.auYes
homely-auAustraliahomely.com.auYes
allhomes-auAustraliaallhomes.com.auYes
property-auAustraliaproperty.com.auYes
view-auAustraliaview.com.auYes
onthehouse-auAustraliaonthehouse.com.auYes
rent-auAustraliarent.com.auYes
reiwa-auAustraliareiwa.com.auYes
realcommercial-auAustraliarealcommercial.com.auYes
commercialrealestate-auAustraliacommercialrealestate.com.auYes
hausples-pgPNGhausples.com.pgYes
marketmeri-pgPNGmarketmeri.comYes
property-pgPNGproperty.com.pgYes
property-com-pgPNGproperty.com.pgYes
property-com-fjFijiproperty.com.fjYes
housingsamoa-comSamoahousingsamoa.comYes
zillowUSAzillow.comYes
realtorUSArealtor.comYes
redfinUSAredfin.comYes
openstreetmapGlobalopenstreetmap.orgNo
realtor-caCanadarealtor.caYes
zolo-caCanadazolo.caYes
point2homes-caCanadapoint2homes.comYes
inmuebles24Mexicoinmuebles24.comYes
vivanuncios-mxMexicovivanuncios.com.mxYes
lamudi-mxMexicolamudi.com.mxYes
vivareal-brBrazilvivareal.com.brYes
zapimoveis-brBrazilzapimoveis.com.brYes
olx-brBrazilolx.com.brYes
zonaprop-arArgentinazonaprop.com.arYes
argenprop-arArgentinaargenprop.comYes
mercadolibre-arArgentinamercadolibre.com.arYes
homedyVietnamhomedy.comYes
suumo-jpJapansuumo.jpYes
99acres-inIndia99acres.comYes
zigbang-krSouth Koreazigbang.comYes
rightmove-co-ukUKrightmove.co.ukYes
maltaparkMaltamaltapark.comYes
immobilienscout24-deGermanyimmobilienscout24.deYes
seloger-frFranceseloger.comYes
immobiliare-itItalyimmobiliare.itYes
idealista-esSpainidealista.comYes
funda-nlNetherlandsfunda.nlYes
homegate-chSwitzerlandhomegate.chYes
aqar-saSaudi Arabiaaqar.fmYes

See Implemented Connectors for the complete list including health status.

See Creating a Connector for a step-by-step guide, and Discovery & Enrichment for a deep dive into implementing the Enricher interface.