Skip to content

Harvester Health Report

This page provides real-time health status for all implemented harvesters. Health checks verify that harvesters can successfully access their target websites.

  • Total Harvesters: 30
  • Healthy: 18
  • Unhealthy: 12
StatusCountPercentage
✅ Healthy1860%
❌ Unhealthy1240%
HarvesterStatusError
argenprop-ar✅ HEALTHY-
harcourts✅ HEALTHY-
harcourts-au✅ HEALTHY-
harcourts-nz✅ HEALTHY-
hausples-pg✅ HEALTHY-
homedy❌ UNHEALTHYHealth check failed for homedy: failed to access https://homedy.com: Forbidden
homes-co-nz✅ HEALTHY-
homes-nz✅ HEALTHY-
housingsamoa-com❌ UNHEALTHYHealth check failed for housingsamoa-com: failed to access https://housingsamoa.com: Get “https://housingsamoa.com”: dial tcp: lookup housingsamoa.com on 1.1.1.1:53: no such host
inmuebles24❌ UNHEALTHYHealth check failed for inmuebles24: Forbidden
lamudi-mx✅ HEALTHY-
marketmeri-pg✅ HEALTHY-
mercadolibre-ar✅ HEALTHY-
olx-br❌ UNHEALTHYHealth check failed for olx-br: Forbidden
openstreetmap✅ HEALTHY-
point2homes-ca✅ HEALTHY-
property-com-fj✅ HEALTHY-
property-com-pg❌ UNHEALTHYHealth check failed for property-com-pg: failed to access https://property.com.pg: Get “https://property.com.pg”: dial tcp: lookup property.com.pg on 1.1.1.1:53: no such host
property-pg❌ UNHEALTHYHealth check failed for property-pg: failed to access https://property.com.pg/: Get “https://property.com.pg/”: dial tcp: lookup property.com.pg on 1.1.1.1:53: no such host
realestate-au❌ UNHEALTHYHealth check failed for realestate-au: failed to access https://www.realestate.com.au: Too Many Requests
realestate-nz✅ HEALTHY-
realtor✅ HEALTHY-
realtor-ca✅ HEALTHY-
redfin✅ HEALTHY-
vivanuncios-mx❌ UNHEALTHYHealth check failed for vivanuncios-mx: Forbidden
vivareal-br❌ UNHEALTHYHealth check failed for vivareal-br: Forbidden
zapimoveis-br❌ UNHEALTHYHealth check failed for zapimoveis-br: Forbidden
zillow✅ HEALTHY-
zolo-ca❌ UNHEALTHYHealth check failed for zolo-ca: failed to access zolo.ca: Forbidden
zonaprop-ar❌ UNHEALTHYHealth check failed for zonaprop-ar: Forbidden

Status: ❌ UNHEALTHY
Error: failed to access https://homedy.com: Forbidden
Issue: Website returns 403 Forbidden, likely due to anti-bot protection
Possible Solutions:

  • Increase delay between requests
  • Rotate user agents
  • Use proxy rotation
  • Check if official API is available

Status: ❌ UNHEALTHY
Error: dial tcp: lookup housingsamoa.com on 1.1.1.1:53: no such host
Issue: DNS lookup failed - domain may not exist or is no longer registered
Possible Solutions:

  • Verify domain is still active
  • Check if domain has changed
  • Remove harvester if site is permanently down

Status: ❌ UNHEALTHY
Error: Forbidden
Issue: Website returns 403 Forbidden, likely due to anti-bot protection
Possible Solutions:

  • Implement more sophisticated headers
  • Use browser automation if necessary
  • Check for API access

Status: ❌ UNHEALTHY
Error: Forbidden
Issue: Website returns 403 Forbidden, likely due to anti-bot protection
Possible Solutions:

  • Implement CAPTCHA handling
  • Use proxy rotation
  • Increase delays significantly

Status: ❌ UNHEALTHY
Error: dial tcp: lookup property.com.pg on 1.1.1.1:53: no such host
Issue: DNS lookup failed - domain may not exist or is inaccessible
Possible Solutions:

  • Verify domain accessibility
  • Check network connectivity
  • Remove if site is permanently down

Status: ❌ UNHEALTHY
Error: dial tcp: lookup property.com.pg on 1.1.1.1:53: no such host
Issue: DNS lookup failed - same domain as property-com-pg
Possible Solutions:

  • Verify domain accessibility
  • Consider consolidating with property-com-pg

Status: ❌ UNHEALTHY
Error: Too Many Requests
Issue: Website returns 429 Too Many Requests - rate limiting
Possible Solutions:

  • Significantly increase delay between requests
  • Implement exponential backoff
  • Reduce concurrent requests
  • Consider API access if available

Status: ❌ UNHEALTHY
Error: Forbidden
Issue: Website returns 403 Forbidden, likely due to anti-bot protection
Possible Solutions:

  • Implement more sophisticated anti-bot evasion
  • Use browser automation
  • Check for API access

Status: ❌ UNHEALTHY
Error: Forbidden
Issue: Website returns 403 Forbidden, likely due to anti-bot protection
Possible Solutions:

  • Implement CAPTCHA handling
  • Use proxy rotation
  • Check for official API

Status: ❌ UNHEALTHY
Error: Forbidden
Issue: Website returns 403 Forbidden, likely due to anti-bot protection
Possible Solutions:

  • Implement sophisticated anti-bot evasion
  • Use browser automation
  • Check for API access

Status: ❌ UNHEALTHY
Error: failed to access zolo.ca: Forbidden
Issue: Website returns 403 Forbidden, likely due to anti-bot protection
Possible Solutions:

  • Increase delay between requests
  • Rotate user agents
  • Use proxy rotation

Status: ❌ UNHEALTHY
Error: Forbidden
Issue: Website returns 403 Forbidden, likely due to anti-bot protection
Possible Solutions:

  • Implement more sophisticated headers
  • Use browser automation if necessary
  • Check for API access

Cause: Anti-bot protection blocking requests
Solutions:

  • Implement realistic browser headers
  • Rotate user agents
  • Use proxy rotation
  • Increase delays between requests
  • Consider browser automation

Cause: Rate limiting - too many requests too quickly
Solutions:

  • Increase delay between requests significantly
  • Implement exponential backoff
  • Reduce concurrent requests
  • Respect rate limit headers

Cause: Domain doesn’t exist or is inaccessible
Solutions:

  • Verify domain is still active
  • Check network connectivity
  • Verify DNS configuration
  • Remove harvester if site is permanently down

Cause: Network issues or site is down
Solutions:

  • Check network connectivity
  • Verify site is accessible
  • Implement retry logic with backoff
  • Check firewall settings
Terminal window
cd harvesters
./scripts/check_harvester_health.sh

This script will:

  1. Iterate through all registered harvesters
  2. Run health check for each
  3. Report status and errors
  4. Generate a summary
Terminal window
cd harvesters
./bin/harvester -harvester <name> -health-check

Example:

Terminal window
./bin/harvester -harvester homes-co-nz -health-check

You can also check harvester health programmatically:

harvester := homes_co_nz.NewHarvester()
err := harvester.HealthCheck()
if err != nil {
log.Printf("Harvester unhealthy: %v", err)
}

Each harvester implements a HealthCheck() method that:

  1. Attempts to access the target website
  2. Verifies basic connectivity
  3. Returns an error if access fails
  4. Returns nil if healthy

Example implementation:

func (h *Harvester) HealthCheck() error {
testURL := "https://www.example.com"
err := h.collector.Visit(testURL)
if err != nil {
return fmt.Errorf("failed to access %s: %w", testURL, err)
}
return nil
}
  • Run health checks daily or weekly
  • Monitor error trends
  • Track recovery from temporary issues
  • Document persistent failures

Set up alerts for:

  • Harvesters that become unhealthy
  • Harvesters that remain unhealthy for extended periods
  • Sudden increases in error rates
  • Update selectors when websites change structure
  • Adjust rate limits based on site responses
  • Remove harvesters for permanently unavailable sites
  • Update anti-bot evasion techniques as needed

If you’d like to help fix unhealthy harvesters:

  1. Identify the Issue: Review the error message and harvester code
  2. Test Locally: Run health checks and scraping tests
  3. Implement Fix: Update the harvester with fixes
  4. Verify: Run health checks to confirm fix
  5. Document: Update this page with resolution details

See Creating a New Harvester for development guidelines.