Harvester Health Report
This page provides real-time health status for all implemented harvesters. Health checks verify that harvesters can successfully access their target websites.
Summary
Section titled “Summary”- Total Harvesters: 30
- Healthy: 18
- Unhealthy: 12
Health Status Overview
Section titled “Health Status Overview”| Status | Count | Percentage |
|---|---|---|
| ✅ Healthy | 18 | 60% |
| ❌ Unhealthy | 12 | 40% |
Detailed Status
Section titled “Detailed Status”| Harvester | Status | Error |
|---|---|---|
| argenprop-ar | ✅ HEALTHY | - |
| harcourts | ✅ HEALTHY | - |
| harcourts-au | ✅ HEALTHY | - |
| harcourts-nz | ✅ HEALTHY | - |
| hausples-pg | ✅ HEALTHY | - |
| homedy | ❌ UNHEALTHY | Health check failed for homedy: failed to access https://homedy.com: Forbidden |
| homes-co-nz | ✅ HEALTHY | - |
| homes-nz | ✅ HEALTHY | - |
| housingsamoa-com | ❌ UNHEALTHY | Health check failed for housingsamoa-com: failed to access https://housingsamoa.com: Get “https://housingsamoa.com”: dial tcp: lookup housingsamoa.com on 1.1.1.1:53: no such host |
| inmuebles24 | ❌ UNHEALTHY | Health check failed for inmuebles24: Forbidden |
| lamudi-mx | ✅ HEALTHY | - |
| marketmeri-pg | ✅ HEALTHY | - |
| mercadolibre-ar | ✅ HEALTHY | - |
| olx-br | ❌ UNHEALTHY | Health check failed for olx-br: Forbidden |
| openstreetmap | ✅ HEALTHY | - |
| point2homes-ca | ✅ HEALTHY | - |
| property-com-fj | ✅ HEALTHY | - |
| property-com-pg | ❌ UNHEALTHY | Health check failed for property-com-pg: failed to access https://property.com.pg: Get “https://property.com.pg”: dial tcp: lookup property.com.pg on 1.1.1.1:53: no such host |
| property-pg | ❌ UNHEALTHY | Health check failed for property-pg: failed to access https://property.com.pg/: Get “https://property.com.pg/”: dial tcp: lookup property.com.pg on 1.1.1.1:53: no such host |
| realestate-au | ❌ UNHEALTHY | Health check failed for realestate-au: failed to access https://www.realestate.com.au: Too Many Requests |
| realestate-nz | ✅ HEALTHY | - |
| realtor | ✅ HEALTHY | - |
| realtor-ca | ✅ HEALTHY | - |
| redfin | ✅ HEALTHY | - |
| vivanuncios-mx | ❌ UNHEALTHY | Health check failed for vivanuncios-mx: Forbidden |
| vivareal-br | ❌ UNHEALTHY | Health check failed for vivareal-br: Forbidden |
| zapimoveis-br | ❌ UNHEALTHY | Health check failed for zapimoveis-br: Forbidden |
| zillow | ✅ HEALTHY | - |
| zolo-ca | ❌ UNHEALTHY | Health check failed for zolo-ca: failed to access zolo.ca: Forbidden |
| zonaprop-ar | ❌ UNHEALTHY | Health check failed for zonaprop-ar: Forbidden |
Unhealthy Harvesters Details
Section titled “Unhealthy Harvesters Details”homedy
Section titled “homedy”Status: ❌ UNHEALTHY
Error: failed to access https://homedy.com: Forbidden
Issue: Website returns 403 Forbidden, likely due to anti-bot protection
Possible Solutions:
- Increase delay between requests
- Rotate user agents
- Use proxy rotation
- Check if official API is available
housingsamoa-com
Section titled “housingsamoa-com”Status: ❌ UNHEALTHY
Error: dial tcp: lookup housingsamoa.com on 1.1.1.1:53: no such host
Issue: DNS lookup failed - domain may not exist or is no longer registered
Possible Solutions:
- Verify domain is still active
- Check if domain has changed
- Remove harvester if site is permanently down
inmuebles24
Section titled “inmuebles24”Status: ❌ UNHEALTHY
Error: Forbidden
Issue: Website returns 403 Forbidden, likely due to anti-bot protection
Possible Solutions:
- Implement more sophisticated headers
- Use browser automation if necessary
- Check for API access
olx-br
Section titled “olx-br”Status: ❌ UNHEALTHY
Error: Forbidden
Issue: Website returns 403 Forbidden, likely due to anti-bot protection
Possible Solutions:
- Implement CAPTCHA handling
- Use proxy rotation
- Increase delays significantly
property-com-pg
Section titled “property-com-pg”Status: ❌ UNHEALTHY
Error: dial tcp: lookup property.com.pg on 1.1.1.1:53: no such host
Issue: DNS lookup failed - domain may not exist or is inaccessible
Possible Solutions:
- Verify domain accessibility
- Check network connectivity
- Remove if site is permanently down
property-pg
Section titled “property-pg”Status: ❌ UNHEALTHY
Error: dial tcp: lookup property.com.pg on 1.1.1.1:53: no such host
Issue: DNS lookup failed - same domain as property-com-pg
Possible Solutions:
- Verify domain accessibility
- Consider consolidating with property-com-pg
realestate-au
Section titled “realestate-au”Status: ❌ UNHEALTHY
Error: Too Many Requests
Issue: Website returns 429 Too Many Requests - rate limiting
Possible Solutions:
- Significantly increase delay between requests
- Implement exponential backoff
- Reduce concurrent requests
- Consider API access if available
vivanuncios-mx
Section titled “vivanuncios-mx”Status: ❌ UNHEALTHY
Error: Forbidden
Issue: Website returns 403 Forbidden, likely due to anti-bot protection
Possible Solutions:
- Implement more sophisticated anti-bot evasion
- Use browser automation
- Check for API access
vivareal-br
Section titled “vivareal-br”Status: ❌ UNHEALTHY
Error: Forbidden
Issue: Website returns 403 Forbidden, likely due to anti-bot protection
Possible Solutions:
- Implement CAPTCHA handling
- Use proxy rotation
- Check for official API
zapimoveis-br
Section titled “zapimoveis-br”Status: ❌ UNHEALTHY
Error: Forbidden
Issue: Website returns 403 Forbidden, likely due to anti-bot protection
Possible Solutions:
- Implement sophisticated anti-bot evasion
- Use browser automation
- Check for API access
zolo-ca
Section titled “zolo-ca”Status: ❌ UNHEALTHY
Error: failed to access zolo.ca: Forbidden
Issue: Website returns 403 Forbidden, likely due to anti-bot protection
Possible Solutions:
- Increase delay between requests
- Rotate user agents
- Use proxy rotation
zonaprop-ar
Section titled “zonaprop-ar”Status: ❌ UNHEALTHY
Error: Forbidden
Issue: Website returns 403 Forbidden, likely due to anti-bot protection
Possible Solutions:
- Implement more sophisticated headers
- Use browser automation if necessary
- Check for API access
Common Error Types
Section titled “Common Error Types”403 Forbidden
Section titled “403 Forbidden”Cause: Anti-bot protection blocking requests
Solutions:
- Implement realistic browser headers
- Rotate user agents
- Use proxy rotation
- Increase delays between requests
- Consider browser automation
429 Too Many Requests
Section titled “429 Too Many Requests”Cause: Rate limiting - too many requests too quickly
Solutions:
- Increase delay between requests significantly
- Implement exponential backoff
- Reduce concurrent requests
- Respect rate limit headers
DNS Lookup Failed
Section titled “DNS Lookup Failed”Cause: Domain doesn’t exist or is inaccessible
Solutions:
- Verify domain is still active
- Check network connectivity
- Verify DNS configuration
- Remove harvester if site is permanently down
Connection Timeout
Section titled “Connection Timeout”Cause: Network issues or site is down
Solutions:
- Check network connectivity
- Verify site is accessible
- Implement retry logic with backoff
- Check firewall settings
How to Run Health Checks
Section titled “How to Run Health Checks”Check All Harvesters
Section titled “Check All Harvesters”cd harvesters./scripts/check_harvester_health.shThis script will:
- Iterate through all registered harvesters
- Run health check for each
- Report status and errors
- Generate a summary
Check Individual Harvester
Section titled “Check Individual Harvester”cd harvesters./bin/harvester -harvester <name> -health-checkExample:
./bin/harvester -harvester homes-co-nz -health-checkProgrammatic Health Check
Section titled “Programmatic Health Check”You can also check harvester health programmatically:
harvester := homes_co_nz.NewHarvester()err := harvester.HealthCheck()if err != nil { log.Printf("Harvester unhealthy: %v", err)}Health Check Implementation
Section titled “Health Check Implementation”Each harvester implements a HealthCheck() method that:
- Attempts to access the target website
- Verifies basic connectivity
- Returns an error if access fails
- Returns nil if healthy
Example implementation:
func (h *Harvester) HealthCheck() error { testURL := "https://www.example.com" err := h.collector.Visit(testURL) if err != nil { return fmt.Errorf("failed to access %s: %w", testURL, err) } return nil}Monitoring Recommendations
Section titled “Monitoring Recommendations”Regular Health Checks
Section titled “Regular Health Checks”- Run health checks daily or weekly
- Monitor error trends
- Track recovery from temporary issues
- Document persistent failures
Alerting
Section titled “Alerting”Set up alerts for:
- Harvesters that become unhealthy
- Harvesters that remain unhealthy for extended periods
- Sudden increases in error rates
Maintenance
Section titled “Maintenance”- Update selectors when websites change structure
- Adjust rate limits based on site responses
- Remove harvesters for permanently unavailable sites
- Update anti-bot evasion techniques as needed
Contributing to Health Improvements
Section titled “Contributing to Health Improvements”If you’d like to help fix unhealthy harvesters:
- Identify the Issue: Review the error message and harvester code
- Test Locally: Run health checks and scraping tests
- Implement Fix: Update the harvester with fixes
- Verify: Run health checks to confirm fix
- Document: Update this page with resolution details
See Creating a New Harvester for development guidelines.
Related Documentation
Section titled “Related Documentation”- Implemented Harvesters - List of all harvesters
- Harvester TODO List - Harvesters to implement
- Creating a New Harvester - Development guide