Skip to content

Identification Attributes

These attributes uniquely identify properties and track their data sources.

  • Type: VARCHAR(50)
  • Description: Primary unique identifier for the property
  • Example: "prop-12345"
  • Required: Yes
  • Indexed: Yes (primary key)
  • Type: VARCHAR(100)
  • Description: Universally unique identifier (UUID) for the property
  • Example: "123e4567-e89b-12d3-a456-426614174000"
  • Required: No
  • Indexed: Yes (unique index)
  • Type: VARCHAR(100)
  • Description: Identifier from the original source system (harvester)
  • Example: "realestate-com-au-12345"
  • Required: No
  • Use Case: Links property back to original listing source
  • Type: VARCHAR(50)
  • Description: Multiple Listing Service (MLS) number, if applicable
  • Example: "ML123456"
  • Required: No
  • Note: Only applicable in regions with MLS systems
  • Type: VARCHAR(50)
  • Description: Alternative identifier for integrations or external systems
  • Example: "EXT-789"
  • Required: No
  • Use Case: Integration with third-party systems
  • Type: VARCHAR(64)
  • Description: MD5 hash of normalized address for deduplication
  • Example: "5d41402abc4b2a76b9719d911017c592"
  • Required: Yes
  • Indexed: Yes (unique key in Doris)
  • Note: Must be first column in table for Doris UNIQUE KEY constraint
  • Type: VARCHAR(100)
  • Description: Name of the harvester or data source that provided this property
  • Example: "realestate-com-au", "zillow", "manual-entry"
  • Required: No
  • Use Case: Track data provenance and quality by source
  • Type: DOUBLE
  • Description: Quality score from 0.0 to 1.0 indicating completeness and accuracy
  • Example: 0.85
  • Range: 0.0 (poor) to 1.0 (excellent)
  • Calculation: Based on field completeness, validation, and consistency
  • Type: VARCHAR(50)
  • Description: Confidence level in the data accuracy
  • Values: "high", "medium", "low", "verified"
  • Example: "high"
  • Use Case: Filter properties by data reliability
  • Type: DATETIME
  • Description: Timestamp when property data was last verified or updated from source
  • Example: "2024-01-15T10:30:00Z"
  • Required: No
  • Use Case: Track data freshness
SELECT * FROM properties WHERE id = 'prop-12345';
SELECT address_hash, COUNT(*) as count
FROM properties
GROUP BY address_hash
HAVING count > 1;
SELECT * FROM properties
WHERE data_quality_score >= 0.8
ORDER BY data_quality_score DESC;
SELECT data_source, COUNT(*) as count, AVG(data_quality_score) as avg_quality
FROM properties
GROUP BY data_source
ORDER BY count DESC;
  1. Always set id: Generate a unique ID for every property
  2. Use address_hash for deduplication: Hash normalized addresses to identify duplicates
  3. Track data_source: Always record which harvester provided the data
  4. Maintain data_quality_score: Calculate and update quality scores during ingestion
  5. Set last_verified: Update timestamp when refreshing data from source