The Impact of Dimensionality Reduction Techniques on Real Estate Appraisal Performance in Tree-Based Machine Learning Models

Peng Wang

doi:10.64229/gaqshr61

Authors

Peng Wang University of Southern California, Los Angeles, United States Author

DOI:

https://doi.org/10.64229/gaqshr61

Keywords:

Automated Valuation Models, Dimensionality Reduction, Feature Selection, Principal Component Analysis, Random Forest, Gradient Boosting, House Price Prediction

Abstract

Accurate and scalable real estate appraisal increasingly relies on machine learning models trained on rich, high-dimensional tabular data. While dimensionality reduction (DR) is often recommended to mitigate noise and multicollinearity, evidence on when DR actually helps tree-based estimators remains mixed. This study provides a systematic, model-level assessment of DR for automated valuation using structured residential attributes (physical, locational, neighborhood, temporal). We benchmark three widely used tree-based learners-Decision Tree, Random Forest, and Gradient Boosting-under three input representations: (i) the full feature set, (ii) supervised feature selection using Random-Forest importance (top-k) and (iii) unsupervised projection via principal component analysis (PCA). Performance is evaluated on held-out test data using coefficient of determination (R²) and root-mean-square error (RMSE). Results indicate that ensembles (Random Forest, Gradient Boosting) already handle moderate dimensionality well, so aggressive feature culling can slightly erode accuracy; by contrast, a single Decision Tree benefits marginally from a compact, high-signal subset. PCA consistently reduces accuracy relative to the full feature set for all tree models, reflecting the fact that the highest-variance directions in features do not necessarily align with price-predictive directions. A practical implication for mass appraisal is that dimensionality reduction should be “task-aware”: embedded/selection methods tied to the target can be helpful when models are capacity-limited, whereas unsupervised projections risk discarding valuation-relevant information. We close with guidance on when to prefer full features, selective pruning, or learned representations in property valuation pipelines.

References

[1]Glumac, B., & Des Rosiers, F. (2021). Practice briefing–Automated valuation models (AVMs): their role, their advantages and their limitations. Journal of Property Investment & Finance, 39(5), 481-491.

[2]Choy, L. H., & Ho, W. K. (2023). The use of machine learning in real estate research. Land, 12(4), 740.

[3]Geerts, M., Vanden Broucke, S., & De Weerdt, J. (2023). A survey of methods and input data types for house price prediction. ISPRS International Journal of Geo-Information, 12(5), 200.

[4]Hong, J., Choi, H., & Kim, W. S. (2020). A house price valuation based on the random forest approach: the mass appraisal of residential property in South Korea. International Journal of Strategic Property Management, 24(3), 140-152.

[5]Iban, M. C. (2022). An explainable model for the mass appraisal of residences: The application of tree-based Machine Learning algorithms and interpretation of value determinants. Habitat International, 128, 102660.

[6]Baur, K., Rosenfelder, M., & Lutz, B. (2023). Automated real estate valuation with machine learning models using property descriptions. Expert Systems with Applications, 213, 119147.

[7]Kim, J., Lee, Y., Lee, M. H., & Hong, S. Y. (2022). A comparative study of machine learning and spatial interpolation methods for predicting house prices. Sustainability, 14(15), 9056.

[8]Jia, W., Sun, M., Lian, J., & Hou, S. (2022). Feature dimensionality reduction: a review. Complex & Intelligent Systems, 8(3), 2663-2693.

[9]Khaire, U. M., & Dhanalakshmi, R. (2022). Stability of feature selection algorithm: A review. Journal of King Saud University-Computer and Information Sciences, 34(4), 1060-1073.

[10]Chhikara, P., Jain, N., Tekchandani, R., & Kumar, N. (2022). Data dimensionality reduction techniques for Industry 4.0: Research results, challenges, and future research directions. Software: Practice and Experience, 52(3), 658-688.

[11]Aydinoglu, A. C., & Sisman, S. (2024). Comparing modelling performance and evaluating differences of feature importance on defined geographical appraisal zones for mass real estate appraisal. Spatial Economic Analysis, 19(2), 225-249.

[12]Droj, G., Kwartnik-Pruc, A., & Droj, L. (2024). A comprehensive overview regarding the impact of GIS on property valuation. ISPRS International Journal of Geo-Information, 13(6), 175.

[13]Hoxha, V. (2025). Comparative analysis of machine learning models in predicting housing prices: a case study of Prishtina's real estate market. International Journal of Housing Markets and Analysis, 18(3), 694-711.

[14]Alzain, E., Alshebami, A. S., Aldhyani, T. H., & Alsubari, S. N. (2022). Application of artificial intelligence for predicting real estate prices: The case of Saudi Arabia. Electronics, 11(21), 3448.

[15]García-Magariño, I., Medrano, C., & Delgado, J. (2020). Estimation of missing prices in real-estate market agent-based simulations with machine learning and dimensionality reduction methods. Neural Computing and Applications, 32(7), 2665-2682.

The Impact of Dimensionality Reduction Techniques on Real Estate Appraisal Performance in Tree-Based Machine Learning Models

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite