Review-Grounded Explainable Recommendation with Faithfulness Evaluation on Amazon Reviews

Xiaohan Chang; Yifei Lu; Ziliang Samuel Zhong

doi:10.54732/jeecs.v11i1.2

PDF

Published: 2026-05-14

DOI: https://doi.org/10.54732/jeecs.v11i1.2

Keywords:

Amazon reviews, Evidence extraction, Explainable recommendation, Faithfulness evaluation , Review-grounded justification

Xiaohan Chang

Computer Science, University of Connecticut

Yifei Lu

Computer Science, University of California San Diego

Ziliang Samuel Zhong

New York University

Abstract

Review text can support explainable recommendations, but many recommender systems still optimize ranking accuracy without providing verifiable textual evidence, or they attach post-hoc explanations whose faithfulness to the model is unclear. This study addresses the lack of a reproducible evaluation setting that jointly measures recommendation quality and whether extracted review evidence actually supports model scoring. We propose Review-Grounded eXplainable Recommender (RGXRec), a lightweight hybrid method that combines interaction signals and TF-IDF review similarity, and we evaluate it on the Luxury Beauty and Video Games subsets of the Amazon Review Data. The pipeline includes rating thresholding, iterative 5-core pruning, chronological leave-one-out splitting, ranked recommendation, extractive evidence generation, and faithfulness evaluation. We compare RGXRec with popularity, metadata-graph KNN, SVD-MF, and ReviewSim using NDCG@K, Recall@K, MRR, evidence coverage, ROUGE-1, sentiment agreement, and a term-attribution faithfulness score. On Luxury Beauty, RGXRec achieves the best ranking performance, reaching NDCG@10 of 0.3606 and outperforming the strongest single-view baseline. On Video Games, collaborative and metadata signals remain stronger for ranking, but RGXRec preserves competitive accuracy while providing non-zero review-grounded faithfulness that interaction-only baselines cannot offer. These findings show that review-grounded recommendation should be evaluated on both ranking quality and explanation faithfulness.

Issue

Vol. 11 No. 1 (2026): June

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Review-Grounded Explainable Recommendation with Faithfulness Evaluation on Amazon Reviews. (2026). JEECS (Journal of Electrical Engineering and Computer Sciences), 11(1), 9-22. https://doi.org/10.54732/jeecs.v11i1.2

References

[1] J. Ni, J. Li, and J. McAuley, “Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects,” 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, pp. 188–197, 2019, doi: 10.18653/V1/D19-1018.

[2] J. McAuley, C. Targett, Q. Shi, and A. Van Den Hengel, “Image-based recommendations on styles and substitutes,” Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 43–52, 2015, doi: 10.1145/2766462.2767755.

[3] J. McAuley, R. Pandey, and J. Leskovec, “Inferring networks of substitutable and complementary products,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 2015-Augus, pp. 785–794, 2015, doi: 10.1145/2783258.2783381/SUPPL_FILE/P785.MP4.

[4] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, 2009, doi: 10.1109/MC.2009.263.

[5] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-based collaborative filtering recommendation algorithms,” Proceedings of the 10th International Conference on World Wide Web, WWW 2001, pp. 285–295, 2001, doi: 10.1145/371920.372071/ASSET/CFB8B952-6F16-43A6-8125-16F950D0D3E3/ASSETS/371920.372071.FP.PNG.

[6] N. Halko, P. G. Martinsson, and J. A. Tropp, “Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions,” SIAM Review, vol. 53, no. 2, pp. 217–288, 2011, doi: 10.1137/090771806.

[7] J. E. Ramos, “Using TF-IDF to Determine Word Relevance in Document Queries,” Proc. 1st Instructional Conf. Machine Learning, 2003.

[8] K. Järvelin and J. Kekäläinen, “Cumulated gain-based evaluation of IR techniques,” ACM Transactions on Information Systems (TOIS), vol. 20, no. 4, pp. 422–446, 2002, doi: 10.1145/582415.582418.

[9] Y. Zhang and X. Chen, “Explainable Recommendation: A Survey and New Perspectives,” Foundations and Trends® in Information Retrieval, vol. 14, no. 1, pp. 1–101, 2020, doi: 10.1561/1500000066.

[10] Z. Xu, H. Zeng, J. Tan, Z. Fu, Y. Zhang, and Q. Ai, “A Reusable Model-agnostic Framework for Faithfully Explainable Recommendation and System Scrutability,” ACM Transactions on Information Systems, vol. 42, no. 1, Aug. 2023, doi: 10.1145/3605357/ASSET/C83B5D09-928A-4E72-9F0D-369B55A11851/ASSETS/IMAGES/LARGE/TOIS-2022-0270-ALGO1.JPG.

[11] M. Hu and B. Liu, “Mining and summarizing customer reviews,” KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177, 2004, doi: 10.1145/1014052.1014073.

[12] J. McAuley and J. Leskovec, “Hidden factors and hidden topics: Understanding rating dimensions with review text,” RecSys 2013 - Proceedings of the 7th ACM Conference on Recommender Systems, pp. 165–172, 2013, doi: 10.1145/2507157.2507163.

[13] L. Zheng, V. Noroozi, and P. S. Yu, “Joint deep modeling of users and items using reviews for recommendation,” WSDM 2017 - Proceedings of the 10th ACM International Conference on Web Search and Data Mining, pp. 425–433, 2017, doi: 10.1145/3018661.3018665.

[14] C. Chen, M. Zhang, Y. Liu, and S. Ma, “Neural attentional rating regression with review-level explanations,” The Web Conference 2018 - Proceedings of the World Wide Web Conference, WWW 2018, pp. 1583–1592, 2018, doi: 10.1145/3178876.3186070.

[15] J. Lei, C. Zhu, S. Yang, J. Wang, and Y. X. Yu, “Influence of Review Properties in the Usefulness Analysis of Consumer Reviews: A Review-Based Recommender System for Rating Prediction,” Neural Processing Letters, vol. 55, no. 8, pp. 11035–11054, 2023, doi: 10.1007/S11063-023-11363-5/METRICS.

[16] P. Cremonesi, Y. Koren, and R. Turrin, “Performance of recommender algorithms on top-N recommendation tasks,” RecSys’10 - Proceedings of the 4th ACM Conference on Recommender Systems, pp. 39–46, 2010, doi: 10.1145/1864708.1864721/SUPPL_FILE/RECSYS2010-28092010-04-01.MOV.

[17] X. Chen, Y. Zhang, and J.-R. Wen, “Measuring ‘Why’ in Recommender Systems: a Comprehensive Survey on the Evaluation of Explainable Recommendation,” 2022, Accessed: May 12, 2026. [Online]. Available: https://arxiv.org/abs/2202.06466v1.

[18] X. Wang, Q. Li, D. Yu, Q. Li, and G. Xu, “Counterfactual Explanation for Fairness in Recommendation,” ACM Transactions on Information Systems, vol. 42, no. 4, 2024, doi: 10.1145/3643670/ASSET/C7962D24-6CA7-4311-BE93-1FB81B3C5BFD/ASSETS/IMAGES/LARGE/TOIS-2023-0217-F07.JPG.

[19] H. Zhuang, W. Zhang, W. Chen, J. Yang, and Q. Z. Sheng, “Improving Faithfulness and Factuality with Contrastive Learning in Explainable Recommendation,” ACM Transactions on Intelligent Systems and Technology, vol. 16, no. 1, p. 23, 2024, doi: 10.1145/3653984/ASSET/D22C586E-C379-4A9A-8EA7-20B7C1BB6B2A/ASSETS/IMAGES/LARGE/TIST-2023-07-0395-F06.JPG.

[20] E. Hasan, M. Rahman, C. Ding, J. X. Huang, and S. Raza, “Review-based Recommender Systems: A Survey of Approaches, Challenges and Future Perspectives,” ACM Computing Surveys, vol. 58, no. 1, p. 41, 2025, doi: 10.1145/3742421/ASSET/60B50CED-0752-4C0C-B0DC-627881A7F49F/ASSETS/IMAGES/LARGE/CSUR-2024-0435-F05.JPG.

[21] ZhouYao, WangHaonan, HeJingrui, and WangHaixun, “Review-Based Explainable Recommendations: A Transparency Perspective,” ACM Transactions on Recommender Systems, vol. 3, no. 3, pp. 1–20, 2025, doi: 10.1145/3701762.

[22] Q. Hao, C. Wang, Y. Xiao, and W. Zheng, “IReGNN: Implicit review-enhanced graph neural network for explainable recommendation,” Knowledge-Based Systems, vol. 311, p. 113113, 2025, doi: 10.1016/J.KNOSYS.2025.113113.

[23] J. DeYoung et al., “ERASER: A Benchmark to Evaluate Rationalized NLP Models,” Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 4443–4458, 2020, doi: 10.18653/V1/2020.ACL-MAIN.408.

[24] Q. Lyu, M. Apidianaki, and C. Callison-Burch, “Towards Faithful Model Explanation in NLP: A Survey,” Computational Linguistics, vol. 50, no. 2, pp. 657–723, 2024, doi: 10.1162/COLI_A_00511.

Article Sidebar

Main Article Content

Abstract

Article Details

Issue

Section

How to Cite

References

Similar Articles