Authors
Keywords
Abstract
: In the rapidly evolving world of e-commerce, metadata enrichment has become essential to improve the discoverability, structure, and value of product information. This study explores advanced methods for enriching product metadata using semantic tagging combined with machine learning. As online product catalogs expand in size and complexity, often containing random patterns and incomplete data, the need for structured, context-aware tags is more important than ever. Traditional tagging systems often face challenges such as sparse data, ambiguous labeling, and lack of standardization, which negatively impact search performance and recommendation accuracy.To address these limitations, this paper presents a hybrid approach that uses structured semantic markup (e.g., schema.org, RDFa, JSON-LD), user-generated content, and various machine learning regression models—including Random Forest, XGBoost, AdaBoost, Gradient Boosting, and Decision Tree regressors—to predict appropriate additional tags for product descriptions. These models were trained and tested on a dataset of 20 product entries, each of which was evaluated based on factors such as image quality, description length, and existing tag reliability.Statistical and correlation analyses revealed a strong positive relationship between the richness of visual and textual product content and the success of tag enrichment. Among the evaluated models, Random Forest Regression demonstrated the highest generalization ability, achieving an R² score of 0.9227 on the test set. It outperformed other models such as XGBoost (0.5527), Gradient Boosting (0.8324), AdaBoost (0.8999) and Decision Tree (0.7534), the latter two of which showed signs of overfitting – highlighting the importance of choosing models that maintain performance in unseen data.Visualization techniques, including scatterplot matrices and heatmaps, further supported these findings by illustrating the strong influence of image quality and description length on tag prediction outcomes. The study also examined the role of ontology association (e.g., AGROVOC) in improving semantic alignment and user personalization. The research highlights a balanced approach to improving metadata coherence, discoverability and adaptive personalization in dynamic e-commerce environments by integrating user-generated metadata with expert-curated vocabularies.