A Self-training CRF Method for Recognizing Product Model Mentions in Web Forums
Important applications in product opinion mining such as opinion summarization and aspect extraction require the recognition of product mentions as a basic task. In the case of consumer electronic products, Web forums are important and popular sources of valuable opinions. Forum users often refer to products by means of their model numbers. In a post a user would employ model numbers, e.g., “BDP-93” and “BDP-103”, to compare Blu-ray players. To properly handle opinions in such a scenario, applications need to correctly recognize products by their model numbers. Forums, however, are informal and many challenges for undertaking automatic product model recognition arise, since users mention model numbers in many different ways. In this paper we propose the use of a self-training strategy to learn a suitable CRF model for this task. Our method requires only a set of seed model numbers. Experiments in four different settings demonstrate that our method, by leveraging unlabeled sentences from the target forum, yielded an improvement of 19% in recall and 12% in F-measure over a supervised CRF model.
- Henry Silva Vieira
- Altigran S. da Silva
- Marco Cristo
- Edleno S. de Moura
- ECIR 2015 – 37th European Conference on IR Research
- Data e Local: 29 de março à 02 de abril de 2015 – Vienna, Austria