Publication details

Customers' Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth

Authors	ŽIŽKA Jan SVOBODA Arnošt
Year of publication	2015
Type	Article in Periodical
Magazine / Source	Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis
MU Faculty or unit	Faculty of Economics and Administration
Citation
Web	http://acta.mendelu.cz/63/6/2229/
Doi	http://dx.doi.org/10.11118/actaun201563062229
Field	Informatics
Keywords	text mining; customer opinion analysis; decision trees; decision rules; windowing; large data volumes; machine learning; computational complexity; training-set size
Attached files	actaun_2015063062229_Acta_Univ_2015.pdf
Description	Not only can the shortage of data be a data mining problem - having too much data may be the cause of difficulty as well. The experimental investigation of the influence of the review number on the knowledge mined from the text documents demonstrated primarily the not surprising cardinal high-time dependence. With the permanent increase of the volume of hotel-service reviews, the CPU time of the text mining process grew strongly non-linearly while the knowledge, expressed in generated semantically relevant words, remained increasing, too, even if its increase was progressively smaller all the time. Among others, the revealed relevant words (or phrases composed of them) can be further used as significant key-words for information retrieval or for defining more detailed topics hidden in text documents. After finishing the above described research, which aimed at revealing relevant words that represented the reviews, a following series of experiments have been started to mine better knowledge that would provide more information understandable by humans: automatically discovering significant phrases composed from relevant words. To find the phrases, a method of analyzing n-grams (here a contiguous sequence of n words) was applied to reviews written in English, Spanish, German, and Russian. Similar procedures as described in this article, using the same decision-trees/rules tool, data source, and windows containing constantly 100,000 reviews, were used. From the semantic point of view - unlike 1-grams described in this paper - the best phrases were provided by 3-grams, for example, "breakfast very good" (a positive phrase), "no free Internet" (a negative phrase) and so like. Details can be found in Žižka and Dařena (2015).