Publication details

Customers' Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth

Authors

ŽIŽKA Jan SVOBODA Arnošt

Year of publication 2015
Type Article in Periodical
Magazine / Source Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis
MU Faculty or unit

Faculty of Economics and Administration

Citation
Web http://acta.mendelu.cz/63/6/2229/
Doi http://dx.doi.org/10.11118/actaun201563062229
Field Informatics
Keywords text mining; customer opinion analysis; decision trees; decision rules; windowing; large data volumes; machine learning; computational complexity; training-set size
Attached files
Description Not only can the shortage of data be a data mining problem - having too much data may be the cause of difficulty as well. The experimental investigation of the influence of the review number on the knowledge mined from the text documents demonstrated primarily the not surprising cardinal high-time dependence. With the permanent increase of the volume of hotel-service reviews, the CPU time of the text mining process grew strongly non-linearly while the knowledge, expressed in generated semantically relevant words, remained increasing, too, even if its increase was progressively smaller all the time. Among others, the revealed relevant words (or phrases composed of them) can be further used as significant key-words for information retrieval or for defining more detailed topics hidden in text documents. After finishing the above described research, which aimed at revealing relevant words that represented the reviews, a following series of experiments have been started to mine better knowledge that would provide more information understandable by humans: automatically discovering significant phrases composed from relevant words. To find the phrases, a method of analyzing n-grams (here a contiguous sequence of n words) was applied to reviews written in English, Spanish, German, and Russian. Similar procedures as described in this article, using the same decision-trees/rules tool, data source, and windows containing constantly 100,000 reviews, were used. From the semantic point of view - unlike 1-grams described in this paper - the best phrases were provided by 3-grams, for example, "breakfast very good" (a positive phrase), "no free Internet" (a negative phrase) and so like. Details can be found in Žižka and Dařena (2015).

You are running an old browser version. We recommend updating your browser to its latest version.

More info