Qualitative researchers may find automated text analysis to be especially useful, as their work often relies heavily on labor-intensive, manual coding procedures. As a result, the rise of computer-assisted text analysis techniques in the social sciences is driving the emergence of the field of computational social sciences. To overcome this problem, social scientists are increasingly incorporating computer-assisted text analysis techniques into their research toolboxes. Most text analysis is still based on labor-intensive human coding or dictionary-based methods, which are semi-automated but involve a large amount of manual labor as well. The tremendous increase in the volume and variety of unstructured text documents represents a major challenge for social sciences research today. This way, we enable readers to gain a deeper understanding of the performance of topic modeling techniques and the interplay of performance and evaluation metrics. Secondly, we analyze the relationship between existing metrics and the known clustering, and thus objectively determine under what conditions these algorithms may be utilized effectively. Our findings show a clear ranking of the algorithms in terms of accuracy. The comparison is made against a known clustering and thus enables an unbiased evaluation of results. First, we compare all commonly used, non-application-specific topic modeling algorithms and assess their relative performance. Consequently, our study has two main objectives. Although many studies have reported promising performance by various topic models, prior research has not yet systematically investigated the validity of the outcomes in a comprehensive manner, that is, using more than a small number of the available algorithms and metrics. Altogether, the choice of an appropriate algorithm and the evaluation of the results remain unresolved issues. The metrics used so far provide a mixed picture, making it difficult to verify the accuracy of topic modeling outputs. A second challenge is the choice of a suitable metric for evaluating the calculated results. First, the comparison of available algorithms is anything but simple, as researchers use many different datasets and criteria for their evaluation. It has proven useful for this task, but its application poses a number of challenges. Topic modeling is a popular technique for exploring large document collections.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |