Application Logistic Regression in Assessing the Quality of Information - Wikipedia Articles Case
DOI:
https://doi.org/10.18559/SOEP.2017.12.3Keywords:
Big Data, Management, Information quality, Logistic regression, Information noiseAbstract
Th e use of the logistic regression in the assessment of the quality of data may have a significant impact on data management in the era of big data, where we are all dealing with a number of variables and amount of information describing some interesting phenomenon or behaviour. Th e calculation of actual an information value (IV) indicator allows to eliminate these variables which are irrelevant or just constitute an information overload. The article presents the use of logistic regression in the assessment of variables describing the quality of articles published on the English version of Wikipedia. A classification of variables because of the results of the information value indicator have been presented. Also the predictive capabilities of variables have been evaluated.Downloads
References
Anderka, M., 2013, Analyzing and Predicting Quality Flaws in User-generated Content: The Case of Wikipedia, PhD. Bauhaus-Universität, Weimar.
Belanger, D., Betser, J., 2013, Architecting the Enterprise via Big Data Analytics, in: Liebowitz, J. (ed.), Big Data and Business Analytics, CRC Press, Taylor & Francis Group, Boca Raton, s. 1-20.
Berry, D., 2012, Unstructured Data: Challenge or Asset?, http://www.zdnet.com/article/ unstructured-data-challenge-or-asset/ [dostęp: kwiecień 2016].
Blumenstock, J.E., 2008a, Automatically Assessing the Quality of Wikipedia Articles, School of Information, UC Berkeley.
Blumenstock, J.E., 2008b, Size Matters: Word Count as a Measure of Quality on Wikipedia, w: Proceedings of the 17th International Conference on World Wide Web, s. 1095-1096.
Brotherton, D., Lund, B., 2013, Information Value Statistic, MidWest SAS® Users Group conference materials, Paper AA-14-2013.
Dalip, D.H., Gonçalves, M.A., Cristo, M., Calado, P., 2009, Automatic Quality Assessment of Content Created Collaboratively by Web Communities: A Case Study of Wikipedia, in: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, s. 295-304.
Dalip, D.H., Gonçalves, M.A., Cristo, M., Calado, P., 2011, Automatic Assessment of Document Quality in Web Collaborative Digital Libraries, Journal of Data and Information Quality, vol. 2, no. 3, s. 1-30.
Finlay, S., 2010, Credit Scoring, Response Modelling and Insurance Rating, Palgrave MacMillan, New York.
Hu, M., Lim, E.-P., Sun, A., Lauw, H.W., Vuong, B.-Q., 2007, Measuring Article Quality in Wikipedia, in: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, s. 243-252.
Lewoniewski, W., Węcel, K., Abramowicz, W., 2015, Analiza porównawcza modeli jakości informacji w narodowych wersjach Wikipedii, w: Porębska-Miąc, T. (red.), Systemy Wspomagania Organizacji SWO, Wydawnictwo Uniwersytetu Ekonomicznego w Katowicach, Katowice, s. 133-154.
Lewoniewski, W., Węcel, K., Abramowicz, W., 2016, Quality and Importance of Wikipedia Articles in Different Languages, in: Dregvaite, G., Damasevicius, R. (eds.), Information and Soft ware Technologies. ICIST 2016, Communications in Computer and Information Science, iss. 639, s. 613-624.
Lex, E., Voelske, M., Errecalde, M., Ferretti, E., Cagnina, L., Horn, C., Stein, B., Granitzer, M., 2012, Measuring the Quality of Web Content Using Factual Information, in: Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web, s. 7.
Lih, A., 2004, Wikipedia as Participatory Journalism: Reliable Sources? Metrics for Evaluating Collaborative Media as a News Resource, in: 5th International Symposium on Online Journalism, s. 31.
Lipka, N., Stein, B., 2010, Identifying Featured Articles in Wikipedia: Writing Style Matters, in: Proceedings of the 19th International Conference on World Wide Web, s. 1147-1148.
Mays, E., Lynas, N., 2011, Credit Scoring for Risk Managers: The Handbook for Lenders, 2nd ed., South Western, Thomson.
Siddiqi, N., 2006, Credit Risk Scorecards, John Wiley & Sons, Hoboken, NJ.
Stvilia, B., Twidale, M.B., Smith, L.C., Gasser, L., 2005, Assessing Information Quality of a Community-based Encyclopedia, in: Proceedings of the Internetional Conference on Information Quality, s. 442-454.
Warncke-Wang, M., Cosley, D., Riedl, J., 2013, Tell Me More: An Actionable Quality Model for Wikipedia, in: Proceeding of the 9th International Symposium on Open Collaboration, s. 1-10.
Węcel, K., Lewoniewski, W., 2015, Modelling the Quality of Attributes in Wikipedia Infoboxes, in: Abramowicz, W. (ed.), Business Information Systems Workshops. BIS 2015, Lecture Notes in Business Information Processing, iss. 228, s. 308-320.
Wilkinson, D.M., Huberman, B.A., 2007, Cooperation and Quality in Wikipedia, in: Proceedings of the 2007 International Symposium on Wikis, s. 157-164.
Xu, Y., Luo, T., 2011, Measuring Article Quality in Wikipedia: Lexical Clue Model, in: IEEE Symposium on Web Society 19, s. 141-146.
Downloads
Published
Issue
Section
License
Copyright (c) 2017 Wydawnictwo UEP

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Lorem ipsum dolor sit amet quam leo, cursus vitae, commodo convallis consequat. Donec pulvinar porta neque, blandit risus commodo sit amet ante. Quisque condimentum. Donec orci interdum euismod scelerisque tincidunt. Maecenas vitae mi. Pellentesque orci vitae nunc venenatis tristique, convallis accumsan, dolor sit amet metus. Curabitur tempor. Phasellus sem. Quisque.

