Wikidata Bibliography
Selection of scientific references in English and French about Wikidata.
General presentations
- Denny Vrandečić, « Wikidata: a new platform for collaborative data collection », Proceedings of the 21st International Conference on World Wide Web, Association for Computing Machinery, wWW '12 Companion, , p.1063–1064
- Denny Vrandečić et M. Krötzsch, « Wikidata: a free collaborative knowledgebas », Communications of the ACM, 2014, 57(10), 78-85. Highly cited article (over 3,000 citations in June 2023). General presentation of Wikidata, covering its history, its deployment in 287 languages, how it is structured, the possibility of citing sources, the web of data, and Wikidata applications.
Wikidata history
- Thomas Pellissier Tanon, Denny Vrandečić, Sebastian Schaffert et Thomas Steiner, « From Freebase to Wikidata: The Great Migration », Proceedings of the 25th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, wWW '16, , p.1419–1428.
- Fredo Erxleben, Michael Günther, Markus Krötzsch et Julian Mendez, « Introducing Wikidata to the Linked Data Web », The Semantic Web – ISWC 2014, Springer International Publishing, lecture Notes in Computer Science, , p. 50–65
- Denny Vrandecic, « The rise of Wikidata », IEEE Intelligent Systems 28 (4), 2013, p. 90-95.
-
Using Wikidata in libraries
- Giovanni Bergamin et Cristian Bacchi, « New ways of creating and sharing bibliographic information: an experiment of using the Wikibase Data Model for UNIMARC data », JLIS.it, vol. 9, no 3, , p. 35–74
- Theo van Veen, « Wikidata:From “an” Identifier to “the” Identifier », Information Technology and Libraries, vol.38, no2,, p.72–81.
Using Wikidata for research
- S. Deng, « Linked Data, Wikidata and Their Implementations », 2023.
-
Daniel Mietchen, Gregor Hagedorn, Egon Willighagen, Mariano Rico, Asunción Gómez-Pérez, Eduard Aibar, Karima Rafes, Cécile Germain, Alastair Dunning, Lydia Pintscher et Daniel Kinzler, « Enabling Open Science: Wikidata for Research (Wiki4R) », Research Ideas and Outcomes, Pensoft Publishers, vol. 1, , e7573
-
Pascal Martinolli, « Wikidata pour la recherche », university of Montreal,
-
Tisch et F. Pradel, « How can the social sciences benefit from knowledge graphs? A case study on using Wikidata and Wikipedia to examine the world’s billionaires ».
This article examines the usefulness of Wikidata in the social sciences, through the use of graphs to obtain a list of the world's billionaires (2010-2022). Knowledge graphs can be used to generate datasets providing information about billionaires, and also allow social science researchers to link different databases in order to enrich their research. Wikidata also presents gender and nationality biases. Using the genealogical information provided by Wikidata, this article examines the family networks of billionaires, and shows that at least 15% of all billionaires have a family member who is also a billionaire.
Wikidata internal workings, tools and gadgets
- Stefan Heindorf, Martin Potthast, Benno Stein et Gregor Engels, « Vandalism Detection in Wikidata », Proceedings of the 25th ACM International on Conference on Information and Knowledge Management Association for Computing Machinery, cIKM', 16, 24 october 2016, p. 327–336.
- Zangerle, E., Gassler, W., Pichl, M., Steinhauser, S., & Specht, G. (2016, August). « An empirical evaluation of property recommender systems for Wikidata and collaborative knowledge bases ». In Proceedings of the 12th International Symposium on Open Collaboration, p. 1-8.
- Hernández, D., Hogan, A., & Krötzsch, M. « Reifying RDF: What works well with wikidata? ». SSWS@ ISWC 1457, 2015, p. 32-47.
- Vevake Balaraman, Simon Razniewski et Werner Nutt « Recoin: Relative Completeness in Wikidata », Companion Proceedings of the The Web Conference 2018, International World Wide Web Conferences Steering Committee, wWW '18, 23 april 2018, p. 1787–1792.
- A. Sarabadani, A. Halfaker & D. Taraborelli , « Building automated vandalism detection tools for Wikidata », Proceedings of the 26th International Conference on World Wide Web Companion, 2017, p. 1647-1654.
Wikidata is a knowledge base that anyone can modify. This model of open collaboration is powerful because it reduces barriers to participation and allows a large number of people to contribute. However, it exposes the knowledge base to the risk of vandalism and poor quality contributions. In this work, we build on previous work detecting vandalism in Wikipedia to detect vandalism in Wikidata. This work is novel in that identifying detrimental changes in a structured knowledge base requires feature engineering work that is significantly different to that of a text-based wiki such as Wikipedia. We also examine the usefulness of such classifiers in reducing the overall workload of patrollers tasked with detecting vandalism in Wikidata. We describe an automatic classification strategy that detects 89% of vandalism while reducing patroller workload by 98%, relying lightly on the contextual characteristics of a change and heavily on the characteristics of the user making the change.
Data alignment and sharing
- (French) Blandine Nouvel, « Des archéologues et le web des données: ateliers d'alignement du thésaurus PACTOLS avec Wikidata » dans Journée d'étude : Outil numérique : pédagogie scientifique et médiation du patrimoine culturel, 2018 ; voir aussi Christelle Molinié, Blandine et Miled Rousset, Aligner pour mieux diffuser, l'expérience du thésaurus PACTOLS pour l'archéologie avec Wikidata, dans Webinaire wiki, data et GLAM 2021.
- (French) Benoît Prieur, « Wikidata/Wikipédia & OpenStreetMap: deux communs en dialogue » dans Séminaire Passages en Commun-UMR Passages, 2019.
- (French) Benoît Prieur, « Projet de valorisation des données relatives aux voies de Lyon », 2019.
Systematic reviews
- M. Mora-Cantallops, S. Sánchez-Alonso et E. García-Barriocanal, « A systematic literature review on Wikidata », Data Technologies and Applications, 2019, 53(3), p. 250-268
This systematic review of the scientific literature on Wikidata, published in 2019, concludes that there is relatively little scientific production despite the potential of the knowledge base and the significant increase in research activity. Most research is published in conferences. Only a few disciplines are currently benefiting from Wikidata applications, with a significant gap between research and practice.
Quality of data
- S. A. H. Beghaeiraveri, A. Gray, et F. McNeill, « RQSS: Referencing Quality Scoring System for Wikidata »,
Wikidata is unique in that it adds provenance data to the declaration of elements as references. This study looks at the quality of the references added, using a Referencing Quality Scoring System (RQSS), which provides quantified scores that make it possible to analyse and evaluate the quality of referencing, and then to check it regularly. The evaluation shows that RQSS is practical and provides valuable information that can be used by Wikidata contributors and project developers to identify quality gaps.
- A. Piscopo et E. Simperl, « What we talk about when we talk about Wikidata quality: a literature survey », In Proceedings of the 15th International Symposium on Open Collaboration, 2019, p. 1-11
Data quality is a major point of interest in Wikidata. This review assesses the quality of data in Wikidata by analysing 28 articles classified according to the quality dimensions addressed. The review concludes that a number of quality dimensions have not yet been adequately addressed, such as accuracy and reliability. Future work should focus on these aspects.
- K. Shenoy, F. Ilievski, D. Garijo, D. Schwabe et P. Szekely, « A study of the quality of wikidata », Journal of Web Semantics, 2022, 72.
This study sets out a framework for detecting and analysing low-quality declarations in Wikidata, highlighting the practices of its community. It explores three indicators of data quality in Wikidata, based on: (1) community consensus on recorded knowledge, assuming that statements that have been removed and not readjusted are implicitly considered low quality; (2) statements that have been deprecated; and (3) constraint violations in the data. She combines these indicators to detect low-quality declarations, revealing problems with duplicate entities, missing triples, broken type rules and taxonomic distinctions. Its results complement the Wikidata community's efforts to improve data quality, making it easier for users and editors to find and correct errors.
- M. Färber, F. Bartscherer, C. Menne et A. Rettinger, « Linked data quality of dbpedia, freebase, opencyc, wikidata, and yago », Semantic Web, 2018, 9(1), p. 77-129.
Comparison between several large cross-domain, freely accessible knowledge graphs (KGs): DBpedia, Freebase, OpenCyc, Wikidata and YAGO. This study provides data quality criteria against which KGs can be analysed, and proposes a framework for finding the KG best suited to a given context.
- G. Amaral, A. Piscopo, L. A. Kaffee, O. Rodrigues et E. Simperl, « Assessing the quality of sources in Wikidata across languages: a hybrid approach », Journal of Data and Information Quality (JDIQ), 13(4), 2021, p. 1-35.
Wikidata content must be supported by credible references; this is particularly important, as Wikidata explicitly encourages editors to add claims that are not widely agreed, as long as they are corroborated by references. Nevertheless, despite this essential link between content and references, Wikidata's ability to systematically assess and guarantee the quality of its references remains limited. To this end, this mixed-methods study determines the relevance, accessibility and authority of Wikidata references, at scale and in different languages, using online crowdsourcing, descriptive statistics and machine learning.
- A. Piscopo, C. Phethean et E. Simperl, « What makes a good collaborative knowledge graph: group composition and quality in wikidata », In Social Informatics: 9th International Conference, SocInfo 2017, Oxford, UK, September 13-15, 2017, Proceedings, Part I 9, p. 305-322.
The collaborative production processes in Wikidata have not yet been explored. It is essential to understand them in order to prevent potentially harmful community dynamics and ensure the long-term viability of the project. This regression analysis examines how the contribution of different types of users, namely robots and human editors, both registered and anonymous, affects the quality of output in Wikidata. In addition, this study examines the effects of length of employment and diversity of interests among registered users. Its results show that a balanced contribution of bots and human editors has a positive influence on the quality of results, whereas a high number of anonymous edits can adversely affect performance. Seniority and diversity of interests within the groups also lead to better quality. These results may be useful for identifying and dealing with groups likely to perform less well in Wikidata. Further work should analyse in detail the respective contributions of robots and registered users.