Skip to main content

Wikidata Bibliography

Selection of scientific references in English and French about Wikidata.

General presentations

    • Denny Vrandečić, « Wikidata: a new platform for collaborative data collection », Proceedings of the 21st International Conference on World Wide Web, Association for Computing Machinery, wWW '12 Companion,‎ p.1063–1064
    • Denny Vrandečić et M. Krötzsch, « Wikidata: a free collaborative knowledgebas », Communications of the ACM, 2014, 57(10), 78-85.
    • Highly cited article (over 3,000 citations in June 2023). General presentation of Wikidata, covering its history, its deployment in 287 languages, how it is structured, the possibility of citing sources, the web of data, and Wikidata applications.

Wikidata history

    • Thomas Pellissier Tanon, Denny Vrandečić, Sebastian Schaffert et Thomas Steiner, « From Freebase to Wikidata: The Great Migration », Proceedings of the 25th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, wWW '16,‎ p.1419–1428.
    • Fredo Erxleben, Michael Günther, Markus Krötzsch et Julian Mendez, « Introducing Wikidata to the Linked Data Web », The Semantic Web – ISWC 2014, Springer International Publishing, lecture Notes in Computer Science,‎ p. 50–65
    • Denny Vrandecic, « The rise of Wikidata », IEEE Intelligent Systems 28 (4), 2013, p. 90-95.

Quality of data

Wikidata is unique in that it adds provenance data to the declaration of elements as references. This study looks at the quality of the references added, using a Referencing Quality Scoring System (RQSS), which provides quantified scores that make it possible to analyse and evaluate the quality of referencing, and then to check it regularly. The evaluation shows that RQSS is practical and provides valuable information that can be used by Wikidata contributors and project developers to identify quality gaps.

Data quality is a major point of interest in Wikidata. This review assesses the quality of data in Wikidata by analysing 28 articles classified according to the quality dimensions addressed. The review concludes that a number of quality dimensions have not yet been adequately addressed, such as accuracy and reliability. Future work should focus on these aspects.

This study sets out a framework for detecting and analysing low-quality declarations in Wikidata, highlighting the practices of its community. It explores three indicators of data quality in Wikidata, based on: (1) community consensus on recorded knowledge, assuming that statements that have been removed and not readjusted are implicitly considered low quality; (2) statements that have been deprecated; and (3) constraint violations in the data. She combines these indicators to detect low-quality declarations, revealing problems with duplicate entities, missing triples, broken type rules and taxonomic distinctions. Its results complement the Wikidata community's efforts to improve data quality, making it easier for users and editors to find and correct errors.

Comparison between several large cross-domain, freely accessible knowledge graphs (KGs): DBpedia, Freebase, OpenCyc, Wikidata and YAGO. This study provides data quality criteria against which KGs can be analysed, and proposes a framework for finding the KG best suited to a given context.

Wikidata content must be supported by credible references; this is particularly important, as Wikidata explicitly encourages editors to add claims that are not widely agreed, as long as they are corroborated by references. Nevertheless, despite this essential link between content and references, Wikidata's ability to systematically assess and guarantee the quality of its references remains limited. To this end, this mixed-methods study determines the relevance, accessibility and authority of Wikidata references, at scale and in different languages, using online crowdsourcing, descriptive statistics and machine learning.

The collaborative production processes in Wikidata have not yet been explored. It is essential to understand them in order to prevent potentially harmful community dynamics and ensure the long-term viability of the project. This regression analysis examines how the contribution of different types of users, namely robots and human editors, both registered and anonymous, affects the quality of output in Wikidata. In addition, this study examines the effects of length of employment and diversity of interests among registered users. Its results show that a balanced contribution of bots and human editors has a positive influence on the quality of results, whereas a high number of anonymous edits can adversely affect performance. Seniority and diversity of interests within the groups also lead to better quality. These results may be useful for identifying and dealing with groups likely to perform less well in Wikidata. Further work should analyse in detail the respective contributions of robots and registered users.