User:Arch dude/Automated DOY

From Wikipedia, the free encyclopedia

The "day of year" (DOY) articles are a bit messy. In particular, the need for references is contentious. Each of the 365 articles has a lede followed by three main sections: Events, births, and deaths. I propose to replace these three sections using data from Wikidata. We should start with births and deaths, since this is technically easier and since is it requires no modifications to articles. Extending this to events is a bigger effort which can be deferred, especially since most of the contention appears to be about births and deaths.

This approach should solve several problems (or at least what I see as problems):

  • selection: currently, a person is listed only when an editor modifies the DOY article, so many notable people may be missing.
  • ease of update: when a date must be corrected, this occurs in three many places (articles in every Wikipedia, the DOY article, and at Wikidata)
  • referencing: In addition to the correction itself, the reference or references must be propagated to multiple places.

I propose to make Wikidata the primary location for birth and death events. The primary problem with this is that Wikidata itself is a mess for these dates, and in particular for references to support them, but this is a problem that needs to be solved anyway. Rather that fighting over the accuracy and verifiability of dates in DOY, I think we should fight the battle in a single place.

Implementation concept[edit]

Birth dates[edit]

Write a script that extracts all Wikidata items that have:

  • English-language Wikipedia article
  • birth date with a specific DOY, with a reference. If the item has no reference, then it will not be used. A reference to a Wikipedia article is not considered valid.

Build the list with the article link and (when present) the death date. We would not add the reference itself because our DOY article would have a brief explanation that the references are in the Wikidata item.

Death dates[edit]

Write a script that extracts all Wikidata items that have:

  • English-language Wikipedia article
  • death date with a specific DOY, with a reference. If the item has no reference, then it will not be used. A reference to a Wikipedia article is not considered valid.

Build the list with the article link and (when present) the birth date. We would not add the reference itself because our DOY article would have a brief explanation that the references are in the Wikidata item.

Events[edit]

Events are a more difficult issue. They do not necessarily map one-to-one with articles, and the brief description of the event on the DOY page is not currently associated directly with an article. I think we can create a template which can be added to an article. This template will explicitly list the data and description of the event. There are several issues here. One possibility is to add an "event" data item to Wikidata, but this is just a start.

Presentation options[edit]

The Wikidata information can either replace the existing tables, or added as an additional table (either expanded or collapsed), or added as a link to a separate article. I feel that is should replace the existing tables immediately for births and deaths, but events will probably require considerable additional work.

Research[edit]

  • Figure out where the DOY denizens congregate
  • Present this idea and get consensus
  • Learn how to create a Wikidata query
  • learn how to write a Wikipedia script (LUA?)
  • Learn how to access Wikidata from a script
  • Learn how to emit Wikitext (or whatever) from a script
  • learn how to invoke a script when a page loads

Implementation[edit]

(to be updated based on research)

There is a problem with efficiency. We do not want to run three scripts across all of Wikidata every time a user brings up a DOY page. Ideally, we would maintain all 366x3=1098 lists in a centralized fashion, and update a particular list only when a Wikidata item change affects a list. Note that with 300 Wikipedia languages, this is 32,9400 potential lists, but the list maintainer software would only touch a list that that was affected, which means that only the languages that have an article for the affected Wikidata item will be updated.

It is likely that the best place to store a list is as a Wikidata item rather than as a transcludable page within a Wikipedia.

The functionality will be implemented as three separate templates, each of which takes the DOY as a parameter Each template transcludes the relevant pre-built list.

After the lists have been built the first time, we need a process at Wikidata that identifies changes that can affect the lists. Basically, we want to check an entry when the data changes, or when the reference changes, or when a Wikipedia article is added or deleted for the Wikidata item.

To build the lists initially, we will need to operate in a way that minimizes the computational impact at Wikidata. Conceptually, we could simply step through all Wikidata items, and treat any that have birth, death, or event dates as requiring a date update. However, We should instead operate in a batch mode to build the lists one date at a time.

It is not clear how and when to build the wikitext list from the Wikidata list. I think the best approach is to run a script when the user opens a DOY page. The script would check its cached version of the list and serve it if it is up to date, but rebuild it if it is not. Because today's date is likely to have very high usage, the script should not go back to Wikidata if the Wikitext was generated within the last minute.

Conversion[edit]

Not all items in our current lists will meet the criteria for the new lists. The Wikidata item may be missing, or not have a reference for the date. We can transition to the new scheme on a list-by-list basis. We need a script that shows an editor which current items fail, so the editor can update Wikidata. If there are 12 interested editors, each editor will need to check a month's worth of lists. This is handles independently for each Wikipedia language, but the Wikidata updates are used by all languages, so the editors benefit from each ohter's work.