User talk:JL-Bot/Archive 7

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Portal banner

Looking at Portal:Society/Featured articles, I wish that the banner was an editnotice, not something at the top of the page itself. The page is transcluded to Portal:Society, which is reader-facing, and thus shouldn't have all this technical information about how it updates; that information only needs to be displayed if someone goes in and actually tries to edit it manually. (please use {{ping|Sdkb}} on reply) {{u|Sdkb}}talk 21:39, 18 November 2020 (UTC)

@Sdkb: wrap it in <noinclude></noinclude> tags. Headbomb {t · c · p · b} 22:38, 18 November 2020 (UTC)
Headbomb, good thought! Doing that at Portal:Society/Features articles would be easy enough and would solve the issue for that page, but looking at this list, there seem to be quite a few other portals with the same issue, so solving it within the banner itself would be ideal. To do that, would we want recursive noinclude tags (<noinclude<noinclude/>>) that only work in the portal namespace or something? {{u|Sdkb}}talk 22:52, 18 November 2020 (UTC)
It can be easily tackled by AWB if there's a consensus for it. But others may feel differently, and prefer the notice to display to prevent confusion / edit attempts. I don't particularly care either way myself. Headbomb {t · c · p · b} 22:56, 18 November 2020 (UTC)
I'll take care of it manually for the portals linked from the main page. Beyond that, I consider the portal realm a lost cause not worth the effort. {{u|Sdkb}}talk 23:46, 18 November 2020 (UTC)
The template already supports a |mbox=no parameter. What about adding a |mbox=noinclude option? -- JLaTondre (talk) 11:50, 20 November 2020 (UTC)
Could work, but coding for noincludes to be included is often tricky. It's possible though. Headbomb {t · c · p · b} 14:09, 20 November 2020 (UTC)

Fails to report some miscaps

Taxon has entries of TAXON which aren't reported in WP:JCW/MISCAPS.

This is likely due to the (journal) disambiguation. Headbomb {t · c · p · b} 06:13, 14 December 2020 (UTC)

Fixed. Your supposition was correct. -- JLaTondre (talk) 23:36, 19 December 2020 (UTC)

Exclusion with 1= not working

{{JCW-exclude|1=Prop=info|In prep|c=1}}

Doesn't seem to work in WP:CITEWATCH#Pseudo-scholarship. Does |2= need to be set explicitly as well? Headbomb {t · c · p · b} 16:52, 20 December 2020 (UTC)

It wasn't expecting 1= in the first parameter. Added support for that on exclude and other templates. It should work on next run. -- JLaTondre (talk) 20:20, 20 December 2020 (UTC)

Did I use it wrong?

I added

and imagined that would cascade into Wikipedia:WikiProject Military history/United States military history task force#Open tasks (cleanup) but that didn't happen on today's run. Detailed rewrite comments in the article talk page. Did I use the wrong template or can you help out a newbie here? Thanks Justaxn (talk) 23:33, 26 December 2020 (UTC)

JL-Bot does not do anything with {{cleanup rewrite}} and open task listings. It generates recognized content listings (see WP:RECOG) instead. Looking at [1], the project's open tasks list is manually maintained. -- JLaTondre (talk) 19:08, 27 December 2020 (UTC)

Miscaps

Could you add allcaps entries to WP:JCW/MISCAPS? Namely all FOOBAR JOURNAL OF FOO redlinks, but leave FOOBAR bluelinks behind (unless marked by {{R from miscaps}}). Headbomb {t · c · p · b} 20:13, 1 December 2020 (UTC)

Treating punctuation as irrelevant here, so FOOBAL-JOURNAL also gets picked up. Headbomb {t · c · p · b} 05:10, 4 December 2020 (UTC)
That should be straightforward. I will have some free time in a couple of weeks to start knocking the above requests off. -- JLaTondre (talk) 20:34, 4 December 2020 (UTC)
No rush. Headbomb {t · c · p · b} 20:36, 4 December 2020 (UTC)
First cut done & uploaded. It is looking for anything with a Unicode uppercase letter, but without a Unicode lowercase letter. That produces a lot of hits including some valid cases so let me know if you want to tweak the logic. As the additions are all items that don't match to a target, they all appear under nonexistent as the target. {{R from miscaps}} should already be handled by the prior logic, but let me know if something is not working right there. -- JLaTondre (talk) 16:33, 29 December 2020 (UTC)
For ALLCAPS, I'd say exclude everything with a number or 'weird' symbol in it (weird symbols being symbols that are not :, &, ;, -, , , , or ), and exclude everything that's 5 or fewer characters long.
Then have a maybe have a separate category for entries with 5 or fewer categories ('SHORTCAPS', no weird symbols, no numbers), then one with the entries that have numbers and weird symbols in them (WEIRDCAPS). It's possible the last two categories wouldn't be very useful, but for now there's a bit too many unexpected entries to make sense of it. Headbomb {t · c · p · b} 22:44, 29 December 2020 (UTC)
Changes made. I included . in the list of acceptable punctuation. I have to change the saving to not link the target names for these three cases. -- JLaTondre (talk) 22:13, 30 December 2020 (UTC)
Target linking removed for these. -- JLaTondre (talk) 23:09, 30 December 2020 (UTC)

Weirdness with featured pictures

Looking at Portal:Nepal#Featured pictures, what's the rationale for listing the filenames below the pictures? That seems rather... pointless clutter? Headbomb {t · c · p · b} 22:05, 8 November 2020 (UTC)

It provides a description (somewhat) of the picture. Otherwise, you just see the picture without a title, etc. (unless you have a widget/custom JS installed) which is not that informative, helpful in many cases. -- JLaTondre (talk) 01:26, 13 November 2020 (UTC)
I suppose that makes some sort of sense, I just kinda considered it redundant with the filename you see when following the URL / hovering on the picture with your mouse. Maybe stripping 'File:' and the extension would make more sense if it's to be used as a description, e.g. [2]? I could see this going either way. Headbomb {t · c · p · b} 01:40, 13 November 2020 (UTC)
The File: & extension are now removed from the file name in gallery mode (see example). A |no-captions option has also been added. -- JLaTondre (talk) 16:09, 31 December 2020 (UTC)

Double-blurb handling

In [3], the Katie Bouman article (first entry in that revision) has two DYK, causing issues. It reports a missing blurb and no date.

Bot should report both blurbs and both dates. Either as separate entries, or a combined one

or

  • ... that imaging scientist Katie Bouman first learned of the Event Horizon Telescope in 2007, while still in high school, and joined the project six years later? (2019-08-09; 2019-05-09)

I suggest the first (separate entries), since that's likely easier to implement, and you don't have to worry about different blurbs in a combined entry. Headbomb {t · c · p · b} 03:38, 22 October 2020 (UTC)

The issue is caused by the article's talk page using dyk1entry which is an undocumented parameter of {{Article history}} (the help says to use dykentry, dyk2entry, etc.). I'll update it to recognize dyk1entry. As for the multiple, the bot is currently designed to only display the first. I will update it to handle the multiple (using the combined dates), but that will require some rework so will take a bit longer. Redoing the DYK blurb code has been on my TODO list for awhile as it's currently inefficient. -- JLaTondre (talk) 13:29, 25 October 2020 (UTC)
dyk1entry now supported. It will show up in tomorrow's run. -- JLaTondre (talk) 21:00, 30 October 2020 (UTC)
Multiple entries are now supported. When the DYK blurb is a duplicate, it creates a combined entry (single blurb, multiple dates). See Wikipedia:WikiProject Astronomy/Did you know for an example of this with the Katie Bouman case. When the DYK blurbs are different, it will list both blurbs with their individual dates. However, these blurbs will be grouped together by the earliest blurb. See User:JL-Bot/Project content/Trial 5.1 for an example of this with Palembang Light Rail Transit and Atlantic and Pacific Railroad cases. Due to how the content processing works, this was the only way of doing it without an onerous restructuring. When next weekend's run happens, I will check the results, but let me know if you see anything odd. -- JLaTondre (talk) 23:44, 3 January 2021 (UTC)
So far, looks good. Headbomb {t · c · p · b} 03:59, 4 January 2021 (UTC)
There is an issue with blurbs that have multiple question marks (example). -- JLaTondre (talk) 18:48, 9 January 2021 (UTC)
Fixed issue with Article History blurb detection. Re-ran against all pages that output blurbs & things look good. -- JLaTondre (talk) 22:41, 10 January 2021 (UTC)

Bump DOI limit to 10.55000

Both listed here are valid. They should also be listed here if CrossRef returns information about them. Headbomb {t · c · p · b} 03:40, 23 January 2021 (UTC)

Wikipedia:JCW/DOI limit updated to 10.55000. For CrossRef, there are entries in the 10.51### range. I found a different endpoint that will allow me to grab them in bulk vs. iterating through them individually. Besides being much much faster, it will ensure that all are captured vs. depending on a pre-defined range. I will work on that next. -- JLaTondre (talk) 22:03, 25 January 2021 (UTC)
The CrossRef query has been updated to the new endpoint. Results have been uploaded & the deltas page has all the "new" ones. -- JLaTondre (talk) 21:01, 31 January 2021 (UTC)

Question about instructions

When using |content-a-class-articles=, or b or c-class, does one need to use the "Category:" prefix? For example, |content-a-class-articles=Category:A-Class Foo articles, or does one exclude the namespace prefix, like |content-a-class-articles=A-Class Foo articles? Funandtrvl (talk) 20:24, 30 January 2021 (UTC)

Either works. -- JLaTondre (talk) 20:41, 30 January 2021 (UTC)
TY, @JLaTondre: what is |WoRC-cat=yes, it isn't listed in the instructions, but I see it being used by WPUSA, etc. Funandtrvl (talk) 21:21, 31 January 2021 (UTC)
It's not a valid parameter. Someone mistakenly added it to a project years ago and it's just been copied since then. See this discussion. -- JLaTondre (talk) 23:37, 31 January 2021 (UTC)
Oh, that's interesting! Thanks for the update. Funandtrvl (talk) 04:55, 1 February 2021 (UTC)

Assistance requested

Hello, How can I get the bot to run for Wikipedia:WikiProject Zimbabwe/Rhodesia task force please? The C of E God Save the Queen! (talk) 14:13, 23 January 2021 (UTC)

Please see the documentation at Wikipedia:RECOG. You will need to add {{User:JL-Bot/Project content|...}} with the desired parameters. The WikiProject will need to have a category or a template that is used to make all its articles. That would go into the project parameter and then you will need to select what recognized content you wish to display and any formatting options. If you have any specific questions on the options, let me know. -- JLaTondre (talk) 14:35, 23 January 2021 (UTC)
I've set up at test at User:The C of E/testing, though I don't know if I set it up right. Can you take a look please @JLaTondre:? I ask because Rhodesia is a subgroup and when I try to set it up as Zimbabwe/Rhodesia task force it gets a redlink. The C of E God Save the Queen! (talk) 14:51, 23 January 2021 (UTC)
Since it is a subproject, you will need to use a category instead. I changed it over to |category = WikiProject Rhodesia articles & re-ran the bot against that page. -- JLaTondre (talk) 15:03, 23 January 2021 (UTC)
Thank you. The C of E God Save the Queen! (talk) 15:56, 23 January 2021 (UTC)
Could you give the bot a run on Wikipedia:WikiProject Zimbabwe/Rhodesia task force to see if it works @JLaTondre: please? I ask as Plastikspork and Sporkbot have made some alterations to the coding so not sure if that will have any effect? The C of E God Save the Queen! (talk) 16:22, 27 January 2021 (UTC)
Done. -- JLaTondre (talk) 22:03, 27 January 2021 (UTC)
It still wasn't working, because the template included all of the notes from the instruction page. I've updated it now, so hopefully it will work. Funandtrvl (talk) 20:20, 30 January 2021 (UTC)
Your comment confused me, but I figured it out after a bit... The C of E was asking about the recognized content list he added to the Wikipedia:WikiProject Zimbabwe/Rhodesia task force page. You are talking about Wikipedia:WikiProject Zimbabwe/Recognized content/Rhodesia. There are now duplicative recognize content lists that the two of you should sort out. I would suggest keeping the subpage and transcluding it onto the task force page, but that is up to you all. -- JLaTondre (talk) 20:39, 30 January 2021 (UTC)
Thank you for pointing that out. I've moved the content to the subpage, and transcluded it to the main project page. Funandtrvl (talk) 05:46, 1 February 2021 (UTC)

Weird 'Unstability' in JCW

Overtime, I notice these sort of weird apparently non-deterministic inclusions/exclusions that change over time with no apparent cause.

  • [4] Adds journal
  • [5] Removes journal (Tag: 'Reverted')
  • [6] Adds journal
  • [7] Removes journal (Tag: 'Reverted')

What gives? Headbomb {t · c · p · b} 11:32, 9 November 2020 (UTC)

The cases where it is added are publisher only runs. The cases where it is removed are combined publisher & questionable runs. When it runs as publisher only, it sees a match based on the normalization logic (example: "International Journal of Education and Research" matches to "International Journal of Educational Research") and includes it. When it does a combined publisher & questionable run, it sees that the match is a questionable target (example: "International Journal of Education and Research" is on User:JL-Bot/Questionable.cfg/Journals) and it won't include a target as a sub-page of another target. The latter (excluding targets from another target) would seem to be the correct behavior. I will have to think about how to handle this for individual runs (maybe read the config for both types, only process the one needed, but still exclude targets from the other). -- JLaTondre (talk) 01:42, 13 November 2020 (UTC)
I made a change earlier this week that should effectively solve this. It's not the most efficient solution. I've been debating whether I want to spend more time trying to speed it up or just let it go. I could potentially shave off an hour or so of processing time for the individual cases, but it would add a little more complexity to the code. I'll probably leave it as is in the end. There haven't been any config changes to cause a reprocessing. I was going to let a few different run occur and make sure everything looks good before I call it done. -- JLaTondre (talk) 01:21, 18 February 2021 (UTC)
Efficiency in runs is entirely up to you. This, to my knowledge, is the last outstanding 'feature/bug report' I had in mind for a while. Maybe when I get off my ass and think about how to deal with Publishers Cited by Wikipedia I'll cook up something else, but the bot looks to be in very solid shape these days. Headbomb {t · c · p · b} 01:57, 18 February 2021 (UTC)

DOI syntax

There's a special DOI syntax to manually tell CS1/2 templates to shut up when there's an odd but legit DOI, i.e. |doi=((...)).

So the bot should remove (( )) pairs from DOI for everything DOI-related. Headbomb {t · c · p · b} 01:09, 19 February 2021 (UTC)

Done. It will show in tonight's run (which should be the 20210220 full run). -- JLaTondre (talk) 00:28, 22 February 2021 (UTC)

New synonym for new series

Concerning things like this, I didn't think of the Spanish Nueva Serie/Nueva Series. So those should be added whenever you've got time. Headbomb {t · c · p · b} 14:09, 12 March 2021 (UTC)

Done. It will show up whenever an update is triggered. -- JLaTondre (talk) 16:14, 13 March 2021 (UTC)

Citewatch is missing International Digital Organization for Scientific Information

On User:JL-Bot/Questionable.cfg/Publishers, there's

But WP:JCW/Questionable5#International Digital Organization for Scientific Information has no related entries, even if there are hits that should match this, e.g. WP:JCW/J57#Journal of Reproduction & Infertility. Headbomb {t · c · p · b} 19:24, 2 June 2021 (UTC)

Same for User:JL-Bot/Publishers.cfg and

Headbomb {t · c · p · b} 19:24, 2 June 2021 (UTC)

This is happening because Journal of Reproduction & Infertility redirects to Category:International Digital Organization for Scientific Information academic journals, but it is not in the category. When the bot sees a category in the configuration, it pulls all the category contents but as this that one is not in the category, it doesn't get picked up. The simplest solution is to add the category to the redirect. I could change it so it also looks for redirects to categories, but seems like there is a pretty standard convention of categorizing them that works. -- JLaTondre (talk) 00:58, 6 June 2021 (UTC)
Hmmm... I see. No need to update anything then. This one's a bit different than the others because it's also the name of a legitimate journal. I need to think about things some more, but current behaviour is likely fine. Headbomb {t · c · p · b} 01:07, 6 June 2021 (UTC)
Actually, now that I think of it... that would sort of mean the bot is missing a huge bunch of relevant redirects, like the ISO ones, and the variant spelling ones (e.g. "and" vs "&" like the one above). It's probably a good idea to change the behaviour to redirects to the category. It won't affect many, but it'll affect enough that it's worth doing. Headbomb {t · c · p · b} 01:12, 6 June 2021 (UTC)
Change has been implemented. Results can be seen at WP:JCW/Questionable1#International Digital Organization for Scientific Information. It also impacted some other questionable results. However, there was no change to the Publisher output as Publishers.cfg has Category:International Digital Organization for Scientific Information academic journals as only a {{JCW-doi-redirects}} and no corresponding {{JCW-selected}}. As such, it is only looking for DOI based matches. In those cases, do you want the bot to treat the publisher in the {{JCW-doi-redirects}} as a 'selected' publisher? -- JLaTondre (talk) 01:03, 8 June 2021 (UTC)
Yes, that sounds optimal. Headbomb {t · c · p · b} 01:45, 8 June 2021 (UTC)
Done. It has increased Publishers from 24 to 28 pages. The entry for this one is 878 on page Publisher17. As you can see, it is titled "Category:International Digital Organization for Scientific Information academic journals" which is the downside of using the {{JCW-doi-redirects}}. -- JLaTondre (talk) 11:55, 9 June 2021 (UTC)

I'm a bit surprised at the extra four pages of publishers, but I'll take a gander. There is a pattern with the categories as the "name" of a publisher, they'll always be "Category:Publisher academic journals", so you could extract the base "Publisher" from it rather easily. Headbomb {t · c · p · b} 14:02, 9 June 2021 (UTC)

For the four extra pages, everything seems peachy. There's a lot of theses cited as |journal=University of..., which inflates the count, and the rest seems to be false matches that can be bypassed in the usual fashion. Headbomb {t · c · p · b} 14:49, 9 June 2021 (UTC)
Put in the rename of "Category: Publisher academic journals" to "Publisher". It will show up in next run. -- JLaTondre (talk) 22:42, 9 June 2021 (UTC)

JL-Bot RECOG hiccup?

[8] and [9]. Same for [10].

What happened there? Nothing seems to have cause this. Headbomb {t · c · p · b} 19:12, 17 July 2021 (UTC)

For the Wikipedia:WikiProject Tropical cyclones/Showcase case, on the featured & good topics pages, the project template was rolled up into the WikiProject Weather template ( see [11]). These pages are no longer marked with the project template or the main project category and so were correctly removed. Most sub-projects have an "all" category that contains all their related pages. This one does not. It will either need to create one or list all the (many) subcategories of Category:WikiProject Tropical cyclones articles in the bot template. -- JLaTondre (talk) 20:20, 19 July 2021 (UTC)
For the Wikipedia:WikiProject Academic Journals/Recognized content and Wikipedia:WikiProject Academic Journals/Did you know cases, the Wikipedia API only returned 17k+ transclusions for Template:WikiProject Academic Journals. Previous runs have been in 29k+ range. I have seen this before where the API seems to hiccup without giving an error. From the bots perspective, everything looks good and it processes the data the API gave it. I have re-run those two pages and they are back to normal. -- JLaTondre (talk) 20:29, 19 July 2021 (UTC)
I suspected a hiccup, but I wanted to confirm. Thanks for checking. Headbomb {t · c · p · b} 15:15, 20 July 2021 (UTC)

Double checking

Hi all, just double checking, did I set up this page correctly? Best - Aza24 (talk) 14:27, 25 July 2021 (UTC)

It was created after this weekends bot run. I re-ran against that page & you can see the results. Thanks. -- JLaTondre (talk) 20:12, 25 July 2021 (UTC)

Deprecated API queries

Hello!

It seems JL-Bot, or the library it depends on (MediaWiki::API/0.41), is doing numerous deprecated API queries like action=query&prop=info&intoken.

Any chance you can update these queries so they don't stop working in the near future?

See phab:T280806 for more information.

Thanks!

Reedy (talk) 20:31, 3 August 2021 (UTC)

I upgraded MediaWiki::API. The change log is ambiguous, but it refers to an API change and I no longer see intoken in the code. I ran several edits with the new version (those after 23:30, 3 August 2021) and everything looked good. Is there a way to double check that resolved the issue? Thanks. -- JLaTondre (talk) 00:57, 4 August 2021 (UTC)

No update on those this month? Headbomb {t · c · p · b} 14:55, 5 August 2021 (UTC)

Looks like there was a network connection failure during processing. I have re-run it. Results will appear in a few hours. -- JLaTondre (talk) 23:30, 5 August 2021 (UTC)
This is weird. Headbomb {t · c · p · b} 01:23, 6 August 2021 (UTC)
Weird, but accurate. See User:JL-Bot/DOI/10.50000. There was a single prefix and it was listed as CrossRef. It was not returned in the last query. The page is not linked in the listing so think it is fine just leaving, but I can delete it if you prefer. -- JLaTondre (talk) 01:58, 6 August 2021 (UTC)
Current behavior on the listing itself is fine, it's just the removal from the /DOI subpage that's weird to me. Headbomb {t · c · p · b} 04:23, 6 August 2021 (UTC)
It does not create empty pages (so no 10.8000, 10.27000, etc.). This seems the most user friendly method. Why would you want to click on a page to find out it's empty? -- JLaTondre (talk) 11:16, 6 August 2021 (UTC)
Well, User:JL-Bot/DOI/10.50000 page isn't empty, is what I'm saying. If 10.50505 got deleted, then it would be, but as of writing it's there. Headbomb {t · c · p · b} 11:25, 6 August 2021 (UTC)
10.50505 was not returned in the last results. There were no other 10.50xxx results so there was nothing for the bot to write to that page and it was skipped. I realized the delta report was not showing prefixes that no longer were being returned. I updated that and if a prefix is removed, it will now show up listed as NONE under the current column. It will also have a link to the CrossRef API to double check if it was a retrieval issue. I also checked back against prior runs and found one other that was removed so manually added it (That one, 10.23823, has a prefix redirect so you will need to decide what to do. Not sure what removal from CrossRef's database means.). For the 10.50505 case, it is returning as CrossRef today so , if that holds, it will return in next month's run. -- JLaTondre (talk) 01:39, 8 August 2021 (UTC)

Did the rest of the bot get disabled? Normally it would pick up new DOI redirects daily, but the bot hasn't edited anything in 3-4 days now. Headbomb {t · c · p · b} 00:30, 12 August 2021 (UTC)

Library dependency problem. Should be fixed. -- JLaTondre (talk) 01:27, 12 August 2021 (UTC)

Portal:The arts/Recognized content

I put a lot of work into improving image captions in Portal:The arts/Recognized content. This is English Wikipedia, I believe that image captions should be in English, even if the filename has Thai, Cyrillic, Hebrew, or Chinese scripts. You messed up my work, changing, for example:

File:01-พระที่นั่งคูหาคฤหาสน์.jpg|Mansion booths → File:01-พระที่นั่งคูหาคฤหาสน์.jpg|01-พระที่นั่งคูหาคฤหาสน์
File:44444 חדרו של דוד בן גוריון בצריף בשדה בוקר.jpg|David Ben-Gurion's room in his house in Sde Boker → File:44444 חדרו של דוד בן גוריון בצריף בשדה בוקר.jpg|44444 חדרו של דוד בן גוריון בצריף בשדה בוקר
File:Lietavský hrad-východná strana.jpg|Lietavský Castle – east side → File:Lietavský hrad-východná strana.jpg|Lietavský hrad-východná strana
File:Царський курган 007.jpg|Royal mound → File:Царський курган 007.jpg|Царський курган 007
File:神龍蘭亭序全.JPG|Shenlong Lanting Xuquan → File:神龍蘭亭序全.JPG|神龍蘭亭序全.JPG

Please explain your theory of why inscrutable captions are better than meaningful ones. —Anomalocaris (talk) 07:30, 5 September 2021 (UTC)

@Anomalocaris: The short answer is that this is meant to be an automated listing, see the disclaimer at the top of that page. The captions listed reflect the file names. Headbomb {t · c · p · b} 15:11, 5 September 2021 (UTC)
Headbomb: I now know that Portal:The arts/Recognized content is used in Portal:The arts only as a store of names of articles, and the picture gallery isn't used for anything. That raises the question of what the picture gallery is for. If it has no purpose at all, we should get rid of it. If its purpose is to have a place to show all featured pictures, then the picture captions should be exactly correct, with the .jpg or whatever. But this idea of name-mangling image filenames to (usually, but not always) delete the suffix and using that as a picture caption makes no sense at all. –Anomalocaris (talk) 19:01, 5 September 2021 (UTC)
There is a |no-captions option that will result in only the image being displayed if you don't want the filename captions (see the Project content documentation for a full listing of options). The filename caption is designed to provide some description for the pictures. Unfortunately, I have not come up with another option than the filename (image description fields are too erratic to parse reliably). The bot is an opt-in service and it's up to each project / portal to decide if & how they want to use it. How Portal:The arts wants to use it is best discussed at Portal_talk:The_arts. -- JLaTondre (talk) 21:25, 6 September 2021 (UTC)

Doesn't strip .JPG in filenames

See Special:Permalink/1042346192#Featured pictures for example. Possibly applies to other capitalized file extensions. Headbomb {t · c · p · b} 15:13, 5 September 2021 (UTC)

Fixed; will show up in next run. -- JLaTondre (talk) 21:25, 6 September 2021 (UTC)

Cornercase

In here, entry 204, there's a dual listing for iUniverse.

Rank Target/Group Entries (Citations, Articles) Total Citations Distinct Articles Citations/article


204 Wikipedia:List of companies engaged in the self-publishing business
[WP:SPSLIST]
5 5 1.000

Seems to be related to the lowercase IUniverse vs iUniverse. Headbomb {t · c · p · b} 14:01, 15 September 2021 (UTC)

Correct. MediaWiki page names always start with a capital letter. The DISPLAYTITLE keyword was added to allow the display of the name to be overwritten, but it is stored with a capital I in the database. In the configuration for the Wikipedia:List of companies engaged in the self-publishing business entry, it has both Category:Self-publishing companies and iUniverse. If you look at the category, you will see it shows up as a capital I (because of how MediaWiki page names work). Therefore the configuration is asking for both IUniverse and iUniverse. In the output, IUniverse is bolded because it matches to a page name in the database dump, but iUniverse is not bolded as it doesn't match to a page name in the database dump (even though for linking purposes, the MediaWiki software ignores the capitalization of the first letter). -- JLaTondre (talk) 23:30, 15 September 2021 (UTC)
Ah I see, one is via the category, the other hard-coded. I've updated the hard-coded one to be IUniverse too, which should be good enough for now. Headbomb {t · c · p · b} 00:00, 16 September 2021 (UTC)

Was there a bot update?

It just did these [12] [13] out of nowhere. Which don't get me wrong, is great. Just wondering. Headbomb {t · c · p · b} 02:51, 28 September 2021 (UTC)

Yes, the bot logs unknown templates during the database dump parsing. I periodically update it for new ones. While the popular and citewatch pages updated now, the Invalid page ones will update with the next full run (as the individual listing pages are only updated with a new backup). -- JLaTondre (talk) 22:32, 28 September 2021 (UTC)
Which reminds me that things like |journal=Foobar [Barfoo] gets normalized to |journal=Foobar. It should only be normalized to that for purpose of /Target matching (and similar, if something else behaves like /Target), but the output should be listed as Foobar [Barfoo] if possible. Headbomb {t · c · p · b} 01:50, 29 September 2021 (UTC)
I feel like that was an early cleanup before there was the normalization process (when it was just outputting the alphabetical pages). I can move that over to be a normalization instead. What about if it's just |journal=[Foobar]? Do you still want the brackets kept? -- JLaTondre (talk) 01:00, 1 October 2021 (UTC)
I'd say yeah, keep the brackets for displaying (Foobar[Foobar][Foobar), ignoring them purpose of target matching (Foobar = [Foobar] = [Foobar). Headbomb {t · c · p · b} 02:04, 1 October 2021 (UTC)
Done. Results will show up in next updates (popular and citewatch with nightly runs, individual pages with next full run). -- JLaTondre (talk) 00:44, 9 October 2021 (UTC)

missing rank 2?

Rank 2 (Springer Science+Business Media normally) disappeared entirely from the publisher rankings. This is... weird to say the least. Headbomb {t · c · p · b} 22:45, 22 October 2021 (UTC)

Fixed. There was a logic error with paging that occurred when the second entry on the first page would push it over the maximum page size. -- JLaTondre (talk) 17:09, 23 October 2021 (UTC)

Edit at Wikipedia:WikiProject Ohio/Recognized content

In this edit you reverted my edit here. Salvatore Todaro is now a DAB page, and therefore is not a GA. The GA has been moved to Salvatore Todaro (mobster). Please reinstate my edit. Havelock Jones (talk) 01:03, 7 November 2021 (UTC)

@Havelock Jones: the solution here is update the talk page of Salvatore Todaro (mobster) and Salvatore Todaro and the bot will automatically update things during the next run. Also please fix your signature, you have [[User:Have lock Jones|Havelock Jones]] in it when it should be [[User:Havelock Jones|Havelock Jones]] Headbomb {t · c · p · b} 01:58, 7 November 2021 (UTC)
I have never changed my signature, and it has always worked fine in the past. You appear to have introduced an invisible space with this edit. I have no idea why or even how you would do that, but it doesn't seem to be a problem with my signature. Havelock Jones (talk) 09:40, 7 November 2021 (UTC)
Yes my bad. No idea how that got there. Stray spacebar press possibly. Headbomb {t · c · p · b} 15:16, 7 November 2021 (UTC)
The talk page was moved with the article so that was not the issue. -- JLaTondre (talk) 12:55, 7 November 2021 (UTC)
I re-ran the bot on that page and it picked up the change. Given the short period between the move and the bot run, the Wikipedia software probably hadn't finished updating the categories (resulting in it providing the old values when the bot queried for the category). The delay is usually small, but can be longer when there is a lot of maintenance going on. Somewhere there is a page that lists the current times, but I don't remember where. Thanks for letting me know. -- JLaTondre (talk) 12:55, 7 November 2021 (UTC)

Line break issue

Since the October 2 update of Wikipedia:WikiProject LGBT studies/Recognized content, the bot has stopped putting in a line break after the end of the DYK section, resulting in the following all occurring on the same line:

<includeonly>Transcluding 30 of 908 total</includeonly>===In the News articles===

It displays correctly in the report itself (other than the "edit source" link missing for In the News), but causes an error when the DYK section is transcluded. Could you please have a look? Thanks for the bot, it is very useful!--Trystan (talk) 03:05, 9 November 2021 (UTC)

Yes, I will look into it, but probably cannot get to it before this weekend's run. -- JLaTondre (talk) 01:22, 11 November 2021 (UTC)
Fixed. Sorry for the delay. -- JLaTondre (talk) 14:51, 20 November 2021 (UTC)
Thanks!--Trystan (talk) 03:36, 24 November 2021 (UTC)

Could you split it in Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/DOI/10.20000 (10.20000–10.24999) and Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/DOI/10.25000(10.25000–10.29999)? Thanks. Headbomb {t · c · p · b} 03:14, 25 March 2022 (UTC)

That's under your control, ;-) The bot uses the pages specified in {{JCW-Main}}. Add it to the WP:JCW/DOI line and it will get picked up at the next run. -- JLaTondre (talk) 21:59, 25 March 2022 (UTC)
Ah, didn't know that. I'll update the templates then. Headbomb {t · c · p · b} 22:05, 25 March 2022 (UTC)

10.55540 prefixes are valid

I don't know what the current limit is for Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Maintenance/Invalid DOI prefixes, but it could be bumped up to 10.60000. Headbomb {t · c · p · b} 20:32, 22 March 2022 (UTC)

Instead of the current hard limit, I will change it to take the largest value from the DOI listing and round up to next tenth. I should be able to do that this weekend. -- JLaTondre (talk) 22:51, 24 March 2022 (UTC)
Next thousand maybe? Like 10.55542 → 10.56000 ? Headbomb {t · c · p · b} 03:00, 25 March 2022 (UTC)
I picked tenths based on your request to make it 10.6 above. Using hundredths is fine also. -- JLaTondre (talk) 21:53, 25 March 2022 (UTC)
Done. It will be show up in the 04/01 dump run. If you need it updated in current dump, I can force a full run. -- JLaTondre (talk) 21:12, 26 March 2022 (UTC)
No need, it's just a single entry on a maintenance page that already has very few entries. Headbomb {t · c · p · b} 21:57, 26 March 2022 (UTC)

User:JL-Bot/DOI subpages are too big

This reveals that 31 out of the 45 subpages of User:JL-Bot/DOI (sans /Deltas) are too big. Could these be split in chunks of 500 instead? E.g.

...

Headbomb {t · c · p · b} 22:15, 25 March 2022 (UTC)

Done and new results uploaded. -- JLaTondre (talk) 21:13, 26 March 2022 (UTC)

...Still 41 out of 79 pages. In batches of 250 maybe?

... Headbomb {t · c · p · b} 22:03, 26 March 2022 (UTC)

Done. -- JLaTondre (talk) 23:46, 27 March 2022 (UTC)

Wikidata Journal Names

@Headbomb:, I updated the bot to recognize a couple more templates that showed up in the last database dump. However, I ran into an issue. There are two articles that use {{Q}} for the journal name. This template inserts content from Wikidata. The two cases are:

I assume there is an API for Wikidata that will allow me to look these cases up, but I haven't investigated yet. I assume these should be listed as just the journal name and drop the parenthesis id part? -- JLaTondre (talk) 21:24, 26 March 2022 (UTC)

If you can, yes sure. But really those are blights and should likely be converted to proper text inputs, so having a list of such citations would be best. If those are the only two, I've updated both articles already. Headbomb {t · c · p · b} 21:54, 26 March 2022 (UTC)
And if those were the only two, maybe the \{\{Q *\| pattern could be detected in WP:JCW/Patterns, instead of having the bot get coded for a special case that's super uncommon? Headbomb {t · c · p · b} 22:00, 26 March 2022 (UTC)
Sounds good. I will need to make a tweak to ensure the template flows through, but that is easier. -- JLaTondre (talk) 23:51, 27 March 2022 (UTC)
Done. See Wikipedia:JCW/Patterns row 65. -- JLaTondre (talk) 01:38, 29 March 2022 (UTC)

Been a while since the bot touched that page... Some issue with the bot? Headbomb {t · c · p · b} 16:52, 30 March 2022 (UTC)

Checking the logs, the update the other night failed due the Wikipedia API failing to return results on a query. This caused it to not save the last run time for the questionable processing. Subsequent runs were then failing as they couldn't find the last run date to compare against the configuration date. I manually set it and it will run tonight. I should change it so that it always runs if it cannot find the prior run date. -- JLaTondre (talk) 23:04, 30 March 2022 (UTC)

Bot crash?

Things stopped uploading after WP:JCW/Questionable5.

Normally after that one, it edits Template:JCW-bottom-questionable then continues with WP:JCW/Publisher1 etc...

Headbomb {t · c · p · b} 21:17, 3 April 2022 (UTC)

It's hanging when it tries to write Questionable6. I don't see a reason for that. The page is autoconfirmed or confirmed access protected, but the bot is logged in so not sure why that would cause an issue. It doesn't fail either. The call just never returns and never times out which is weird. For the short term, I have it skipping that page. All the others should be updated now. -- JLaTondre (talk) 23:06, 4 April 2022 (UTC)
Maybe it doesn't like the blank page? It wasn't an issue in the past. It failed to write there a few times, but it never stopped the bot entirely. Headbomb {t · c · p · b} 00:01, 5 April 2022 (UTC)
Being blank shouldn't matter. It doesn't do anything with the existing text. It replaces the content. Anyhow, the current results are now only 5 pages so unable to debug further. I will try to remember to look at it after the next dump. -- JLaTondre (talk) 22:32, 6 April 2022 (UTC)

Follow up

I think I figured out why Special:Permalink/1084025401#Bot crash? had an issue with /Questionable6.

{{JCW-Main}} has this line in it

-->{{#ifexist:Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Questionable6|{{#if:{{Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Questionable6}}| * [[Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Questionable6|6]]}}}}

When the bot attempts to write at /Questionable6, this causes a transclusion loop, which is either warned against, or prevented entirely. I'll try to tweak the template's logic to prevent such a loop. Headbomb {t · c · p · b} 17:01, 28 April 2022 (UTC)

It seemed like it was failing on the read actually. It's currently back down to only 5 pages so unable to test again. -- JLaTondre (talk) 16:36, 1 May 2022 (UTC)
I guess we'll see with the new dump. Though it too might not reach 6 pages. Headbomb {t · c · p · b} 18:45, 1 May 2022 (UTC)
Yeah, this month's dump only resulted in 5 pages. -- JLaTondre (talk) 23:46, 3 May 2022 (UTC)
Yup, chocked on /Questionable6 again Headbomb {t · c · p · b} 19:49, 22 May 2022 (UTC)
Fixed. -- JLaTondre (talk) 21:43, 22 May 2022 (UTC)

Doesn't display AWA: La revue de la femme noire correctly

Apparently it thinks it's an interwiki link. You can find the issue in WP:JCW/A99, below the 'The AWA Review' entry.

The solution here is likely to :-pad every entry with a 2 or 3 letter before the :. So [[XX: ...]] and [[XXX:]] both become [[XX: ...]] and [[XXX:]]. That or the bot's interwiki list needs an update. Headbomb {t · c · p · b} 19:22, 3 July 2022 (UTC)

Fixed. The individual pages are updating. If it has any impact on the specified, etc. lists, it will show up in their next update. The bot uses a configuration file for interwiki and interlanguage prefixes as they typically don't change very often. It is created from Special:Interwiki. Reading that page at dump processing time is on the "nice to have list". -- JLaTondre (talk) 20:18, 4 July 2022 (UTC)

Would it be possible to make a thing similar to WP:JCW/TYPO, but for diacritics instead? Basically, list everything tagged by

As well as redlinks that are different diacritics of existing targets (e.g. Journal of ZoölogyJournal of Zoology)

Rank Target Entries (Citations, Articles) Total Citations Distinct Articles Citations/article


1 Journal of Zoology 3 1 3.000
2 Zeitschrift für Physik 2 1 2.000
2 Tohoku Mathematical Journal 1 1 1.000
Implemented the first part (using the templates). That was just a copy of the Template:R from misspelling processing so easy. I will have to think through how to handle any arbitrary diacritic in a red link. Let me know if you see any issues with the current portion. -- JLaTondre (talk) 17:48, 26 July 2022 (UTC)
Looks good so far! Headbomb {t · c · p · b} 18:03, 26 July 2022 (UTC)
Do you have an actual example for the red link case? The Journal of Zoölogy example does not appear to exist in the latest dump (see J76). I coded something up but did not find any results. I suspect that means a logic error, but need a valid case to test with. -- JLaTondre (talk) 21:20, 26 July 2022 (UTC)
I'll try to dig something. I don't know of an example of the top of my head, but very likely German journals will have something with fur instead of für, like in current entry 160. Headbomb {t · c · p · b} 21:54, 26 July 2022 (UTC)

I found three

  1. Zeitschrift fur Anorganische und Allgemeine ChemieZeitschrift für Anorganische und Allgemeine Chemie
  2. Zeitschrift fur KinderheilkundeZeitschrift für Kinderheilkunde
  3. Zeitschrift fűr Anorganische und Allgemeine ChemieZeitschrift für Anorganische und Allgemeine Chemie

There's likely some more too, but it's a hard thing to search for. Headbomb {t · c · p · b} 22:05, 26 July 2022 (UTC)

Red link case has been added. Please check out the latest results. -- JLaTondre (talk) 00:42, 28 July 2022 (UTC)
Seems to work just fine, I'll let you know if anything's off. Thanks a bunch! Headbomb {t · c · p · b} 03:40, 28 July 2022 (UTC)

Could you add those tagged by {{R from incorrect name}} to that page? Headbomb {t · c · p · b} 13:49, 1 August 2022 (UTC)

I assume as a new section? -- JLaTondre (talk) 23:21, 1 August 2022 (UTC)
Nah, I'd keep it in the same. They're fairly interchangeable. Headbomb {t · c · p · b} 23:35, 1 August 2022 (UTC)
I ended up breaking it out as it's own subpage: Maintenance/Incorrect Name. They really didn't fit misspellings. As the logic repeated across the various sub-types, I consolidated some of the code. That caused some changes in the prior results that I need to go through. A couple of things were reported under misspellings that I would next expect so I may be doing some more tweaks based on what I find. -- JLaTondre (talk) 01:10, 4 August 2022 (UTC)
Many of them were mis-tagged, and the rest of them are pretty much typos, or not conceptually different from them. I still think they should be merged together. It'll be 13 entries or so, and perhaps only 2-3 entries in the next dump. Headbomb {t · c · p · b} 02:37, 4 August 2022 (UTC)
The WP:JCW/Typo listings however, has now many things that are neither typos nor incorrect names. Headbomb {t · c · p · b} 02:46, 4 August 2022 (UTC)
User:Anomie/linkclassifier.js is useful here if you don't make use of it (I also have a modified version at User:Headbomb/linkclassifier.js that highlights diacritic categories and {{R from ISO 4}} and similar). More or less, if it's in red or orange, it should be on those pages. There might be exceptions, and some have since been re-classified as {{R from long title}}/{{R from alternative title}}, but they should be rare. Headbomb {t · c · p · b} 03:27, 4 August 2022 (UTC)
Okay, I reverted prior changes and rolled it into the misspellings. It was a more straightforward change. -- JLaTondre (talk) 00:07, 5 August 2022 (UTC)
Thanks. Looks good! Headbomb {t · c · p · b} 00:39, 5 August 2022 (UTC)
One thing I notice is those maintenance pages tend to be updated once a dump. It would be nice if they could be updated alongside the rest to reflect changes in categories. It's not critical, but I doubt this would take much extra computational time. Headbomb {t · c · p · b} 00:54, 5 August 2022 (UTC)
Done. It was set to run on pattern config changes. I updated it to run on every run. It takes about 20 mins to process all the maintenance types. -- JLaTondre (talk) 19:10, 5 August 2022 (UTC)

One time custom run?

The pattern .* : .* is busting up WP:JCW/Patterns. Could you do a one time run on a separate page for that pattern alone? This would let me bring it down to something more reasonable. Headbomb {t · c · p · b} 02:09, 18 August 2022 (UTC)

Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Maintenance/Patterns/Semi Colons The table template would not handle it so I skipped the table. -- JLaTondre (talk) 23:14, 18 August 2022 (UTC)
Thanks! I should be able to get through with it by the time the next dump is generated. Headbomb {t · c · p · b} 01:12, 19 August 2022 (UTC)

Could you add these new categories/templates to the typo listing? Thanks. Headbomb {t · c · p · b} 13:34, 20 August 2022 (UTC)

Done. -- JLaTondre (talk) 15:34, 21 August 2022 (UTC)

RECOG expansion size

Some pages have ballooned in size over time, e.g. Portal:Association_football/Recognised_content and they run into template expansion issues. This fixes it rather easily. Could that be implemented? Headbomb {t · c · p · b} 08:42, 6 November 2022 (UTC)

Done. It will show up in next week's run since this week's run is already ongoing. -- JLaTondre (talk) 16:35, 6 November 2022 (UTC)

Lint war on WikiProject Plants

(ping operator @JLaTondre) Hi, this bot is editwarring to keep two specific errors intact on Wikipedia:WikiProject Plants/Recognized content. I and multiple editors in the past few months all appear to be at a loss at how to correct the errors and not have this bot reinstate them each week. My edit correcting the two Lint issues (an unpaired rouge italics after the wikilink [[crow]], and a wonky issue* with any file captions ending in px which triggers erroneous "bogus-image-options" errors). This bot's next edit removed my changes, just like every other time a human fixed these, and this "fight" goes back back all the way to May 2022. Are these an issue with the bot itself, or do these issues need to be corrected on another page that the bot is using as the source for each entry?

*I believe the px issue is a site wide issue not just a thing with this bot, but this is the only place I've encountered it so far. Adding a {{sp}} to the end of the caption ending in "px" clears the issue and is the cleanest solution I've seen for this so far. Zinnober9 (talk) 19:27, 12 December 2022 (UTC)

As is stated on the page header, the recognized content lists are automatically generated. Manual edits will always be overwritten on the next run. For the DYK entry, you would need to fix the DYK text on the respective article (so Talk:Commersonia dasyphylla in this case). The bot copies the DYK entry text verbatim. The DYK entry should be fixed regardless of the bot as the DYK text should be properly formatted on the article talk page. The extra '' is impacting the template presentation as it is italicizing the "A record of the entry may be seen at Wikipedia:Recent additions/2010/July" which normally is not italicized. For the px issue, I will update the bot to add {{sp}} at the end when the name ends in #px. Hopefully I can get that in by this weekends run, but it might the next one after that. -- JLaTondre (talk) 22:40, 12 December 2022 (UTC)
i.e. this is the fix. Headbomb {t · c · p · b} 00:10, 13 December 2022 (UTC)
Ah, thank you Headbomb. Figured that was the case (like transcluded errors), I just hadn't found that page yet in my searches.
@JLaTondre Thank you, that'll be great. Next week is fine, no rush, just didn't want the issue to perpetuate indefinitely. Since the files this px issue occurs with are commonly named File:somethingPX.type, any fix will need to avoid adding the {{sp}} to the file name (like File:somethingPX{{sp}}.type which would break the filename). So long as it's only the captions that get the appended px{{sp}}, that will be great. Happy holidays! Zinnober9 (talk) 05:32, 13 December 2022 (UTC)
Done. Change made and Wikipedia:WikiProject Plants/Recognized content updated. It will role out to all other recognized content pages with this weekend's run. -- JLaTondre (talk) 01:18, 15 December 2022 (UTC)

Issue with this month's JCW dump?

Is there an issue with this month's dump? The DOI compendium updated, but the compilation didn't. That's never happened before. Headbomb {t · c · p · b} 20:01, 3 April 2023 (UTC)

It bombed out as the DOI redirects processing did not account for RFD templates (as is currently on 10.15406). I worked around that and kicked the processing off again. It will take awhile to complete. JLaTondre (talk) 22:15, 3 April 2023 (UTC)
Thanks. Headbomb {t · c · p · b} 22:23, 3 April 2023 (UTC)

What am I missing here?

In here, the bot choked, saying there was "No template or category parameter found", despite |template=WIR-251 being present. What gives? Could you re-run the bot once the issue is fixed? Headbomb {t · c · p · b} 20:59, 1 July 2023 (UTC)

It was the space indentation before the parameters. For a quick fix, I updated the pages to remove the spaces and re-ran the bot against those pages. I will update the bot to allow spaces to be present (I will leave DoNotArchiveUntil on this conversation until that is done). For the updated results, on 276, it removed the manual entry. However, this appears to be "correct" as {{WIR-276}} is not on Talk:María Pérez Rabaza. -- JLaTondre (talk) 00:03, 5 July 2023 (UTC)
Fixed. Spaces are now allowed before the bot parameters. -- JLaTondre (talk) 12:51, 8 July 2023 (UTC)

User:JL-Bot/Publishers.cfg exceeds expansion limit

Could it be split in 2 at /Publishers1.cfg (A–M)and /Publishers2.cfg (N–Z + others)? Headbomb {t · c · p · b} 22:58, 2 July 2023 (UTC)

Yes, doable. I will try to set aside some time in the next few days to start trimmining away at the back log on this page. -- JLaTondre (talk) 00:08, 5 July 2023 (UTC)
I am going to create a User:JL-Bot/Publishers1.cfg and User:JL-Bot/Publishers2.cfg with a small selection of publishers each so I can work on this. Once I have it working, the full split can happen. -- JLaTondre (talk) 15:36, 8 July 2023 (UTC)
@Headbomb: I've got this working. However, do you want to use Publishers.cfg/One and Publishers.cfg/Two instead (similar to how Questionable.cfg is broken out)? That would avoid having to update WP:JCW/PUBSETUP and other templates which could continue to point to Publishers.cfg as the parent with common instructions. Set it up however you want and let me know, I can easily change the page names in the bot. -- JLaTondre (talk) 17:29, 8 July 2023 (UTC)
I'm not sure what you mean by Publishers.cfg/One and Publishers.cfg/Two...? Like instead of Publishers.cfg/1 Publishers.cfg/2 it's literally Publishers.cfg/One Publishers.cfg/Two? Or like.... named sections like Publishers.cfg/A-M, Publishers.cfg/N-Z, Publishers.cfg/Others? Headbomb {t · c · p · b} 18:42, 8 July 2023 (UTC)
I interpreted your original post as two pages under User:JL-Bot/ (as per the test pages I created). I was suggesting two subpages under User:JL-Bot/Publishers.cfg/. I don't care whether they have generic names (which would allow expansion and reordering without renaming) or specific names (which would be clearer but would require renaming if future reordering needed). At this point, it is just a configuration setting so pick the names that work best for the project and let me know. -- JLaTondre (talk) 21:22, 8 July 2023 (UTC)
Oh, sure. Subpages are fine. I think /Publishers.cfg/A–M, /Publishers.cfg/N–Z, /Publishers.cfg/Others would be the clearest scheme. Headbomb {t · c · p · b} 22:23, 8 July 2023 (UTC)
Okay, I have the bot setup to use:
I also created those pages by moving the relative settings from Publishers.cfg to the subpages. It should execute off them tonight. I will leave it to you to format the instructions, ToC, etc. as you see fit. -- JLaTondre (talk) 00:11, 9 July 2023 (UTC)
Thanks. I've updated the pages accordingly. It might need some tweak, but it's good enough for now. Headbomb {t · c · p · b} 00:51, 9 July 2023 (UTC)

Transcluding X of X total tweak

If this tweak could be done to the transcluding so-much of the total types of notes, that would be great. This way people could be taken to the full list easily. Headbomb {t · c · p · b} 16:54, 7 July 2023 (UTC)

Done. It will show up in today's run. -- JLaTondre (talk) 13:05, 8 July 2023 (UTC)

If you have time...

There's this MFD. I've been trying to kill WP:JCW/Questionable6 for ages but I'm running into ... let's call it zealous burocracy. Headbomb {t · c · p · b} 18:44, 8 July 2023 (UTC)

Article history in the news

Currently, JL-Bot does not understand the "|itn1date=" parameter in Template:Article history. It understands "|itndate=", but the numbers are needed if there are multiple In The News events. CMD (talk) 12:13, 2 April 2023 (UTC)

Yeah, it was not designed with more than 1 per article in mind. Do you have an example page with multiple and the recommended content list it should show up on? Thanks. JLaTondre (talk) 21:49, 3 April 2023 (UTC)
One example is Talk:North Macedonia, which has |itn1date=13 February 2019|itn1link=Special:PermanentLink/883114594|itn2date=27 March 2020. Currently it does get listed on Wikipedia:WikiProject Countries/Assessment#In the News articles, but with no date. (Some others also are getting listed with no date, because they use itn1date despite only having one event.) CMD (talk) 09:48, 4 April 2023 (UTC)
I think the advice in the template documentation is to use itndate rather than itndate1, whether or not there is an itndate2, etc.--Trystan (talk) 15:18, 30 April 2023 (UTC)
For the short term, I have updated the bot to look for itn1date as well as itndate. I ran it against Wikipedia:WikiProject Countries/Assessment to verify the change. It will show up on other pages in today's run. I have added to my to do list the ability to support multiple dates for In The News (similair to how multiple DYK dates are handled). Unfortunately, that is going to require a bit of rework so I will likely not get to for some time. -- JLaTondre (talk) 14:47, 8 July 2023 (UTC)

RECOG parsing

This or that have several display issues, mostly related to the parsing of inline templates.

Search for | and you'll find things like stray |, |dyk1nom=... Headbomb {t · c · p · b} 09:28, 25 June 2023 (UTC)

Okay, thanks. I will take a look at that. Hopefully be able to get to it next week. -- JLaTondre (talk) 21:19, 28 June 2023 (UTC)
Done. This will show up in today's results. -- JLaTondre (talk) 15:23, 8 July 2023 (UTC)
Wikipedia:WikiProject Conservatism/Recognized content/DYK still shows dyk1nom etc. Some seem related to dyk2nom etc... See also [14]. Headbomb {t · c · p · b} 02:13, 9 July 2023 (UTC)
The bot caches the output of the talk page parsing. When getting category contents / template transculsions, it gets the current talk page revid and so only has to update the parsing if the page has changed. For the itn1date above, I flushed the cache for the impacted entries, but forgot to do that for the dykblurbs. I just purged them and am rerunning against the pages in that query. Should be resolved for real now. -- JLaTondre (talk) 15:38, 9 July 2023 (UTC)

A heads up for an upcoming piece focusing on the history of JCW. Anything I missed or should highlight? Any feedback welcome. Headbomb {t · c · p · b} 02:57, 18 July 2023 (UTC)

Looks good. Here are a couple suggestions:
  • Under "How does it work, exactly?", it might be worth reiterating in the first paragraph that the results are from a Wikipedia dump and not "live". Perhaps "...from citation templates on the English Wikipedia as of the last Wikipedia backup."?
  • Templates are replaced/removed from the citations as well. So for example, {{small|text}} becomes text and {{paywall}} is removed.
-- JLaTondre (talk) 22:48, 18 July 2023 (UTC)
For the first item, I mentioned dumps again in that section. For the second, I've rephrased the 2nd bullet to mention templates. Thanks for the feedback! Headbomb {t · c · p · b} 23:23, 18 July 2023 (UTC)

Some missed runs...

I think I set them up right after JL-bot started the last RECOG run... so if you could rerun the bot on

It also skipped Wikipedia:WikiProject Women in Green/DYK, setup on the 6th, so I don't know what's wrong with that one.

Anyway, running the bot on those would be peachy! Headbomb {t · c · p · b} 11:09, 9 July 2023 (UTC)

I've also added a new parameter |sortkey=, to User:JL-Bot. My impression was that if it doesn't recognize it, it'll just skip it, but I want to confirm. These would be the affected pages. Headbomb {t · c · p · b} 11:15, 9 July 2023 (UTC)

Once it completes re-running the dykblurbs above, it will run against the Meetup pages. For Wikipedia:WikiProject Women in Green/DYK, the log shows it ran and saved results. I am wondering if the results are too big. Do you know what the maximum size of a Wikipedia page is? I know there is some limit and can research it. -- JLaTondre (talk) 15:41, 9 July 2023 (UTC)
I don't know of any such limits, but I've never seen any page above 2MB. You start running into issues above 1MB, but that's mostly making slow to load/edit pages, and template expansion limits. Those pages should be far from reaching those limits however, given WikiProject Women's RECOG reports run just fine, and they'll encompass all women-related articles, not just those created at those events. Headbomb {t · c · p · b} 15:51, 9 July 2023 (UTC)
Though it's true DYK pages get massive... bot could create /1 /2 /3 /4 subpages? Id's still be surprised if Women in Green's DYK page would exceed Wikipedia:WikiProject Women in Red/DYK's 500KB page though. Headbomb {t · c · p · b} 17:02, 9 July 2023 (UTC)
I reran Wikipedia:WikiProject Women in Green/DYK, saved the contents to a file, and manually tried to edit the page & save the results. I received the following, "ERROR: The text you have submitted is 2,094.439 kilobytes long, which is longer than the maximum of 2,048 kilobytes. It cannot be saved." The Green page specifies 7 templates & 8 categories pulling in over 600k articles and resulting in about 10.4k DYK blurbs. For comparison, Wikipedia:WikiProject Women in Red/DYK only specifies one category of 45.6k articles and results in 2.5k DYK blurbs. If you are expecting Green to be less then Red, you may want to look at the configuration of that page. -- JLaTondre (talk) 23:17, 9 July 2023 (UTC)
Yeah it'd need to be broken in two+ then... I suggest 1MB/page top. Or a per year page like .../DYK/2012 .../DYK/2013 .../DYK/2014. I guess I underestimated Category:All WikiProject Women-related pages vs Category:All WikiProject Women in Red pages. Headbomb {t · c · p · b} 02:14, 10 July 2023 (UTC)
An initial change has been made to show a message when the output exceeds the maximum page size. This will show up in tomorrow's run. Next, I will add a parameter to breakup DYK into subpages based on year. -- JLaTondre (talk) 23:27, 21 July 2023 (UTC)
A |dyk-blurb-paged parameter has been implemented to breakup DYK into subpages based on year. It creates a subpage per year and transcludes these pages (showing a max of 10 results) to the parent page. I have added the parameter to Wikipedia:WikiProject Women in Green/DYK and ran the bot against that page so you can see the results. -- JLaTondre (talk) 19:25, 22 July 2023 (UTC)
Looks wonderful. I cleared the 'others'... will the bot delete that page on the next run, or will it just blank it? Headbomb {t · c · p · b} 19:41, 22 July 2023 (UTC)
Neither. It will only update subpages that have content for the project. If a subpage is no longer valid, it will just be orphaned. My hope is this doesn't happen often and projects can just clean up any orphans on their own. If it becomes an issue, I will add a check, but trying to avoid unneeded code complication. I reran against that project and deleted the Other subpage. -- JLaTondre (talk) 20:11, 22 July 2023 (UTC)

Logic tweak?

This edit seems to indicate that something was changed... For the better I might add! Headbomb {t · c · p · b} 15:29, 24 July 2023 (UTC)

Added handling of additional templates from this last dump. See [15]. -- JLaTondre (talk) 22:10, 24 July 2023 (UTC)
Speaking of those
WP:JCW/Publisher21#Villanova University still has a {{convert
Headbomb {t · c · p · b} 15:01, 25 July 2023 (UTC)
The Publisher and Questionable pages don't handle a midcycle update in the individual pages. I purged the cache they work from and they should update with tonight's processing run. -- JLaTondre (talk) 22:47, 26 July 2023 (UTC)
/Publisher don't seem to have been processed today... The run normally finishes at noon-ish (my time), and it finished at 5am this morning. Headbomb {t · c · p · b} 14:29, 27 July 2023 (UTC)
Sorry, it didn't handle purging the cache like I expected. I fixed it and it will run tonight. -- JLaTondre (talk) 23:12, 28 July 2023 (UTC)

Vital

Does the bot or documentation need to be updated wrt to vital articles?

All those vital article categories are now empty. Headbomb {t · c · p · b} 02:42, 27 July 2023 (UTC)

It looks like the "Category:All Wikipedia level-x vital articles" categories have been replaced by "Category:Wikipedia level-x vital articles" categories? If so, I can update the bot to use those instead. -- JLaTondre (talk) 23:34, 28 July 2023 (UTC)
Done. -- JLaTondre (talk) 21:35, 29 July 2023 (UTC)
Could you rerun it, since it purged all vital articles in Today's run? Headbomb {t · c · p · b} 22:28, 29 July 2023 (UTC)
Running... -- JLaTondre (talk) 23:06, 29 July 2023 (UTC)

new interwiki, ACE

WP:JCW/Publisher21#Polytechnic University of Catalonia has a few entries that get mangled up on account of the interwiki link. Headbomb {t · c · p · b} 22:24, 27 July 2023 (UTC)

Okay, I think I see the problem. I will try to get this in tomorrow. -- JLaTondre (talk) 00:01, 29 July 2023 (UTC)
No rush. Headbomb {t · c · p · b} 00:03, 29 July 2023 (UTC)
Done. It will show up in next run. -- JLaTondre (talk) 23:06, 29 July 2023 (UTC)

Autosigned?

There's some weird stuff going on with [16], [17], etc... Headbomb {t · c · p · b} 06:26, 30 July 2023 (UTC)

I figured out the issue, but I will not be able to work it for a bit. It is expecting the class to be in {{Vital article}}, but for these it was moved to the project banner shell. The pattern matching gets overzealous and matches the class from the autosigned further down the page. I will not be able to fix this before next week's run, but hopefully before the one after that. -- JLaTondre (talk) 10:55, 30 July 2023 (UTC)
Cool. I'll keep the note about vital being busted in the RECOG header then. Maybe adjust to partly busted. Headbomb {t · c · p · b} 13:16, 30 July 2023 (UTC)
Fixed. I removed the note from the RECOG header. -- JLaTondre (talk) 22:05, 9 August 2023 (UTC)