Module talk:Random portal component
This module does not require a rating on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||
|
Script errors
edit@Mr. Stradivarius: The recent changes to this module cause every portal that is missing a page to show up in Category:Pages with script errors. What should we do about this? Jackmcbarn (talk) 23:07, 4 December 2014 (UTC)
- @Jackmcbarn: It looks like catching the error and putting them in a separate tracking category would be best. — Mr. Stradivarius ♪ talk ♪ 23:21, 4 December 2014 (UTC)
- I've changed the module to catch errors in portals with non-existent content subpages and put them in Category:Portals needing attention. — Mr. Stradivarius ♪ talk ♪ 23:39, 4 December 2014 (UTC)
Subpage tracking algorithm
edit@BrownHairedGirl: Thank you for your recent addition of the subpage tracking code. :) I noticed that in Template:Portals with subpages count tracking category you have commented out categories for subpage counts of 51–100, 101–200, 201–500, 501–1000, and >1000, presumably because checking for them would exceed the expensive function count. How about using the shortcut of only checking the boundary pages exist, instead of checking that every page exists? The algorithm would go something like this:
- Check whether subpage #2 exists. If it exists, then move to step 2. If it doesn't, put the page in the category Category:Random portal component with less than 2 available subpages and exit.
- Check whether subpage #6 exists. If it exists, then move to step 3. If it doesn't, put the page in the category Category:Random portal component with 2–5 available subpages and exit.
- Repeat with page #11, #16, #21, #26, #31, #41, #51, #101, #201, and #501.
- Finally, check whether subpage #1001 exists. If it exists, then put the page in Category:Random portal component with more than 1000 available subpages. If it doesn't, put the page in the category Category:Random portal component with 501–1000 available available subpages.
This way you use at most 13 expensive function calls - one for each category. However, there is the drawback that the algorithm won't work properly if there are gaps in the subpage numbers. For example, if a portal has subpages #1, #3 and #4, but #2 doesn't exist, then the algorithm will put it in the "less than 2" category, when it should be in the "2–5" category. Let me know if you think this would be an acceptable trade-off or not. Best — Mr. Stradivarius ♪ talk ♪ 06:17, 2 May 2019 (UTC)
- Many thanks, Mr. Stradivarius. That's a very thoughtful suggestion, and yes expensive function count was the issue. It caused the modules to fail on big sets, leaving redlinks and error messages, and sadly triggering some dramas like MFD:State-level road portals.
- I think that as presented, it is open to gaming the system by creating only the boundary pages ... but if it was accompanied by some randomised checking of pages between the boundaries, then gaming could be avoided. E.g. if page 101 exists, then check a random sample of pages in the 51–100 range, and if any are missing, then categorise that somehow. That would probably still mean the end of the mismatch-checking, which would be a pity.
- However, the really thoughtful feedback since creating the tracking system has made me re-evaluate my initial assumptions. I started out by simply copying and hacking the code I had used for automated portals, and retained the same thresholds. But now that I have looked at the results, I am less persuaded that the higher-numbered categories are useful. I view any set over 50 as "plenty big", and am nt sure that it's worth dividing that into "more than plenty big", "much more than plenty big", "shedloads more than plenty big" etc.
- I think that @Espresso Addict regards that sort of number as a set that probably needs splitting. I hope I haven't put words in EA's mouth, but maybe EA will respond to the ping and lets us know what they think of breaking own the over-50 set. --BrownHairedGirl (talk) • (contribs) 20:30, 2 May 2019 (UTC)
- It's just a personal view, but I've found something in the 20–40 range is probably optimal for each individual portal textbox. That way it usually comes up with something different when refreshed (and if you've got two columns side by side, at least one will nearly always change), but readers stand a chance of actually seeing the content. I'm not sure dividing the tracking category above 50 is therefore all that useful, except to identify portals with over-large selections that should be split up. But that isn't a good reason to take a portal to a deletion discussion, so I'm not sure such a tracking category would be all that useful. Espresso Addict (talk) 21:10, 2 May 2019 (UTC)
- @BrownHairedGirl and Espresso Addict: I have an idea: how about using an exponential search algorithm to find the highest subpage number? Exponential searches run in log(n) time, so even if a portal has 1000 subpages the algorithm will only need to make about 20 expensive function calls. This way we can deprecate the
|max=
parameter; if we can find the highest subpage number from the exponential search, then we can use that for both the subpage tracking category and the random page generation, and save portal maintainers the trouble of keeping track of the number of subpages themselves. To give you an idea of how the algorithm works, consider a page with five subpages: #1, #2, #3, #4 and #5. If we start checking at subpage #1, it will work like this:- Check whether subpage #1 exists. It does, so we know there must be 1 or more subpages.
- Double the subpage number. This gives us subpage #2. Subpage #2 exists, so we know there must be 2 or more subpages.
- Double the subpage number. This gives us subpage #4. Subpage #4 exists, so we know there must be 4 or more subpages.
- Double the subpage number. This gives us subpage #8. Subpage #8 does not exist, so we know there must be between 4 and 7 subpages.
- Find the mid-point between 4 and 8. This gives us subpage #6. Subpage #6 does not exist, so we know there must be either 4 or 5 subpages.
- Find the mid-point between 4 and 6. This gives us subpage #5. Subpage #5 exists, so we know there must be 5 subpages.
- However, this algorithm also has the drawback of producing inaccurate results if there are gaps in the subpage numbers. For example, if there was no subpage #4 in the example above, the algorithm would stop looking for any subpages higher than #4, and would return subpage #3 as the highest. If there are gaps in subpage numbers, then this might mean that some subpages are excluded entirely from being displayed, depending on the exact subpage numbers that are missing and what number we start the search from. To find those, perhaps we can use a bot. (We should probably use a bot to get rid of all the deprecated
|max=
parameters anyway, if we decide that this is a good idea.) Let me know what you think. Best — Mr. Stradivarius ♪ talk ♪ 09:05, 3 May 2019 (UTC)- Interesting idea, @Mr. Stradivarius, with some advantages and some disadvantages.
- Personally, I created these trackers primarily for immediate rough triage, to help distinguish well-developed portals from still-born portals. However, if we want to retain magazine-style portals (which @EA does, but I am much less keen on), then I think it should be a high priority to dismantle as far as possible this hideous proliferation of content-forked subpages, and adapt some of the code developed last year to simply have a list of articles and automatically extract the MOS:LEAD. So I don't want to spend much time overpolishing the forked system.--BrownHairedGirl (talk) • (contribs) 09:17, 3 May 2019 (UTC)
- @BrownHairedGirl and Mr. Stradivarius: I think that if the max is out of step with the actual number of subpages, or there are gaps in the subpage list, the portal maintainer wasn't very competent and there will be other more significant flaws. I love my over-polished manual portals, but I don't know many portal maintainers (aside from the Opera project) who put in as much time as I have to creating the blurbs. However there are significant problems of quality with the lead-extract process that never got sorted (and might be harder to do than a non-programmer like me thinks). It might well be better than nothing for a low-quality unmaintained portal, but my experiments with it were not positive in trying to generate a moderate-quality one. It's very frustrating because I can see BrownHairedGirl's problem with the highly multiple-subpage model. Espresso Addict (talk) 10:07, 3 May 2019 (UTC)
- @BrownHairedGirl and Espresso Addict: I have an idea: how about using an exponential search algorithm to find the highest subpage number? Exponential searches run in log(n) time, so even if a portal has 1000 subpages the algorithm will only need to make about 20 expensive function calls. This way we can deprecate the
- It's just a personal view, but I've found something in the 20–40 range is probably optimal for each individual portal textbox. That way it usually comes up with something different when refreshed (and if you've got two columns side by side, at least one will nearly always change), but readers stand a chance of actually seeing the content. I'm not sure dividing the tracking category above 50 is therefore all that useful, except to identify portals with over-large selections that should be split up. But that isn't a good reason to take a portal to a deletion discussion, so I'm not sure such a tracking category would be all that useful. Espresso Addict (talk) 21:10, 2 May 2019 (UTC)
Expensive parser functions
editI've been trying to fix the expensive parser functions limit at Portal:San Francisco Bay Area, but it isn't behaving at all like how I expected. When commenting out one Random portal component call it reduces the count by one but if comment out two it removes however many subpages there are. Changing the amount of subpages used does nothing. I have no idea why this is happening and some help would be appreciated. Maybe a mode without checking if the page exsist would be useful? --Trialpears (talk) 22:25, 22 September 2019 (UTC)