43 Commits

Author SHA1 Message Date
James Lu
b2d800ce61 Wikifetch: cleanup redirect handling; fix tests
I also remove the showRedirects logic as the addition text is usually far too verbose for IRC
2022-06-19 15:49:30 -07:00
James Lu
2ae51939b3 Update my email & repo link references 2019-10-11 09:58:50 -07:00
James Lu
831a0af53c Wikifetch: escape regexps as r'' strings (#75) 2019-01-05 19:40:08 -08:00
James Lu
9221d87c29 Wikifetch: skip looking at empty leading paragraphs 2018-07-19 18:04:56 +00:00
James Lu
c4c7f52541 Wikifetch: separate disambiguation results by semicolons
This enhances readability, especially when individual results already contain semicolons (e.g. places).
2018-05-10 23:52:10 -07:00
James Lu
d147207ad1 Wikifetch: ignore GPS coordinates from articles for countries, etc. 2017-11-12 01:39:28 -08:00
James Lu
400ffd7899 Wikifetch: fix quote_plus import 2017-09-07 19:20:21 -07:00
James Lu
6011742299 Wikifetch: remove Python 2 compatibility code 2017-09-01 18:09:19 -07:00
James Lu
08d8f48db5 Wikifetch: refactor text fetching, fix listing disambig results 2017-09-01 18:04:20 -07:00
James Lu
9986babd2e Wikifetch: strip inline notes in the form "text[note 1]" from IRC 2017-09-01 18:02:52 -07:00
James Lu
b6231f56ef Revert "Wikifetch: intelligently filter out <p> lines with little or no content"
This broke parsing for CJK languages (e.g. Chinese and Japanese), which don't use traditional spaces...

(but I should've known that)

This reverts commit 91cfa7acb0975fd5b5bab6e6f2c760781ccd84e2.
2017-06-03 18:30:25 -07:00
James Lu
346f72d816 Wikifetch: fix lookup of articles with symbols (e.g. "/") in their title
The normalization for the special cases was previously ignored if the query matched a "/"; why was this added in the first place?
2017-06-03 18:09:20 -07:00
James Lu
092055d491 Wikifetch: fix Wikipedia parsing again
As of 2017-06-03, Wikipedia has put its text content under a new "mw-parser-output" div, while# other sites (e.g. Wikia) still have it directly under "mw-content-text".
2017-06-03 17:46:30 -07:00
James Lu
7611f0fa9c Wikifetch: fix 'random' help text syntax 2017-03-24 19:10:47 -07:00
James Lu
001b49b6c3 Wikifetch: prefer <link rel="canonical"> links again when available 2017-03-24 19:09:38 -07:00
James Lu
91cfa7acb0 Wikifetch: intelligently filter out <p> lines with little or no content
More specifically, this skips lines that have a lower word count than the search query (e.g. page titles, some navigation links).
This allows some pages on https://wiki.ubuntu.com/ to work, for example
2017-03-18 19:07:56 -07:00
James Lu
819fcc6c09 Wikifetch: add a --no-mw-parsing option in an attempt to support non-MediaWiki sites 2017-03-18 18:47:20 -07:00
James Lu
194ac4d7be Wikifetch: clarify _get_article_tree docstring 2017-03-18 18:23:00 -07:00
James Lu
a9dfb1009d Wikifetch: add a three second timeout in fetch 2017-02-04 18:28:11 -08:00
James Lu
2fbfc37f98 Wikifetch: leave a fallback reply if paragraph parsing failed 2017-01-27 18:16:00 -08:00
James Lu
2bd06a39a9 Wikifetch: return the address in _get_article_tree as well 2017-01-27 18:10:52 -08:00
James Lu
100f503783 Wikifetch: support wikimedia.org and mediawiki.org 2017-01-27 18:00:48 -08:00
James Lu
8d586dad47 Wikifetch: fix NameError on redirect parsing 2017-01-27 17:38:37 -08:00
James Lu
5bf0bd6fd5 Wikifetch: bump copyright years 2017-01-27 17:32:20 -08:00
James Lu
9f1f04d25c Wikifetch: only show "possible results" in disambiguation pages if parsing succeeds 2017-01-27 17:25:43 -08:00
James Lu
d1eea2a0a4 Wikifetch: support disambiguation parsing on Wikia 2017-01-27 17:25:24 -08:00
James Lu
3ca87bb686 Wikifetch: abstract out article fetching, fix Wikia search support 2017-01-27 17:12:41 -08:00
James Lu
482a8dd1d9 Wikifetch: remove special case for articles about years
This isn't really relevant anymore, since most years on Wikipedia have an introductory paragraph describing them now.
2017-01-27 17:10:32 -08:00
James Lu
4d487dd0e0 Wikifetch: log the URL when fetching a link fails 2017-01-26 21:28:06 -08:00
James Lu
66b3ef6d17 Wikifetch: make _wiki() return the fetched text instead of replying it directly 2016-02-28 18:20:52 -08:00
James Lu
2d0e90b2dc Wikifetch: Limit supybot.commands import to remove __builtins__.any hack 2015-11-14 16:19:11 -08:00
James Lu
4109741a01 Wikifetch: "Not found or page malformed" should be an error, not a reply 2015-11-01 11:14:32 -08:00
James Lu
03cd552a08 Wikifetch: special case for Wikimedia commons 2015-10-25 17:42:49 -07:00
James Lu
535dcc3bc3 Wikifetch: add 'random' command, fetching a random article via Special:Random 2015-10-25 17:36:14 -07:00
James Lu
54a9d6a8a6 Wikifetch: reword command description (Wikipedia article -> wiki article) 2015-10-25 17:35:45 -07:00
James Lu
08a2942d99 Wikifetch: modularize wiki fetcher, use new-style wrap, simplify URL display 2015-10-25 17:20:40 -07:00
James Lu
c50c50b1cb Wikifetch: also strip newlines from regular paragraphs
Example of problem article: https://en.wikipedia.org/wiki/2BOT_Physical_Modeling_Technologies
2015-10-25 17:07:35 -07:00
James Lu
7f5cea73ad Wikifetch: fallback to '' instead of "None" for bold text, w/ combinations of <a> tags inside <b> tags
Example for this is the article for "Fallstreak hole", where the old behavior would create incorrect results when an <a> tag is nested at the beginning of a <b> tag.

Bad:
<Lily> A Nonefallstreak hole, also known as a hole punch cloud, punch hole cloud, skypunch, canal cloud or cloud hole, is a large circular or elliptical gap that can appear in cirrocumulus or altocumulus clouds. Such holes are formed when the water temperature in the clouds is below freezing but the water has not frozen yet due to the lack of ice nucleation (see supercooled water). When ice crystals do form it will set off a (1 more message)

Better:
<Lily> A fallstreak hole, also known as a hole punch cloud, punch hole cloud, skypunch, canal cloud or cloud hole, is a large circular or elliptical gap that can appear in cirrocumulus or altocumulus clouds. Such holes are formed when the water temperature in the clouds is below freezing but the water has not frozen yet due to the lack of ice nucleation (see supercooled water). When ice crystals do form it will set off a (1 more message)
2015-10-11 17:26:31 -07:00
James Lu
be34ec21f1 Wikifetch: strip spaces in disambiguation output, preventing output corruption 2015-09-25 20:06:25 -07:00
James Lu
da2fc35da3 Wikifetch: handle disambiguation pages in a more friendly manner
Don't cut off results at 5 (many articles have more than that) and show ENTIRE entries with the article link in bold
2015-05-17 22:14:36 -07:00
James Lu
e1bf834877 Wikifetch: factorize --site checking, and don't assume Wikipedia 2015-05-17 21:51:36 -07:00
James Lu
ae84573efb Wikiefetch: special case for Arch Linux's Wiki; factorize imports 2015-05-17 21:49:12 -07:00
James Lu
f6118edd74 New Wikifetch plugin, forked from GLolol/ProgVal-Supybot-plugins@efba93e28e 2015-05-17 21:43:34 -07:00