Merging duplicate sections on Wikipedia

mediawikiwikipedia

I've noticed that many paragraphs and article sections have been copied and pasted from one Wikipedia article to another, leading to excessive amounts of redundant text on Wikipedia. Do any tools, scripts, or APIs exist that would make it possible to automatically identify these duplicate sections and paragraphs (so that they can be removed)?

Best Answer

I'm afraid there's not any way to do this using the API or anything like that. However, you could probably do something with the Wikimedia dumps to find the sort of duplication you're looking for. The people already doing research might also be able to help you out.