Packages
- In this challenge we use Wikipedia as a wrapper to call the Wikipedia API. As this is a Python wrapper, it is relatively straightforward to access documentation using the help() command within Python.
- To create the Wordcloud, we used the wordcloud package, although it would be possible to create your own Wordcloud using a powerful technique called regular expressions. However, we do not explicitly use these in this challenge.
Hints
- Python has a wide range of built-in commands for analysing string data. Try and use this functionality without writing your own code, where possible.
- The API requires quite specific page names to withdraw the information. Using the results of the wikipedia.search might help find the specific page name you are looking for.
Notes
- The relative frequency of characters is probably the simplest analysis that can be done from text data - search for the Text Mining page to find out more on this topic.
- You are likely to find that it would take a very long time for a computer to ‘read’ all of Wikipedia using the API. What is the bottleneck?