Python Challenge 3 Hints and Tips

Hints for Python Challenge 3

Packages

  • In this challenge we use Wikipedia as a wrapper to call the Wikipedia API. As this is a Python wrapper, it is relatively straightforward to access documentation using the help() command within Python.
  • To create the Wordcloud, we used the wordcloud package, although it would be possible to create your own Wordcloud using a powerful technique called regular expressions. However, we do not explicitly use these in this challenge.

Hints

  • Python has a wide range of built-in commands for analysing string data. Try and use this functionality without writing your own code, where possible.
  • The API requires quite specific page names to withdraw the information. Using the results of the wikipedia.search might help find the specific page name you are looking for.

Notes

  • The relative frequency of characters is probably the simplest analysis that can be done from text data - search for the Text Mining page to find out more on this topic.
  • You are likely to find that it would take a very long time for a computer to ‘read’ all of Wikipedia using the API. What is the bottleneck?

Solution for Python Challenge 3