Python for Data Science – Importing table data from a web page

This is another blog post about using Pandas package. This time, I’ll show you how to import table data from a web page. To be able to get table data, there should be a table defined with table tags (table,td,tr) in the web page we access. Unfortunately most web sites do not use “tables” anymore. They usually prefer to use “div” tags, so if this code doesn’t work, check HTML source code of the page.

For testing purposes, I’ll try to fetch exchange rates from CNN Money International web site. There are two tables in the page, one for the exchange rates and one for the world markets.

Python code is very simple:

I examined the HTML code of the page and see that these tables have different IDs. The ID of the exchange rates table is “wsod_currencyExhangeRatesTable”. I use this ID to fetch only the exchange rates table:

The read_html function returns a list of DataFrames even there’s only one table. We need to use indexes (i.e. df_list[0]) to access the first table.

You probably noticed that the last column contains both min and max values and it could be better to extract these data into separate columns. Here’s the sample script:

and its output:

So we successfully fetched the table data and parsed it from a web site. Did you see how easy to manipulate columns of Pandas DataFrames? See you next blog post!

Please share
  • 3
  •  
  •  
  •  
  •  
  •  

AWS Big Data Specialist. Oracle Certified Professional (OCP) for EBS R12, Oracle 10g and 11g. Co-author of "Expert Oracle Enterprise Manager 12c" book published by Apress. Awarded as Oracle ACE (in 2011) and Oracle ACE Director (in 2016) for the continuous contributions to the Oracle users community. Founding member, and vice president of Turkish Oracle User Group (TROUG). Presented at various international conferences including Oracle Open World.

Leave Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.