Static Web Scraping on a website – Day 2

Published by BrighterBees on

web scrapping

What is Static Web Scraping?

Static Web Scraping means we scrape data from a single web page using our scraping libraries. If you have read our previous web scarping series then you have done a little scraping. In a previous blog, you have only scraped raw data (HTML content ) from the web page.

We will copy the code from the previous blog and we will use another website.

url = http://www.studyguideindia.com/Colleges/default.asp?cat=&State=DL&ct=&page=JNKPKPN4

Preview of the link on the web

 

Our target is to scrape college names from the website. We will scrape data ( HTML content ) from the website using previous code and if you have any problem regarding installation you can read it here.

Step: 1 Import the request module

>>> import requests

Step: 2 Import the BeautifulSoup module

>>> from bs4 import BeautifulSoup

Step : 3 Put the url of the website

>>> url = ‘https://pythonprogramming.net/introduction-scraping-parsing-beautiful-soup-tutorial/’

Step : 4 Get the html content of the website at the end

>>> raw_content = requests.get(url).content

>>> soup = BeautifulSoup(raw_content , “html.parser”)

>>> print(soup)

Till this we have HTML content of the website , now we will target our tags for the information. Use shortcut ctrl+u to see source code , here is preview for the source code.

 

If you see the code , you can see that there are many <tr> ( Table Row ) tags with class alternate and alternate1 in which name of college is present. So we will continue the code from here –

Step: 5  Get all anchor tags with class name alternate and alternate1

>>> stock1  = soup.find(‘tr’,attrs={“class”:”alternate”}).find_all(‘a’)>

>>> stock2  = soup.find(‘tr’,attrs={“class”:”alternate1″}).find_all(‘a’)

Step:6  Now the final steps comes here above code returns the list data type , we will use for loops to get the title of the colleges from anchor tags

>>> College_list = []

>>> for i in stock1:

————College_list.append(i.get(‘title’))

>>> for j in stock2:

  — ——-College_list.append(j.get(‘title’))

>>> print(College_list)

In College_list list data type you have all college names in the list , you see it by printing in you IDE.

Result

 

Till this blog you have learned how to get names of the website by web scraping and Subscribe us to get more content like this. If you have any problem regarding blog comment below.

If want to know about Web Scraping click here and for installation click here.


0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

STAY CONNECT WITH US