Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
186 views
in Technique[技术] by (71.8m points)

python - Web scraping after authentication using requests

I doing a webscraping script in Python 3.9.

I'd like to gather some informations from this website: https://www.matchendirect.fr/. This website is in French but I don't think it's a real problem to try to help me.

The information I need is the array displayed on mouse over in the 'Pronostics des internautes' section. The HTML code start with : <table class="table table-bordered MEDtpro">

I have recreate cookies I've on my browser to simulate my connection following answers on this article How to send cookies with urllib but it didn't work.

Here is my code:

#!/usr/bin/python3
# -*- coding: utf-8 -*-
import requests

cookies = {
    'PHPSESSID': 'a2q4evve875s1ibamiqmc93ru6',
    'c_compte_cle': '76598fbd4fe763e768dc79275c02e11f',
    'c_compte_id':'311084',
    'c_compte_pseudo':'foobar',
    'c_compte_url_image':'%2Fimage%2Fcommun%2Fmembre-med-t16.png',
    'c_coucours_promo':'3'
    }
headers = {'User-Agent': 'Mozilla/5.0'}


link = "https://www.matchendirect.fr/live-score/caen-toulouse.html"

response = requests.get(link, cookies=cookies, headers=headers)
webpage = response.text

print("Success!") if webpage.find('<table class="table table-bordered MEDtpro">')>-1 else print("Failed!")

Can someone help my with this issue?

This account is a junk account for the one who would like to test on their side.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The content you are looking for is updated by an AJAX request. You can find the data by sending the request to the AJAX URL.

#!/usr/bin/python3
# -*- coding: utf-8 -*-
import requests
from bs4 import BeautifulSoup


link = "https://www.matchendirect.fr/live-score/caen-toulouse.html"

response = requests.get(link)
soup = BeautifulSoup(response.content, "html.parser")
f_id_match = soup.find("input", {"name": "f_pronostic_id_match"})["value"]


data_response = requests.get("https://www.matchendirect.fr/cgi/ajax/liste_pronostic.php?f_id_match={}&f_id_grille=".format(f_id_match))
webpage = data_response.text


print("Success!") if webpage.find('<table class="table table-bordered MEDtpro">')>-1 else print("Failed!")

You need to find the f_id_match from the page, then send a new request to that AJAX url to find what you looking for.

Do not set cookies to emulate the browser, use the requests.Session() to create a session like browser then try to navigate around URLs


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...