Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward cookies from login page form to resource page #438

Open
mallorca2288 opened this issue Oct 11, 2024 · 6 comments
Open

Forward cookies from login page form to resource page #438

mallorca2288 opened this issue Oct 11, 2024 · 6 comments

Comments

@mallorca2288
Copy link

mallorca2288 commented Oct 11, 2024

Version of the custom_component

8.0.2

Configuration

  - resource: "https://plataforma.habidat.es/src/php/vecino/selectViviendas.php?list=acumulado_fecha_desde&vivienda=XXXXX"
    name: Habidat
    log_response: True
    scan_interval: 0
    form_submit:
        submit_once: True
        resource: 'https://plataforma.habidat.es'
        select: "form.login-form"
        input:
            u12: [email protected]
            c12: 'password'
        headers:
            referer: "https://plataforma.habidat.es/index.php"
            X-Requested-With: XMLHttpRequest
    headers:
        referer: "https://plataforma.habidat.es"
    sensor:
      - select: 'body'
        name: habidat
    button:
        unique_id: refrescar_habidat
        name: Refrescar habidat

Describe the bug

First of all I want to thank the developer for this amazing custom component.

Cookies set when loading the form page (form_page_response_cookies) and also set when submitting the form (form_submit_request_cookies) are not sent when requesting the resource page (page_request_cookies).

The issue I'm having is that the server sets a cookie with PHPSESSID when loading the form. But this cookie is not sent when loading the resource page, so it returns the login page again.

Everything works ok If I manually set the cookie via the headers option, but it only lasts a few minutes until the session expires.

headers:
  Cookie: PHPSESSID=e3ffadacc4d2c58edf71c5add71a96##

Is there any way to retrieve and store the cookies from the form page and send them to the resource page? I've tried reading all the issues related to cookies (#319 , #407, #327) but I couldn't figure out what I'm doing wrong.

Debug log


2024-10-11 02:53:24.549 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Starting with form-submit
2024-10-11 02:53:24.550 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Requesting page with form from: https://plataforma.habidat.es
2024-10-11 02:53:24.550 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # Executing form_page-request with a GET to url: https://plataforma.habidat.es with headers: {'referer': 'https://plataforma.habidat.es/index.php', 'X-Requested-With': 'XMLHttpRequest'} and cookies: None.
2024-10-11 02:53:24.607 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # request_headers written to file: form_page_request_headers.txt
2024-10-11 02:53:24.861 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # Response status code received: 200
2024-10-11 02:53:24.965 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # response_headers written to file: form_page_response_headers.txt
2024-10-11 02:53:24.965 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # response_cookies written to file: form_page_response_cookies.txt
2024-10-11 02:53:24.979 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # response_body written to file: form_page_response_body.txt
2024-10-11 02:53:24.980 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Parse page with form with BeautifulSoup parser lxml
2024-10-11 02:53:25.716 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # The page with the form parsed by BeautifulSoup has been written to file: form_page_soup.txt
2024-10-11 02:53:25.716 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Try to find form with selector form.login-form
2024-10-11 02:53:25.718 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Form looks like this: 
<form action="src/php/processLogin.php" class="login-form" method="post" name="login">
<h1 style="color:#333;">Inicio de sesión</h1>
<p>Servicio de control de consumos.</p>
<div class="row">
<div class="col-xs-12 col-md-6">
<input autocomplete="off" class="form-control form-control-solid placeholder-no-fix form-group" id="u12" name="u12" onkeydown="if(event.keyCode==13){event.preventDefault();cargarDesafioLogin();}" placeholder="Email" type="text"/>
</div>
<div class="col-xs-12 col-md-6">
<input autocomplete="off" class="form-control form-control-solid placeholder-no-fix form-group" id="c12" name="c12" onkeydown="if(event.keyCode==13){event.preventDefault();cargarDesafioLogin();}" placeholder="Contraseña" type="password"/>
</div>
</div>
<div class="row">
<div class="col-sm-12 text-right">
<input id="res40" name="res40" type="hidden"/>
<div class="forgot-password">
<a class="forget-password" id="forget-password">Solicitar contraseña</a>
</div>
<input class="btn blue" id="botonlogin" onclick="javascript:cargarDesafioLogin();" type="button" value="Entrar"/>
</div>
</div>
<input id="smt" name="smt" type="hidden" value="Habidat"/>
<input id="via" name="via" type="hidden" value="plataforma"/>
</form>
2024-10-11 02:53:25.734 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Finding all input fields in form
2024-10-11 02:53:25.734 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Found the following input fields: {'u12': None, 'c12': None, 'res40': None, None: 'Entrar', 'smt': 'Habidat', 'via': 'plataforma'}
2024-10-11 02:53:25.735 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Found form action src/php/processLogin.php and method post
2024-10-11 02:53:25.735 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Merged input fields with input data in config. Result: {'u12': '[email protected]', 'c12': 'password', 'res40': None, None: 'Entrar', 'smt': 'Habidat', 'via': 'plataforma'}
2024-10-11 02:53:25.735 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Determined the url to submit the form to: https://plataforma.habidat.es/src/php/processLogin.php
2024-10-11 02:53:25.735 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Submitting the form
2024-10-11 02:53:25.736 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # Executing form_submit-request with a post to url: https://plataforma.habidat.es/src/php/processLogin.php with headers: {'referer': 'https://plataforma.habidat.es/index.php', 'X-Requested-With': 'XMLHttpRequest'} and cookies: <Cookies[<Cookie PHPSESSID=e3ffadacc4d2c58edf71c5add71a96## for plataforma.habidat.es />]>.
2024-10-11 02:53:25.893 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # request_body written to file: form_submit_request_body.txt
2024-10-11 02:53:25.944 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # request_headers written to file: form_submit_request_headers.txt
2024-10-11 02:53:25.975 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # request_cookies written to file: form_submit_request_cookies.txt
2024-10-11 02:53:26.435 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # Response status code received: 200
2024-10-11 02:53:26.524 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # response_headers written to file: form_submit_response_headers.txt
2024-10-11 02:53:26.535 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # response_body written to file: form_submit_response_body.txt
2024-10-11 02:53:26.535 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # response_cookies written to file: form_submit_response_cookies.txt
2024-10-11 02:53:26.577 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Form seems to be submitted successfully (to be sure, use log_response and check file). Now continuing to retrieve target page.
2024-10-11 02:53:26.578 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # Executing page-request with a GET to url: https://plataforma.habidat.es/src/php/vecino/selectViviendas.php?list=acumulado_fecha_desde&vivienda=XXXXX with headers: {'referer': 'https://plataforma.habidat.es'} and cookies: <Cookies[]>.
2024-10-11 02:53:26.652 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # request_headers written to file: page_request_headers.txt
2024-10-11 02:53:26.673 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # request_cookies written to file: page_request_cookies.txt
2024-10-11 02:53:26.878 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # Response status code received: 200
2024-10-11 02:53:26.955 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # response_headers written to file: page_response_headers.txt
2024-10-11 02:53:26.955 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # response_body written to file: page_response_body.txt
2024-10-11 02:53:27.002 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # response_cookies written to file: page_response_cookies.txt
2024-10-11 02:53:27.005 DEBUG (MainThread) [custom_components.multiscrape.scraper] Habidat # Loading the content in BeautifulSoup.
2024-10-11 02:53:27.965 DEBUG (MainThread) [custom_components.multiscrape.scraper] Habidat # page_soup written to file: page_soup.txt
2024-10-11 02:53:27.966 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Habidat # Data successfully refreshed. Sensors will now start scraping to update.
2024-10-11 02:53:27.966 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Finished fetching multiscrape data in 3.569 seconds (success: True)
2024-10-11 02:53:27.966 DEBUG (MainThread) [custom_components.multiscrape.sensor] Habidat # habidat # Start scraping to update sensor
2024-10-11 02:53:27.969 DEBUG (MainThread) [custom_components.multiscrape.scraper] Habidat # habidat # Tag selected: [REMOVED LOGIN PAGE HTML CONTENT]

form_page_response_cookies.txt
<Cookies[<Cookie PHPSESSID=e3ffadacc4d2c58edf71c5add71a96## for plataforma.habidat.es />]>

form_submit_request_cookies.txt
<Cookies[<Cookie PHPSESSID=e3ffadacc4d2c58edf71c5add71a96## for plataforma.habidat.es />]>

form_submit_response_cookies.txt
<Cookies[]>

page_request_cookies.txt
<Cookies[]>

page_response_cookies.txt
<Cookies[]>

@danieldotnl
Copy link
Owner

Thanks for the detailed description! Could you try if it works with submit_once: False?

@mallorca2288
Copy link
Author

Unfortunately I see the same behaviour when changing submit_once to false.

However I've found an alternative solution to save the value of the cookies in a sensor. I'll explain how just in case it can help someone else.

In configuration.yaml I have added the following lines:

homeassistant:
allowlist_external_dirs:
- "/config/multiscrape/"

I have created a sensor with the Home Assistant (FILE integration) that will be reading the content from form_page_response_cookies.txt file.
In my example:

File path: /config/multiscrape/habidat/form_page_response_cookies.txt
Template:

{% if value is string and (value|length>5)%}
  {% set ret = (value|regex_findall(find='PHPSESSID=(.*?) for ', ignorecase=False))[0] %}
  {% if ret is string and (ret|length>5)%}
  {{ ret }}
  {% else %}
  unknown
  {% endif %}
{% else %}
unknown
{% endif %}

This way, I have the cookie value saved in a sensor that I can use for another call to multiscraper using the option:

headers:
   Cookie: PHPSESSID={{ states('sensor.habidat_cookie') }};

If I want to store cookies that are more than 255 characters in lengh (Like for example Authorization cookies) what I did is create two file sensors for each cookie and storing the first 240 characters in one sensor (By changing the template to {{ ret[:240] }} and another one with the template {{ ret[240:] }}. Afterwards I merged the content of both sensors with a (template sensor) .

@danieldotnl if I can make a feature request: I think being able to directly save the value of the cookies in a sensor with multiscraper would be wonderful and much simpler.

Thank you!

@pafnow
Copy link

pafnow commented Dec 12, 2024

Hello, I'm reaching the same issue today. Let me know if you need any more precisions.

In my case, I have a 302 redirection just after login, and it seems that the phpsessid is lost because of this, I can't find it in the log_reponse folder files

@danieldotnl
Copy link
Owner

Cookies are a snake pit :(
Sounds related to this discussion: encode/httpx#1404

@pafnow
Copy link

pafnow commented Dec 16, 2024

Indeed, seems similar issue but I didn't catch everything yet.
Apparently it seems to depends if the cookies are set on the session or on the request.

How are cookies set in multiscrape ?

@pafnow
Copy link

pafnow commented Dec 16, 2024

Few progress on my case

  • The server will send the PHPSESSID only once when the first request is made, subsequent requests do not receive the Set-Cookie anymore.
  • If I send a dummy header Cookie "PHPSESSID=" the server always send me a new Set-Cookie "PHPSESSID=..."
  • Using multiscrape, it seems that the headers configuration is not send on form_page request
  • Therefore, if I dont catch the cookie at the first request, I cannot do anything (testing) until the server resend me one (timeout to be found)
  • Even if I'm able to catch the Set-Cookie PHPSESSID I don't really know how to send it back on next document_page request (maybe @mallorca2288 solution might work, I need to test it)

What would be useful in my case

  1. being able to configure headers parameters for form_page request
  2. finding a way to catch and resend the PHPSESSID cookie

One interesting thing to note is that PHPSESSID is sent even if I'm not logged in on my test case. Maybe that's because a login form is displayed on the top of each page.
What I want to highlight is that catching and saving the PHPSESSID could be done on a request before login, ie on the form_page_request requesting the form page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants