You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Sometimes logging into my university's website fails for an unknown reason.
The site responds with HTTP 200 and takes me to an error page.
The scraper goes to the URL in resource.
It gets redirected back to the login page
It doesn't find what it's looking for.
select_list is in use, so the scraper returns an empty list.
If I'd used select, the scraper would've thrown an exception when it couldn't find a matching tag, which would've led to the form resubmitting since resubmit_on_error is on by default. However, I need to use select_list.
Describe the solution you'd like
Add an allow_empty key under sensor. When this key has a value of false, the scraper throws an exception if no matching tags are found.
Describe alternatives you've considered
I suppose I could write an automation that checks if the string is empty and calls the update service. However, this doesn't prevent the empty string value from getting stored in the database and showing up in the sensor's history.
_LOGGER.debug("%s # List selector selected tags: %s", log_prefix, tags)
ifselector.attributeisnotNone:
_LOGGER.debug(
"%s # Try to find attributes: %s",
log_prefix,
selector.attribute,
)
values= [tag[selector.attribute] fortagintags]
else:
values= [tag.textfortagintags]
value=self._separator.join(values)
_LOGGER.debug("%s # List selector csv: %s", log_prefix, value)
Debug logs after the form is submitted:
Response status code received: 200
Form seems to be submitted succesfully (to be sure, use log_response and check file). Now continuing to retrieve target page.
Request data from https://upnet.up.ac.za/psc/pscsmpra_newwin/EMPLOYEE/SA/c/UP_SF_FL_MENU.UP_OL_PAY_SSF_FL.GBL?NavColl=true
Executing page-request with a get to url: https://upnet.up.ac.za/psc/pscsmpra_newwin/EMPLOYEE/SA/c/UP_SF_FL_MENU.UP_OL_PAY_SSF_FL.GBL?NavColl=true with headers: {}
Response status code received: 200
Loading the content in BeautifulSoup.
Data succesfully refreshed. Sensors will now start scraping to update.
2024-03-10 14:39:41.586 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Finished fetching multiscrape data in 0.857 seconds (success: True)
UP bursary payout dates # Setting up sensor
UP bursary payout dates # Start scraping to update sensor
UP bursary payout dates # List selector selected tags: []
UP bursary payout dates # List selector csv:
UP bursary payout dates # Final selector value: of type <class 'str'>
UP bursary payout dates # Selected:
UP bursary payout dates # Updated sensor and attributes, now adding to HA
I don't know why Multiscrape doesn't fail when it tries to navigate to the resource and gets redirected to the login page. Maybe that's the real issue here.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
resource
.select_list
is in use, so the scraper returns an empty list.If I'd used
select
, the scraper would've thrown an exception when it couldn't find a matching tag, which would've led to the form resubmitting sinceresubmit_on_error
is on by default. However, I need to useselect_list
.Describe the solution you'd like
Add an
allow_empty
key undersensor
. When this key has a value of false, the scraper throws an exception if no matching tags are found.Describe alternatives you've considered
I suppose I could write an automation that checks if the string is empty and calls the update service. However, this doesn't prevent the empty string value from getting stored in the database and showing up in the sensor's history.
Additional context
The relevant lines of code:
ha-multiscrape/custom_components/multiscrape/scraper.py
Lines 91 to 104 in 6b450df
Debug logs after the form is submitted:
I don't know why Multiscrape doesn't fail when it tries to navigate to the resource and gets redirected to the login page. Maybe that's the real issue here.
The text was updated successfully, but these errors were encountered: