Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to throw exception when no tags are found with select_list #332

Open
glibg10b opened this issue Mar 10, 2024 · 0 comments
Open

Comments

@glibg10b
Copy link

glibg10b commented Mar 10, 2024

Is your feature request related to a problem? Please describe.

  1. Sometimes logging into my university's website fails for an unknown reason.
  2. The site responds with HTTP 200 and takes me to an error page.
  3. The scraper goes to the URL in resource.
  4. It gets redirected back to the login page
  5. It doesn't find what it's looking for.
  6. select_list is in use, so the scraper returns an empty list.

If I'd used select, the scraper would've thrown an exception when it couldn't find a matching tag, which would've led to the form resubmitting since resubmit_on_error is on by default. However, I need to use select_list.

Describe the solution you'd like

Add an allow_empty key under sensor. When this key has a value of false, the scraper throws an exception if no matching tags are found.

Describe alternatives you've considered

I suppose I could write an automation that checks if the string is empty and calls the update service. However, this doesn't prevent the empty string value from getting stored in the database and showing up in the sensor's history.

Additional context

The relevant lines of code:

if selector.is_list:
tags = self._soup.select(selector.list)
_LOGGER.debug("%s # List selector selected tags: %s", log_prefix, tags)
if selector.attribute is not None:
_LOGGER.debug(
"%s # Try to find attributes: %s",
log_prefix,
selector.attribute,
)
values = [tag[selector.attribute] for tag in tags]
else:
values = [tag.text for tag in tags]
value = self._separator.join(values)
_LOGGER.debug("%s # List selector csv: %s", log_prefix, value)

Debug logs after the form is submitted:

Response status code received: 200
Form seems to be submitted succesfully (to be sure, use log_response and check file). Now continuing to retrieve target page.
Request data from https://upnet.up.ac.za/psc/pscsmpra_newwin/EMPLOYEE/SA/c/UP_SF_FL_MENU.UP_OL_PAY_SSF_FL.GBL?NavColl=true
Executing page-request with a get to url: https://upnet.up.ac.za/psc/pscsmpra_newwin/EMPLOYEE/SA/c/UP_SF_FL_MENU.UP_OL_PAY_SSF_FL.GBL?NavColl=true with headers: {}
Response status code received: 200
Loading the content in BeautifulSoup.
Data succesfully refreshed. Sensors will now start scraping to update.
2024-03-10 14:39:41.586 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Finished fetching multiscrape data in 0.857 seconds (success: True)
UP bursary payout dates # Setting up sensor
UP bursary payout dates # Start scraping to update sensor
UP bursary payout dates # List selector selected tags: []
UP bursary payout dates # List selector csv: 
UP bursary payout dates # Final selector value:  of type <class 'str'>
UP bursary payout dates # Selected: 
UP bursary payout dates # Updated sensor and attributes, now adding to HA

I don't know why Multiscrape doesn't fail when it tries to navigate to the resource and gets redirected to the login page. Maybe that's the real issue here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant