Add option to throw exception when no tags are found with select_list #332

glibg10b · 2024-03-10T13:20:20Z

Is your feature request related to a problem? Please describe.

Sometimes logging into my university's website fails for an unknown reason.
The site responds with HTTP 200 and takes me to an error page.
The scraper goes to the URL in resource.
It gets redirected back to the login page
It doesn't find what it's looking for.
select_list is in use, so the scraper returns an empty list.

If I'd used select, the scraper would've thrown an exception when it couldn't find a matching tag, which would've led to the form resubmitting since resubmit_on_error is on by default. However, I need to use select_list.

Describe the solution you'd like

Add an allow_empty key under sensor. When this key has a value of false, the scraper throws an exception if no matching tags are found.

Describe alternatives you've considered

I suppose I could write an automation that checks if the string is empty and calls the update service. However, this doesn't prevent the empty string value from getting stored in the database and showing up in the sensor's history.

Additional context

The relevant lines of code:

ha-multiscrape/custom_components/multiscrape/scraper.py

Lines 91 to 104 in 6b450df

    
           if selector.is_list: 
        
               tags = self._soup.select(selector.list) 
        
               _LOGGER.debug("%s # List selector selected tags: %s", log_prefix, tags) 
        
               if selector.attribute is not None: 
        
                   _LOGGER.debug( 
        
                       "%s # Try to find attributes: %s", 
        
                       log_prefix, 
        
                       selector.attribute, 
        
                   ) 
        
                   values = [tag[selector.attribute] for tag in tags] 
        
               else: 
        
                   values = [tag.text for tag in tags] 
        
               value = self._separator.join(values) 
        
               _LOGGER.debug("%s # List selector csv: %s", log_prefix, value)

Debug logs after the form is submitted:

Response status code received: 200
Form seems to be submitted succesfully (to be sure, use log_response and check file). Now continuing to retrieve target page.
Request data from https://upnet.up.ac.za/psc/pscsmpra_newwin/EMPLOYEE/SA/c/UP_SF_FL_MENU.UP_OL_PAY_SSF_FL.GBL?NavColl=true
Executing page-request with a get to url: https://upnet.up.ac.za/psc/pscsmpra_newwin/EMPLOYEE/SA/c/UP_SF_FL_MENU.UP_OL_PAY_SSF_FL.GBL?NavColl=true with headers: {}
Response status code received: 200
Loading the content in BeautifulSoup.
Data succesfully refreshed. Sensors will now start scraping to update.
2024-03-10 14:39:41.586 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Finished fetching multiscrape data in 0.857 seconds (success: True)
UP bursary payout dates # Setting up sensor
UP bursary payout dates # Start scraping to update sensor
UP bursary payout dates # List selector selected tags: []
UP bursary payout dates # List selector csv: 
UP bursary payout dates # Final selector value:  of type <class 'str'>
UP bursary payout dates # Selected: 
UP bursary payout dates # Updated sensor and attributes, now adding to HA

I don't know why Multiscrape doesn't fail when it tries to navigate to the resource and gets redirected to the login page. Maybe that's the real issue here.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to throw exception when no tags are found with select_list #332

Add option to throw exception when no tags are found with select_list #332

glibg10b commented Mar 10, 2024 •

edited

Loading

Add option to throw exception when no tags are found with select_list #332

Add option to throw exception when no tags are found with select_list #332

Comments

glibg10b commented Mar 10, 2024 • edited Loading

glibg10b commented Mar 10, 2024 •

edited

Loading