Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Bing Search Engine, 10X Speedup, Cleaner HTML. Made architectural changes requested by JulesGM #8

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

hitchingsh
Copy link

@JulesGM I have implemented all the architectural and stylistic suggestions you requested. This new pull request adds Bing Search since that was what was used in the ParlAI Blenderbot2 paper. It also allows you to limit the the text per URL since currently Blenderbot only uses the first 512 characters. It allows you to strip out HTML menus. You can also return a clean summary of each web page at 10X faster since it does not need to fetch each URL. I have updated the README with examples to allow you to quickly test these options. Overall it enables the search engine to return significantly higher quality text to Blenderbot2. I will send you a separate private email with the URLs to each of these test URLs, which I have deployed as Docker Containers to Google Cloud in case you do not have a Bing Search Subscription key and want to test them. Thank you again for your time.

…get stuck. Limit text returned to 2K per result. Set time filter to last 30 days for web pages indexed within last 30 days. These changes improve the information returned and answers given by ParlAI.
…much cleaner data returned, added command line args
…g new class for Bing Search and removing most global variables
@Cuteistfox
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants