-
Notifications
You must be signed in to change notification settings - Fork 4
Implement crawling controller to fetch directory URLs containing OpenAPI definitions. - Implementing Queue-Based Architecture of Downloading Index Files from Common Crawl Server Using RabbitMQ NOTE: All code is contained within the downloadAndProcessIndexFilesInBackground() function #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Implement crawling controller to fetch directory URLs containing OpenAPI definitions. - Implementing Queue-Based Architecture of Downloading Index Files from Common Crawl Server Using RabbitMQ NOTE: All code is contained within the downloadAndProcessIndexFilesInBackground() function #5
Conversation
…e crawling process. Implement a crawling controller and create the Common Crawl driver.
…elay while fetching directories from cc server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@priyanshu-kun Please move the Dummy App
to a separate branch - feature/backup-dummy-app
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@priyanshu-kun Have completed initial review, please take a look.
@priyanshu-kun In order to prevent rate-limiting issues, you can explore back-off and sleep methods. |
… Common Crawl Server Using RabbitMQ NOTE: All code is contained within the downloadAndProcessIndexFilesInBackground() function.
return res.badRequest('Data source not provided'); | ||
} | ||
|
||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this try catch as we have one already and we are not making any explicit handling for this
In this pull request, a new crawling controller is presented, whose job it is to fetch directory URLs that are particularly linked to OpenAPI specifications. The controller makes it easier to get the desired OpenAPI definitions by retrieving the directories that contain them.
In this commit, a queue-based architecture is implemented to handle the downloading of index files from the Common Crawl server. RabbitMQ is utilized as the message broker for managing the queue. The downloadAndProcessIndexFilesInBackground() function contains all the necessary code for performing the background download and processing of the index files.
This implementation ensures a more efficient and scalable approach to handle long-running operations while keeping the server responsive and preventing overloading. The queue-based architecture allows for asynchronous processing of index files, providing better performance and fault tolerance.
By leveraging RabbitMQ and encapsulating the functionality within the downloadAndProcessIndexFilesInBackground() function, the codebase is organized and modular, making it easier to maintain and extend in the future."