Implement crawling controller to fetch directory URLs containing OpenAPI definitions. - Implementing Queue-Based Architecture of Downloading Index Files from Common Crawl Server Using RabbitMQ NOTE: All code is contained within the downloadAndProcessIndexFilesInBackground() function #5

priyanshu-kun · 2023-06-22T12:09:19Z

In this pull request, a new crawling controller is presented, whose job it is to fetch directory URLs that are particularly linked to OpenAPI specifications. The controller makes it easier to get the desired OpenAPI definitions by retrieving the directories that contain them.

In this commit, a queue-based architecture is implemented to handle the downloading of index files from the Common Crawl server. RabbitMQ is utilized as the message broker for managing the queue. The downloadAndProcessIndexFilesInBackground() function contains all the necessary code for performing the background download and processing of the index files.

This implementation ensures a more efficient and scalable approach to handle long-running operations while keeping the server responsive and preventing overloading. The queue-based architecture allows for asynchronous processing of index files, providing better performance and fault tolerance.

By leveraging RabbitMQ and encapsulating the functionality within the downloadAndProcessIndexFilesInBackground() function, the codebase is organized and modular, making it easier to maintain and extend in the future."

…folder.

…rn for crawling.

…e crawling process. Implement a crawling controller and create the Common Crawl driver.

…elay while fetching directories from cc server.

vinitshahdeo

@priyanshu-kun Please move the Dummy App to a separate branch - feature/backup-dummy-app

vinitshahdeo

@priyanshu-kun Have completed initial review, please take a look.

src/server/README.md

src/server/api/constants/Constants.js

src/server/api/controllers/CrawlingController.js

src/server/package.json

vinitshahdeo · 2023-06-22T13:59:38Z

@priyanshu-kun In order to prevent rate-limiting issues, you can explore back-off and sleep methods.

… Common Crawl Server Using RabbitMQ NOTE: All code is contained within the downloadAndProcessIndexFilesInBackground() function.

HimanshuS129 · 2023-08-05T08:27:43Z

src/server/api/controllers/CrawlingController.js

+        return res.badRequest('Data source not provided');
+      }
+
+      try {


no need for this try catch as we have one already and we are not making any explicit handling for this

priyanshu-kun added 8 commits April 1, 2023 18:54

Create a dummy app for openapi web search project

27ee979

fix port number

185b6d8

fix github cache

b7421c5

Created a default Sails.js server with no frontend in the src/server …

a8adced

…folder.

start implementating crawling controller - create fasade design patte…

ce75f4c

…rn for crawling.

Fix the typo in whole codebase: Fasade to Facade

47a02de

Reorganize the directory structure through refactoring and rewrite th…

3484876

…e crawling process. Implement a crawling controller and create the Common Crawl driver.

write batch processing for common crawl directories and implement a d…

c48cd21

…elay while fetching directories from cc server.

vinitshahdeo reviewed Jun 22, 2023

View reviewed changes

vinitshahdeo suggested changes Jun 22, 2023

View reviewed changes

priyanshu-kun mentioned this pull request Jun 23, 2023

Feature/backup dummy app #6

Draft

priyanshu-kun added 2 commits June 26, 2023 14:19

Implement backoff for retriving index files URLs from CC server.

a3a6319

Refactor code for improved readability and maintainability

ef9181d

priyanshu-kun mentioned this pull request Jul 1, 2023

Refactor code for improved readability and maintainability. #7

Open

priyanshu-kun marked this pull request as ready for review July 7, 2023 09:14

priyanshu-kun added 2 commits July 9, 2023 17:24

Implementing Queue-Based Architecture of Downloading Index Files from…

5427bd8

… Common Crawl Server Using RabbitMQ NOTE: All code is contained within the downloadAndProcessIndexFilesInBackground() function.

Merge branch 'postman-open-technologies:develop' into develop

58a72c3

HimanshuS129 reviewed Aug 5, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement crawling controller to fetch directory URLs containing OpenAPI definitions. - Implementing Queue-Based Architecture of Downloading Index Files from Common Crawl Server Using RabbitMQ NOTE: All code is contained within the downloadAndProcessIndexFilesInBackground() function #5

Implement crawling controller to fetch directory URLs containing OpenAPI definitions. - Implementing Queue-Based Architecture of Downloading Index Files from Common Crawl Server Using RabbitMQ NOTE: All code is contained within the downloadAndProcessIndexFilesInBackground() function #5

Uh oh!

priyanshu-kun commented Jun 22, 2023 •

edited

Loading

Uh oh!

vinitshahdeo left a comment

Uh oh!

vinitshahdeo left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vinitshahdeo commented Jun 22, 2023

Uh oh!

HimanshuS129 Aug 5, 2023

Uh oh!

Uh oh!

Implement crawling controller to fetch directory URLs containing OpenAPI definitions. - Implementing Queue-Based Architecture of Downloading Index Files from Common Crawl Server Using RabbitMQ NOTE: All code is contained within the downloadAndProcessIndexFilesInBackground() function #5

Are you sure you want to change the base?

Implement crawling controller to fetch directory URLs containing OpenAPI definitions. - Implementing Queue-Based Architecture of Downloading Index Files from Common Crawl Server Using RabbitMQ NOTE: All code is contained within the downloadAndProcessIndexFilesInBackground() function #5

Uh oh!

Conversation

priyanshu-kun commented Jun 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vinitshahdeo left a comment

Choose a reason for hiding this comment

Uh oh!

vinitshahdeo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vinitshahdeo commented Jun 22, 2023

Uh oh!

HimanshuS129 Aug 5, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

priyanshu-kun commented Jun 22, 2023 •

edited

Loading