RSS Generator

A simple and lightweight RSS feed generator built with Python. Generate and manage RSS feeds from any website by scraping content using custom CSS selectors.

What is RSS?

RSS (Really Simple Syndication) is a web feed format that allows users and applications to access updates to websites in a standardized, computer-readable format. This generator is particularly useful for blog sites that do not provide a native RSS feed, enabling you to follow content updates through your preferred RSS reader.

Features

Generate RSS feeds from various data sources.
Customizable feed attributes.
Lightweight and easy to use.

Tech Stack

FastAPI: Modern, high-performance web framework for building APIs
Pydantic: Data validation and settings management
Google Cloud Firestore: NoSQL document database for cloud storage
BeautifulSoup4: Library for parsing HTML and extracting data
FeedGen: Library for generating RSS/Atom feeds
Uvicorn: ASGI server for serving the FastAPI application
Python 3.8+: Modern Python with type annotations

Pre-requisites

Python 3.8 or higher
Google Cloud account with Firestore enabled
Google Cloud SDK installed and configured
Basic understanding of CSS selectors

Installation

Clone the repository:

git clone https://github.com/pschua/rss-feed-generator.git

Navigate to the project directory:
```
cd rss-feed-generator
```

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Set up Google Cloud Credentials

# Set the application default credentials
gcloud auth application-default login

# Or set the GOOGLE_APPLICATION_CREDENTIALS environment variable
export GOOGLE_APPLICATION_CREDENTIALS="path/to/your/service-account-key.json"

Required Google Cloud Permissions

Your service account will need the following roles:

roles/datastore.user - For Firestore access

Note

include the ones below if using GitHub Actions

roles/run.admin - For Cloud Run deployment
roles/cloudbuild.builds.editor - For Cloud Build operations
roles/cloudscheduler.admin - For Cloud Scheduler configuration

Usage

Running the API locally

uvicorn main:app --reload

The API will be available at http://localhost:8000

API Documentations

Once running, access the auto-generated API documentation at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Creating a New Feed Source

To create a new feed source, send a POST request to /sources/ with a JSON body:

{
    "name": "Example Blog",
    "url": "https://example.com/blog",
    "selector": "article.post",
    "description": "Example blog feed"
}

Response format

{
    "name": "string",
    "url": "string",
    "selector": "string",
    "description": "string",
    "id": "string",
    "last_refreshed": "2025-04-03T11:57:48.162Z"
}

Finding the selector

The selector is a CSS selector that identifies the HTML elements containing article information on a web page. To find the right selector:

Open the target website in Chrome or Firefox.
Right-click on an article element and select "Inspect" or "Inspect Element".
Look for a container element that wraps all the article content (title, description, date).
Identify a unique class or ID that all article containers share.

For example, if articles are structured like this:

<div class="articles">
  <article class="post">
    <h2 class="title">Article Title</h2>
    <div class="meta">July 1, 2023</div>
    <div class="excerpt">Article excerpt...</div>
  </article>
  <!-- More articles -->
</div>

The appropriate selector would be article.post.

Important

Be mindful of the websites you scrape - ensure you're complying with their terms of service and robots.txt policies.

Selector Example

The image below shows the selector you should use for the Elementor blog page:

Note

This example is for illustration purposes only

It's recommended to test your selector using the browser console before adding it to your feed configuration:

// Test in browser console
document.querySelectorAll('your-selector-here').length
// Should return the number of articles on the page

Complete Workflow Example

Start the local server:

uvicorn main:app --reload

Create a new feed source:

curl -X POST http://localhost:8000/sources/ \
  -H "Content-Type: application/json" \
  -d '{"name":"Tech Blog","url":"https://example.com/tech","selector":"article.blog-post","description":"Latest tech articles"}'

Note the returned source ID (e.g., abc123). Access your RSS feed:

http://localhost:8000/feed/abc123

Add this URL to your RSS reader application.

Deployment to Cloud Run

This project includes a GitHub Actions workflow (.github/workflows/deploy.yml) for deploying to Google Cloud Run. To replicate the deployment process, follow these steps:

Ensure you have a Google Cloud project set up with billing enabled.
- Ideally this should be the same as the one enabled for Cloud Firestore
Enable the Cloud Build API, Cloud Run API and Cloud Scheduler API in your Google Cloud project.
Set up a service account with the necessary permissions and download the JSON key file.
- The service account will need permissions for Cloud Build, Cloud Run, Cloud Scheduler, and Artifact Registry.
- Having 'Viewer' access could help prevent logging issues during the build.
Add the following secrets to your GitHub repository:
- GCP_PROJECT: Your Google Cloud project ID.
- GCP_REGION: The region where your Cloud Run service will be deployed (e.g., us-central1).
- GCP_SA_KEY: The base64-encoded content of your service account JSON key file.
Push your changes to the main branch (or the branch specified in the workflow). The GitHub Actions workflow will automatically build and deploy the application to Cloud Run.

You may also change the Cloud Scheduler's refresh interval in the GitHub Actions workflow file. It currently refreshes every 12 hours.

Once deployed, go to the Cloud Run URL to add your feed via /sources or via /docs.

The /feed/{source_id} link can be used on your RSS app (e.g. Feedly) to retrieve the articles.

Limitations

The script uses several heuristics to determine the elements of an RSS feed (like scanning for h1, h2, h3 tags for titles). These might not work optimally for all websites.
The generator only pulls articles from a single page, which means it won't include older posts that require pagination.
The RSS feeds generated do not include images.
Complex dynamic websites using JavaScript to load content may not work well with the current scraping approach.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
img		img
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RSS Generator

What is RSS?

Features

Tech Stack

Pre-requisites

Installation

Required Google Cloud Permissions

Usage

Running the API locally

API Documentations

Creating a New Feed Source

Response format

Finding the selector

Selector Example

Complete Workflow Example

Deployment to Cloud Run

Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

pschua/rss-feed-generator

Folders and files

Latest commit

History

Repository files navigation

RSS Generator

What is RSS?

Features

Tech Stack

Pre-requisites

Installation

Required Google Cloud Permissions

Usage

Running the API locally

API Documentations

Creating a New Feed Source

Response format

Finding the selector

Selector Example

Complete Workflow Example

Deployment to Cloud Run

Limitations

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages