Codegate Projects (repos) #454

lukehinds · 2024-12-26T17:44:57Z

projects is something that myself and @jhrozek discussed, and the more our discussion unfolded, the more it was realised how useful this could be in unlocking many features that would be highly beneficial to users. It is also an ‘inevitable’ that we will need such a feature, so sooner the better and less painful in the long run.

This document attempts to describe the potential scope of Projects, what might they look like, and what they could unlock feature-wise.

Non-Goals

Let’s start off with non-goals, as this is important. Projects would not mean a multi-tenant system with access control, hierarchy etc. If we need that (which will be much later) then a system such as Minder would be a more logical choice.

So what is a Project?

A project (in respect of this document) is essentially a means of grouping elements under a single codebase / repository.

At present, we track the following elements, which are collated together as singular object:

Prompt (chat or FIM based, either from the user or the IDE / Agent)
LLM Output
Secrets
Malicious / Deprecated Packages

These can be further broken down into pipelines, where Secrets and Malicious Packages are inserted into a flow of checks via the pipeline system.

Over time, we plan to introduce more elements, such as token count, suspicious shell patterns, and folder obfuscation etc. It’s at this point that you can start to see a large amount of data all sitting in the same pile, making for an organized mess in the dashboard.

A project is a representation of a git repository, perhaps mounted in a volume or by other means (TBD)

The first obvious benefit of Projects in CodeGate is the grouping together of elements, neatly under a parent (a repository). This will make the UX far more user-friendly than a big long ever increasing list of prompts, secrets and malicious packages.

Root Element (Repository)

The Nexus of a Project is the Repository. This is a logical component to use, for the following reasons:

Prompts are specific to a codebase / repository (outside of generic chatbot scenarios)
Outputs are specific to a codebase / repository
Files and their paths are specific to a codebase / repository
Secrets and Malicious packages are are specific to a codebase / repository

So we see we have logical grouping with a repository.

With a repository as the root element / grouping and local file access of the repository being made available to codegate, this then allows some highly useful features

Better UI Organization

Right now we lump everything in together, with separate buckets for each repo, it will be far more useful to helping users find what they need.

Project Based System Prompt

Having a code specific project, allows users to set a system prompt to a specific project (e.g., "You are an expert python developer with expertise in asynchronous networking and software security"). Of the few coding assistants that provide a customizable prompt, its global and often they are forced into adopting the IDE’s root system prompt

*Custom instructions in Cline*

Snapshot Generation

Snapshots are becoming increasingly popular, as a means of giving a model a holistic overview of the entire codebase, thereby making it produce more ‘project wide’ specific recommendations

Many of them exist, but they are all singular with no way of associating them with projects or checking their freshness:

Files-to-prompt - https://github.com/simonw/files-to-prompt
Code2Prompt - https://github.com/mufeedvh/code2prompt
https://gh-repo-dl.cottonash.com/
1filellm - https://github.com/jimmc414/1filellm
Repopack https://github.com/yamadashy/repopack
Ingest - https://github.com/sammcj/ingest

acme
├── src
│   ├── backend
│   │   └── hello_world.py
│   └── frontend
│       └── hello_world.js


### Project Files
- `acme/src/python/hello_world.py`
- `acme/src/javascript/hello_world.js`

#### acme/src/backend/hello_world.py
def main():
    print("Hello, World!")

if __name__ == "__main__":
    main()

#### acme/src/frontend/hello_world.js
console.log("Hello, World!");

Token Metering and MUXing

Token usage per prompt / conversation, with something like provider filtering can offer good insight into what is costing the user more money and what optimizations could be made.

This is just a few of the ideas that projects could bring about.

The text was updated successfully, but these errors were encountered:

github-actions bot added the needs-triage label Dec 26, 2024

lukehinds added feature-request and removed needs-triage labels Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codegate Projects (repos) #454

Codegate Projects (repos) #454

lukehinds commented Dec 26, 2024 •

edited

Loading

Codegate Projects (repos) #454

Codegate Projects (repos) #454

Comments

lukehinds commented Dec 26, 2024 • edited Loading

Non-Goals

So what is a Project?

Root Element (Repository)

Better UI Organization

Project Based System Prompt

Snapshot Generation

Token Metering and MUXing

lukehinds commented Dec 26, 2024 •

edited

Loading