Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add several values for the same property #168

Open
CharlesNepote opened this issue Mar 22, 2024 · 8 comments
Open

Add several values for the same property #168

CharlesNepote opened this issue Mar 22, 2024 · 8 comments
Labels
⭐ top issue Top issue.

Comments

@CharlesNepote
Copy link
Member

CharlesNepote commented Mar 22, 2024

Reported by Alizarine on Slack (2024-03-24).

image

Indeed, the example provides a simple use case: a product could have multiple producer_data_issue.

Option 1: do not change the API, manage it with the UI

With the UI it would be possible to enter a property several times.
In the database, the different values would be stored in a unique property/value, separated by a given sign (a ; for example).
The UI would have to manage the following cases:

  • allowing to enter a property more than one time, and saving the value along with the existing values (it should not save a value if it is already existing)
  • taking care of multiple values in the list of all the possible values for a given property: eg. https://world.openfoodfacts.org/property/producer_data_issue
  • taking care of multiple values in the search as you type feature: when entering a value, the UI is listing all possible values depending on the property

Option 2: change the API, do not change the UI

There would be probably no change at all in the UI.

@CharlesNepote CharlesNepote added the ✨ enhancement New feature or request label Mar 22, 2024
@alexgarel
Copy link
Member

@cquest do you have an opinion on that (or can you share OSM experience on that ?)

@alexgarel
Copy link
Member

On my side, I think option 1 is best for now.

Option 2 would:

  • either require a huge rethinking of the code, the API and the database… so I don't see it as a good option.
  • either require an upper layer of abstraction that encode the multiple value into one value, but would require you to reprovide all values, etc. which, I fear, would feel alien to current API.

For option 1, it should be well documented what the UI does. I think going for "," as a separator is better than ";" because it's more standard (for example it's permitted to represent parameters as list in OpenAPI).

Then you have either to deal with values containing "," or refuse them. If you deal with them, I strongly encourage using csv style (handling quote) as there are a lot of libs to deal with it and avoid silly bugs.

In my opinion, the UI should clearly handle lists (one input box per value) instead of trying to magically guess, which have the risk to lead to complicated cases (explicit is better than implicit).

Also don't try to handle more than lists, as dict, for example must be encoded in property name. (one could even argue that list could be encoded as my::value::1="value 1", my::value::2="value 2", etc., but it might be overcomplicated !)

We could even argue that list could be handled by property name: `

@alexgarel
Copy link
Member

BTW, this also relates to #151

@github-actions github-actions bot added the ⭐ top issue Top issue. label Sep 16, 2024
@teolemon teolemon removed the ✨ enhancement New feature or request label Oct 19, 2024
@kirtanchandak
Copy link
Contributor

I am working on this issue

@kirtanchandak
Copy link
Contributor

kirtanchandak commented Mar 6, 2025

I have a doubt, which one should we go ahead with

  1. Comma separated - {"k": "size", "v": "big, small"}
  2. Saving in list - {"k": "size", "v": ["small", "extrasmall", "big"]}

comma-separated would be more suitable here as we would not have to change the entire structure, just need to change the fetching of values related to specific keys.

@kirtanchandak
Copy link
Contributor

@CharlesNepote @teolemon

@suchithh
Copy link
Contributor

suchithh commented Mar 9, 2025

Hi everyone,

I don't think this is as much of an implementation problem as it is a design problem (easy to implement, hard to plan). I've been reading the discussions surrounding this issue (see #168, #151, #189, and perhaps #128) and I strongly believe that this issue (#168) should be thought about along with #189 so we don't create incompatible designs. The current solution appears to be to use namespaces (for example, assort:products-1:ja:name or assort:products-1:nutrition_facts:energy), but I actually disagree with that approach. It can lead to key sprawl and user confusion. While I understand that OFF sister sites might use this convention, these solutions seem arbitrary and hard for general users to work with, not to mention the extra implementation required to parse the response.

With this in mind, I want to tackle two birds with one stone: allowing multiple values for the same key and allowing contextual flags without key sprawl. Here's my proposed solution that addresses the challenges while maintaining backward compatibility and avoiding unique constraint violations. Please keep in mind that this deviates from OSM's approach, but this is a design decision I believe is worth considering which could potentially help towards #128

Suggestion: Multiple Values and Contextual Flags with a Separate args Array

Our goal is to standardize how we handle keys with contextual extensions without causing key sprawl. Instead of embedding context in the key (e.g. using colon-separated strings like data_quality:robotoff_issue:product_version), I propose we split the key and its context. The base key will always be stored in the key field, and any additional context will be stored in a new JSONB column called args as an array.

Schema Changes to Support Contextual Flags

  1. Add the args Column (Aims to tackle Conventions and UIs for grouped properties-values #189)

This column will store an array of contextual flags. If a key is provided without any contextual extension, args will simply be an empty array.

ALTER TABLE folksonomy
ADD COLUMN args JSONB DEFAULT '[]'::JSONB;
  1. Data Insertion Logic

When a composite key (e.g. data_quality:robotoff_issue:product_version) is submitted:

  • The first segment (data_quality) becomes the key.
  • The remaining segments (i.e. ["robotoff_issue", "product_version"]) are stored in the args column.

For example:

  • Input: data_quality:robotoff_issue:product_version
*   Stored as:

  ```json
  {
    "product": "0055144524653",
    "key": "data_quality",
    "value": "Yes",
    "args": ["robotoff_issue", "product_version"]
  }
  ```
  • Input: data_quality:product_opener_issue
*   Stored as:

  ```json
  {
    "product": "0055144524653",
    "key": "data_quality",
    "value": "Yes",
    "args": ["product_opener_issue"]
  }
  ```
  • Input with no context, e.g. data_quality
*   Stored as:

  ```json
  {
    "product": "0055144524653",
    "key": "data_quality",
    "value": "Yes",
    "args": []
  }
  ```
  1. Unique Constraint Adjustments

The current unique constraint on (product, owner, k) prevents multiple values for the same property. With this design, we drop the old constraint and create a new one that also considers the value and args fields:

CREATE UNIQUE INDEX ON folksonomy (product, owner, key, value, args);

This ensures that a single product (and owner) can have multiple entries for the same base key, as long as the combination of value and context (in args) is unique.

Querying and API Endpoint Design

When querying, we need to handle both cases: filtering by a specific set of context flags and retrieving entries that have no context.

Suggestion 1: Querying for Specific Context using an additional args query

Suppose you want to query for the entry that specifically represents a "robotoff issue" with a "product version" context. You can design the API to accept a query parameter (e.g., args) as a comma-separated list. For example:

GET /product/{product_id}/data_quality?args=robotoff_issue,product_version

Suggestion 2: Handle splitting in the backend

In the backend, you can split keys by the colon and store the first part as key and the rest as args preserving the current key structure.
For example, if you receive data_quality:robotoff_issue:product_version, you can split
it into key = 'data_quality' and args = ['robotoff_issue', 'product_version'].
Hence, the GET request would be unchanged from how it's now:

GET /product/{product_id}/data_quality:robotoff_issue:product_version

After this, the query will:

  • Filter rows where key = 'data_quality'.
  • Use PostgreSQL’s JSONB containment operator to check that the args array contains both "robotoff_issue" and "product_version":
SELECT *
FROM folksonomy
WHERE product = '{product_id}'
  AND key = 'data_quality'
  AND args @> '["robotoff_issue", "product_version"]';

This query works as long as you enforce a canonical order or simply use the containment operator, which ignores order. If you require precise matching (order matters), you may need additional logic, but typically the containment check is sufficient.

Handling Cases with No Args but Multiple Values for Same Key (#168)

For entries without any contextual flags (i.e., args is an empty array), the query can simply omit filtering on args. The endpoint can be designed to accept an optional args query parameter. If args is not provided, the API returns all entries for the given base key and if provided, it filters accordingly.

Example API Endpoints

1. Inserting a Property with Context

When adding or updating a property, the client sends the composite key which is then split server-side:

POST /product/0055144524653/data_quality
Content-Type: application/json

{
  "v": "Yes",
  "args": ["robotoff_issue", "product_version"]  // Optional: omit or set to [] if no context
}

2. Querying for a Specific Context

To query for a specific contextual combination:

GET /product/0055144524653/data_quality?args=robotoff_issue,product_version

Backend logic would:

  • Parse the args query parameter into an array: ["robotoff_issue", "product_version"].
  • Execute the query using the JSONB containment operator.

3. Querying Without Context

To query entries for data_quality regardless of context:

GET /product/0055144524653/data_quality

This would return all entries where key = 'data_quality', including those with empty args.

Backward Compatibility for Single-Value API Responses

Problem: Existing users expect GET /product/{product}/{key} to return a single value.

Solution:

  • Let's keep the current endpoint behavior by returning the latest value (sorted by last_edit).
SELECT * FROM folksonomy
WHERE product = '0055144524653' AND key = 'producer_data_issue'
ORDER BY last_edit DESC
LIMIT 1;
  • Then we can add a new parameter (?all=true) that returns all values as an array.

For example:

GET /product/0055144524653/producer_data_issue?all=true

Deprecation strategy: We can document the legacy behavior as "returns the latest value" and tell new clients to use ?all=true to get the full history.

I'm new to OFF and I understand that there are a lot of moving parts here so I would love to hear your thoughts and feedback.

Charles' Solution 1: Delimiter-Separated Values (OSM Style)

Or we can go ahead with Option 1 where we simply use a delimiter like OSM. I genuinely think this is a great option too.

@alexgarel
Copy link
Member

alexgarel commented Mar 13, 2025

@suchithh thanks for the clear proposal and the research.

I'm not sure how you idea solve the list problem.

Also it does not solve the suggestion problem (how do you help user finding the right keys)

Also reducing keys to one word seems to go against the initial design.

The args list you are proposing is interesting though, it's very close to my key / value proposal

Maybe we could have something like:

my:list:property(1):deep:inside(2)

translate to key --> my:list:property:deep:inside args --> {property: 1, inside: 2}
Or even args --> [null,null,1,null,2]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⭐ top issue Top issue.
Projects
Status: Todo
Development

No branches or pull requests

5 participants