Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CHANGELOG: Fix handling of duplicate terms in SparsePolynomial #915

Conversation

woojucy
Copy link

@woojucy woojucy commented Jan 4, 2025

Description

In the current implementation of univariate::SparsePolynomial::from_coefficients_vec, duplicate terms with the same degree are not explicitly handled. As a result, if multiple terms with the same degree are provided, they remain as separate entries in the resulting sparse polynomial. This can lead to inconsistencies in polynomial operations and unnecessary redundancy.

Current Behavior:
Given an input like:

vec![
    (0, F::from(2)), 
    (0, F::from(1)), 
    (0, F::from(1)), 
    (3, F::from(5)),
    (3, F::from(3)),
    (4, F::from(0)),
]

The resulting polynomial contains duplicate terms without merging them.

Proposed Improvement:

  1. Sort Terms by Degree: Ensure terms are sorted by degree.
  2. Merge Duplicate Terms: For terms with the same degree, sum their coefficients into a single term.
  3. Remove Zero Coefficient Terms: Eliminate terms where the coefficient becomes zero after merging.

Proposed Implementation Example:

fn from_coefficients_vec(mut coeffs: Vec<(usize, F)>) -> Self {
    // Sort terms by degree
    coeffs.sort_by(|(c1, _), (c2, _)| c1.cmp(c2));

    // Merge duplicate terms
    let mut merged_coeffs: Vec<(usize, F)> = Vec::new();
    for (degree, coeff) in coeffs {
        if let Some((last_degree, last_coeff)) = merged_coeffs.last_mut() {
            if *last_degree == degree {
                *last_coeff += coeff; // Combine coefficients for duplicate degrees
                continue;
            }
        }
        merged_coeffs.push((degree, coeff));
    }

    // Remove zero coefficient terms
    merged_coeffs.retain(|(_, coeff)| !coeff.is_zero());

    // Create the SparsePolynomial
    Self { coeffs: merged_coeffs }
}

Expected Behavior:
For the same input:

[(0, F::from(4)), (3, F::from(8))]

The resulting polynomial will now be consistent and free from redundant terms.

Why is this change beneficial?

  • Ensures clarity and consistency in sparse polynomial representations.
  • Prevents redundant terms, which may cause unexpected results in polynomial operations.
  • Aligns with expected mathematical representations of polynomials.

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (master)
  • Linked to GitHub issue with discussion and accepted design OR have an explanation in the PR that describes this work.
  • Wrote unit tests
  • Updated relevant documentation in the code
  • Added a relevant changelog entry to the Pending section in CHANGELOG.md
  • Re-reviewed Files changed in the GitHub PR explorer

@woojucy woojucy requested review from a team as code owners January 4, 2025 12:08
@woojucy woojucy requested review from z-tech, Pratyush and tyshko-rostyslav and removed request for a team January 4, 2025 12:08
@z-tech
Copy link
Contributor

z-tech commented Jan 4, 2025

Hi, thank you for raising this PR

  1. Do we think this case of inputs with duplicate terms is occurring organically?
  2. Have we considered instead that the function should reject when the vector is not already simplified?

At the least, approach 2 will force callers to become aware of situation 1, which seems to be the focus of the PR.

To do this safely, I think it would be more prudent to open a function call with a strongly defined "valid input" and then later deprecate this one so that callers can at their own pace address the potential unexpected results in polynomial operations mentioned.

What are thoughts?

@woojucy
Copy link
Author

woojucy commented Jan 6, 2025

That's a good point. We need to examine how frequently duplicate terms naturally occur.

Additionally, we should also consider the issue of "consistency" in the function's behavior. For example, comparing the two polynomial cases below:

  1. univariate::SparsePolynomial example:

    let poly = univariate::SparsePolynomial::from_coefficients_vec(vec![
        (0, Fq::from(2)), 
        (0, Fq::from(1)), 
        (0, Fq::from(1)), 
        (3, Fq::from(5)),
        (3, Fq::from(3)),
        (4, Fq::from(0)),
    ]);
  2. multivariate::SparsePolynomial example:

    let poly2 = multivariate::SparsePolynomial::from_coefficients_vec(
        1,
        vec![
            (Fq::from(0), SparseTerm::new(vec![(0, 4)])),
            (Fq::from(3), SparseTerm::new(vec![(0, 3)])),
            (Fq::from(5), SparseTerm::new(vec![(0, 3)])),
            (Fq::from(1), SparseTerm::new(vec![])),
            (Fq::from(1), SparseTerm::new(vec![])),
            (Fq::from(2), SparseTerm::new(vec![])),
        ],
    );

In the first case, it seems the function does not handle duplicate terms explicitly, while in the second case, the input is considered valid and behaves as expected. While we would expect both polynomials to behave consistently, the first function appears to lack clear input validation, resulting in inconsistency.

To address this issue, do you think introducing a new function call that enforces valid input, followed by a gradual deprecation of the current function, would be sufficient as you suggested?

I’d appreciate your thoughts on this.

@z-tech
Copy link
Contributor

z-tech commented Feb 5, 2025

That's a good point. We need to examine how frequently duplicate terms naturally occur.

Additionally, we should also consider the issue of "consistency" in the function's behavior. For example, comparing the two polynomial cases below:

  1. univariate::SparsePolynomial example:
    let poly = univariate::SparsePolynomial::from_coefficients_vec(vec![
        (0, Fq::from(2)), 
        (0, Fq::from(1)), 
        (0, Fq::from(1)), 
        (3, Fq::from(5)),
        (3, Fq::from(3)),
        (4, Fq::from(0)),
    ]);
  2. multivariate::SparsePolynomial example:
    let poly2 = multivariate::SparsePolynomial::from_coefficients_vec(
        1,
        vec![
            (Fq::from(0), SparseTerm::new(vec![(0, 4)])),
            (Fq::from(3), SparseTerm::new(vec![(0, 3)])),
            (Fq::from(5), SparseTerm::new(vec![(0, 3)])),
            (Fq::from(1), SparseTerm::new(vec![])),
            (Fq::from(1), SparseTerm::new(vec![])),
            (Fq::from(2), SparseTerm::new(vec![])),
        ],
    );

In the first case, it seems the function does not handle duplicate terms explicitly, while in the second case, the input is considered valid and behaves as expected. While we would expect both polynomials to behave consistently, the first function appears to lack clear input validation, resulting in inconsistency.

To address this issue, do you think introducing a new function call that enforces valid input, followed by a gradual deprecation of the current function, would be sufficient as you suggested?

I’d appreciate your thoughts on this.

I was thinking maybe a function is_valid_coefficients_vec would suffice? In any case, do you think that seems like reasonable place to start?

@woojucy
Copy link
Author

woojucy commented Feb 19, 2025

I fully agree with the approach of adding an is_valid_coefficients_vec function to ensure the input vector is pre-validated. I have considered the method where the function returns an error when the vector is not simplified.

Implementing this change would require modifying the return type of from_coefficients_vec. This means that all functions calling from_coefficients_vec would need changes in their signatures and additional error handling, leading to several modifications.

I would like to discuss this issue further. Are there any better ideas or conventional approaches available? If there is a consensus that this is the right direction, I am ready to proceed with implementing these changes. Your feedback will be crucial in making this decision.

@z-tech
Copy link
Contributor

z-tech commented Feb 19, 2025

It would be great if you could implement the new function is_valid_coefficients_vec. I don't think any existing functions need to be modified. This is a sequence diagram of how I see it working.

Screenshot 2025-02-19 at 20 28 39

If we do it this way we include the verification logic you've suggested and at the same time we insure backward compatibility.

What are you thoughts?

@woojucy
Copy link
Author

woojucy commented Mar 1, 2025

Originally, I considered calling the is_valid_coefficients_vec function directly within the from_coefficients_vec function. Thankfully, the function call diagram you provided made it immediately clear what you meant. Based on your feedback, I updated the commit by explicitly calling the is_valid_coefficients_vec function. This function now checks for duplicate terms and zero coefficients. Please review the updated implementation.

@z-tech
Copy link
Contributor

z-tech commented Mar 1, 2025

Thanks for this! I made some suggestions. It would be great if you could write some tests?

@woojucy
Copy link
Author

woojucy commented Mar 1, 2025

Absolutely! I've added three test cases. Could you please review them and let me know your thoughts?

However, I'm also considering whether to change how the rand_sparse_poly function calls from_coefficients_vec in the existing test code. One option could be to integrate the validity check within rand_sparse_poly without changing its output, and implement a retry mechanism. Alternatively, we could adjust the output based on the is_valid_coefficients_vec check and modify how other test codes handle these cases as well. What do you think?


#[test]
fn test_invalid_duplicate_degrees() {
let mut coeffs = vec![(1, Fr::from(2)), (1, Fr::from(3))];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut coeffs = vec![(1, Fr::from(2)), (1, Fr::from(3))];
let mut coeffs = vec![(3, Fr::from(2)), (1, Fr::from(2)), (2, Fr::from(2)), (1, Fr::from(3))];

This way you check if the sorting works as expected.

@z-tech z-tech changed the base branch from master to z-tech/fix-werror-on-must-use-annotation March 3, 2025 09:50
@z-tech
Copy link
Contributor

z-tech commented Mar 3, 2025

Thanks for this! Looks good to me.

@Pratyush Pratyush deleted the branch arkworks-rs:z-tech/fix-werror-on-must-use-annotation March 3, 2025 19:44
@Pratyush Pratyush closed this Mar 3, 2025
@z-tech
Copy link
Contributor

z-tech commented Mar 4, 2025

@woojucy we didn't mean to close. It got closed automatically when the base branch was deleted. Do you mind to reopen and base off master please? Thanks!

@woojucy
Copy link
Author

woojucy commented Mar 12, 2025

Sorry for the late response—I just got to this. I reopened the PR and rebase it onto current master branch. Let me know if there are any additional updates needed. Thanks for your patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants