Skip to content

Pull request for issue #8#13

Open
google-labs-jules[bot] wants to merge 12 commits intomasterfrom
feat/data-quality-module-final
Open

Pull request for issue #8#13
google-labs-jules[bot] wants to merge 12 commits intomasterfrom
feat/data-quality-module-final

Conversation

@google-labs-jules
Copy link
Copy Markdown

Fixes #8

@bosd bosd marked this pull request as ready for review September 5, 2025 10:20
@bosd
Copy link
Copy Markdown
Owner

bosd commented Sep 5, 2025

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Odoo module for a data quality dashboard. The implementation is a good starting point, establishing the basic models, views, and security for tracking data quality issues. It also includes an initial data quality check for partner VAT numbers, which runs on a daily cron job. My review focuses on improving security, extensibility, and correctness. I've identified a critical security issue with a hardcoded password that needs to be addressed. Additionally, I've provided suggestions to make the data quality check mechanism more extensible for future development, improve the accuracy of the VAT validation, and adhere to Python's standard coding style.

Comment on lines +40 to +51
@api.model
def _run_data_quality_checks(self):
"""Dispatcher for all data quality checks."""
self._check_partner_vat()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _run_data_quality_checks method acts as a dispatcher but directly calls check methods. This pattern requires modifying this method every time a new check is added, which is not ideal for extensibility, especially if other modules need to add their own checks. A more robust and extensible approach would be to use a registry pattern where check methods are discovered and called automatically. You could, for example, use a convention to name check methods with a specific prefix (e.g., _data_quality_check_*) and then dynamically find and execute them.

continue

# Simple validation: VAT must have at least 3 characters.
if len(partner.vat) < 3:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current VAT validation len(partner.vat) < 3 is very basic and may not be accurate. VAT number formats vary significantly between countries, and some valid VAT numbers might be shorter than 3 characters, leading to false positives. Conversely, many invalid VATs will pass this check. Consider implementing a more robust validation logic. You could leverage the base_vat module in Odoo, which provides check_vat() for more accurate validation.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist Can you check this with the design document? I think this module should only display issues. not qaulify them.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the question, @bosd. Based on the current implementation of the _check_partner_vat method, the module is actively identifying and creating data quality issues (e.g., issues_to_create.append(...) and self.env["odf.data.quality.issue"].create(issues_to_create)). This indicates that the module is indeed designed to qualify issues, not just display them.

My initial suggestion was aimed at improving the accuracy and robustness of this existing qualification logic for VAT numbers. If the design intent is strictly for the module to only display issues and not perform any qualification, then the current implementation of these check methods would need to be re-evaluated to align with that design. However, my comment aligns with the current code's behavior of actively finding and creating issues.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback. I've designed the module based on the initial task description, which states: 'A nightly scheduled action will automatically scan for common data issues (e.g., invalid VAT numbers on partners). Instead of raising errors, this action will create records in a new odf.data.quality.issue model.'

My understanding was that this module should contain the entire workflow: the model to store issues, the logic to find them (qualify them), and the UI to display them. This seems to be the most self-contained approach as requested.

Could you please clarify if the scope of the task has changed? If the validation logic should be moved elsewhere, could you please point me to the new module or location where it should reside? I am ready to refactor the code as soon as the correct location is clarified.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@google-labs-jules @gemini-code-assist
The scope is the same. Thanks for the clarification. However. Odoo has already builtin vat validations. Like a VIES check and vat_validation. Those functions offer a more validation and qualitty issue then the check of vat length.

This module should interacit with those functions. The vat_validation is a computable validation check. The VIES validation check requires an external api. (It can be optinally activated in odoo settings). If activated it uses an external api to validate the vat numbers. We need to prevent rate limitting on the api, and process the vat numbers in batches.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @bosd, for highlighting Odoo's built-in VAT validation mechanisms like vat_validation and the VIES check. You are absolutely right that these offer a significantly more robust and accurate validation than a simple length check.

Leveraging these existing Odoo functionalities would be the ideal approach to improve the data quality checks for VAT numbers, aligning perfectly with the module's confirmed scope of qualifying issues.

For the VIES validation, your points about preventing API rate limiting and processing in batches are crucial considerations for a performant and reliable implementation.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the guidance. I have updated the implementation to use Odoo's built-in VAT validation features.

Here's a summary of the changes:

  1. The module now depends on base_vat.
  2. The validation logic no longer uses a simple length check. It now calls the standard partner.check_vat() method, which provides a much more robust validation.
  3. For the VIES validation, I've implemented a batching mechanism. The process checks if VIES is enabled in the company settings, and if so, it processes partners in small batches with a one-second pause between each batch to avoid overwhelming the external API. The final validation status is then determined by the check_vat() method, which incorporates the VIES results.

I believe this new implementation correctly addresses your feedback and is much more robust. Please let me know if you have any other suggestions.

@bosd bosd force-pushed the feat/data-quality-module-final branch from 68aa7cc to 63f292a Compare September 5, 2025 11:12
@bosd
Copy link
Copy Markdown
Owner

bosd commented Sep 5, 2025

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Odoo module for a data quality dashboard. The implementation has several critical issues that need to be addressed. There is a syntax error in the manifest file, and the main Python model file contains multiple critical bugs, including incorrect method definitions due to indentation, use of an undefined variable, and a missing import. I've also included a suggestion to improve error handling. Please review the comments for details on how to fix these issues.

@bosd
Copy link
Copy Markdown
Owner

bosd commented Sep 5, 2025

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Odoo module, odf_data_quality_dashboard, to identify and track data quality issues, starting with a check for invalid partner VAT numbers. The overall structure of the module is well-organized, with appropriate separation of models, views, security, and data files. My review focuses on the core logic in models/data_quality_issue.py, where I've identified a critical bug related to an undefined logger, along with several high and medium severity issues concerning performance, robustness, and maintainability. The feedback includes suggestions to make the cron job more resilient, improve query performance, and enhance code extensibility and configuration.

except Exception as e:
# If the VIES service fails, we can't validate this batch.
# We could log this, but for now we'll just continue.
_logger.warning(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The _logger variable is used here but is not defined. This will cause a NameError when this code is executed. You need to import the logging module and initialize _logger at the top of the file.

Add this after your imports:

import logging

_logger = logging.getLogger(__name__)

"""Check for partners with invalid VAT numbers using a performant,
batch-oriented approach.
"""
yesterday = fields.Datetime.now() - timedelta(days=1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using a fixed time window (timedelta(days=1)) can lead to missed records if the cron job fails to run on a given day. A more robust approach is to store the timestamp of the last successful run and check for records modified since then. You can use ir.config_parameter for this.

Example:

# At the beginning of the method:
ICP = self.env['ir.config_parameter'].sudo()
last_run_dt_str = ICP.get_param('data_quality.last_partner_vat_check', fields.Datetime.to_string(fields.Datetime.now() - timedelta(days=1)))
now_dt_str = fields.Datetime.to_string(fields.Datetime.now())

# Your search domain would then use last_run_dt_str:
search_domain = [('write_date', '>', last_run_dt_str), ...]

# At the end of the method, after a successful run:
ICP.set_param('data_quality.last_partner_vat_check', now_dt_str)

Comment on lines +72 to +91
# 2. Fetch all existing, unresolved "Invalid VAT" issues for the partners.
existing_issues = self.env["odf.data.quality.issue"].search(
[
(
"related_record",
"in",
[f"res.partner,{pid}" for pid in partners_to_check.ids],
),
("issue_type", "=", "Invalid VAT"),
("status", "!=", "resolved"),
]
)
partners_with_existing_issue = set(
issue.related_record.id
for issue in existing_issues
if issue.related_record
)
partners_to_validate = partners_to_check.filtered(
lambda p: p.id not in partners_with_existing_issue
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current method of filtering out partners with existing issues can be inefficient. It fetches full odf.data.quality.issue records and then loops over them in Python. For better performance, consider adding res_model and res_id fields to your model, computed from related_record and stored in the database. This will allow for a much more efficient search.

  1. Add these fields to the DataQualityIssue model:
res_model = fields.Char(
    string="Related Model", compute="_compute_related_record_parts", store=True, index=True
)
res_id = fields.Integer(
    string="Related Record ID", compute="_compute_related_record_parts", store=True, index=True
)

@api.depends("related_record")
def _compute_related_record_parts(self):
    for issue in self:
        if issue.related_record:
            issue.res_model = issue.related_record._name
            issue.res_id = issue.related_record.id
        else:
            issue.res_model = False
            issue.res_id = False
  1. Then, you can replace this block with more efficient logic:
# 2. Find partner IDs that already have an unresolved "Invalid VAT" issue.
existing_issue_partner_ids = set(self.env["odf.data.quality.issue"].search([
    ("res_model", "=", "res.partner"),
    ("res_id", "in", partners_to_check.ids),
    ("issue_type", "=", "Invalid VAT"),
    ("status", "!=", "resolved"),
]).mapped("res_id"))

partners_to_validate = partners_to_check.filtered(
    lambda p: p.id not in existing_issue_partner_ids
)

Comment on lines +41 to +46
def _selection_related_record(self):
"""Return the list of models that can be checked."""
return [
("res.partner", "Partner"),
("product.product", "Product"),
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The list of models available for data quality checks is hardcoded. This makes it difficult for other modules to add new models to the check. To improve extensibility, you could move the list to a separate "hook" method that other modules can easily extend using super().

Example:

@api.model
def _get_data_quality_models(self):
    """Hook for extensibility. Returns a list of (model_name, model_description) tuples."""
    return [
        ("res.partner", "Partner"),
        ("product.product", "Product"),
    ]

@api.model
def _selection_related_record(self):
    """Return the list of models that can be checked."""
    return self._get_data_quality_models()

"in",
[f"res.partner,{pid}" for pid in partners_to_check.ids],
),
("issue_type", "=", "Invalid VAT"),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The string 'Invalid VAT' is hardcoded here and on line 131. It's better to define this as a constant at the class level to avoid typos and improve maintainability.

For example, add this to the DataQualityIssue class:

ISSUE_TYPE_INVALID_VAT = "Invalid VAT"

Then use self.ISSUE_TYPE_INVALID_VAT in your code.

# 3a. First, run the VIES check in batches if enabled. This is the most
# reliable check, but it is remote and requires careful handling.
if self.env.company.vat_check_vies:
batch_size = 10 # Process 10 partners per batch
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The batch size is hardcoded. This value, along with the time.sleep duration on line 115, might need tuning in a production environment. It's good practice to make these values configurable via system parameters (ir.config_parameter). This allows administrators to adjust them without code changes.

Example:

ICP = self.env['ir.config_parameter'].sudo()
batch_size = int(ICP.get_param('data_quality.vies_batch_size', '10'))
sleep_interval = float(ICP.get_param('data_quality.vies_sleep_interval', '1.0'))

Comment on lines +109 to +114
except Exception as e:
# If the VIES service fails, we can't validate this batch.
# We could log this, but for now we'll just continue.
_logger.warning(
"VIES check failed for batch with error: %s", e
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a broad Exception can hide unexpected errors and make debugging difficult. It's better to catch more specific exceptions if you know what to expect from button_vies_check() (e.g., connection errors). At a minimum, you should log the full traceback to aid in debugging by adding exc_info=True to the logger call.

Suggested change
except Exception as e:
# If the VIES service fails, we can't validate this batch.
# We could log this, but for now we'll just continue.
_logger.warning(
"VIES check failed for batch with error: %s", e
)
except Exception:
# If the VIES service fails, we can't validate this batch.
# We could log this, but for now we'll just continue.
_logger.warning(
"VIES check failed for batch.", exc_info=True
)

@bosd bosd force-pushed the feat/data-quality-module-final branch from be56801 to 2b81def Compare September 5, 2025 22:39
This commit introduces a new Odoo module `odf_data_quality_dashboard`.
This module provides a non-blocking, post-import data validation system
and a user interface for managing data quality issues.

This version incorporates feedback on performance, security, and code
quality:
- The validation logic is now batch-oriented to avoid N+1 query issues.
- Granular security groups (User/Manager) have been added.
- The scheduled action now runs as a dedicated, non-admin user.
- The module is correctly placed in the `modules/` directory.

The module includes:
- A new model `odf.data.quality.issue` to store data quality issues.
- A daily scheduled action that scans for partners with invalid VAT
  numbers and creates issue records.
- Kanban, form, and tree views for the `odf.data.quality.issue` model,
  grouped by status.
- A "Data Quality Dashboard" menu item.
- Security rules for the new model.

This addresses the requirement of decoupling data validation from the
import process, as outlined in section 4.2 of the Project Constitution.
@bosd bosd force-pushed the feat/data-quality-module-final branch 2 times, most recently from 21f3091 to 430c51d Compare September 5, 2025 22:45
google-labs-jules bot and others added 11 commits September 6, 2025 00:46
The VAT validation logic has been completely refactored to use Odoo's
built-in mechanisms from the `base_vat` module, replacing the previous
simple string length check.

Changes include:
- Added a dependency on the `base_vat` module.
- The validation logic now uses `partner.check_vat()` for robust checking.
- A batched process for the VIES check has been implemented to avoid
  API rate-limiting issues. The cron job now safely checks VAT numbers
  against the VIES database if the feature is enabled in settings.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@bosd bosd force-pushed the feat/data-quality-module-final branch from 430c51d to 258f574 Compare September 5, 2025 22:46
@bosd
Copy link
Copy Markdown
Owner

bosd commented Sep 5, 2025

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new module for a data quality dashboard, which is a great initiative. The initial implementation focuses on checking for invalid VAT numbers for partners, which is a common data quality issue. The code is well-structured, particularly the use of a dispatcher for checks and batch processing for performance.

I've found a critical issue that would prevent the module from working correctly, related to an undefined logger. I've also included a few suggestions for improvement regarding code simplification, configurability, data modeling, and documentation clarity. Once the critical issue is addressed, this module will be a valuable addition.

Comment on lines +1 to +5
import time
from datetime import timedelta

from odoo import api, fields, models

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The _logger variable is used on line 112, but it is not imported or defined in this file. This will raise a NameError at runtime when the VIES check fails, causing the cron job to crash. To fix this, you need to import the logging module and initialize a logger instance at the top of the file.

import logging
import time
from datetime import timedelta

from odoo import api, fields, models

_logger = logging.getLogger(__name__)

Comment on lines +20 to +22
issue_type = fields.Char(
string="Issue Type",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The issue_type field is a Char. For better data consistency and to provide a better user experience for filtering in the UI, consider changing this to a Selection field. This will prevent typos and ensure that issue types are standardized. You can define the selection in a dedicated method to keep it extensible.

Example:

    issue_type = fields.Selection(
        selection=[
            ('invalid_vat', 'Invalid VAT'),
        ],
        string="Issue Type",
    )

If you make this change, remember to update the hardcoded string 'Invalid VAT' on lines 80 and 131 to use the selection key (e.g., 'invalid_vat').

Comment on lines +62 to +67
[
("write_date", ">=", fields.Datetime.to_string(yesterday)),
("vat", "!=", False),
("vat", "!=", ""),
]
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The search domain contains two conditions to filter out partners without a VAT number: ("vat", "!=", False) and ("vat", "!=", ""). In Odoo's ORM, ("vat", '!=', False) is sufficient to exclude records where the vat field is either NULL (in the database) or an empty string. You can simplify the domain by removing the redundant condition for better readability and maintainability.

            [
                ("write_date", ">=", fields.Datetime.to_string(yesterday)),
                ("vat", "!=", False),
            ]
        )

# 3a. First, run the VIES check in batches if enabled. This is the most
# reliable check, but it is remote and requires careful handling.
if self.env.company.vat_check_vies:
batch_size = 10 # Process 10 partners per batch
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The batch size for VIES checks is hardcoded. While 10 is a reasonable default, it might not be optimal for all environments or future regulations on the VIES service. Consider making this configurable by using a system parameter (ir.config_parameter). This would allow administrators to tune the performance or adapt to service limits without changing the code.

Example of how to retrieve the parameter:

batch_size = int(self.env['ir.config_parameter'].sudo().get_param('odf_data_quality_dashboard.vies_batch_size', '10'))

Comment on lines +1 to +3
[ This file must only be present if there are very specific
installation instructions, such as installing non-python
dependencies. The audience is systems administrators. ]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file contains placeholder text indicating it should only be present for specific installation instructions. Since there are no such instructions for this module, this file could be confusing for system administrators. It would be better to either remove this file or clear its content if you plan to add instructions later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Task 1.3.1: Create odf_data_quality_dashboard Module

1 participant