Skip to content

Adjust Column Value

adriancs edited this page Mar 15, 2025 · 10 revisions

Disclaimer:

To expedite this guide, sample code was generated using Grok AI. Some examples are untested and may contain bugs or inaccuracies. They are provided for explanatory purposes to illustrate concepts, not as production-ready solutions.

- March 13, 2025.

This is considered an advanced feature in MySqlBackup.NET, introduced in v2.3.9 (March 12, 2025). It allows users to dynamically adjust column values during export.

This is a follow up update on a pull request performed by Tyler Pflueger (tpflueger) https://github.com/tpflueger

A special thank for code contribution of Tyler Pflueger.

For more information on the code changes on this release, please visit the pull/merge at: https://github.com/MySqlBackupNET/MySqlBackup.Net/pull/113


The AdjustColumnValue delegate in ExportInformations lets you transform column values during a MySQL database export without altering the source data. Whether you need to format numbers, mask sensitive information, or apply custom logic, this feature gives you full control over the exported SQL output.

The new added info object (class):

namespace MySqlConnector.InfoObjects
{
    public class ColumnWithValue
    {
        public string TableName { get; set; }
        public string ColumnName { get; set; }
        public string MySqlDataType { get; set; }
        public object Value { get; set; }
    }
}

Which will be used with ExportInfo

backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
    // perform custom logic

    return columnInfo.Value;
};

Simple Example: Padding Numbers

Suppose you have a result table where the result column contains inconsistent numeric strings or NULL values, and you want to standardize them to 6-digit strings with leading zeros in the export:

Original Data:

INSERT INTO `result` (id, code, result) values
(1, 'A0001', null),
(2, 'B0003', '1'),
(3, 'C0003', '803');

With AdjustColumnValue, you can transform it like this:

backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
    if (columnInfo.TableName == "result" && columnInfo.ColumnName == "result")
    {
        return columnInfo.Value == null ? "000000" : columnInfo.Value.ToString().PadLeft(6, '0');
    }
    return columnInfo.Value;
};

Exported Result:

INSERT INTO `result` (id, code, result) values
(1, 'A0001', '000000'),
(2, 'B0003', '000001'),
(3, 'C0003', '000803');

This simple tweak ensures consistent formatting for reporting or downstream systems. But that’s just the start—AdjustColumnValue supports any logic you can imagine, from data masking to complex lookups, all driven by the rich context of the ColumnWithValue object (table name, column name, data type, and value). Dive in to see how it can fit your needs!

Here's the full code example:

using (var conn = new MySqlConnection("your_connection_string"))
{
    conn.Open();
    using (var cmd = new MySqlCommand { Connection = conn })
    {
        var backup = new MySqlBackup(cmd);
        backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
        {
            if (columnInfo.TableName == "result" && columnInfo.ColumnName == "result")
            {
                return columnInfo.Value == null ? "000000" : columnInfo.Value.ToString().PadLeft(6, '0');
            }
            return columnInfo.Value;
        };
        backup.ExportInfo.TablesToBeExportedList = new List<string> { "result" };
        backup.ExportToFile("modified_database_dump.sql");
    }
}

Example 2: Masking Sensitive Data During Export

Scenario: A company needs to export a MySQL database for testing purposes but must protect sensitive customer information, such as email addresses, by masking them in the exported SQL dump. The original data should remain unchanged in the database, but the exported file should obscure the emails.

Code Example

using MySqlConnector;
using MySqlConnector.InfoObjects;
using System;

class Program
{
    static void Main()
    {
        // Initialize MySqlCommand (assume connection is set up)
        MySqlCommand cmd = new MySqlCommand();
        cmd.Connection = new MySqlConnection("your_connection_string");
        cmd.Connection.Open();

        // Create MySqlBackup instance
        var backup = new MySqlBackup(cmd);

        // Configure ExportInformations with AdjustColumnValue
        backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
        {
            // Check if the column is an email field
            if (columnInfo.ColumnName.Equals("email", StringComparison.OrdinalIgnoreCase))
            {
                // Mask the email address (e.g., "[email protected]" -> "jo****@e*****.com")
                string originalEmail = columnInfo.Value?.ToString();
                if (!string.IsNullOrEmpty(originalEmail))
                {
                    int atIndex = originalEmail.IndexOf('@');
                    if (atIndex > 2)
                    {
                        string masked = originalEmail.Substring(0, 2) + "****" + 
                                       "@" + originalEmail[(atIndex + 1)..(atIndex + 2)] + "*****.com";
                        return masked;
                    }
                }
                return originalEmail; // Return unchanged if masking fails
            }
            // Return original value for all other columns
            return columnInfo.Value;
        };

        // Export to a file
        backup.ExportToFile("masked_database_dump.sql");

        cmd.Connection.Close();
    }
}

Explanation

  1. Custom Logic: The AdjustColumnValue delegate is set to inspect each column value during export.
  2. If the column is named "email" (case-insensitive), it masks the email address by keeping the first two characters, replacing the rest of the local part with "****", and partially masking the domain.
  3. For all other columns, it returns the original value unchanged.
  4. Result: The exported SQL file (masked_database_dump.sql) contains masked email addresses (e.g., INSERT INTO customers VALUES (1, 'jo****@e*****.com', ...)), while the original database remains intact.

Sample Output in Exported SQL

INSERT INTO `customers` (`id`, `email`, `name`) VALUES
(1, 'jo****@e*****.com', 'John Doe'),
(2, 'ja****@e*****.com', 'Jane Smith');

Use Case Benefits

  • Data Privacy: Sensitive information is obscured in the export, making it safe for sharing with developers or testers.
  • Flexibility: The masking logic can be customized further (e.g., hashing, replacing with random data) without altering the MySqlBackup library.
  • Non-Invasive: The original database data isn’t modified, only the exported representation.

Customize with Any Logic

The AdjustColumnValue delegate in ExportInformations acts as a customizable "middle-in-between" block where users can inject any logic they want to transform column values during the export process. Since it’s a Func<ColumnWithValue, object>, it takes a ColumnWithValue object as input (providing table name, column name, data type, and value) and returns an object that will be used in the exported SQL. This gives users full control to implement whatever transformation or logic they need.

Example Variations of Custom Logic

To illustrate the flexibility, here are a few different examples of what you could do inside that block:

  1. Redact Specific Columns
backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
    if (columnInfo.TableName == "users" && columnInfo.ColumnName == "ssn")
    {
        return "XXX-XX-XXXX"; // Mask Social Security Numbers
    }
    return columnInfo.Value; // Keep other values unchanged
};
  1. Convert Data Formats
backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
    if (columnInfo.MySqlDataType.StartsWith("DATETIME") && columnInfo.Value != null)
    {
        // Reformat datetime to a custom string
        return ((DateTime)columnInfo.Value).ToString("yyyy-MM-dd");
    }
    return columnInfo.Value;
};
  1. Conditional Replacement Based on Value
backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
    if (columnInfo.TableName == "products" && columnInfo.ColumnName == "stock")
    {
        int stock = Convert.ToInt32(columnInfo.Value);
        return stock < 0 ? 0 : stock; // Replace negative stock with 0
    }
    return columnInfo.Value;
};
  1. External Lookup or Computation
backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
    if (columnInfo.TableName == "employees" && columnInfo.ColumnName == "department_id")
    {
        // Hypothetical external method to get department name
        string deptName = GetDepartmentNameFromId((int)columnInfo.Value);
        return deptName; // Replace ID with name
    }
    return columnInfo.Value;
};

static string GetDepartmentNameFromId(int id) => id switch
{
    1 => "HR",
    2 => "Engineering",
    _ => "Unknown"
};
  1. Complex Multi-Column Logic
backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
    if (columnInfo.TableName == "orders" && columnInfo.ColumnName == "total_price")
    {
        // Simulate applying a discount based on some condition
        decimal price = Convert.ToDecimal(columnInfo.Value);
        return price > 1000 ? price * 0.9m : price; // 10% discount for orders over $1000
    }
    return columnInfo.Value;
};

Key Points

  • Freedom: You can write any valid cs code inside the delegate, from simple replacements to complex computations or external calls.
  • Return Type: The only requirement is that the logic returns an object, which QueryExpress.ConvertToSqlFormat will then convert to a SQL-compatible string. This object should match the column’s expected data type to avoid runtime errors.
  • Scope: The logic applies to every column value in every row of every table being exported, so you typically use conditionals (e.g., if statements) to target specific cases.

Practical Considerations

  • Performance: Complex logic (e.g., database queries or heavy computations) could slow down the export process if applied to many rows/columns.
  • Error Handling: If your logic throws an exception, it could halt the export unless wrapped in a try-catch block (though the library itself doesn’t handle this for you).
  • Default Behavior: If you don’t set AdjustColumnValue, it defaults to (columnInfo) => columnInfo.Value, mimicking the old code’s behavior.

Real-World Use Cases: Why Developers Need to Transform Data During Export

1. Data Privacy and Security

  • Reason: To comply with data protection regulations (e.g., GDPR, HIPAA) or internal policies, sensitive information must be obscured before sharing the export with third parties (e.g., testers, analysts, or external partners).
  • Example: Masking personally identifiable information (PII) like names, emails, or Social Security numbers.
backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
    if (columnInfo.ColumnName == "email")
        return "[email protected]";
    return columnInfo.Value;
};

2. Testing and Development

  • Reason: When creating test environments, developers often need realistic but non-sensitive data. Modifying the export allows them to generate test-friendly datasets without altering production data.
  • Example: Replacing real customer names with generic placeholders.
backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
    if (columnInfo.TableName == "customers" && columnInfo.ColumnName == "name")
        return "TestUser" + new Random().Next(1000);
    return columnInfo.Value;
};

3. Data Formatting or Normalization

  • Reason: The exported data might need to conform to a different format or standard for compatibility with another system, reporting tool, or downstream process.
  • Example: Converting dates to a specific format or trimming strings.
backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
    if (columnInfo.MySqlDataType.StartsWith("DATETIME") && columnInfo.Value != null)
        return ((DateTime)columnInfo.Value).ToString("yyyy-MM-dd");
    return columnInfo.Value;
};

4. Data Correction or Cleanup

  • Reason: The database might contain inconsistent or erroneous data that’s acceptable in the live system but needs correction in the export for analysis or migration purposes.
  • Example: Replacing null or invalid values with defaults.
backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
    if (columnInfo.TableName == "products" && columnInfo.ColumnName == "price")
        return columnInfo.Value == null ? 0.0 : columnInfo.Value;
    return columnInfo.Value;
};

5. Migration to Another System

  • Reason: When exporting data for import into a different database or application, the data might need to be transformed to match the target schema or business rules (e.g., changing IDs, adjusting units, or mapping values).
  • Example: Converting a status code to a human-readable value for a system that doesn’t use codes.
backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
    if (columnInfo.TableName == "orders" && columnInfo.ColumnName == "status")
        return columnInfo.Value switch { 0 => "Pending", 1 => "Shipped", _ => "Unknown" };
    return columnInfo.Value;
};

6. Business Logic or Reporting Needs

  • Reason: The export might serve a specific business purpose (e.g., a report or audit) that requires derived or computed values rather than raw data.
  • Example: Calculating a discounted price for a report.
backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
    if (columnInfo.TableName == "products" && columnInfo.ColumnName == "price")
        return Convert.ToDouble(columnInfo.Value) * 0.9; // Apply 10% discount
    return columnInfo.Value;
};

Why Modify During Export Instead of Post-Processing?

  • Efficiency: Adjusting data during export eliminates the need for a separate post-processing step, saving time and reducing complexity.
  • Control: Developers can leverage the context provided by ColumnWithValue (table, column, type) to apply precise, context-aware transformations.
  • Integration: It’s built into the export workflow, ensuring consistency and avoiding external tooling dependencies.

Why Not Modify the Original Database?

  • Integrity: The database often serves as the source of truth for live applications, and altering it could disrupt operations, break dependencies, or violate data retention policies.
  • Reversibility: Export-time modifications are transient and don’t affect the source, making them safer and easier to experiment with.
  • Separation of Concerns: The database’s role is to store data, not to handle presentation or export-specific logic—that’s better suited to the export process.

Example: Handling Multiple Conditions for Different Tables and Columns

Let’s assume you have 9 specific table-column pairs where you want to apply custom transformations. Here’s how you could implement it:

Note: The following and the rest of C# example uses a lot tuple (multiple key) as the dictionary key. More on tuple with dictionary: https://makolyte.com/csharp-using-a-dictionary-with-tuples/

backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
    // Use a switch expression with tuples for cleaner multi-condition logic
    return (columnInfo.TableName, columnInfo.ColumnName) switch
    {
        ("employees", "department_id") => GetDepartmentNameFromId((int)columnInfo.Value),
        ("customers", "email") => MaskEmail(columnInfo.Value?.ToString()),
        ("products", "price") => ApplyDiscount((decimal)columnInfo.Value),
        ("orders", "status") => MapStatusCode((int)columnInfo.Value),
        ("users", "username") => $"user_{columnInfo.Value}", // Prefix usernames
        ("invoices", "amount") => Math.Round((decimal)columnInfo.Value, 2), // Round to 2 decimals
        ("inventory", "quantity") => columnInfo.Value == null ? 0 : columnInfo.Value, // Default null to 0
        ("projects", "start_date") => FormatDate((DateTime)columnInfo.Value),
        ("departments", "budget") => Convert.ToDouble(columnInfo.Value) * 1.1, // Increase by 10%
        _ => columnInfo.Value // Default: return original value
    };
};

// Helper methods for transformations
static string GetDepartmentNameFromId(int id) => id switch
{
    1 => "HR",
    2 => "Engineering",
    _ => "Unknown"
};

static string MaskEmail(string email) => email != null && email.Contains("@")
    ? email.Substring(0, 2) + "****" + email[^(email.Length - email.IndexOf('@'))..]
    : email;

static decimal ApplyDiscount(decimal price) => price > 100 ? price * 0.9m : price;

static string MapStatusCode(int code) => code switch { 0 => "Pending", 1 => "Shipped", _ => "Unknown" };

static string FormatDate(DateTime date) => date.ToString("yyyy-MM-dd");

Explanation

  1. Switch Expression with Tuples
  • Instead of multiple if statements, a switch expression uses tuples (TableName, ColumnName) to match conditions concisely.
  • Each case corresponds to one of the 9 table-column pairs, and the associated logic is applied.
  • The _ (default) case ensures all unmatched columns return their original values.
  1. Helper Methods:
  • Each transformation is encapsulated in a separate method (e.g., GetDepartmentNameFromId, MaskEmail) for readability and reusability.
  • This keeps the delegate body focused on condition matching rather than implementation details.
  1. Sample Conditions:
  • "employees"."department_id": Maps ID to department name.
  • "customers"."email": Masks email addresses.
  • "products"."price": Applies a discount.
  • "orders"."status": Converts status codes to text.
  • "users"."username": Adds a prefix.
  • "invoices"."amount": Rounds to 2 decimals.
  • "inventory"."quantity": Defaults null to 0.
  • "projects"."start_date": Formats dates.
  • "departments"."budget": Increases by 10%.

Scaling to Many Conditions: Alternative Techniques

If you have even more conditions (e.g., dozens or hundreds) or want a more maintainable approach, here are some techniques to handle complexity beyond a simple switch:

1. Dictionary-Based Lookup

Use a Dictionary to map table-column pairs to their transformation functions.

var transformations = new Dictionary<(string Table, string Column), Func<object, object>>
{
    { ("employees", "department_id"), v => GetDepartmentNameFromId((int)v) },
    { ("customers", "email"), v => MaskEmail(v?.ToString()) },
    { ("products", "price"), v => ApplyDiscount((decimal)v) },
    { ("orders", "status"), v => MapStatusCode((int)v) },
    { ("users", "username"), v => $"user_{v}" },
    { ("invoices", "amount"), v => Math.Round((decimal)v, 2) },
    { ("inventory", "quantity"), v => v ?? 0 },
    { ("projects", "start_date"), v => FormatDate((DateTime)v) },
    { ("departments", "budget"), v => Convert.ToDouble(v) * 1.1 }
};

backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
    var key = (columnInfo.TableName, columnInfo.ColumnName);
    return transformations.TryGetValue(key, out var transform) 
        ? transform(columnInfo.Value) 
        : columnInfo.Value;
};

Pros: Easy to add/remove conditions by modifying the dictionary; scales well for many rules.

Cons: Slightly more setup overhead; less readable for small sets of conditions.

2. Chain of Responsibility or Rule Engine

Define a list of rules, each checking applicability and applying a transformation.

class TransformationRule
{
    public string TableName { get; set; }
    public string ColumnName { get; set; }
    public Func<object, object> Transform { get; set; }
}

var rules = new List<TransformationRule>
{
    new() { TableName = "employees", ColumnName = "department_id", Transform = v => GetDepartmentNameFromId((int)v) },
    new() { TableName = "customers", ColumnName = "email", Transform = v => MaskEmail(v?.ToString()) },
    // Add more rules...
};

backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
    foreach (var rule in rules)
    {
        if (rule.TableName == columnInfo.TableName && rule.ColumnName == columnInfo.ColumnName)
            return rule.Transform(columnInfo.Value);
    }
    return columnInfo.Value;
};

Pros: Highly extensible; can prioritize rules or add complex conditions (e.g., based on value or type).

Cons: More verbose and potentially slower due to iteration.

3. Nested Switch or If-Else

For 9 conditions, a nested structure is still manageable, though less elegant:

backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
    switch (columnInfo.TableName)
    {
        case "employees":
            if (columnInfo.ColumnName == "department_id") return GetDepartmentNameFromId((int)columnInfo.Value);
            break;
        case "customers":
            if (columnInfo.ColumnName == "email") return MaskEmail(columnInfo.Value?.ToString());
            break;
        case "products":
            if (columnInfo.ColumnName == "price") return ApplyDiscount((decimal)columnInfo.Value);
            break;
        // Add more cases...
    }
    return columnInfo.Value;
};

Pros: Simple and explicit for small sets.

Cons: Becomes unwieldy with many conditions.

Recommendation

  • For 9 Conditions: The switch expression with tuples (first example) is ideal. It’s concise, readable, and performs well for a small, fixed number of rules.
  • For 20+ Conditions: Consider the dictionary-based approach for better scalability and maintainability.
  • For Dynamic Rules: Use a rule engine if conditions might change at runtime or involve complex logic.

Each method leverages the flexibility of AdjustColumnValue, allowing you to handle multiple conditions efficiently while keeping the code organized. Choose based on your specific needs for readability, performance, and future expansion!

Performance Impact

Assuming that we are now trying to handle a very large database:

  • Tables: 200
  • Rows per Table: 1,000,000
  • Columns per Table: 50
  • Total Rows: 200 × 1,000,000 = 200,000,000 rows
  • Total "Cells" (Column Values): 200,000,000 × 50 = 10,000,000,000 (10 billion)
  • Total Rules for Data Transformation: 20 rules

Each "cell" represents a single column value in a row, and AdjustColumnValue is called once per cell during the export process.

Does It Have to Loop Over All 20 Rules for Every Cell?

Not necessarily! The "very long loop" can be avoided or optimized depending on the technique used. Let’s evaluate the options from the previous response in this context:

Performance Impact Comparison:

Baseline (No Rules):

  • 10 billion delegate calls with a simple return columnInfo.Value (minimal overhead).

With 20 Rules:

Switch Expression with Tuples:

backup.ExportInfo.AdjustColumnValue = (columnInfo) => (columnInfo.TableName, columnInfo.ColumnName) switch
{
    ("table1", "col1") => Transform1(columnInfo.Value),
    ("table2", "col2") => Transform2(columnInfo.Value),
    // ... 18 more cases ...
    ("table20", "col20") => Transform20(columnInfo.Value),
    _ => columnInfo.Value
};
  • The switch expression jumps directly to the matching case (or default) using a compiled jump table or equivalent mechanism in the CLR.
  • 8 Adds ~10-20 billion lightweight comparisons—manageable on modern hardware (seconds to minutes extra).
  • Average case: ~1-2 comparisons per cell (depends on switch implementation, but it’s optimized by the compiler).
  • Total comparisons: ~10-20 billion (a massive improvement over 200 billion).
  • A single, efficient lookup per cell. Still, it’s evaluated for all 10 billion cells.

Dictionary-Based Lookup

var transformations = new Dictionary<(string, string), Func<object, object>>
{
    { ("table1", "col1"), v => Transform1(v) },
    { ("table2", "col2"), v => Transform2(v) },
    // ... 18 more entries ...
    { ("table20", "col20"), v => Transform20(v) }
};

backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
    var key = (columnInfo.TableName, columnInfo.ColumnName);
    return transformations.TryGetValue(key, out var transform) ? transform(columnInfo.Value) : columnInfo.Value;
};
  • Uses a hash table lookup to find a matching rule.
  • Adds ~10 billion hash lookups—slightly more costly but still fast (likely under a minute extra).
  • Hash lookup: O(1) time per cell (~1 operation, though slightly more costly than a simple comparison due to hashing).
  • Total operations: ~10 billion hash lookups.
  • Delegate invocation: Only for the 20 matching table-column pairs (negligible).
  • No loop over 20 rules—each cell requires one hash check, making it highly efficient. The dictionary is checked for all 10 billion cells, but it’s fast.

List Loop:

var rules = new List<(string Table, string Column, Func<object, object> Transform)>
{
    ("table1", "col1", v => Transform1(v)),
    ("table2", "col2", v => Transform2(v)),
    // ... 18 more ...
    ("table20", "col20", v => Transform20(v))
};

backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
    foreach (var rule in rules)
    {
        if (rule.Table == columnInfo.TableName && rule.Column == columnInfo.ColumnName)
            return rule.Transform(columnInfo.Value);
    }
    return columnInfo.Value;
};
  • Iterates over the list of 20 rules until a match is found or all are checked.
  • Average case: ~10 comparisons per cell (half the list on average if no match).
  • Worst case: 20 comparisons per cell.
  • Total comparisons: ~100-200 billion.
  • This does involve a loop over all 20 rules for every cell, making it the least efficient option for large datasets.
  • Adds ~100-200 billion comparisons—significant overhead (could add hours).

Future Improvement

To-do:

  • By-pass AdjustColumnValue with a cached runtime boolean variable when it is not applicable to the table.
Clone this wiki locally