-
Notifications
You must be signed in to change notification settings - Fork 107
Adjust Column Value
Disclaimer:
To expedite this guide, sample code was generated using Grok AI. Some examples are untested and may contain bugs or inaccuracies. They are provided for explanatory purposes to illustrate concepts, not as production-ready solutions.
- March 13, 2025.
This is considered an advanced feature in MySqlBackup.NET, introduced in v2.3.9 (March 12, 2025). It allows users to dynamically adjust column values during export.
This is a follow up update on a pull request performed by Tyler Pflueger (tpflueger) https://github.com/tpflueger
A special thank for code contribution of Tyler Pflueger.
For more information on the code changes on this release, please visit the pull/merge at: https://github.com/MySqlBackupNET/MySqlBackup.Net/pull/113
The AdjustColumnValue
delegate in ExportInformations
lets you transform column values during a MySQL database export without altering the source data. Whether you need to format numbers, mask sensitive information, or apply custom logic, this feature gives you full control over the exported SQL output.
The new added info object (class):
namespace MySqlConnector.InfoObjects
{
public class ColumnWithValue
{
public string TableName { get; set; }
public string ColumnName { get; set; }
public string MySqlDataType { get; set; }
public object Value { get; set; }
}
}
Which will be used with ExportInfo
backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
// perform custom logic
return columnInfo.Value;
};
Simple Example: Padding Numbers
Suppose you have a result table where the result column contains inconsistent numeric strings or NULL values, and you want to standardize them to 6-digit strings with leading zeros in the export:
Original Data:
INSERT INTO `result` (id, code, result) values
(1, 'A0001', null),
(2, 'B0003', '1'),
(3, 'C0003', '803');
With AdjustColumnValue, you can transform it like this:
backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
if (columnInfo.TableName == "result" && columnInfo.ColumnName == "result")
{
return columnInfo.Value == null ? "000000" : columnInfo.Value.ToString().PadLeft(6, '0');
}
return columnInfo.Value;
};
Exported Result:
INSERT INTO `result` (id, code, result) values
(1, 'A0001', '000000'),
(2, 'B0003', '000001'),
(3, 'C0003', '000803');
This simple tweak ensures consistent formatting for reporting or downstream systems. But that’s just the start—AdjustColumnValue supports any logic you can imagine, from data masking to complex lookups, all driven by the rich context of the ColumnWithValue object (table name, column name, data type, and value). Dive in to see how it can fit your needs!
Here's the full code example:
using (var conn = new MySqlConnection("your_connection_string"))
{
conn.Open();
using (var cmd = new MySqlCommand { Connection = conn })
{
var backup = new MySqlBackup(cmd);
backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
if (columnInfo.TableName == "result" && columnInfo.ColumnName == "result")
{
return columnInfo.Value == null ? "000000" : columnInfo.Value.ToString().PadLeft(6, '0');
}
return columnInfo.Value;
};
backup.ExportInfo.TablesToBeExportedList = new List<string> { "result" };
backup.ExportToFile("modified_database_dump.sql");
}
}
Scenario: A company needs to export a MySQL database for testing purposes but must protect sensitive customer information, such as email addresses, by masking them in the exported SQL dump. The original data should remain unchanged in the database, but the exported file should obscure the emails.
Code Example
using MySqlConnector;
using MySqlConnector.InfoObjects;
using System;
class Program
{
static void Main()
{
// Initialize MySqlCommand (assume connection is set up)
MySqlCommand cmd = new MySqlCommand();
cmd.Connection = new MySqlConnection("your_connection_string");
cmd.Connection.Open();
// Create MySqlBackup instance
var backup = new MySqlBackup(cmd);
// Configure ExportInformations with AdjustColumnValue
backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
// Check if the column is an email field
if (columnInfo.ColumnName.Equals("email", StringComparison.OrdinalIgnoreCase))
{
// Mask the email address (e.g., "[email protected]" -> "jo****@e*****.com")
string originalEmail = columnInfo.Value?.ToString();
if (!string.IsNullOrEmpty(originalEmail))
{
int atIndex = originalEmail.IndexOf('@');
if (atIndex > 2)
{
string masked = originalEmail.Substring(0, 2) + "****" +
"@" + originalEmail[(atIndex + 1)..(atIndex + 2)] + "*****.com";
return masked;
}
}
return originalEmail; // Return unchanged if masking fails
}
// Return original value for all other columns
return columnInfo.Value;
};
// Export to a file
backup.ExportToFile("masked_database_dump.sql");
cmd.Connection.Close();
}
}
Explanation
-
Custom Logic: The
AdjustColumnValue
delegate is set to inspect each column value during export. - If the column is named "email" (case-insensitive), it masks the email address by keeping the first two characters, replacing the rest of the local part with "****", and partially masking the domain.
- For all other columns, it returns the original value unchanged.
-
Result: The exported SQL file (masked_database_dump.sql) contains masked email addresses (e.g.,
INSERT INTO customers VALUES (1, 'jo****@e*****.com', ...))
, while the original database remains intact.
Sample Output in Exported SQL
INSERT INTO `customers` (`id`, `email`, `name`) VALUES
(1, 'jo****@e*****.com', 'John Doe'),
(2, 'ja****@e*****.com', 'Jane Smith');
Use Case Benefits
- Data Privacy: Sensitive information is obscured in the export, making it safe for sharing with developers or testers.
- Flexibility: The masking logic can be customized further (e.g., hashing, replacing with random data) without altering the MySqlBackup library.
- Non-Invasive: The original database data isn’t modified, only the exported representation.
The AdjustColumnValue
delegate in ExportInformations
acts as a customizable "middle-in-between" block where users can inject any logic they want to transform column values during the export process. Since it’s a Func<ColumnWithValue, object>
, it takes a ColumnWithValue
object as input (providing table name, column name, data type, and value) and returns an object that will be used in the exported SQL. This gives users full control to implement whatever transformation or logic they need.
To illustrate the flexibility, here are a few different examples of what you could do inside that block:
- Redact Specific Columns
backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
if (columnInfo.TableName == "users" && columnInfo.ColumnName == "ssn")
{
return "XXX-XX-XXXX"; // Mask Social Security Numbers
}
return columnInfo.Value; // Keep other values unchanged
};
- Convert Data Formats
backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
if (columnInfo.MySqlDataType.StartsWith("DATETIME") && columnInfo.Value != null)
{
// Reformat datetime to a custom string
return ((DateTime)columnInfo.Value).ToString("yyyy-MM-dd");
}
return columnInfo.Value;
};
- Conditional Replacement Based on Value
backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
if (columnInfo.TableName == "products" && columnInfo.ColumnName == "stock")
{
int stock = Convert.ToInt32(columnInfo.Value);
return stock < 0 ? 0 : stock; // Replace negative stock with 0
}
return columnInfo.Value;
};
- External Lookup or Computation
backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
if (columnInfo.TableName == "employees" && columnInfo.ColumnName == "department_id")
{
// Hypothetical external method to get department name
string deptName = GetDepartmentNameFromId((int)columnInfo.Value);
return deptName; // Replace ID with name
}
return columnInfo.Value;
};
static string GetDepartmentNameFromId(int id) => id switch
{
1 => "HR",
2 => "Engineering",
_ => "Unknown"
};
- Complex Multi-Column Logic
backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
if (columnInfo.TableName == "orders" && columnInfo.ColumnName == "total_price")
{
// Simulate applying a discount based on some condition
decimal price = Convert.ToDecimal(columnInfo.Value);
return price > 1000 ? price * 0.9m : price; // 10% discount for orders over $1000
}
return columnInfo.Value;
};
Key Points
- Freedom: You can write any valid cs code inside the delegate, from simple replacements to complex computations or external calls.
-
Return Type: The only requirement is that the logic returns an object, which
QueryExpress.ConvertToSqlFormat
will then convert to a SQL-compatible string. This object should match the column’s expected data type to avoid runtime errors. - Scope: The logic applies to every column value in every row of every table being exported, so you typically use conditionals (e.g., if statements) to target specific cases.
Practical Considerations
- Performance: Complex logic (e.g., database queries or heavy computations) could slow down the export process if applied to many rows/columns.
- Error Handling: If your logic throws an exception, it could halt the export unless wrapped in a try-catch block (though the library itself doesn’t handle this for you).
-
Default Behavior: If you don’t set
AdjustColumnValue
, it defaults to(columnInfo) => columnInfo.Value
, mimicking the old code’s behavior.
- Reason: To comply with data protection regulations (e.g., GDPR, HIPAA) or internal policies, sensitive information must be obscured before sharing the export with third parties (e.g., testers, analysts, or external partners).
- Example: Masking personally identifiable information (PII) like names, emails, or Social Security numbers.
backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
if (columnInfo.ColumnName == "email")
return "[email protected]";
return columnInfo.Value;
};
- Reason: When creating test environments, developers often need realistic but non-sensitive data. Modifying the export allows them to generate test-friendly datasets without altering production data.
- Example: Replacing real customer names with generic placeholders.
backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
if (columnInfo.TableName == "customers" && columnInfo.ColumnName == "name")
return "TestUser" + new Random().Next(1000);
return columnInfo.Value;
};
- Reason: The exported data might need to conform to a different format or standard for compatibility with another system, reporting tool, or downstream process.
- Example: Converting dates to a specific format or trimming strings.
backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
if (columnInfo.MySqlDataType.StartsWith("DATETIME") && columnInfo.Value != null)
return ((DateTime)columnInfo.Value).ToString("yyyy-MM-dd");
return columnInfo.Value;
};
- Reason: The database might contain inconsistent or erroneous data that’s acceptable in the live system but needs correction in the export for analysis or migration purposes.
- Example: Replacing null or invalid values with defaults.
backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
if (columnInfo.TableName == "products" && columnInfo.ColumnName == "price")
return columnInfo.Value == null ? 0.0 : columnInfo.Value;
return columnInfo.Value;
};
- Reason: When exporting data for import into a different database or application, the data might need to be transformed to match the target schema or business rules (e.g., changing IDs, adjusting units, or mapping values).
- Example: Converting a status code to a human-readable value for a system that doesn’t use codes.
backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
if (columnInfo.TableName == "orders" && columnInfo.ColumnName == "status")
return columnInfo.Value switch { 0 => "Pending", 1 => "Shipped", _ => "Unknown" };
return columnInfo.Value;
};
- Reason: The export might serve a specific business purpose (e.g., a report or audit) that requires derived or computed values rather than raw data.
- Example: Calculating a discounted price for a report.
backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
if (columnInfo.TableName == "products" && columnInfo.ColumnName == "price")
return Convert.ToDouble(columnInfo.Value) * 0.9; // Apply 10% discount
return columnInfo.Value;
};
- Efficiency: Adjusting data during export eliminates the need for a separate post-processing step, saving time and reducing complexity.
- Control: Developers can leverage the context provided by
ColumnWithValue
(table, column, type) to apply precise, context-aware transformations. - Integration: It’s built into the export workflow, ensuring consistency and avoiding external tooling dependencies.
- Integrity: The database often serves as the source of truth for live applications, and altering it could disrupt operations, break dependencies, or violate data retention policies.
- Reversibility: Export-time modifications are transient and don’t affect the source, making them safer and easier to experiment with.
- Separation of Concerns: The database’s role is to store data, not to handle presentation or export-specific logic—that’s better suited to the export process.
Let’s assume you have 9 specific table-column pairs where you want to apply custom transformations. Here’s how you could implement it:
Note: The following and the rest of C# example uses a lot tuple (multiple key) as the dictionary key. More on tuple with dictionary: https://makolyte.com/csharp-using-a-dictionary-with-tuples/
backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
// Use a switch expression with tuples for cleaner multi-condition logic
return (columnInfo.TableName, columnInfo.ColumnName) switch
{
("employees", "department_id") => GetDepartmentNameFromId((int)columnInfo.Value),
("customers", "email") => MaskEmail(columnInfo.Value?.ToString()),
("products", "price") => ApplyDiscount((decimal)columnInfo.Value),
("orders", "status") => MapStatusCode((int)columnInfo.Value),
("users", "username") => $"user_{columnInfo.Value}", // Prefix usernames
("invoices", "amount") => Math.Round((decimal)columnInfo.Value, 2), // Round to 2 decimals
("inventory", "quantity") => columnInfo.Value == null ? 0 : columnInfo.Value, // Default null to 0
("projects", "start_date") => FormatDate((DateTime)columnInfo.Value),
("departments", "budget") => Convert.ToDouble(columnInfo.Value) * 1.1, // Increase by 10%
_ => columnInfo.Value // Default: return original value
};
};
// Helper methods for transformations
static string GetDepartmentNameFromId(int id) => id switch
{
1 => "HR",
2 => "Engineering",
_ => "Unknown"
};
static string MaskEmail(string email) => email != null && email.Contains("@")
? email.Substring(0, 2) + "****" + email[^(email.Length - email.IndexOf('@'))..]
: email;
static decimal ApplyDiscount(decimal price) => price > 100 ? price * 0.9m : price;
static string MapStatusCode(int code) => code switch { 0 => "Pending", 1 => "Shipped", _ => "Unknown" };
static string FormatDate(DateTime date) => date.ToString("yyyy-MM-dd");
Explanation
- Switch Expression with Tuples
- Instead of multiple if statements, a switch expression uses tuples (TableName, ColumnName) to match conditions concisely.
- Each case corresponds to one of the 9 table-column pairs, and the associated logic is applied.
- The _ (default) case ensures all unmatched columns return their original values.
- Helper Methods:
- Each transformation is encapsulated in a separate method (e.g., GetDepartmentNameFromId, MaskEmail) for readability and reusability.
- This keeps the delegate body focused on condition matching rather than implementation details.
- Sample Conditions:
- "employees"."department_id": Maps ID to department name.
- "customers"."email": Masks email addresses.
- "products"."price": Applies a discount.
- "orders"."status": Converts status codes to text.
- "users"."username": Adds a prefix.
- "invoices"."amount": Rounds to 2 decimals.
- "inventory"."quantity": Defaults null to 0.
- "projects"."start_date": Formats dates.
- "departments"."budget": Increases by 10%.
If you have even more conditions (e.g., dozens or hundreds) or want a more maintainable approach, here are some techniques to handle complexity beyond a simple switch:
Use a Dictionary to map table-column pairs to their transformation functions.
var transformations = new Dictionary<(string Table, string Column), Func<object, object>>
{
{ ("employees", "department_id"), v => GetDepartmentNameFromId((int)v) },
{ ("customers", "email"), v => MaskEmail(v?.ToString()) },
{ ("products", "price"), v => ApplyDiscount((decimal)v) },
{ ("orders", "status"), v => MapStatusCode((int)v) },
{ ("users", "username"), v => $"user_{v}" },
{ ("invoices", "amount"), v => Math.Round((decimal)v, 2) },
{ ("inventory", "quantity"), v => v ?? 0 },
{ ("projects", "start_date"), v => FormatDate((DateTime)v) },
{ ("departments", "budget"), v => Convert.ToDouble(v) * 1.1 }
};
backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
var key = (columnInfo.TableName, columnInfo.ColumnName);
return transformations.TryGetValue(key, out var transform)
? transform(columnInfo.Value)
: columnInfo.Value;
};
Pros: Easy to add/remove conditions by modifying the dictionary; scales well for many rules.
Cons: Slightly more setup overhead; less readable for small sets of conditions.
Define a list of rules, each checking applicability and applying a transformation.
class TransformationRule
{
public string TableName { get; set; }
public string ColumnName { get; set; }
public Func<object, object> Transform { get; set; }
}
var rules = new List<TransformationRule>
{
new() { TableName = "employees", ColumnName = "department_id", Transform = v => GetDepartmentNameFromId((int)v) },
new() { TableName = "customers", ColumnName = "email", Transform = v => MaskEmail(v?.ToString()) },
// Add more rules...
};
backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
foreach (var rule in rules)
{
if (rule.TableName == columnInfo.TableName && rule.ColumnName == columnInfo.ColumnName)
return rule.Transform(columnInfo.Value);
}
return columnInfo.Value;
};
Pros: Highly extensible; can prioritize rules or add complex conditions (e.g., based on value or type).
Cons: More verbose and potentially slower due to iteration.
For 9 conditions, a nested structure is still manageable, though less elegant:
backup.ExportInfo.AdjustColumnValue = (ColumnWithValue columnInfo) =>
{
switch (columnInfo.TableName)
{
case "employees":
if (columnInfo.ColumnName == "department_id") return GetDepartmentNameFromId((int)columnInfo.Value);
break;
case "customers":
if (columnInfo.ColumnName == "email") return MaskEmail(columnInfo.Value?.ToString());
break;
case "products":
if (columnInfo.ColumnName == "price") return ApplyDiscount((decimal)columnInfo.Value);
break;
// Add more cases...
}
return columnInfo.Value;
};
Pros: Simple and explicit for small sets.
Cons: Becomes unwieldy with many conditions.
- For 9 Conditions: The switch expression with tuples (first example) is ideal. It’s concise, readable, and performs well for a small, fixed number of rules.
- For 20+ Conditions: Consider the dictionary-based approach for better scalability and maintainability.
- For Dynamic Rules: Use a rule engine if conditions might change at runtime or involve complex logic.
Each method leverages the flexibility of AdjustColumnValue, allowing you to handle multiple conditions efficiently while keeping the code organized. Choose based on your specific needs for readability, performance, and future expansion!
Assuming that we are now trying to handle a very large database:
- Tables: 200
- Rows per Table: 1,000,000
- Columns per Table: 50
- Total Rows: 200 × 1,000,000 = 200,000,000 rows
- Total "Cells" (Column Values): 200,000,000 × 50 = 10,000,000,000 (10 billion)
- Total Rules for Data Transformation: 20 rules
Each "cell" represents a single column value in a row, and AdjustColumnValue
is called once per cell during the export process.
Does It Have to Loop Over All 20 Rules for Every Cell?
Not necessarily! The "very long loop" can be avoided or optimized depending on the technique used. Let’s evaluate the options from the previous response in this context:
Performance Impact Comparison:
- 10 billion delegate calls with a simple return columnInfo.Value (minimal overhead).
With 20 Rules:
backup.ExportInfo.AdjustColumnValue = (columnInfo) => (columnInfo.TableName, columnInfo.ColumnName) switch
{
("table1", "col1") => Transform1(columnInfo.Value),
("table2", "col2") => Transform2(columnInfo.Value),
// ... 18 more cases ...
("table20", "col20") => Transform20(columnInfo.Value),
_ => columnInfo.Value
};
- The switch expression jumps directly to the matching case (or default) using a compiled jump table or equivalent mechanism in the CLR.
- 8 Adds ~10-20 billion lightweight comparisons—manageable on modern hardware (seconds to minutes extra).
- Average case: ~1-2 comparisons per cell (depends on switch implementation, but it’s optimized by the compiler).
- Total comparisons: ~10-20 billion (a massive improvement over 200 billion).
- A single, efficient lookup per cell. Still, it’s evaluated for all 10 billion cells.
var transformations = new Dictionary<(string, string), Func<object, object>>
{
{ ("table1", "col1"), v => Transform1(v) },
{ ("table2", "col2"), v => Transform2(v) },
// ... 18 more entries ...
{ ("table20", "col20"), v => Transform20(v) }
};
backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
var key = (columnInfo.TableName, columnInfo.ColumnName);
return transformations.TryGetValue(key, out var transform) ? transform(columnInfo.Value) : columnInfo.Value;
};
- Uses a hash table lookup to find a matching rule.
- Adds ~10 billion hash lookups—slightly more costly but still fast (likely under a minute extra).
- Hash lookup: O(1) time per cell (~1 operation, though slightly more costly than a simple comparison due to hashing).
- Total operations: ~10 billion hash lookups.
- Delegate invocation: Only for the 20 matching table-column pairs (negligible).
- No loop over 20 rules—each cell requires one hash check, making it highly efficient. The dictionary is checked for all 10 billion cells, but it’s fast.
var rules = new List<(string Table, string Column, Func<object, object> Transform)>
{
("table1", "col1", v => Transform1(v)),
("table2", "col2", v => Transform2(v)),
// ... 18 more ...
("table20", "col20", v => Transform20(v))
};
backup.ExportInfo.AdjustColumnValue = (columnInfo) =>
{
foreach (var rule in rules)
{
if (rule.Table == columnInfo.TableName && rule.Column == columnInfo.ColumnName)
return rule.Transform(columnInfo.Value);
}
return columnInfo.Value;
};
- Iterates over the list of 20 rules until a match is found or all are checked.
- Average case: ~10 comparisons per cell (half the list on average if no match).
- Worst case: 20 comparisons per cell.
- Total comparisons: ~100-200 billion.
- This does involve a loop over all 20 rules for every cell, making it the least efficient option for large datasets.
- Adds ~100-200 billion comparisons—significant overhead (could add hours).
To-do:
- By-pass
AdjustColumnValue
with a cached runtime boolean variable when it is not applicable to the table.