Skip to content

Commit d2b9084

Browse files
authored
Add --apply-global option for CSV validation (#148)
This update introduces a new 'apply-global' option in the CSV validation command. This flag allows global schemas, those without a 'filename_pattern', to be applied to all CSV files being analyzed. This addition increases flexibility and efficiency in the file validation process.
1 parent 80191ad commit d2b9084

6 files changed

+120
-59
lines changed

README.md

+50-45
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,11 @@ You can find launch examples in the [workflow demo](https://github.com/JBZoo/Csv
145145
# Required: true
146146
report: 'table'
147147

148+
# Apply global schemas (without `filename_pattern`) to all CSV files found.
149+
# Default value: 'no'
150+
# Required: true
151+
apply-global: 'no'
152+
148153
# Quick mode. It will not validate all rows. It will stop after the first error.
149154
# Default value: 'no'
150155
# Required: true
@@ -1430,49 +1435,50 @@ Usage:
14301435
validate:csv [options]
14311436

14321437
Options:
1433-
-c, --csv=CSV Specify the path(s) to the CSV files you want to validate.
1434-
This can include a direct path to a file or a directory to search with a maximum depth of 10 levels.
1435-
Examples: /full/path/name.csv; p/file.csv; p/*.csv; p/**/*.csv; p/**/name-*.csv; **/*.csv
1436-
(multiple values allowed)
1437-
-s, --schema=SCHEMA Specify the path(s) to the schema file(s), supporting YAML, JSON, or PHP formats.
1438-
Similar to CSV paths, you can direct to specific files or search directories with glob patterns.
1439-
Examples: /full/path/name.yml; p/file.yml; p/*.yml; p/**/*.yml; p/**/name-*.yml; **/*.yml
1440-
(multiple values allowed)
1441-
-S, --skip-schema[=SKIP-SCHEMA] Skips schema validation for quicker checks when the schema's correctness is certain.
1442-
Use any non-empty value or "yes" to activate
1443-
[default: "no"]
1444-
-r, --report=REPORT Determines the report's output format.
1445-
Available options: text, table, github, gitlab, teamcity, junit
1446-
[default: "table"]
1447-
-Q, --quick[=QUICK] Stops the validation process upon encountering the first error,
1448-
accelerating the check but limiting error visibility.
1449-
Returns a non-zero exit code if any error is detected.
1450-
Enable by setting to any non-empty value or "yes".
1451-
[default: "no"]
1452-
--dump-schema Dumps the schema of the CSV file if you want to see the final schema after inheritance.
1453-
--debug Intended solely for debugging and advanced profiling purposes.
1454-
Activating this option provides detailed process insights,
1455-
useful for troubleshooting and performance analysis.
1456-
--no-progress Disable progress bar animation for logs. It will be used only for text output format.
1457-
--mute-errors Mute any sort of errors. So exit code will be always "0" (if it's possible).
1458-
It has major priority then --non-zero-on-error. It's on your own risk!
1459-
--stdout-only For any errors messages application will use StdOut instead of StdErr. It's on your own risk!
1460-
--non-zero-on-error None-zero exit code on any StdErr message.
1461-
--timestamp Show timestamp at the beginning of each message.It will be used only for text output format.
1462-
--profile Display timing and memory usage information.
1463-
--output-mode=OUTPUT-MODE Output format. Available options:
1464-
text - Default text output format, userfriendly and easy to read.
1465-
cron - Shortcut for crontab. It's basically focused on human-readable logs output.
1466-
It's combination of --timestamp --profile --stdout-only --no-progress -vv.
1467-
logstash - Logstash output format, for integration with ELK stack.
1468-
[default: "text"]
1469-
--cron Alias for --output-mode=cron. Deprecated!
1470-
-h, --help Display help for the given command. When no command is given display help for the list command
1471-
-q, --quiet Do not output any message
1472-
-V, --version Display this application version
1473-
--ansi|--no-ansi Force (or disable --no-ansi) ANSI output
1474-
-n, --no-interaction Do not ask any interactive question
1475-
-v|vv|vvv, --verbose Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug
1438+
-c, --csv=CSV Specify the path(s) to the CSV files you want to validate.
1439+
This can include a direct path to a file or a directory to search with a maximum depth of 10 levels.
1440+
Examples: /full/path/name.csv; p/file.csv; p/*.csv; p/**/*.csv; p/**/name-*.csv; **/*.csv
1441+
(multiple values allowed)
1442+
-s, --schema=SCHEMA Specify the path(s) to the schema file(s), supporting YAML, JSON, or PHP formats.
1443+
Similar to CSV paths, you can direct to specific files or search directories with glob patterns.
1444+
Examples: /full/path/name.yml; p/file.yml; p/*.yml; p/**/*.yml; p/**/name-*.yml; **/*.yml
1445+
(multiple values allowed)
1446+
-S, --skip-schema[=SKIP-SCHEMA] Skips schema validation for quicker checks when the schema's correctness is certain.
1447+
Use any non-empty value or "yes" to activate
1448+
[default: "no"]
1449+
-G, --apply-global[=APPLY-GLOBAL] Apply global schemas (without `filename_pattern`) to all CSV files found. [default: "no"]
1450+
-r, --report=REPORT Determines the report's output format.
1451+
Available options: text, table, github, gitlab, teamcity, junit
1452+
[default: "table"]
1453+
-Q, --quick[=QUICK] Stops the validation process upon encountering the first error,
1454+
accelerating the check but limiting error visibility.
1455+
Returns a non-zero exit code if any error is detected.
1456+
Enable by setting to any non-empty value or "yes".
1457+
[default: "no"]
1458+
--dump-schema Dumps the schema of the CSV file if you want to see the final schema after inheritance.
1459+
--debug Intended solely for debugging and advanced profiling purposes.
1460+
Activating this option provides detailed process insights,
1461+
useful for troubleshooting and performance analysis.
1462+
--no-progress Disable progress bar animation for logs. It will be used only for text output format.
1463+
--mute-errors Mute any sort of errors. So exit code will be always "0" (if it's possible).
1464+
It has major priority then --non-zero-on-error. It's on your own risk!
1465+
--stdout-only For any errors messages application will use StdOut instead of StdErr. It's on your own risk!
1466+
--non-zero-on-error None-zero exit code on any StdErr message.
1467+
--timestamp Show timestamp at the beginning of each message.It will be used only for text output format.
1468+
--profile Display timing and memory usage information.
1469+
--output-mode=OUTPUT-MODE Output format. Available options:
1470+
text - Default text output format, userfriendly and easy to read.
1471+
cron - Shortcut for crontab. It's basically focused on human-readable logs output.
1472+
It's combination of --timestamp --profile --stdout-only --no-progress -vv.
1473+
logstash - Logstash output format, for integration with ELK stack.
1474+
[default: "text"]
1475+
--cron Alias for --output-mode=cron. Deprecated!
1476+
-h, --help Display help for the given command. When no command is given display help for the list command
1477+
-q, --quiet Do not output any message
1478+
-V, --version Display this application version
1479+
--ansi|--no-ansi Force (or disable --no-ansi) ANSI output
1480+
-n, --no-interaction Do not ask any interactive question
1481+
-v|vv|vvv, --verbose Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug
14761482
```
14771483
<!-- auto-update:/validate-csv-help -->
14781484
@@ -1873,7 +1879,6 @@ It's random ideas and plans. No promises and deadlines. Feel free to [help me!](
18731879

18741880
* **Batch processing**
18751881
* If option `--csv` is not specified, then the STDIN is used. To build a pipeline in Unix-like systems.
1876-
* Flag to ignore file name pattern. It's useful when you have a lot of files, and you don't want to validate the file name.
18771882

18781883
* **Validation**
18791884
* Multi `filename_pattern`. Support list of regexs.
@@ -1914,11 +1919,11 @@ It's random ideas and plans. No promises and deadlines. Feel free to [help me!](
19141919
* Warnings about deprecated options and features.
19151920
* Add option `--recomendation` to show a list of recommended rules for the schema or potential issues in the CSV file or schema. It's useful when you are not sure what rules to use.
19161921
* Add option `--error=[level]` to show only errors with a specific level. It's useful when you have a lot of warnings and you want to see only errors.
1917-
* S3 Storage support. Validate files in the S3 bucket? Hmm... Why not? But...
19181922
* More examples and documentation.
19191923

19201924
PS. [There is a file](tests/schemas/todo.yml) with my ideas and imagination. It's not valid schema file, just a draft.
19211925
I'm not sure if I will implement all of them. But I will try to do my best.
1926+
19221927
</details>
19231928

19241929
## Contributing

action.yml

+6
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,10 @@ inputs:
3535
description: 'Report format. Available options: text, table, github, gitlab, teamcity, junit.'
3636
default: table
3737
required: true
38+
apply-global:
39+
description: 'Apply global schemas (without `filename_pattern`) to all CSV files found.'
40+
default: no
41+
required: true
3842
quick:
3943
description: 'Quick mode. It will not validate all rows. It will stop after the first error.'
4044
default: no
@@ -67,6 +71,8 @@ runs:
6771
- ${{ inputs.schema }}
6872
- '--report'
6973
- ${{ inputs.report }}
74+
- '--apply-global'
75+
- ${{ inputs.apply-global }}
7076
- '--quick'
7177
- ${{ inputs.quick }}
7278
- '--skip-schema'

src/Commands/ValidateCsv.php

+14-1
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,13 @@ protected function configure(): void
8383
'',
8484
]),
8585
'no',
86+
)
87+
->addOption(
88+
'apply-global',
89+
'G',
90+
InputOption::VALUE_OPTIONAL,
91+
'Apply global schemas (without `filename_pattern`) to all CSV files found.',
92+
'no',
8693
);
8794

8895
parent::configure();
@@ -94,7 +101,7 @@ protected function executeAction(): int
94101

95102
$csvFilenames = $this->findFiles('csv', false);
96103
$schemaFilenames = $this->findFiles('schema', false);
97-
$matchedFiles = Utils::matchSchemaAndCsvFiles($csvFilenames, $schemaFilenames);
104+
$matchedFiles = Utils::matchSchemaAndCsvFiles($csvFilenames, $schemaFilenames, $this->isApplyGlobal());
98105

99106
$this->printHeaderInfo($csvFilenames, $schemaFilenames, $matchedFiles);
100107

@@ -112,6 +119,12 @@ protected function executeAction(): int
112119
);
113120
}
114121

122+
protected function isApplyGlobal(): bool
123+
{
124+
$value = $this->getOptString('apply-global');
125+
return $value === '' || bool($value);
126+
}
127+
115128
private function isCheckingSchema(): bool
116129
{
117130
$value = $this->getOptString('skip-schema');

tests/Commands/ValidateCsvBasicTest.php

+3-2
Original file line numberDiff line numberDiff line change
@@ -250,8 +250,9 @@ public function testInvalidSchemaAndNotFoundCSV(): void
250250
public function testValidateOneCsvNoHeaderNegative(): void
251251
{
252252
[$actual, $exitCode] = Tools::virtualExecution('validate:csv', [
253-
'csv' => Tools::DEMO_CSV,
254-
'schema' => './tests/schemas/simple_no_header.yml',
253+
'csv' => Tools::DEMO_CSV,
254+
'schema' => './tests/schemas/simple_no_header.yml',
255+
'apply-global' => 'yes',
255256
]);
256257

257258
$expected = <<<'TXT'

tests/Commands/ValidateCsvBatchSchemaTest.php

+40-5
Original file line numberDiff line numberDiff line change
@@ -103,14 +103,49 @@ public function testMultiSchemaDiscovery(): void
103103
isSame($expected, $actual);
104104
}
105105

106-
public function testNoPattern(): void
106+
public function testNoPatternNoApplyGlobal(): void
107107
{
108108
$optionsAsString = Tools::arrayToOptionString([
109109
'csv' => './tests/fixtures/demo.csv',
110-
'schema' => [
111-
Tools::DEMO_YML_VALID,
112-
'./tests/schemas/demo_invalid_no_pattern.yml',
113-
],
110+
'schema' => [Tools::DEMO_YML_VALID, './tests/schemas/demo_invalid_no_pattern.yml'],
111+
]);
112+
113+
[$actual, $exitCode] = Tools::virtualExecution('validate:csv', $optionsAsString);
114+
115+
$expected = <<<'TXT'
116+
CSV Blueprint: Unknown version (PhpUnit)
117+
Found Schemas : 2
118+
Found CSV files : 1
119+
Pairs by pattern: 1
120+
121+
Check schema syntax: 2
122+
(1/2) OK ./tests/schemas/demo_invalid_no_pattern.yml
123+
(2/2) OK ./tests/schemas/demo_valid.yml
124+
125+
CSV file validation: 1
126+
Schema: ./tests/schemas/demo_valid.yml
127+
OK ./tests/fixtures/demo.csv; Size: 123.34 MB
128+
129+
Summary:
130+
1 pairs (schema to csv) were found based on `filename_pattern`.
131+
No issues in 2 schemas.
132+
No issues in 1 CSV files.
133+
Not used schemas:
134+
* ./tests/schemas/demo_invalid_no_pattern.yml
135+
136+
137+
TXT;
138+
139+
isSame(1, $exitCode, $actual);
140+
isSame($expected, $actual);
141+
}
142+
143+
public function testNoPatternApplyGlobal(): void
144+
{
145+
$optionsAsString = Tools::arrayToOptionString([
146+
'csv' => './tests/fixtures/demo.csv',
147+
'schema' => [Tools::DEMO_YML_VALID, './tests/schemas/demo_invalid_no_pattern.yml'],
148+
'apply-global' => 'yes',
114149
]);
115150

116151
[$actual, $exitCode] = Tools::virtualExecution('validate:csv', $optionsAsString);

tests/GithubActionsTest.php

+7-6
Original file line numberDiff line numberDiff line change
@@ -49,12 +49,13 @@ public function testGitHubActionsReadMe(): void
4949
{
5050
$inputs = yml(PROJECT_ROOT . '/action.yml')->findArray('inputs');
5151
$examples = [
52-
'csv' => './tests/**/*.csv',
53-
'schema' => './tests/**/*.yml',
54-
'report' => "'" . ErrorSuite::REPORT_DEFAULT . "'",
55-
'quick' => "'no'",
56-
'skip-schema' => "'no'",
57-
'extra' => "'options: --ansi'",
52+
'csv' => './tests/**/*.csv',
53+
'schema' => './tests/**/*.yml',
54+
'report' => "'" . ErrorSuite::REPORT_DEFAULT . "'",
55+
'apply-global' => "'no'",
56+
'quick' => "'no'",
57+
'skip-schema' => "'no'",
58+
'extra' => "'options: --ansi'",
5859
];
5960

6061
$expectedMessage = [

0 commit comments

Comments
 (0)