|
| 1 | + |
1 | 2 | # DSV Mender
|
| 3 | +[](https://mvnrepository.com/artifact/com.github.alexisjehan/dsv-mender/latest) |
| 4 | +[](http://www.javadoc.io/doc/com.github.alexisjehan/dsv-mender) |
| 5 | +[](https://travis-ci.org/alexisjehan/dsv-mender) |
| 6 | +[](https://codecov.io/gh/alexisjehan/dsv-mender) |
| 7 | +[](https://github.com/alexisjehan/dsv-mender/blob/master/LICENSE.txt) |
2 | 8 |
|
3 |
| -A Java 8 library to fix malformed Delimiter-separated values (DSV) data automatically. |
| 9 | +A Java 11+ library to fix malformed DSV (Delimiter-Separated Values) data automatically. |
4 | 10 |
|
5 | 11 | ## Introduction
|
6 |
| - |
7 | 12 | As many developers you may already had to treat some input data with formats such as _CSV_ or _JSON_. Sometimes that
|
8 |
| -task could become tricky to achieve because some values are not formatted how they are supposed to be. DSV Mender is a |
9 |
| -library that aims to help you in such cases efficiently. Basically it collects some features from each valid column of |
10 |
| -the data independently to find the best solution while handling invalid or missing values. |
| 13 | +task could become tricky to achieve because some values are not always formatted how they are supposed to be. |
| 14 | +**DSV Mender** is a library that aims to help you in such cases efficiently. Basically it collects some features from |
| 15 | +each valid column of the data independently to find the best solution while handling invalid or missing values. |
11 | 16 |
|
12 |
| -### Estimations and Constraints |
| 17 | +### Constraints and estimations |
| 18 | +DSV Mender is working with a concept of constraints and estimations that are associated to specific columns of the data: |
13 | 19 |
|
14 |
| -DSV Mender is working with concepts of estimations and constraints that are assigned to desired columns: |
| 20 | +* **Constraints** eliminate some candidate possibilities of a malformed row if they do not respect a rule, without |
| 21 | +taking into account previous valid values at all. |
| 22 | +For example if the third column has to be exactly 5 characters long, then all candidates with a value that does not |
| 23 | +will be discarded. |
15 | 24 |
|
16 | 25 | * **Estimations** could be used to collect some features from valid values. When an invalid value need to be fixed then
|
17 | 26 | the closest generated possibility is chosen.
|
18 | 27 | For example if you collect the length of valid values and get 5 characters 95% of the time then a possible fixed-value
|
19 |
| -that got a length of 5 got more chances to be selected than a possibility of 3 characters. |
| 28 | +that got a length of 5 got more chances to be selected than a candidate of 3 characters. |
| 29 | + |
| 30 | +## Getting started |
| 31 | +To include and use DSV Mender, you need to add the following dependency into your _Maven_ _pom.xml_ file: |
| 32 | +```xml |
| 33 | +<dependency> |
| 34 | + <groupId>com.github.alexisjehan</groupId> |
| 35 | + <artifactId>dsv-mender</artifactId> |
| 36 | + <version>1.0.0</version> |
| 37 | +</dependency> |
| 38 | +``` |
20 | 39 |
|
21 |
| -* **Constraints** unlike estimations, eliminate some possibilities if they do not respect a precise rule, without taking |
22 |
| -into account valid values at all. |
23 |
| -For example if the third column has to be exactly 5 characters long, then all possibilities with a value that does not |
24 |
| -will be discarded. |
| 40 | +Or if you are using _Gradle_: |
| 41 | +```xml |
| 42 | +dependencies { |
| 43 | + compile "com.github.alexisjehan:dsv-mender:1.0.0" |
| 44 | +} |
| 45 | +``` |
25 | 46 |
|
26 |
| -## Example |
| 47 | +Also the Javadoc can be accessed [here](http://www.javadoc.io/doc/com.github.alexisjehan/dsv-mender). |
27 | 48 |
|
| 49 | +## Examples |
28 | 50 | Let's illustrate how it works step-by-step, consider the following CSV data:
|
29 | 51 |
|
30 | 52 | ```csv
|
31 |
| -ID,NAME,DESCRIPTION,BIRTHDAY,COUNTRY |
32 |
| -1,John,Hey everyone I'm the first user,1984-05-16,United Kingdom |
33 |
| -2,Pierre,Bonjour à tous vous allez bien ?,1992-11-26,France |
34 |
| -3,Pedro,Holà qué tal ?,1962-01-05,Spain |
35 |
| -4,Arnold,My country name contains a , in it,1974-05-30,Macedonia, Rep. of |
36 |
| -5,Peter,I, like, to, use, commas, between, words,1994-12-04,United States |
| 53 | +Release,Release date,Highlights |
| 54 | +Java SE 9,2017-09-21,Initial release |
| 55 | +Java SE 9.0.1,2017-10-17,October 2017 security fixes and critical bug fixes |
| 56 | +Java SE 9.0.4,2018-01-16,Final release for JDK 9; January 2018 security fixes and critical bug fixes |
| 57 | +Java SE 10,2018-03-20,Initial release |
| 58 | +Java SE 10.0.1,2018-04-17,Security fixes, 5 bug fixes |
| 59 | +Java SE 11,2018-09-25,Initial release |
| 60 | +Java SE 11.0.1,2018-10-16,Security & bug fixes |
| 61 | +Java SE 11.0.2,2019-01-15,Security & bug fixes |
| 62 | +Java SE 12,Initial release |
37 | 63 | ```
|
38 | 64 |
|
39 |
| -As you can see, it looks like CSV data but values are not quoted. Have a look especially to the two last lines, yeah... |
40 |
| -some values appear to contain some comma characters, which is used also as the delimiter. How to fix it ? Let's see how |
41 |
| -DSV Mender works... |
| 65 | +As you may see, some lines are not well-formatted. The "Java SE 10.0.1" "Highlights" column contains the delimiter |
| 66 | +character, and the "Java SE 12" "Release date" column is missing. Let's see how to use DSV Mender to fix it. |
42 | 67 |
|
43 | 68 | ### Building the mender
|
| 69 | +First you need to create a _Mender_ object based on the input data. That requires to specify the delimiter string as |
| 70 | +well as the expected number of columns. |
44 | 71 |
|
45 |
| -The first thing is to configure a _Mender_ object based on the input data. You need to specify the delimiter string as |
46 |
| -well as the valid number of columns. |
47 |
| - |
48 |
| -#### Automatic configuration |
49 |
| - |
50 |
| -If you don't know so much about the data or you want to see how the Mender acts automatically you can use that: |
| 72 | +#### Basic configuration |
| 73 | +The lazy way, for a first attempt is to build a basic _Mender_, that can be able to mend most of input data: |
51 | 74 |
|
52 | 75 | ```java
|
53 |
| -final DsvMender mender = DsvMender.auto(",", 5); // delimiter and number of columns |
| 76 | +final var delimiter = ','; |
| 77 | +final var length = 3; |
| 78 | +final var mender = DsvMender.basic(delimiter, length); |
54 | 79 | ```
|
55 | 80 |
|
56 | 81 | #### Advanced configuration
|
57 |
| - |
58 |
| -If you know approximately how some columns need to be formatted and to get more accurate results, you would better use a |
59 |
| -more advanced configuration. Concerning our example the Mender could be created like this: |
| 82 | +For more accurate results, you can also build a _Mender_ with custom _Constraints_ and _Estimations_. For our example above |
| 83 | +we will use the following ones: |
60 | 84 |
|
61 | 85 | ```java
|
62 |
| -final DsvMender mender = DsvMender.builder(",", 5) |
63 |
| - .withLengthEstimations() // Estimating the length of the value for every columns |
64 |
| - .withContainsEstimations(" ") // Estimating if the value contains a space character for every columns |
65 |
| - .withPatternConstraint(0, Pattern.compile("[0-9]+")) // The ID column is always numerical, not empty |
66 |
| - .withLengthConstraint(3, 10) // The birthday column always contains 10 characters |
| 86 | +final var mender = DsvMender.builder() |
| 87 | + .withDelimiter(',') |
| 88 | + .withLength(3) |
| 89 | + .withConstraint(value -> value.startsWith("Java SE"), 0) // values[0] must start with "Java SE" |
| 90 | + .withConstraint(value -> value.isEmpty() || 10 == value.length(), 1)// values[1] must be empty or have a length of 10 |
67 | 91 | .build();
|
68 | 92 | ```
|
69 | 93 |
|
70 | 94 | ### Processing the data
|
71 |
| - |
72 |
| -Before to fix, and because we configured estimations, then we first need to fit the DSV Mender with valid rows. |
73 |
| - |
74 |
| -```java |
75 |
| -mender.fit("1,John,Hey everyone I'm the first user,1984-05-16,United Kingdom"); |
76 |
| -mender.fit("2,Pierre,Bonjour à tous vous allez bien ?,1992-11-26,France"); |
77 |
| -mender.fit("3,Pedro,Holà, qué tal ?,1962-01-05,Spain"); |
78 |
| -``` |
79 |
| - |
80 |
| -Finally we can now fix invalid rows and display the result: |
| 95 | +Once you got your _Mender_ component built, you are able to process your data line by line. Note that you do not have to |
| 96 | +worry of the passed line being valid or not, if it is then the _Mender_ will still fit its _Estimations_ before to |
| 97 | +return it. |
81 | 98 |
|
82 | 99 | ```java
|
83 |
| -try { |
84 |
| - Arrays.asList(mender.fix("4,Arnold,My country name contains a , in it,1974-05-30,Macedonia, Rep. of")).forEach(System.out::println); |
85 |
| - System.out.println(); |
86 |
| - Arrays.asList(mender.fix("5,Peter,I, like, to, use, commas, between, words,1994-12-04,United States")).forEach(System.out::println); |
87 |
| -} catch (final MenderException e) { |
88 |
| - System.err.println("ERROR: No solution has been found, try others estimations and constraints"); |
| 100 | +String row; |
| 101 | +while (null != (row = reader.readLine())) { |
| 102 | + printValues(mender.mend(row)); |
89 | 103 | }
|
90 | 104 | ```
|
91 | 105 |
|
92 |
| -If you had properly configured the DSV Mender as described earlier, then the data should be fixed. |
| 106 | +Finally here is the result we got for our example: |
93 | 107 |
|
94 |
| -#### Notes: |
95 |
| -* If you don't know which row is valid or not, you should use _fitIfValid_ and _fixIfNotValid_ instead of _fit_ and |
96 |
| -_fix_. |
97 |
| -* Even better, you can use the _DSVReader_ wrapper class that automatically fit and fix while reading from a source. |
98 |
| - |
99 |
| -More examples can be found in the _examples_ package. |
100 |
| - |
101 |
| -## Maven commands |
| 108 | +``` |
| 109 | +"Release", "Release date", "Highlights" |
| 110 | +"Java SE 9", "2017-09-21", "Initial release" |
| 111 | +"Java SE 9.0.1", "2017-10-17", "October 2017 security fixes and critical bug fixes" |
| 112 | +"Java SE 9.0.4", "2018-01-16", "Final release for JDK 9; January 2018 security fixes and critical bug fixes" |
| 113 | +"Java SE 10", "2018-03-20", "Initial release" |
| 114 | +"Java SE 10.0.1", "2018-04-17", "Security fixes, 5 bug fixes" |
| 115 | +"Java SE 11", "2018-09-25", "Initial release" |
| 116 | +"Java SE 11.0.1", "2018-10-16", "Security & bug fixes" |
| 117 | +"Java SE 11.0.2", "2019-01-15", "Security & bug fixes" |
| 118 | +"Java SE 12", "", "Initial release" |
| 119 | +``` |
102 | 120 |
|
103 |
| -### Compiling |
| 121 | +(You can find the code of that example among others in the "examples" package) |
104 | 122 |
|
| 123 | +## Maven phases and goals |
| 124 | +Compile, test and install the JAR in the local Maven repository: |
105 | 125 | ```
|
106 |
| -mvn compile |
| 126 | +mvn install |
107 | 127 | ```
|
108 | 128 |
|
109 |
| -### Running unit tests |
110 |
| - |
| 129 | +Run JUnit 5 tests: |
111 | 130 | ```
|
112 | 131 | mvn test
|
113 | 132 | ```
|
114 | 133 |
|
115 |
| -### Generating the Javadoc |
116 |
| - |
| 134 | +Generate the Javadoc API documentation: |
117 | 135 | ```
|
118 | 136 | mvn javadoc:javadoc
|
119 | 137 | ```
|
120 | 138 |
|
121 |
| -## License |
| 139 | +Update sources license: |
| 140 | +``` |
| 141 | +mvn license:format |
| 142 | +``` |
| 143 | + |
| 144 | +Generate the Jacoco test coverage report: |
| 145 | +``` |
| 146 | +mvn jacoco:report |
| 147 | +``` |
122 | 148 |
|
123 |
| -This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details |
| 149 | +## License |
| 150 | +This project is licensed under the MIT License - see the [LICENSE](LICENSE.txt) file for details |
0 commit comments