Skip to content

Commit b3636d9

Browse files
author
Alexis
committed
Released 1.0.0
1 parent 3f43106 commit b3636d9

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+2619
-3919
lines changed

.travis.yml

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
language: java
2+
3+
cache:
4+
directories:
5+
- $HOME/.m2
6+
7+
install: mvn install -DskipTests -Dmaven.javadoc.skip=true -Dgpg.skip
8+
9+
jdk:
10+
- oraclejdk11
11+
- openjdk11
12+
- openjdk12
13+
14+
after_success:
15+
- mvn jacoco:report && bash <(curl -s https://codecov.io/bash)

CHANGELOG.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
## 1.0.0 _(xxx)_
2+
3+
First release

LICENSE LICENSE.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) 2017 Alexis Jehan
3+
Copyright (c) 2017-2019 Alexis Jehan
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

README.md

+97-70
Original file line numberDiff line numberDiff line change
@@ -1,123 +1,150 @@
1+
![DSV Mender](logo.png)
12
# DSV Mender
3+
[![Maven Central](https://img.shields.io/maven-central/v/com.github.alexisjehan/dsv-mender.svg)](https://mvnrepository.com/artifact/com.github.alexisjehan/dsv-mender/latest)
4+
[![Javadoc](http://www.javadoc.io/badge/com.github.alexisjehan/dsv-mender.svg)](http://www.javadoc.io/doc/com.github.alexisjehan/dsv-mender)
5+
[![Travis](https://img.shields.io/travis/alexisjehan/dsv-mender.svg)](https://travis-ci.org/alexisjehan/dsv-mender)
6+
[![Codecov](https://img.shields.io/codecov/c/github/alexisjehan/dsv-mender.svg)](https://codecov.io/gh/alexisjehan/dsv-mender)
7+
[![License](https://img.shields.io/github/license/alexisjehan/dsv-mender.svg)](https://github.com/alexisjehan/dsv-mender/blob/master/LICENSE.txt)
28

3-
A Java 8 library to fix malformed Delimiter-separated values (DSV) data automatically.
9+
A Java 11+ library to fix malformed DSV (Delimiter-Separated Values) data automatically.
410

511
## Introduction
6-
712
As many developers you may already had to treat some input data with formats such as _CSV_ or _JSON_. Sometimes that
8-
task could become tricky to achieve because some values are not formatted how they are supposed to be. DSV Mender is a
9-
library that aims to help you in such cases efficiently. Basically it collects some features from each valid column of
10-
the data independently to find the best solution while handling invalid or missing values.
13+
task could become tricky to achieve because some values are not always formatted how they are supposed to be.
14+
**DSV Mender** is a library that aims to help you in such cases efficiently. Basically it collects some features from
15+
each valid column of the data independently to find the best solution while handling invalid or missing values.
1116

12-
### Estimations and Constraints
17+
### Constraints and estimations
18+
DSV Mender is working with a concept of constraints and estimations that are associated to specific columns of the data:
1319

14-
DSV Mender is working with concepts of estimations and constraints that are assigned to desired columns:
20+
* **Constraints** eliminate some candidate possibilities of a malformed row if they do not respect a rule, without
21+
taking into account previous valid values at all.
22+
For example if the third column has to be exactly 5 characters long, then all candidates with a value that does not
23+
will be discarded.
1524

1625
* **Estimations** could be used to collect some features from valid values. When an invalid value need to be fixed then
1726
the closest generated possibility is chosen.
1827
For example if you collect the length of valid values and get 5 characters 95% of the time then a possible fixed-value
19-
that got a length of 5 got more chances to be selected than a possibility of 3 characters.
28+
that got a length of 5 got more chances to be selected than a candidate of 3 characters.
29+
30+
## Getting started
31+
To include and use DSV Mender, you need to add the following dependency into your _Maven_ _pom.xml_ file:
32+
```xml
33+
<dependency>
34+
<groupId>com.github.alexisjehan</groupId>
35+
<artifactId>dsv-mender</artifactId>
36+
<version>1.0.0</version>
37+
</dependency>
38+
```
2039

21-
* **Constraints** unlike estimations, eliminate some possibilities if they do not respect a precise rule, without taking
22-
into account valid values at all.
23-
For example if the third column has to be exactly 5 characters long, then all possibilities with a value that does not
24-
will be discarded.
40+
Or if you are using _Gradle_:
41+
```xml
42+
dependencies {
43+
compile "com.github.alexisjehan:dsv-mender:1.0.0"
44+
}
45+
```
2546

26-
## Example
47+
Also the Javadoc can be accessed [here](http://www.javadoc.io/doc/com.github.alexisjehan/dsv-mender).
2748

49+
## Examples
2850
Let's illustrate how it works step-by-step, consider the following CSV data:
2951

3052
```csv
31-
ID,NAME,DESCRIPTION,BIRTHDAY,COUNTRY
32-
1,John,Hey everyone I'm the first user,1984-05-16,United Kingdom
33-
2,Pierre,Bonjour à tous vous allez bien ?,1992-11-26,France
34-
3,Pedro,Holà qué tal ?,1962-01-05,Spain
35-
4,Arnold,My country name contains a , in it,1974-05-30,Macedonia, Rep. of
36-
5,Peter,I, like, to, use, commas, between, words,1994-12-04,United States
53+
Release,Release date,Highlights
54+
Java SE 9,2017-09-21,Initial release
55+
Java SE 9.0.1,2017-10-17,October 2017 security fixes and critical bug fixes
56+
Java SE 9.0.4,2018-01-16,Final release for JDK 9; January 2018 security fixes and critical bug fixes
57+
Java SE 10,2018-03-20,Initial release
58+
Java SE 10.0.1,2018-04-17,Security fixes, 5 bug fixes
59+
Java SE 11,2018-09-25,Initial release
60+
Java SE 11.0.1,2018-10-16,Security & bug fixes
61+
Java SE 11.0.2,2019-01-15,Security & bug fixes
62+
Java SE 12,Initial release
3763
```
3864

39-
As you can see, it looks like CSV data but values are not quoted. Have a look especially to the two last lines, yeah...
40-
some values appear to contain some comma characters, which is used also as the delimiter. How to fix it ? Let's see how
41-
DSV Mender works...
65+
As you may see, some lines are not well-formatted. The "Java SE 10.0.1" "Highlights" column contains the delimiter
66+
character, and the "Java SE 12" "Release date" column is missing. Let's see how to use DSV Mender to fix it.
4267

4368
### Building the mender
69+
First you need to create a _Mender_ object based on the input data. That requires to specify the delimiter string as
70+
well as the expected number of columns.
4471

45-
The first thing is to configure a _Mender_ object based on the input data. You need to specify the delimiter string as
46-
well as the valid number of columns.
47-
48-
#### Automatic configuration
49-
50-
If you don't know so much about the data or you want to see how the Mender acts automatically you can use that:
72+
#### Basic configuration
73+
The lazy way, for a first attempt is to build a basic _Mender_, that can be able to mend most of input data:
5174

5275
```java
53-
final DsvMender mender = DsvMender.auto(",", 5); // delimiter and number of columns
76+
final var delimiter = ',';
77+
final var length = 3;
78+
final var mender = DsvMender.basic(delimiter, length);
5479
```
5580

5681
#### Advanced configuration
57-
58-
If you know approximately how some columns need to be formatted and to get more accurate results, you would better use a
59-
more advanced configuration. Concerning our example the Mender could be created like this:
82+
For more accurate results, you can also build a _Mender_ with custom _Constraints_ and _Estimations_. For our example above
83+
we will use the following ones:
6084

6185
```java
62-
final DsvMender mender = DsvMender.builder(",", 5)
63-
.withLengthEstimations() // Estimating the length of the value for every columns
64-
.withContainsEstimations(" ") // Estimating if the value contains a space character for every columns
65-
.withPatternConstraint(0, Pattern.compile("[0-9]+")) // The ID column is always numerical, not empty
66-
.withLengthConstraint(3, 10) // The birthday column always contains 10 characters
86+
final var mender = DsvMender.builder()
87+
.withDelimiter(',')
88+
.withLength(3)
89+
.withConstraint(value -> value.startsWith("Java SE"), 0) // values[0] must start with "Java SE"
90+
.withConstraint(value -> value.isEmpty() || 10 == value.length(), 1)// values[1] must be empty or have a length of 10
6791
.build();
6892
```
6993

7094
### Processing the data
71-
72-
Before to fix, and because we configured estimations, then we first need to fit the DSV Mender with valid rows.
73-
74-
```java
75-
mender.fit("1,John,Hey everyone I'm the first user,1984-05-16,United Kingdom");
76-
mender.fit("2,Pierre,Bonjour à tous vous allez bien ?,1992-11-26,France");
77-
mender.fit("3,Pedro,Holà, qué tal ?,1962-01-05,Spain");
78-
```
79-
80-
Finally we can now fix invalid rows and display the result:
95+
Once you got your _Mender_ component built, you are able to process your data line by line. Note that you do not have to
96+
worry of the passed line being valid or not, if it is then the _Mender_ will still fit its _Estimations_ before to
97+
return it.
8198

8299
```java
83-
try {
84-
Arrays.asList(mender.fix("4,Arnold,My country name contains a , in it,1974-05-30,Macedonia, Rep. of")).forEach(System.out::println);
85-
System.out.println();
86-
Arrays.asList(mender.fix("5,Peter,I, like, to, use, commas, between, words,1994-12-04,United States")).forEach(System.out::println);
87-
} catch (final MenderException e) {
88-
System.err.println("ERROR: No solution has been found, try others estimations and constraints");
100+
String row;
101+
while (null != (row = reader.readLine())) {
102+
printValues(mender.mend(row));
89103
}
90104
```
91105

92-
If you had properly configured the DSV Mender as described earlier, then the data should be fixed.
106+
Finally here is the result we got for our example:
93107

94-
#### Notes:
95-
* If you don't know which row is valid or not, you should use _fitIfValid_ and _fixIfNotValid_ instead of _fit_ and
96-
_fix_.
97-
* Even better, you can use the _DSVReader_ wrapper class that automatically fit and fix while reading from a source.
98-
99-
More examples can be found in the _examples_ package.
100-
101-
## Maven commands
108+
```
109+
"Release", "Release date", "Highlights"
110+
"Java SE 9", "2017-09-21", "Initial release"
111+
"Java SE 9.0.1", "2017-10-17", "October 2017 security fixes and critical bug fixes"
112+
"Java SE 9.0.4", "2018-01-16", "Final release for JDK 9; January 2018 security fixes and critical bug fixes"
113+
"Java SE 10", "2018-03-20", "Initial release"
114+
"Java SE 10.0.1", "2018-04-17", "Security fixes, 5 bug fixes"
115+
"Java SE 11", "2018-09-25", "Initial release"
116+
"Java SE 11.0.1", "2018-10-16", "Security & bug fixes"
117+
"Java SE 11.0.2", "2019-01-15", "Security & bug fixes"
118+
"Java SE 12", "", "Initial release"
119+
```
102120

103-
### Compiling
121+
(You can find the code of that example among others in the "examples" package)
104122

123+
## Maven phases and goals
124+
Compile, test and install the JAR in the local Maven repository:
105125
```
106-
mvn compile
126+
mvn install
107127
```
108128

109-
### Running unit tests
110-
129+
Run JUnit 5 tests:
111130
```
112131
mvn test
113132
```
114133

115-
### Generating the Javadoc
116-
134+
Generate the Javadoc API documentation:
117135
```
118136
mvn javadoc:javadoc
119137
```
120138

121-
## License
139+
Update sources license:
140+
```
141+
mvn license:format
142+
```
143+
144+
Generate the Jacoco test coverage report:
145+
```
146+
mvn jacoco:report
147+
```
122148

123-
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details
149+
## License
150+
This project is licensed under the MIT License - see the [LICENSE](LICENSE.txt) file for details

logo.png

13.1 KB
Loading

0 commit comments

Comments
 (0)