You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[GOBBLIN-1385] Renaming references from incubator gobblin to gobblin
Renaming references to incubator gobblin to
gobblin
Removing disclamer as it not needed anymore
Closesapache#3223 from
treff7es/rename_incubator_gobblin_to_gobblin
Copy file name to clipboardexpand all lines: CHANGELOG.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -1762,9 +1762,9 @@ GOBBLIN 0.6.0
1762
1762
NEW FEATURES
1763
1763
1764
1764
*[Compaction] Added M/R compaction/de-duping for hourly data
1765
-
*[Compaction] Added late data handling for hourly and daily M/R compaction: https://github.com/apache/incubator-gobblin/wiki/Compaction#handling-late-records; added support for triggering M/R compaction if late data exceeds a threshold
1765
+
*[Compaction] Added late data handling for hourly and daily M/R compaction: https://github.com/apache/gobblin/wiki/Compaction#handling-late-records; added support for triggering M/R compaction if late data exceeds a threshold
1766
1766
*[I/O] Added support for using Hive SerDe's through HiveWritableHdfsDataWriter
1767
-
*[I/O] Added the concept of data partitioning to writers: https://github.com/apache/incubator-gobblin/wiki/Partitioned-Writers
1767
+
*[I/O] Added the concept of data partitioning to writers: https://github.com/apache/gobblin/wiki/Partitioned-Writers
1768
1768
*[Runtime] Added CliLocalJobLauncher for launching single jobs from the command line.
1769
1769
*[Converters] Added AvroSchemaFieldRemover that can remove specific fields from a (possibly recursive) Avro schema.
1770
1770
*[DQ] Added new row-level policies RecordTimestampLowerBoundPolicy and AvroRecordTimestampLowerBoundPolicy for checking if a record timestamp is too far in the past.
[](https://communityinviter.com/apps/apache-gobblin/apache-gobblin)
Copy file name to clipboardexpand all lines: gobblin-docs/Getting-Started.md
+8-8
Original file line number
Diff line number
Diff line change
@@ -81,17 +81,17 @@ For this example, we will once again run the Wikipedia example. The records will
81
81
82
82
## Preliminary
83
83
84
-
Each Gobblin job minimally involves several constructs, e.g. [Source](https://github.com/apache/incubator-gobblin/blob/master/gobblin-api/src/main/java/org/apache/gobblin/source/Source.java), [Extractor](https://github.com/apache/incubator-gobblin/blob/master/gobblin-api/src/main/java/org/apache/gobblin/source/extractor/Extractor.java), [DataWriter](https://github.com/apache/incubator-gobblin/blob/master/gobblin-api/src/main/java/org/apache/gobblin/writer/DataWriter.java) and [DataPublisher](https://github.com/apache/incubator-gobblin/blob/master/gobblin-api/src/main/java/org/apache/gobblin/publisher/DataPublisher.java). As the names suggest, Source defines the source to pull data from, Extractor implements the logic to extract data records, DataWriter defines the way the extracted records are output, and DataPublisher publishes the data to the final output location. A job may optionally have one or more Converters, which transform the extracted records, as well as one or more PolicyCheckers that check the quality of the extracted records and determine whether they conform to certain policies.
84
+
Each Gobblin job minimally involves several constructs, e.g. [Source](https://github.com/apache/gobblin/blob/master/gobblin-api/src/main/java/org/apache/gobblin/source/Source.java), [Extractor](https://github.com/apache/gobblin/blob/master/gobblin-api/src/main/java/org/apache/gobblin/source/extractor/Extractor.java), [DataWriter](https://github.com/apache/gobblin/blob/master/gobblin-api/src/main/java/org/apache/gobblin/writer/DataWriter.java) and [DataPublisher](https://github.com/apache/gobblin/blob/master/gobblin-api/src/main/java/org/apache/gobblin/publisher/DataPublisher.java). As the names suggest, Source defines the source to pull data from, Extractor implements the logic to extract data records, DataWriter defines the way the extracted records are output, and DataPublisher publishes the data to the final output location. A job may optionally have one or more Converters, which transform the extracted records, as well as one or more PolicyCheckers that check the quality of the extracted records and determine whether they conform to certain policies.
85
85
86
-
Some of the classes relevant to this example include [WikipediaSource](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/java/org/apache/gobblin/example/wikipedia/WikipediaSource.java), [WikipediaExtractor](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/java/org/apache/gobblin/example/wikipedia/WikipediaExtractor.java), [WikipediaConverter](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/java/org/apache/gobblin/example/wikipedia/WikipediaConverter.java), [AvroHdfsDataWriter](https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/AvroHdfsDataWriter.java) and [BaseDataPublisher](https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/publisher/BaseDataPublisher.java).
86
+
Some of the classes relevant to this example include [WikipediaSource](https://github.com/apache/gobblin/blob/master/gobblin-example/src/main/java/org/apache/gobblin/example/wikipedia/WikipediaSource.java), [WikipediaExtractor](https://github.com/apache/gobblin/blob/master/gobblin-example/src/main/java/org/apache/gobblin/example/wikipedia/WikipediaExtractor.java), [WikipediaConverter](https://github.com/apache/gobblin/blob/master/gobblin-example/src/main/java/org/apache/gobblin/example/wikipedia/WikipediaConverter.java), [AvroHdfsDataWriter](https://github.com/apache/gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/AvroHdfsDataWriter.java) and [BaseDataPublisher](https://github.com/apache/gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/publisher/BaseDataPublisher.java).
87
87
88
-
To run Gobblin in standalone daemon mode we need a Gobblin configuration file (such as uses [application.conf](https://github.com/apache/incubator-gobblin/blob/master/conf/standalone/application.conf)). And for each job we wish to run, we also need a job configuration file (such as [wikipedia.pull](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/resources/wikipedia.pull)). The Gobblin configuration file, which is passed to Gobblin as a command line argument, should contain a property `jobconf.dir` which specifies where the job configuration files are located. By default, `jobconf.dir` points to environment variable `GOBBLIN_JOB_CONFIG_DIR`. Each file in `jobconf.dir` with extension `.job` or `.pull` is considered a job configuration file, and Gobblin will launch a job for each such file. For more information on Gobblin deployment in standalone mode, refer to the [Standalone Deployment](user-guide/Gobblin-Deployment#Standalone-Deployment) page.
88
+
To run Gobblin in standalone daemon mode we need a Gobblin configuration file (such as uses [application.conf](https://github.com/apache/gobblin/blob/master/conf/standalone/application.conf)). And for each job we wish to run, we also need a job configuration file (such as [wikipedia.pull](https://github.com/apache/gobblin/blob/master/gobblin-example/src/main/resources/wikipedia.pull)). The Gobblin configuration file, which is passed to Gobblin as a command line argument, should contain a property `jobconf.dir` which specifies where the job configuration files are located. By default, `jobconf.dir` points to environment variable `GOBBLIN_JOB_CONFIG_DIR`. Each file in `jobconf.dir` with extension `.job` or `.pull` is considered a job configuration file, and Gobblin will launch a job for each such file. For more information on Gobblin deployment in standalone mode, refer to the [Standalone Deployment](user-guide/Gobblin-Deployment#Standalone-Deployment) page.
89
89
90
90
A list of commonly used configuration properties can be found here: [Configuration Properties Glossary](user-guide/Configuration-Properties-Glossary).
91
91
92
92
## Steps
93
93
94
-
* Create a folder to store the job configuration file. Put [wikipedia.pull](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/resources/wikipedia.pull) in this folder, and set environment variable `GOBBLIN_JOB_CONFIG_DIR` to point to this folder. Also, make sure that the environment variable `JAVA_HOME` is set correctly.
94
+
* Create a folder to store the job configuration file. Put [wikipedia.pull](https://github.com/apache/gobblin/blob/master/gobblin-example/src/main/resources/wikipedia.pull) in this folder, and set environment variable `GOBBLIN_JOB_CONFIG_DIR` to point to this folder. Also, make sure that the environment variable `JAVA_HOME` is set correctly.
95
95
96
96
* Create a folder as Gobblin's working directory. Gobblin will write job output as well as other information there, such as locks and state-store (for more information, see the [Standalone Deployment](user-guide/Gobblin-Deployment#Standalone-Deployment) page). Set environment variable `GOBBLIN_WORK_DIR` to point to that folder.
`output.json` will contain all retrieved records in JSON format.
154
154
155
-
Note that since this job configuration file we used ([wikipedia.pull](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/resources/wikipedia.pull)) doesn't specify a job schedule, the job will run immediately and will run only once. To schedule a job to run at a certain time and/or repeatedly, set the `job.schedule` property with a cron-based syntax. For example, `job.schedule=0 0/2 * * * ?` will run the job every two minutes. See [this link](http://www.quartz-scheduler.org/documentation/quartz-2.1.x/tutorials/crontrigger.html) (Quartz CronTrigger) for more details.
155
+
Note that since this job configuration file we used ([wikipedia.pull](https://github.com/apache/gobblin/blob/master/gobblin-example/src/main/resources/wikipedia.pull)) doesn't specify a job schedule, the job will run immediately and will run only once. To schedule a job to run at a certain time and/or repeatedly, set the `job.schedule` property with a cron-based syntax. For example, `job.schedule=0 0/2 * * * ?` will run the job every two minutes. See [this link](http://www.quartz-scheduler.org/documentation/quartz-2.1.x/tutorials/crontrigger.html) (Quartz CronTrigger) for more details.
156
156
157
157
# Other Example Jobs
158
158
159
-
Besides the Wikipedia example, we have another example job [SimpleJson](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/resources/simplejson.pull), which extracts records from JSON files and store them in Avro files.
159
+
Besides the Wikipedia example, we have another example job [SimpleJson](https://github.com/apache/gobblin/blob/master/gobblin-example/src/main/resources/simplejson.pull), which extracts records from JSON files and store them in Avro files.
160
160
161
-
To create your own jobs, simply implement the relevant interfaces such as [Source](https://github.com/apache/incubator-gobblin/blob/master/gobblin-api/src/main/java/org/apache/gobblin/source/Source.java), [Extractor](https://github.com/apache/incubator-gobblin/blob/master/gobblin-api/src/main/java/org/apache/gobblin/source/extractor/Extractor.java), [Converter](https://github.com/apache/incubator-gobblin/blob/master/gobblin-api/src/main/java/org/apache/gobblin/converter/Converter.java) and [DataWriter](https://github.com/apache/incubator-gobblin/blob/master/gobblin-api/src/main/java/org/apache/gobblin/writer/DataWriter.java). In the job configuration file, set properties such as `source.class` and `converter.class` to point to these classes.
161
+
To create your own jobs, simply implement the relevant interfaces such as [Source](https://github.com/apache/gobblin/blob/master/gobblin-api/src/main/java/org/apache/gobblin/source/Source.java), [Extractor](https://github.com/apache/gobblin/blob/master/gobblin-api/src/main/java/org/apache/gobblin/source/extractor/Extractor.java), [Converter](https://github.com/apache/gobblin/blob/master/gobblin-api/src/main/java/org/apache/gobblin/converter/Converter.java) and [DataWriter](https://github.com/apache/gobblin/blob/master/gobblin-api/src/main/java/org/apache/gobblin/writer/DataWriter.java). In the job configuration file, set properties such as `source.class` and `converter.class` to point to these classes.
162
162
163
-
On a side note: while users are free to directly implement the Extractor interface (e.g., WikipediaExtractor), Gobblin also provides several extractor implementations based on commonly used protocols, e.g., [KafkaExtractor](https://github.com/apache/incubator-gobblin/blob/master/gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java), [RestApiExtractor](https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/source/extractor/extract/restapi/RestApiExtractor.java), [JdbcExtractor](https://github.com/apache/incubator-gobblin/blob/master/gobblin-modules/gobblin-sql/src/main/java/org/apache/gobblin/source/jdbc/JdbcExtractor.java), [SftpExtractor](https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/source/extractor/extract/sftp/SftpExtractor.java), etc. Users are encouraged to extend these classes to take advantage of existing implementations.
163
+
On a side note: while users are free to directly implement the Extractor interface (e.g., WikipediaExtractor), Gobblin also provides several extractor implementations based on commonly used protocols, e.g., [KafkaExtractor](https://github.com/apache/gobblin/blob/master/gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java), [RestApiExtractor](https://github.com/apache/gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/source/extractor/extract/restapi/RestApiExtractor.java), [JdbcExtractor](https://github.com/apache/gobblin/blob/master/gobblin-modules/gobblin-sql/src/main/java/org/apache/gobblin/source/jdbc/JdbcExtractor.java), [SftpExtractor](https://github.com/apache/gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/source/extractor/extract/sftp/SftpExtractor.java), etc. Users are encouraged to extend these classes to take advantage of existing implementations.
0 commit comments