Skip to content

Common Generic DataFile DB V1.0 Aim to ingest kind of dataFile (file format including csv/json/xml/arvo/orc/parquet/protobuf/apache arrow) and Filter/group and order those kind of data using plain sql without flush datas to any Database or hadoop filesystem.

License

Notifications You must be signed in to change notification settings

robinhood-jim/GenericFileDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GenericFileDB

Build Status license

Common Generic DataFile DB V1.0 Aim to ingest kind of dataFile (file format including csv/json/xml/arvo/orc/parquet/protobuf/apache arrow) and Filter/group and order those kind of data using plain sql without flush datas to any Database or hadoop filesystem. Data file can ingest from local/hdfs/ApacheVfs/AWS s3/google cloud storage/minio/Aliyun/tencent cos/baidu BOS/huawei OBS and etc. Files less than 4G bytes can process without flush to tmp path. large than 4G orc/parquet/arrow binary file must be download first.

Prerequisites

  • Java 11+ above.
  • Maven 3.8.6 above
  • add following to you pom
<dependency>
    <groupId>com.robin.gfdb</groupId>
    <artifactId>core</artifactId>
    <version>1.0-SNAPSHOT</version>
</dependency>

Examples

read csv from FileSystemAccessor

    DataCollectionMeta.Builder builder=new DataCollectionMeta.Builder();
    builder.addColumn("id", Const.META_TYPE_BIGINT,null);
    builder.addColumn("name",Const.META_TYPE_STRING,null);
    builder.addColumn("description",Const.META_TYPE_STRING,null);
        ......
    try(LocalFileSystem fileSystem=LocalFileSystem.getInstance();
        AbstractFileReader reader=new CsvFileReader(meta,fileSystem)){
        fileSystem.init(meta);
        reader.init();
        while(reader.hasNext()){
            outputMap=reader.next();
            log.info("{}",outputMap);
        }finally {
            CommRecordFilter.close();
        }
        

About

Common Generic DataFile DB V1.0 Aim to ingest kind of dataFile (file format including csv/json/xml/arvo/orc/parquet/protobuf/apache arrow) and Filter/group and order those kind of data using plain sql without flush datas to any Database or hadoop filesystem.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages