Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] JSON array is not supported in JSON column #136

Open
calorie opened this issue Jan 7, 2021 · 1 comment
Open

[Bug Report] JSON array is not supported in JSON column #136

calorie opened this issue Jan 7, 2021 · 1 comment

Comments

@calorie
Copy link

calorie commented Jan 7, 2021

Environment

  • Embulk version: 0.9.23
  • embulk-output-elasticsearch version: 0.4.7(latest)
  • embulk-base-restclient version: 0.5.5

Reproduction

https://github.com/calorie/embulk-repro

$ docker-compose run --rm embulk run json_to_es.yml
Creating embulk-repro_embulk_run ... done
2021-01-07 10:50:06.065 +0000: Embulk v0.9.23
2021-01-07 10:50:07.328 +0000 [WARN] (main): DEPRECATION: JRuby org.jruby.embed.ScriptingContainer is directly injected.
2021-01-07 10:50:10.758 +0000 [INFO] (main): Gem's home and path are set by default: "/root/.embulk/lib/gems"
2021-01-07 10:50:11.828 +0000 [INFO] (main): Started Embulk v0.9.23
2021-01-07 10:50:12.009 +0000 [INFO] (0001:transaction): Loaded plugin embulk-output-elasticsearch (0.4.7)
2021-01-07 10:50:12.065 +0000 [INFO] (0001:transaction): Loaded plugin embulk-parser-jsonl (0.2.1)
2021-01-07 10:50:12.096 +0000 [INFO] (0001:transaction): Listing local files at directory '.' filtering filename by prefix 'json_payload.json'
2021-01-07 10:50:12.098 +0000 [INFO] (0001:transaction): "follow_symlinks" is set false. Note that symbolic links to directories are skipped.
2021-01-07 10:50:12.118 +0000 [INFO] (0001:transaction): Loading files [./json_payload.json]
2021-01-07 10:50:12.162 +0000 [INFO] (0001:transaction): Using local thread executor with max_threads=8 / output tasks 4 = input tasks 1 * 4
2021-01-07 10:50:12.191 +0000 [INFO] (0001:transaction): Logging initialized @6740ms
2021-01-07 10:50:12.681 +0000 [INFO] (0001:transaction): Connecting to Elasticsearch version:7.10.1
2021-01-07 10:50:12.681 +0000 [INFO] (0001:transaction): Executing plugin with 'replace' mode.
2021-01-07 10:50:12.715 +0000 [INFO] (0001:transaction): Inserting data into index[test_20210107-105011]
2021-01-07 10:50:12.724 +0000 [INFO] (0001:transaction): {done:  0 / 1, running: 0}
2021-01-07 10:50:12.869 +0000 [INFO] (0001:transaction): {done:  1 / 1, running: 0}
org.embulk.exec.PartialExecutionException: org.embulk.spi.DataException: Expected object node: [{"b":1}]
        at org.embulk.exec.BulkLoader$LoaderState.buildPartialExecuteException(BulkLoader.java:340)
        at org.embulk.exec.BulkLoader.doRun(BulkLoader.java:566)
        at org.embulk.exec.BulkLoader.access$000(BulkLoader.java:35)
        at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:353)
        at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:350)
        at org.embulk.spi.Exec.doWith(Exec.java:22)
        at org.embulk.exec.BulkLoader.run(BulkLoader.java:350)
        at org.embulk.EmbulkEmbed.run(EmbulkEmbed.java:242)
        at org.embulk.EmbulkRunner.runInternal(EmbulkRunner.java:291)
        at org.embulk.EmbulkRunner.run(EmbulkRunner.java:155)
        at org.embulk.cli.EmbulkRun.runSubcommand(EmbulkRun.java:431)
        at org.embulk.cli.EmbulkRun.run(EmbulkRun.java:90)
        at org.embulk.cli.Main.main(Main.java:64)
Caused by: org.embulk.spi.DataException: Expected object node: [{"b":1}]
        at org.embulk.base.restclient.jackson.StringJsonParser.parseJsonObject(StringJsonParser.java:31)
        at org.embulk.base.restclient.jackson.scope.JacksonAllInObjectScope$1.jsonColumn(JacksonAllInObjectScope.java:115)
        at org.embulk.spi.Column.visit(Column.java:56)
        at org.embulk.spi.Schema.visitColumns(Schema.java:68)
        at org.embulk.base.restclient.jackson.scope.JacksonAllInObjectScope.scopeObject(JacksonAllInObjectScope.java:47)
        at org.embulk.base.restclient.jackson.scope.JacksonObjectScopeBase.scopeEmbulkValues(JacksonObjectScopeBase.java:17)
        at org.embulk.base.restclient.jackson.scope.JacksonObjectScopeBase.scopeEmbulkValues(JacksonObjectScopeBase.java:9)
        at org.embulk.base.restclient.record.ValueExporter.exportValueToBuildRecord(ValueExporter.java:14)
        at org.embulk.base.restclient.record.RecordExporter.exportRecord(RecordExporter.java:18)
        at org.embulk.base.restclient.RestClientPageOutput.add(RestClientPageOutput.java:43)
        at org.embulk.exec.LocalExecutorPlugin$ScatterTransactionalPageOutput$OutputWorker.call(LocalExecutorPlugin.java:351)
        at org.embulk.exec.LocalExecutorPlugin$ScatterTransactionalPageOutput$OutputWorker.call(LocalExecutorPlugin.java:291)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Error: org.embulk.spi.DataException: Expected object node: [{"b":1}]

json_to_es.yml

in:
  type: file
  path_prefix: ./repro.json
  parser:
    type: json
    columns:
    - {name: a, type: json}
out:
  type: elasticsearch
  mode: replace
  nodes:
  - {host: elasticsearch, port: 9200}
  index: test
  index_type: test

repro.json

{"a": [{"b": 1}]}

Expected

I want to use JSON array in JSON data type.
It's necessary to use parseJsonArray here:

public void jsonColumn(final Column column) {
// TODO(dmikurube): Use jackson-datatype-msgpack.
// See: https://github.com/embulk/embulk-base-restclient/issues/32
if (!singlePageRecordReader.isNull(column)) {
resultObject.set(
column.getName(),
jsonParser.parseJsonObject(singlePageRecordReader.getJson(column).toJson()));
} else {
resultObject.putNull(column.getName());
}
}

Ref: Twitter (Japanese text)

@hiroyuki-sato Thank you for supporting me.

@hiroyuki-sato
Copy link
Member

hiroyuki-sato commented Jan 13, 2021

@dmikurube Could you take a look when you get a chance?

For future testing.

a,b,c,d
1,"\"test\"","{ \"a\":\"abc\" }","[1,2,3]",true,null
1,"\"test\"","{ \"a\":\"abc\" }","[1,2,3]",true,null
1,"\"test\"","{ \"a\":\"abc\" }","[1,2,3]",true,null
1,"\"test\"","{ \"a\":\"abc\" }","[1,2,3]",true,null
in:
  type: file
  path_prefix: test.csv
  parser:
    charset: UTF-8
    newline: LF
    type: csv
    delimiter: ','
    quote: '"'
    escape: \
    trim_if_not_quoted: false
    skip_header_lines: 1
    allow_extra_columns: false
    allow_optional_columns: false
    columns:
    - {name: a, type: json}
    - {name: b, type: json}
    - {name: c, type: json}
    - {name: d, type: json}
    - {name: e, type: json}
    - {name: f, type: json}
out: {type: stdout}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants