Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert DataFrames.DataFrame <-> TimeSeries.TimeArray #46

Closed
femtotrader opened this issue Jul 10, 2017 · 8 comments
Closed

Convert DataFrames.DataFrame <-> TimeSeries.TimeArray #46

femtotrader opened this issue Jul 10, 2017 · 8 comments
Labels

Comments

@femtotrader
Copy link

femtotrader commented Jul 10, 2017

Hello,

I'm looking for a way to convert DataFrames.DataFrame to (and from) TimeSeries.TimeArray using IterableTables.jl.
but it doesn't seems to work as I was expecting

TimeSeries.TimeArray -> DataFrames.DataFrame

Create TimeSeries.TimeArray

julia> using TimeSeries
julia> dat = "Date,Open,High,Low,Close,Volume
2012-01-03,11.0,11.25,10.99,11.13,45709900
2012-01-04,11.15,11.53,11.07,11.3,79725200
2012-01-05,11.33,11.63,11.24,11.59,67877500
2012-01-06,11.74,11.8,11.52,11.71,59840700
2012-01-09,11.83,11.95,11.7,11.8,53981500
2012-01-10,12.0,12.05,11.63,11.8,121750600
2012-01-11,11.74,12.18,11.65,12.07,63806000"
julia> io2 = IOBuffer(dat)
julia> ta2 = readtimearray(io2)

Convert to DataFrame

julia> using IterableTables
julia> using DataFrames
julia> DataFrame(ta2)
7×6 DataFrames.DataFrame
│ Row │ timestamp  │ Open  │ High  │ Low   │ Close │ Volume    │
├─────┼────────────┼───────┼───────┼───────┼───────┼───────────┤
│ 12012-01-0311.011.2510.9911.134.57099e7 │
│ 22012-01-0411.1511.5311.0711.37.97252e7 │
│ 32012-01-0511.3311.6311.2411.596.78775e7 │
│ 42012-01-0611.7411.811.5211.715.98407e7 │
│ 52012-01-0911.8311.9511.711.85.39815e7 │
│ 62012-01-1012.012.0511.6311.81.21751e8 │
│ 72012-01-1111.7412.1811.6512.076.3806e7

works fine but...

DataFrames.DataFrame -> TimeSeries.TimeArray

Create DataFrames.DataFrame

julia> using DataFrames
julia> dat = "Date,Open,High,Low,Close,Volume
2012-01-03,11.0,11.25,10.99,11.13,45709900
2012-01-04,11.15,11.53,11.07,11.3,79725200
2012-01-05,11.33,11.63,11.24,11.59,67877500
2012-01-06,11.74,11.8,11.52,11.71,59840700
2012-01-09,11.83,11.95,11.7,11.8,53981500
2012-01-10,12.0,12.05,11.63,11.8,121750600
2012-01-11,11.74,12.18,11.65,12.07,63806000"
julia> io = IOBuffer(dat)
julia> df = readtable(io)
julia> df[:Date] = Date(df[:Date])

Convert to TimeSeries.TimeArray

julia> using IterableTables
julia> using TimeSeries
julia> TimeArray(df[[:Date, :Open, :High, :Low, :Close]], timestamp_column=:Date)
ERROR: MethodError: no method matching TimeSeries.TimeArray(::DataFrames.DataFrame; timestamp_column=:Date)

Any idea?

Kind regards

PS:
Following JuliaStats/TimeSeries.jl#290
and femtotrader/TimeSeriesIO.jl#19

@davidanthoff
Copy link
Member

Ah, this is a bug! Should be fixed in #47 and I'll tag a new release asap. Thanks for reporting!

@femtotrader
Copy link
Author

Thanks @davidanthoff
You can add theses examples to your test bases to catch this kind of issue.

@davidanthoff
Copy link
Member

You can add theses examples to your test bases to catch this kind of issue.

Yes, that would be very useful: now tracked in #48.

@femtotrader
Copy link
Author

femtotrader commented Jul 11, 2017

That's why I opened JuliaStats/TimeSeries.jl#297
it's very convenient for tests.
All Julia tabular data structures with CSV like input (DataFrame, TimeArray...) should accept IOBuffer (not only filename)

@davidanthoff
Copy link
Member

All Julia tabular data structure with CSV like input (DataFrame, TimeArray...) should accept IOBuffer (not only filename)

There is probably a way to achieve this via the new CSVFiles.jl stuff. Essentially the idea would be that it would be enough to accept an iterable table in a constructor and use the stuff from CSVFiles.jl to convert a stream in CSV format into an iterable table. I'm tracking this in queryverse/CSVFiles.jl#5.

@davidanthoff
Copy link
Member

Once queryverse/CSVFiles.jl#6 lands, one should be able to do this:

using FileIO, CSVFiles, DataFrame


dat = """Date,Open,High,Low,Close,Volume
2012-01-03,11.0,11.25,10.99,11.13,45709900
2012-01-04,11.15,11.53,11.07,11.3,79725200
2012-01-05,11.33,11.63,11.24,11.59,67877500
2012-01-06,11.74,11.8,11.52,11.71,59840700
2012-01-09,11.83,11.95,11.7,11.8,53981500
2012-01-10,12.0,12.05,11.63,11.8,121750600
2012-01-11,11.74,12.18,11.65,12.07,63806000"""

io = IOBuffer(dat)

df = DataFrame(load(Stream(format"CSV", io)))
# Or using pipe syntax, once I merge one more thing
df = Stream(format"CSV", io) |> load() |> DataFrame

You should be able to substitute any of the IterableTables.jl sinks in for DataFrame and it should all work.

@davidanthoff
Copy link
Member

Alright, bug fix is released and tagged, so a Pkg.update() should be enough to make this work. Thanks again for reporting this.

@femtotrader
Copy link
Author

All right!

julia> TimeArray(df[[:Date, :Open, :High, :Low, :Close]], timestamp_column=:Date)
(WARNINGS)
7x4 TimeSeries.TimeArray{Float64,2,Date,Array{Float64,2}} 2012-01-03 to 2012-01-11

             Open     High     Low      Close
2012-01-03 | 11.0     11.25    10.99    11.13
2012-01-04 | 11.15    11.53    11.07    11.3
2012-01-05 | 11.33    11.63    11.24    11.59
2012-01-06 | 11.74    11.8     11.52    11.71
2012-01-09 | 11.83    11.95    11.7     11.8
2012-01-10 | 12.0     12.05    11.63    11.8
2012-01-11 | 11.74    12.18    11.65    12.07

Thanks @davidanthoff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants