AR(IMA) implementation #15

papamarkou · 2014-02-10T17:38:16Z

Hi @milktrader. I am sorry I haven't had much time to get more actively involved in TimeModels. I wanted to ask you what would be the best way to create a model hierarchy in order to implement ARIMA. This won't be useful only to those who work with time series, it will be very beneficial for MCMC too given that several summary statistics for MCMC rely on AR(p) models.

I would be happy to try implement ARIMA or to help with it. Do you have any recommendations from which language we could use some code as prototype for ARIMA or maybe at least AR?

The text was updated successfully, but these errors were encountered:

ElOceanografo · 2014-02-10T18:09:41Z

I've actually been poking at this stuff over the past couple of days, for
the first time in a long while...

R's arima() function is implemented as a state-space model, which allows it
to deal with series with missing data. That was part of my motivation for
doing the Kalman filter last year. The ar() function in R is implemented
differently, and gives you a choice of a few fitting algorithms. I've got
some skeleton code for constructing and fitting ARIMA models as state-space
models via maximum likelihood, which I can push tonight after work.

Definitely worth discussing the big-picture approach to take. I'm used to
R's time-series modeling, so that's my point of reference.

Sam

On Mon, Feb 10, 2014 at 12:38 PM, Theodore Papamarkou <
[email protected]> wrote:

Hi @milktrader https://github.com/milktrader. I am sorry I haven't had
much time to get more actively involved in TimeModels. I wanted to ask
you what would be the best way to create a model hierarchy in order to
implement ARIMA. This won't be useful only to those who work with time
series, it will be very beneficial for MCMC too given that several summary
statistics for MCMC rely on AR(p) models.

I would be happy to try implement ARIMA or to help with it. Do you have
any recommendations from which language we could use some code as prototype
for ARIMA or maybe at least AR?

Reply to this email directly or view it on GitHubhttps://github.com//issues/15
.

AndreyKolev · 2014-02-10T18:27:34Z

Hi! Of course R implementation was my first thought too, but you also can look to python implementation: https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/arima_model.py

Also I'm sorry I hadn't time to merge my garch code before, I think I'll have a time to do it this week finally

papamarkou · 2014-02-10T19:13:13Z

That's great @ElOceanografo, @AndreyKolev, it would be amazing if you manage to merge arima and/or garch! There are a few points for discussion, for example if we are going to deal with missing values (I imagine yes), and therefore if we are going to work with data arrays or time arrays or float64 arrays, but these are not so urgent. As long as we have some Julia code for arima or garch to work with, we can easily change these later.

milktrader · 2014-02-10T19:28:18Z

Possibly ping @carljv about model hierarchy.

Heads up on TimeSeries. It no longer supports DataFrames/DataArrays but instead has implemented the TimeArray type.

immutable TimeArray{T,N} <: AbstractTimeArray

    timestamp::Vector{Date{ISOCalendar}}
    values::Array{T,N}
    colnames::Vector{ASCIIString}


    function TimeArray(timestamp::Vector{Date{ISOCalendar}}, values::Array{T,N}, colnames::Vector{ASCIIString})
        nrow, ncol = size(values, 1), size(values, 2)
        nrow != size(timestamp, 1) ? error("values must match length of timestamp"):
        ncol != size(colnames,1) ? error("column names must match width of array"):
        timestamp != unique(timestamp) ? error("there are duplicate dates"):
        ~(flipud(timestamp) == sort(timestamp) || timestamp == sort(timestamp)) ? error("dates are mangled"):
        flipud(timestamp) == sort(timestamp) ? 
        new(flipud(timestamp), flipud(values), colnames):
        new(timestamp, values, colnames)
    end
end

The inner constructor looks like a lot of code but simply enforces obvious invariants and orders the data from youngest to oldest. Overall, the design's goals are simplicity speed. It would be interesting to take it through the paces with time models and see where the weaknesses are.

papamarkou · 2014-02-10T19:31:58Z

Thanks @milktrader. Two questions, as I look at the TimeArray definition only now for the first time:

Does it deal with missing values?
What info does the timestamp field hold? If you can provide a very simple example of creating a timearray, this will help us use it in TimeModels.

Cheers!

milktrader · 2014-02-10T19:38:12Z

It does not yet deal with missing values. That missing piece is being explored in the DataArrays branch. The creation of a TimeArray is straightforward.

julia> mydates = [today():days(1):date(2014,3,1)]
20-element Array{Date{ISOCalendar},1}:
 2014-02-10
 2014-02-11
 2014-02-12
 2014-02-13
 2014-02-14
 2014-02-15
 ⋮
 2014-02-24
 2014-02-25
 2014-02-26
 2014-02-27
 2014-02-28
 2014-03-01

julia> myarray = [rand(20) rand(20)];

julia> mycolnames = ["r1", "r2"];

julia> mytimearray = TimeArray(mydates, myarray, mycolnames)
20x2 TimeArray{Float64,2} 2014-02-10 to 2014-03-01

             r1    r2
2014-02-10 | 0.58  0.13
2014-02-11 | 0.81  0.64
2014-02-12 | 0.18  0.07
2014-02-13 | 0.54  0.07
...
2014-02-25 | 0.42  0.99
2014-02-26 | 0.49  0.47
2014-02-27 | 0.77  0.19
2014-02-28 | 0.97  0.55
2014-03-01 | 0.45  0.72

The show method is still brittle and doesn't support long column names.

papamarkou · 2014-02-10T19:41:13Z

That's really helpful. One more question; is there any constructor that allows leaving the timestamp empty or to some default value if the user of a time series model is not so much interested in the timestamp field, but mostly or only wants to look at the values field from a modelling point of view?

milktrader · 2014-02-10T19:41:34Z

If you don't mind cloning the MarketData package you get some const objects to play with.

julia> Pkg.clone("git://github.com/JuliaQuant/MarketData.jl.git")

julia> using MarketData

julia> OHLC
16103x6 TimeArray{Float64,2} 1950-01-03 to 2013-12-31

             Open     High     Low      Close    Volume          Adj Cl
1950-01-03 | 16.66  16.66  16.66  16.66  1260000.00  16.66
1950-01-04 | 16.85  16.85  16.85  16.85  1890000.00  16.85
1950-01-05 | 16.93  16.93  16.93  16.93  2550000.00  16.93
1950-01-06 | 16.98  16.98  16.98  16.98  2010000.00  16.98
...
2013-12-24 | 1828.02  1833.32  1828.02  1833.32  1307630000.00  1833.32
2013-12-26 | 1834.96  1842.84  1834.96  1842.02  1982270000.00  1842.02
2013-12-27 | 1842.97  1844.89  1839.81  1841.40  2052920000.00  1841.40
2013-12-30 | 1841.47  1842.47  1838.77  1841.07  0.00  1841.07
2013-12-31 | 1842.61  1849.44  1842.41  1848.36  2312840000.00  1848.36

Yeah, the show method needs some work

milktrader · 2014-02-10T19:42:45Z

Dropping down to just an array is simple

julia> OHLC.values
16103x6 Array{Float64,2}:
   16.66    16.66    16.66    16.66  1.26e6       16.66
   16.85    16.85    16.85    16.85  1.89e6       16.85
   16.93    16.93    16.93    16.93  2.55e6       16.93
   16.98    16.98    16.98    16.98  2.01e6       16.98
   17.08    17.08    17.08    17.08  2.52e6       17.08
   17.03    17.03    17.03    17.03  2.16e6       17.03
    ⋮                                              ⋮
 1822.92  1829.75  1822.92  1827.99  2.85154e9  1827.99
 1828.02  1833.32  1828.02  1833.32  1.30763e9  1833.32
 1834.96  1842.84  1834.96  1842.02  1.98227e9  1842.02
 1842.97  1844.89  1839.81  1841.4   2.05292e9  1841.4
 1841.47  1842.47  1838.77  1841.07  0.0        1841.07
 1842.61  1849.44  1842.41  1848.36  2.31284e9  1848.36

papamarkou · 2014-02-10T19:47:30Z

That's good. What I had in mind is this situation; once the time series models rely on time arrays (as their input or output), the user may just have a float64 array, i.e. the values field of the time array. Then he would have to create a time array where the timestamp would be nothing or populated by sth as default, so as to then pass this time array to the model. Anyway, this thinking can be omitted for later, no reason to worry about all this from now, let's get first the models coded and we can discuss about such mattters later :)

milktrader · 2014-02-10T19:56:50Z

@AndreyKolev I've looked into the best way to merge two repositories like these and it's actually not a simple matter. Given that the original TimeModels doesn't have a deep history and the history for GARCH is relatively short, I think the best approach is as follows (but let's discuss this before someone pulls the trigger)

go to the directory where both GARCH and TimeModels repositories exist (probably your .julia directory)
make sure both repositories are up-to-date with master (git pull)
$ cd TimeModels
$ git remote add GARCH ../GARCH
$ git remote update
$ git checkout -b garch
$ git merge GARCH/master

Now you have a branch called garch in your TimeModels repository. Once this is done, whoever does this should do a PR and we can just merge it. The current TimeModels has very little code in it and nothing should get squashed that would require pesky MERGING housekeeping.

milktrader · 2014-02-10T20:01:16Z

@Scidom I've been creating dummy timestamp data doing this.

[date(1,1,1):years(1):date(variable_that_has_length,1,1)]

So if you want 100 observations with dummy timestamps, you'd do:

[date(1,1,1):years(1):date(100,1,1)]

AndreyKolev · 2014-02-11T20:20:29Z

@milktrader OK I can try it

milktrader · 2014-02-11T22:05:48Z

@AndreyKolev I just noticed we already have a branch named garch. It's probably best to delete it first. You should do it if you're going to do the merge. You can do it from the github.com interface or from command line git branch -D garch should do it.

papamarkou · 2014-02-12T17:59:48Z

Just a quick question so that I Know where we are standing with ARIMA. @ElOceanografo is ARIMA coding for now in your realm of activity? This is perfectly fine, just want to know so that we hold off from any coding in case you are/will be working on it.

ElOceanografo · 2014-02-12T18:48:16Z

In theory yes...though I'm pretty busy with other work right now. If
you're ready to dive in, go ahead; I'm not sure how much time I'll have for
it in the next ~week.

The idea for the state space ARIMA-fitting procedure (in case it's not
clear from my code) is:

-specify order of ARIMA model: p, d, q
-make "build" function that turns a vector of length p + q (i.e., the
concatenated AR and MA coefficients) into a StateSpaceModel
-make an "objective" function that Kalman-filters the data with that SSM
and returns the -log likelihood
-optimize the objective function over the values of the AR and MA
coefficients

I think we should probably have classic methods (Levinson-Durbin, OLS)
available too. I don't have any immediate plans to work on those, so
they're up for grabs if anyone wants to call them...

Sam

On Wed, Feb 12, 2014 at 12:59 PM, Theodore Papamarkou <
[email protected]> wrote:

Just a quick question so that I Know where we are standing with ARIMA.
@ElOceanografo https://github.com/ElOceanografo is ARIMA coding for now
in your realm of activity? This is perfectly fine, just want to know so
that we hold off from any coding in case you are/will be working on it.

Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-34896363
.

papamarkou · 2014-02-12T19:26:15Z

A week is more than fine - I don't think I will have time within the next 2-3 weeks. If let's say after a month none of us has made any progress, then I will try to make time to lay my hands on it.

ElOceanografo · 2014-02-12T19:27:32Z

Alright, sounds good.

On Wed, Feb 12, 2014 at 2:26 PM, Theodore Papamarkou <
[email protected]> wrote:

A week is more than fine - I don't think I will have time within the next
2-3 weeks. If let's say after a month none of us has made any progress,
then I will try to make time to lay my hands on it.

Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-34906159
.

mishrasubhajit · 2015-07-15T05:08:30Z

Hello, why different languages returns different coefficients of an AR data? The data being the same. I have checked with R, Python and MATLAB, and coefficients are different in each languages. What is reason behind this? Thanks.

milktrader · 2015-07-15T17:43:45Z

What data are you using? And are the differences substantial?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AR(IMA) implementation #15

AR(IMA) implementation #15

papamarkou commented Feb 10, 2014

ElOceanografo commented Feb 10, 2014

AndreyKolev commented Feb 10, 2014

papamarkou commented Feb 10, 2014

milktrader commented Feb 10, 2014

papamarkou commented Feb 10, 2014

milktrader commented Feb 10, 2014

papamarkou commented Feb 10, 2014

milktrader commented Feb 10, 2014

milktrader commented Feb 10, 2014

papamarkou commented Feb 10, 2014

milktrader commented Feb 10, 2014

milktrader commented Feb 10, 2014

AndreyKolev commented Feb 11, 2014

milktrader commented Feb 11, 2014

papamarkou commented Feb 12, 2014

ElOceanografo commented Feb 12, 2014

papamarkou commented Feb 12, 2014

ElOceanografo commented Feb 12, 2014

mishrasubhajit commented Jul 15, 2015

milktrader commented Jul 15, 2015

AR(IMA) implementation #15

AR(IMA) implementation #15

Comments

papamarkou commented Feb 10, 2014

ElOceanografo commented Feb 10, 2014

AndreyKolev commented Feb 10, 2014

papamarkou commented Feb 10, 2014

milktrader commented Feb 10, 2014

papamarkou commented Feb 10, 2014

milktrader commented Feb 10, 2014

papamarkou commented Feb 10, 2014

milktrader commented Feb 10, 2014

milktrader commented Feb 10, 2014

papamarkou commented Feb 10, 2014

milktrader commented Feb 10, 2014

milktrader commented Feb 10, 2014

AndreyKolev commented Feb 11, 2014

milktrader commented Feb 11, 2014

papamarkou commented Feb 12, 2014

ElOceanografo commented Feb 12, 2014

papamarkou commented Feb 12, 2014

ElOceanografo commented Feb 12, 2014

mishrasubhajit commented Jul 15, 2015

milktrader commented Jul 15, 2015