Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AR(IMA) implementation #15

Open
papamarkou opened this issue Feb 10, 2014 · 20 comments
Open

AR(IMA) implementation #15

papamarkou opened this issue Feb 10, 2014 · 20 comments

Comments

@papamarkou
Copy link
Contributor

Hi @milktrader. I am sorry I haven't had much time to get more actively involved in TimeModels. I wanted to ask you what would be the best way to create a model hierarchy in order to implement ARIMA. This won't be useful only to those who work with time series, it will be very beneficial for MCMC too given that several summary statistics for MCMC rely on AR(p) models.

I would be happy to try implement ARIMA or to help with it. Do you have any recommendations from which language we could use some code as prototype for ARIMA or maybe at least AR?

@ElOceanografo
Copy link

I've actually been poking at this stuff over the past couple of days, for
the first time in a long while...

R's arima() function is implemented as a state-space model, which allows it
to deal with series with missing data. That was part of my motivation for
doing the Kalman filter last year. The ar() function in R is implemented
differently, and gives you a choice of a few fitting algorithms. I've got
some skeleton code for constructing and fitting ARIMA models as state-space
models via maximum likelihood, which I can push tonight after work.

Definitely worth discussing the big-picture approach to take. I'm used to
R's time-series modeling, so that's my point of reference.

Sam

On Mon, Feb 10, 2014 at 12:38 PM, Theodore Papamarkou <
[email protected]> wrote:

Hi @milktrader https://github.com/milktrader. I am sorry I haven't had
much time to get more actively involved in TimeModels. I wanted to ask
you what would be the best way to create a model hierarchy in order to
implement ARIMA. This won't be useful only to those who work with time
series, it will be very beneficial for MCMC too given that several summary
statistics for MCMC rely on AR(p) models.

I would be happy to try implement ARIMA or to help with it. Do you have
any recommendations from which language we could use some code as prototype
for ARIMA or maybe at least AR?

Reply to this email directly or view it on GitHubhttps://github.com//issues/15
.

@AndreyKolev
Copy link
Contributor

Hi! Of course R implementation was my first thought too, but you also can look to python implementation: https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/arima_model.py

Also I'm sorry I hadn't time to merge my garch code before, I think I'll have a time to do it this week finally

@papamarkou
Copy link
Contributor Author

That's great @ElOceanografo, @AndreyKolev, it would be amazing if you manage to merge arima and/or garch! There are a few points for discussion, for example if we are going to deal with missing values (I imagine yes), and therefore if we are going to work with data arrays or time arrays or float64 arrays, but these are not so urgent. As long as we have some Julia code for arima or garch to work with, we can easily change these later.

@milktrader
Copy link
Contributor

Possibly ping @carljv about model hierarchy.

Heads up on TimeSeries. It no longer supports DataFrames/DataArrays but instead has implemented the TimeArray type.

immutable TimeArray{T,N} <: AbstractTimeArray

    timestamp::Vector{Date{ISOCalendar}}
    values::Array{T,N}
    colnames::Vector{ASCIIString}


    function TimeArray(timestamp::Vector{Date{ISOCalendar}}, values::Array{T,N}, colnames::Vector{ASCIIString})
        nrow, ncol = size(values, 1), size(values, 2)
        nrow != size(timestamp, 1) ? error("values must match length of timestamp"):
        ncol != size(colnames,1) ? error("column names must match width of array"):
        timestamp != unique(timestamp) ? error("there are duplicate dates"):
        ~(flipud(timestamp) == sort(timestamp) || timestamp == sort(timestamp)) ? error("dates are mangled"):
        flipud(timestamp) == sort(timestamp) ? 
        new(flipud(timestamp), flipud(values), colnames):
        new(timestamp, values, colnames)
    end
end

The inner constructor looks like a lot of code but simply enforces obvious invariants and orders the data from youngest to oldest. Overall, the design's goals are simplicity speed. It would be interesting to take it through the paces with time models and see where the weaknesses are.

@papamarkou
Copy link
Contributor Author

Thanks @milktrader. Two questions, as I look at the TimeArray definition only now for the first time:

  • Does it deal with missing values?
  • What info does the timestamp field hold? If you can provide a very simple example of creating a timearray, this will help us use it in TimeModels.

Cheers!

@milktrader
Copy link
Contributor

It does not yet deal with missing values. That missing piece is being explored in the DataArrays branch. The creation of a TimeArray is straightforward.

julia> mydates = [today():days(1):date(2014,3,1)]
20-element Array{Date{ISOCalendar},1}:
 2014-02-10
 2014-02-11
 2014-02-12
 2014-02-13
 2014-02-14
 2014-02-15
 
 2014-02-24
 2014-02-25
 2014-02-26
 2014-02-27
 2014-02-28
 2014-03-01

julia> myarray = [rand(20) rand(20)];

julia> mycolnames = ["r1", "r2"];

julia> mytimearray = TimeArray(mydates, myarray, mycolnames)
20x2 TimeArray{Float64,2} 2014-02-10 to 2014-03-01

             r1    r2
2014-02-10 | 0.58  0.13
2014-02-11 | 0.81  0.64
2014-02-12 | 0.18  0.07
2014-02-13 | 0.54  0.07
...
2014-02-25 | 0.42  0.99
2014-02-26 | 0.49  0.47
2014-02-27 | 0.77  0.19
2014-02-28 | 0.97  0.55
2014-03-01 | 0.45  0.72

The show method is still brittle and doesn't support long column names.

@papamarkou
Copy link
Contributor Author

That's really helpful. One more question; is there any constructor that allows leaving the timestamp empty or to some default value if the user of a time series model is not so much interested in the timestamp field, but mostly or only wants to look at the values field from a modelling point of view?

@milktrader
Copy link
Contributor

If you don't mind cloning the MarketData package you get some const objects to play with.

julia> Pkg.clone("git://github.com/JuliaQuant/MarketData.jl.git")

julia> using MarketData

julia> OHLC
16103x6 TimeArray{Float64,2} 1950-01-03 to 2013-12-31

             Open     High     Low      Close    Volume          Adj Cl
1950-01-03 | 16.66  16.66  16.66  16.66  1260000.00  16.66
1950-01-04 | 16.85  16.85  16.85  16.85  1890000.00  16.85
1950-01-05 | 16.93  16.93  16.93  16.93  2550000.00  16.93
1950-01-06 | 16.98  16.98  16.98  16.98  2010000.00  16.98
...
2013-12-24 | 1828.02  1833.32  1828.02  1833.32  1307630000.00  1833.32
2013-12-26 | 1834.96  1842.84  1834.96  1842.02  1982270000.00  1842.02
2013-12-27 | 1842.97  1844.89  1839.81  1841.40  2052920000.00  1841.40
2013-12-30 | 1841.47  1842.47  1838.77  1841.07  0.00  1841.07
2013-12-31 | 1842.61  1849.44  1842.41  1848.36  2312840000.00  1848.36

Yeah, the show method needs some work

@milktrader
Copy link
Contributor

Dropping down to just an array is simple

julia> OHLC.values
16103x6 Array{Float64,2}:
   16.66    16.66    16.66    16.66  1.26e6       16.66
   16.85    16.85    16.85    16.85  1.89e6       16.85
   16.93    16.93    16.93    16.93  2.55e6       16.93
   16.98    16.98    16.98    16.98  2.01e6       16.98
   17.08    17.08    17.08    17.08  2.52e6       17.08
   17.03    17.03    17.03    17.03  2.16e6       17.03
                                                  
 1822.92  1829.75  1822.92  1827.99  2.85154e9  1827.99
 1828.02  1833.32  1828.02  1833.32  1.30763e9  1833.32
 1834.96  1842.84  1834.96  1842.02  1.98227e9  1842.02
 1842.97  1844.89  1839.81  1841.4   2.05292e9  1841.4
 1841.47  1842.47  1838.77  1841.07  0.0        1841.07
 1842.61  1849.44  1842.41  1848.36  2.31284e9  1848.36

@papamarkou
Copy link
Contributor Author

That's good. What I had in mind is this situation; once the time series models rely on time arrays (as their input or output), the user may just have a float64 array, i.e. the values field of the time array. Then he would have to create a time array where the timestamp would be nothing or populated by sth as default, so as to then pass this time array to the model. Anyway, this thinking can be omitted for later, no reason to worry about all this from now, let's get first the models coded and we can discuss about such mattters later :)

@milktrader
Copy link
Contributor

@AndreyKolev I've looked into the best way to merge two repositories like these and it's actually not a simple matter. Given that the original TimeModels doesn't have a deep history and the history for GARCH is relatively short, I think the best approach is as follows (but let's discuss this before someone pulls the trigger)

  1. go to the directory where both GARCH and TimeModels repositories exist (probably your .julia directory)
  2. make sure both repositories are up-to-date with master (git pull)
  3. $ cd TimeModels
  4. $ git remote add GARCH ../GARCH
  5. $ git remote update
  6. $ git checkout -b garch
  7. $ git merge GARCH/master

Now you have a branch called garch in your TimeModels repository. Once this is done, whoever does this should do a PR and we can just merge it. The current TimeModels has very little code in it and nothing should get squashed that would require pesky MERGING housekeeping.

@milktrader
Copy link
Contributor

@Scidom I've been creating dummy timestamp data doing this.

[date(1,1,1):years(1):date(variable_that_has_length,1,1)]

So if you want 100 observations with dummy timestamps, you'd do:

[date(1,1,1):years(1):date(100,1,1)]

@AndreyKolev
Copy link
Contributor

@milktrader OK I can try it

@milktrader
Copy link
Contributor

@AndreyKolev I just noticed we already have a branch named garch. It's probably best to delete it first. You should do it if you're going to do the merge. You can do it from the github.com interface or from command line git branch -D garch should do it.

@papamarkou
Copy link
Contributor Author

Just a quick question so that I Know where we are standing with ARIMA. @ElOceanografo is ARIMA coding for now in your realm of activity? This is perfectly fine, just want to know so that we hold off from any coding in case you are/will be working on it.

@ElOceanografo
Copy link

In theory yes...though I'm pretty busy with other work right now. If
you're ready to dive in, go ahead; I'm not sure how much time I'll have for
it in the next ~week.

The idea for the state space ARIMA-fitting procedure (in case it's not
clear from my code) is:

-specify order of ARIMA model: p, d, q
-make "build" function that turns a vector of length p + q (i.e., the
concatenated AR and MA coefficients) into a StateSpaceModel
-make an "objective" function that Kalman-filters the data with that SSM
and returns the -log likelihood
-optimize the objective function over the values of the AR and MA
coefficients

I think we should probably have classic methods (Levinson-Durbin, OLS)
available too. I don't have any immediate plans to work on those, so
they're up for grabs if anyone wants to call them...

Sam

On Wed, Feb 12, 2014 at 12:59 PM, Theodore Papamarkou <
[email protected]> wrote:

Just a quick question so that I Know where we are standing with ARIMA.
@ElOceanografo https://github.com/ElOceanografo is ARIMA coding for now
in your realm of activity? This is perfectly fine, just want to know so
that we hold off from any coding in case you are/will be working on it.

Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-34896363
.

@papamarkou
Copy link
Contributor Author

A week is more than fine - I don't think I will have time within the next 2-3 weeks. If let's say after a month none of us has made any progress, then I will try to make time to lay my hands on it.

@ElOceanografo
Copy link

Alright, sounds good.

On Wed, Feb 12, 2014 at 2:26 PM, Theodore Papamarkou <
[email protected]> wrote:

A week is more than fine - I don't think I will have time within the next
2-3 weeks. If let's say after a month none of us has made any progress,
then I will try to make time to lay my hands on it.

Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-34906159
.

@mishrasubhajit
Copy link

Hello, why different languages returns different coefficients of an AR data? The data being the same. I have checked with R, Python and MATLAB, and coefficients are different in each languages. What is reason behind this? Thanks.

@milktrader
Copy link
Contributor

What data are you using? And are the differences substantial?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants