From 695d4ebb87209d880ddbb25c418252e85264d603 Mon Sep 17 00:00:00 2001 From: Dmitry Selivanov Date: Wed, 3 Jul 2024 14:02:19 +0800 Subject: [PATCH] cran 0.5.2 --- DESCRIPTION | 14 +++++++------- R/SoftALS.R | 2 +- R/model_ScaleNormalize.R | 2 +- README.md | 23 +++++++---------------- man/ScaleNormalize.Rd | 2 +- man/soft_impute.Rd | 2 +- 6 files changed, 18 insertions(+), 27 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index e296d87..58abdf2 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,15 +1,15 @@ Package: rsparse Type: Package Title: Statistical Learning on Sparse Matrices -Version: 0.5.1 +Version: 0.5.2 Authors@R: c( - person("Dmitriy", "Selivanov", role=c("aut", "cre", "cph"), email="ds@rexy.ai", + person("Dmitriy", "Selivanov", role=c("aut", "cre", "cph"), email="selivanov.dmitriy@gmail.com", comment = c(ORCID = "0000-0001-5413-1506")), person("David", "Cortes", role="ctb"), person("Drew", "Schmidt", role="ctb", comment="configure script for BLAS, LAPACK detection"), person("Wei-Chen", "Chen", role="ctb", comment="configure script and work on linking to float package") ) -Maintainer: Dmitriy Selivanov +Maintainer: Dmitriy Selivanov Description: Implements many algorithms for statistical learning on sparse matrices - matrix factorizations, matrix completion, elastic net regressions, factorization machines. @@ -27,7 +27,7 @@ Description: Implements many algorithms for statistical learning on (2005, ) 3) Fast Truncated Singular Value Decomposition (SVD), Soft-Thresholded SVD, Soft-Impute matrix completion via ALS - paper by Hastie, Mazumder - et al. (2014, ) + et al. (2014, ) 4) Linear-Flow matrix factorization, from 'Practical linear models for large-scale one-class collaborative filtering' by Sedhain, Bui, Kawale et al (2016, ISBN:978-1-57735-770-4) @@ -55,7 +55,7 @@ Suggests: testthat, covr StagedInstall: TRUE -URL: https://github.com/rexyai/rsparse -BugReports: https://github.com/rexyai/rsparse/issues -RoxygenNote: 7.2.1 +URL: https://github.com/dselivanov/rsparse +BugReports: https://github.com/dselivanov/rsparse/issues +RoxygenNote: 7.3.1 NeedsCompilation: yes diff --git a/R/SoftALS.R b/R/SoftALS.R index 5c2356a..9eb9f74 100644 --- a/R/SoftALS.R +++ b/R/SoftALS.R @@ -2,7 +2,7 @@ #' @description Fit SoftImpute/SoftSVD via fast alternating least squares. Based on the #' paper by Trevor Hastie, Rahul Mazumder, Jason D. Lee, Reza Zadeh #' by "Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares" - -#' \url{https://arxiv.org/pdf/1410.2596.pdf} +#' \url{http://arxiv.org/pdf/1410.2596} #' @param x sparse matrix. Both CSR \code{dgRMatrix} and CSC \code{dgCMatrix} are supported. #' CSR matrix is preffered because in this case algorithm will benefit from multithreaded #' CSR * dense matrix products (if OpenMP is supported on your platform). diff --git a/R/model_ScaleNormalize.R b/R/model_ScaleNormalize.R index 60ba782..b858798 100644 --- a/R/model_ScaleNormalize.R +++ b/R/model_ScaleNormalize.R @@ -2,7 +2,7 @@ #' @description scales input user-item interaction matrix as per eq (16) from the paper. #' Usage of such rescaled matrix with [PureSVD] model will be equal to running PureSVD #' on the scaled cosine-based inter-item similarity matrix. -#' @references See \href{https://arxiv.org/pdf/1511.06033.pdf}{EigenRec: Generalizing PureSVD for +#' @references See \href{http://arxiv.org/pdf/1511.06033}{EigenRec: Generalizing PureSVD for #' Effective and Efficient Top-N Recommendations} for details. #' @export ScaleNormalize = R6::R6Class( diff --git a/README.md b/README.md index f2aea63..bc5cac3 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,9 @@ # rsparse -[![R build status](https://github.com/rexyai/rsparse/workflows/R-CMD-check/badge.svg)](https://github.com/rexyai/rsparse/actions) -[![codecov](https://codecov.io/gh/rexyai/rsparse/branch/master/graph/badge.svg)](https://codecov.io/gh/rexyai/rsparse/branch/master) +[![R build status](https://github.com/rexyai/rsparse/workflows/R-CMD-check/badge.svg)](https://github.com/dselivanov/rsparse/actions) +[![codecov](https://codecov.io/gh/rexyai/rsparse/branch/master/graph/badge.svg)](https://app.codecov.io/gh/rexyai/rsparse/branch/master) [![License](https://eddelbuettel.github.io/badges/GPL2+.svg)](http://www.gnu.org/licenses/gpl-2.0.html) [![Project Status](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html#maturing) - `rsparse` is an R package for statistical learning primarily on **sparse matrices** - **matrix factorizations, factorization machines, out-of-core regression**. Many of the implemented algorithms are particularly useful for **recommender systems** and **NLP**. @@ -15,10 +14,10 @@ We've paid some attention to the implementation details - we try to avoid data c ### Classification/Regression -1. [Follow the proximally-regularized leader](http://proceedings.mlr.press/v15/mcmahan11b/mcmahan11b.pdf) which allows to solve **very large linear/logistic regression** problems with elastic-net penalty. Solver uses stochastic gradient descent with adaptive learning rates (so can be used for online learning - not necessary to load all data to RAM). See [Ad Click Prediction: a View from the Trenches](https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf) for more examples. +1. [Follow the proximally-regularized leader](http://proceedings.mlr.press/v15/mcmahan11b/mcmahan11b.pdf) which allows to solve **very large linear/logistic regression** problems with elastic-net penalty. Solver uses stochastic gradient descent with adaptive learning rates (so can be used for online learning - not necessary to load all data to RAM). See [Ad Click Prediction: a View from the Trenches](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf) for more examples. - Only logistic regerssion implemented at the moment - Native format for matrices is CSR - `Matrix::RsparseMatrix`. However common R `Matrix::CsparseMatrix` (`dgCMatrix`) will be converted automatically. -1. [Factorization Machines](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf) supervised learning algorithm which learns second order polynomial interactions in a factorized way. We provide highly optimized SIMD accelerated implementation. +1. [Factorization Machines](https://cseweb.ucsd.edu/classes/fa17/cse291-b/reading/Rendle2010FM.pdf) supervised learning algorithm which learns second order polynomial interactions in a factorized way. We provide highly optimized SIMD accelerated implementation. ### Matrix Factorizations @@ -32,14 +31,14 @@ See details in [Applications of the Conjugate Gradient Method for Implicit Feedb * 1. **Linear-Flow** from [Practical Linear Models for Large-Scale One-Class Collaborative Filtering](http://www.bkveton.com/docs/ijcai2016.pdf). Algorithm looks for factorized low-rank item-item similarity matrix (in some sense it is similar to [SLIM](http://glaros.dtc.umn.edu/gkhome/node/774)) * -1. Fast **Truncated SVD** and **Truncated Soft-SVD** via Alternating Least Squares as described in [Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares](https://arxiv.org/pdf/1410.2596.pdf). Works for both sparse and dense matrices. Works on [float](https://github.com/wrathematics/float) matrices as well! For certain problems may be even faster than [irlba](https://github.com/bwlewis/irlba) package. +1. Fast **Truncated SVD** and **Truncated Soft-SVD** via Alternating Least Squares as described in [Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares](http://arxiv.org/pdf/1410.2596). Works for both sparse and dense matrices. Works on [float](https://github.com/wrathematics/float) matrices as well! For certain problems may be even faster than [irlba](https://github.com/bwlewis/irlba) package. * -1. **Soft-Impute** via fast Alternating Least Squares as described in [Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares](https://arxiv.org/pdf/1410.2596.pdf). +1. **Soft-Impute** via fast Alternating Least Squares as described in [Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares](https://arxiv.org/pdf/1410.2596). * * with a solution in SVD form 1. **GloVe** as described in [GloVe: Global Vectors for Word Representation](https://nlp.stanford.edu/pubs/glove.pdf). * This is usually used to train word embeddings, but actually also very useful for recommender systems. -1. Matrix scaling as descibed in [EigenRec: Generalizing PureSVD for Effective and Efficient Top-N Recommendations](https://arxiv.org/pdf/1511.06033.pdf) +1. Matrix scaling as descibed in [EigenRec: Generalizing PureSVD for Effective and Efficient Top-N Recommendations](http://arxiv.org/pdf/1511.06033) ********************* @@ -84,14 +83,6 @@ By default, R for Windows comes with unoptimized BLAS and LAPACK libraries, and **Note that syntax is these posts/slides is not up to date since package was under active development** 1. [Slides from DataFest Tbilisi(2017-11-16)](https://www.slideshare.net/DmitriySelivanov/matrix-factorizations-for-recommender-systems) -1. [Introduction to matrix factorization with Weighted-ALS algorithm](http://dsnotes.com/post/2017-05-28-matrix-factorization-for-recommender-systems/) - collaborative filtering for implicit feedback datasets. -1. [Music recommendations using LastFM-360K dataset](http://dsnotes.com/post/2017-06-28-matrix-factorization-for-recommender-systems-part-2/) - * evaluation metrics for ranking - * setting up proper cross-validation - * possible issues with nested parallelism and thread contention - * making recommendations for new users - * complimentary item-to-item recommendations -1. [Benchmark](http://dsnotes.com/post/2017-07-10-bench-wrmf/) against other good implementations Here is example of `rsparse::WRMF` on [lastfm360k](https://www.upf.edu/web/mtg/lastfm360k) dataset in comparison with other good implementations: diff --git a/man/ScaleNormalize.Rd b/man/ScaleNormalize.Rd index fb5ff49..66ea92a 100644 --- a/man/ScaleNormalize.Rd +++ b/man/ScaleNormalize.Rd @@ -9,7 +9,7 @@ Usage of such rescaled matrix with [PureSVD] model will be equal to running Pure on the scaled cosine-based inter-item similarity matrix. } \references{ -See \href{https://arxiv.org/pdf/1511.06033.pdf}{EigenRec: Generalizing PureSVD for +See \href{http://arxiv.org/pdf/1511.06033}{EigenRec: Generalizing PureSVD for Effective and Efficient Top-N Recommendations} for details. } \section{Public fields}{ diff --git a/man/soft_impute.Rd b/man/soft_impute.Rd index f5447b7..233aaba 100644 --- a/man/soft_impute.Rd +++ b/man/soft_impute.Rd @@ -57,7 +57,7 @@ components represent left, right singular vectors and singular values. Fit SoftImpute/SoftSVD via fast alternating least squares. Based on the paper by Trevor Hastie, Rahul Mazumder, Jason D. Lee, Reza Zadeh by "Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares" - -\url{https://arxiv.org/pdf/1410.2596.pdf} +\url{http://arxiv.org/pdf/1410.2596} } \examples{ set.seed(42)