Skip to content

Commit

Permalink
#420: stri_sprintf done
Browse files Browse the repository at this point in the history
  • Loading branch information
gagolews committed May 24, 2021
1 parent e3e3b58 commit 87b5b39
Show file tree
Hide file tree
Showing 23 changed files with 372 additions and 196 deletions.
8 changes: 4 additions & 4 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

## 1.6.2.9xxx (to-be >=1.6.3) (2021-xx-yy)

* TODO ... [NEW FEATURE] #420: `stri_sprintf` (alias: `stri_string_format`)
* [NEW FEATURE] #420: `stri_sprintf` (alias: `stri_string_format`)
is a Unicode-aware replacement for and enhancement of the base `sprintf`:
it adds a customised handling of `NA`s (on demand),
computing field size based on code point width,
Expand All @@ -12,12 +12,12 @@
Moreover, `stri_printf` can be used to display formatted strings
conveniently.

* [BACKWARD INCOMPATIBILITY] `%s$%` and `%stri$%` now use `stri_sprintf`
instead of `base::sprintf`.

* TODO ... [NEW FEATURE] #434: `stri_datetime_format` and `stri_datetime_parse`
is now also vectorised with respect to the `format` argument.

* [BACKWARD INCOMPATIBILITY] `%s$%` and `%stri$%` now use `stri_sprintf`
instead of `base::sprintf`.

* [INTERNAL] `stri_prepare_arg*`s have been refactored, buffer overruns
in the exception handling subsystem are now avoided.

Expand Down
6 changes: 4 additions & 2 deletions R/pad.R
Original file line number Diff line number Diff line change
Expand Up @@ -61,15 +61,14 @@
#' @param width integer vector giving minimal output string lengths
#' @param side [\code{stri_pad} only] single character string;
#' sides on which padding character is added
#' (\code{left}, \code{right}, or \code{both})
#' (\code{left} (default), \code{right}, or \code{both})
#' @param pad character vector giving padding code points
#' @param use_length single logical value; should the number of code
#' points be used instead of the total code point width
#' (see \code{\link{stri_width}})?
#'
#' @return These functions return a character vector.
#'
#' @rdname stri_pad
#' @examples
#' stri_pad_left('stringi', 10, pad='#')
#' stri_pad_both('stringi', 8:12, pad='*')
Expand All @@ -79,6 +78,9 @@
#' cat(stri_pad_both(c('\ud6c8\ubbfc\uc815\uc74c', # takes width into account
#' stri_trans_nfkd('\ud6c8\ubbfc\uc815\uc74c'), 'abcd'),
#' width=10), sep='\n')
#'
#' @family length
#' @rdname stri_pad
#' @export
stri_pad_both <- function(str, width = floor(0.9 * getOption("width")), pad = " ",
use_length = FALSE)
Expand Down
90 changes: 59 additions & 31 deletions R/sprintf.R
Original file line number Diff line number Diff line change
Expand Up @@ -35,45 +35,56 @@
#' Format Strings
#'
#' @description
#' A Unicode-aware replacement for and enhancement of
#' \code{stri_sprintf} (synonym: \code{stri_string_format})
#' is a Unicode-aware replacement for and enhancement of
#' the built-in \code{\link[base]{sprintf}} function.
#' Moreover, \code{stri_printf} prints formatted strings.
#'
#' @details
#' Vectorized over \code{format} and all vectors passed via \code{...}.
#'
#' \code{stri_string_format} is a synonym for \code{stri_sprintf}.
#'
#' Note that \code{stri_printf} treats missing values in \code{...}
#' as \code{"NA"} strings by default.
#'
#' Note that Unicode code points may have various widths when
#' printed on the console and that, by default, the function takes that
#' into account. By changing the state of the \code{use_length}
#' argument, this function act as if each code point was of width 1.
#'
#' For \code{\%d} and \code{\%f} formats, factors are treated as integer
#' vectors (underlying codes) and so are date and time objects, etc.
#' Unicode code points may have various widths when
#' printed on the console (compare \code{\link{stri_width}}).
#' These functions, by default (see the \code{use_length} argument), take this
#' into account.
#'
#' This function is not locale sensitive. For instance, numbers are
#' always formatted in the "POSIX" style, e.g., \code{-123456.789}.
#' always formatted in the "POSIX" style, e.g., \code{-123456.789}
#' (no thousands separator, dot as a fractional separator).
#' Such a feature might be added at a later date, though.
#'
#' All arguments passed via \code{...} are evaluated. If some of them
#' are unused, a warning is generated. Too few arguments result in an error.
#'
#' Note that \code{stri_printf} treats missing values in \code{...}
#' as strings \code{"NA"} by default.
#'
#' All format specifiers supported \code{\link[base]{sprintf}} are
#' also available here. For the formatting of integers and floating-point
#' values, currently the system \code{std::snprintf()} is called, but
#' this may change in the future. Format specifiers are normalized
#' and necessary sanity checks are performed.
#'
#' Supported conversion specifiers: \code{dioxX} (integers)
#' \code{feEgGaA} (floats) and \code{s} (character strings).
#' Supported flags: \code{-} (left-align),
#' \code{+} (force output sign or blank when \code{NaN} or \code{NA}; numeric only),
#' \code{<space>} (output minus or space for a sign; numeric only)
#' \code{0} (pad with 0s; numeric only),
#' \code{#} (alternative output of some numerics).
#'
#'
#' @param format character vector of format strings
#' @param format character vector of format strings \code{\link[base]{sprintf}}
#' @param ... vectors (coercible to integer, real, or character)
#' @param na_string single string to represent missing values;
#' if \code{NA}, missing values in \code{...}
#' result in the corresponding outputs be missing too;
#' use \code{"NA"} for compatibility with base R
#' @param inf_string single string to represent the (unsigned) infinity
#' @param na_string single string to represent the not-a-number
#' @param inf_string single string to represent the (unsigned) infinity (\code{NA} allowed)
#' @param nan_string single string to represent the not-a-number (\code{NA} allowed)
#' @param use_length single logical value; should the number of code
#' points be used when applying modifiers such as \code{\%20s}
#' instead of the total code point width (see \code{\link{stri_width}})?
#' instead of the total code point width?
#' @param file see \code{\link[base]{cat}}
#' @param sep see \code{\link[base]{cat}}
#' @param append see \code{\link[base]{cat}}
Expand All @@ -86,11 +97,23 @@
#' The other functions return a character vector.
#'
#'
#' @references
#' \code{printf} in \code{glibc},
#' \url{https://man.archlinux.org/man/printf.3}
#'
#' \code{printf} format strings -- Wikipedia,
#' \url{https://en.wikipedia.org/wiki/Printf_format_string}
#'
#' @examples
#' #...
#' stri_printf("%4s=%.3f", c("e", "e\u00b2", "\u03c0", "\u03c0\u00b2"),
#' c(exp(1), exp(2), pi, pi^2))
#'
#' x <- c("xxabcd", "xx\u0105\u0106\u0107\u0108",
#' "\u200b\u200b\u200b\u200b\U0001F3F4\U000E0067\U000E0062\U000E0073\U000E0063\U000E0074\U000E007Fabcd")
#' stri_printf("[%10s]", x) # minimum width = 10
#' stri_printf("[%-10.3s]", x) # output of max width = 3, but pad to width of 10
#' stri_printf("[%10s]", x, use_length=TRUE) # minimum number Unicode of code points = 10
#'
#' # vectorization wrt all arguments:
#' p <- runif(10)
#' stri_sprintf(ifelse(p > 0.5, "P(Y=1)=%1$.2f", "P(Y=0)=%2$.2f"), p, 1-p)
Expand All @@ -103,7 +126,17 @@
#' stri_printf("%+10.3f", c(-Inf, -0, 0, Inf, NaN, NA_real_),
#' na_string="<NA>", nan_string="\U0001F4A9", inf_string="\u221E")
#'
#' stri_sprintf("UNIX time %1$f is %1$s.", Sys.time())
#'
#' # the following do not work in sprintf()
#' stri_sprintf("%1$#- *2$.*3$f", 1.23456, 10, 3) # two asterisks
#' stri_sprintf(c("%s", "%f"), pi) # re-coercion needed
#' stri_sprintf("%1$s is %1$f UNIX time.", Sys.time()) # re-coercion needed
#' stri_sprintf(c("%d", "%s"), factor(11:12)) # re-coercion needed
#' stri_sprintf(c("%s", "%d"), factor(11:12)) # re-coercion needed
#'
#' @rdname stri_sprintf
#' @family length
#' @export
stri_sprintf <- function(
format, ...,
Expand All @@ -123,6 +156,7 @@ stri_sprintf <- function(
stri_string_format <- stri_sprintf


#' @rdname stri_sprintf
#' @export
stri_printf <- function(
format, ...,
Expand All @@ -140,14 +174,12 @@ stri_printf <- function(
cat(str, file=file, sep=sep, append=append)
}

### TODO: update


#' @title
#' C-Style Formatting with sprintf as a Binary Operator TODO: call stri_sprintf
#' C-Style Formatting with \code{\link{stri_sprintf}} as a Binary Operator
#'
#' @description
#' Provides access to base R's \code{\link[base]{sprintf}} in form of a binary
#' Provides access to \code{\link{stri_sprintf}} in form of a binary
#' operator in a way similar to Python's \code{\%} overloaded for strings.
#'
#'
Expand All @@ -158,12 +190,9 @@ stri_printf <- function(
#' \code{e1 \%s$\% atomic_vector} is equivalent to
#' \code{e1 \%s$\% list(atomic_vector)}.
#'
#' Note that \code{\link[base]{sprintf}} takes field width in bytes,
#' not Unicode code points. See Examples for a workaround.
#'
#'
#' @param e1 format strings, see \code{\link[base]{sprintf}} for syntax
#' @param e2 a list of atomic vectors to be passed to \code{\link[base]{sprintf}}
#' @param e1 format strings, see \code{\link{stri_sprintf}} for syntax
#' @param e2 a list of atomic vectors to be passed to \code{\link{stri_sprintf}}
#' or a single atomic vector
#'
#' @return
Expand All @@ -178,13 +207,12 @@ stri_printf <- function(
#' "%s='%d'" %s$% list(c("a", "b", "c"), 1)
#' "%s='%d'" %s$% list(c("a", "b", "c"), 1:3)
#'
#' # sprintf field width:
#' x <- c("abcd", "\u00DF\u00B5\U0001F970", "abcdef")
#' cat(sprintf("%s%6s%s", "-", x, "-"), sep="\n")
#' cat(sprintf("%s%s%s", "-", stringi::stri_pad(x, 6), "-"), sep="\n")
#' cat("[%6s]" %s$% x, sep="\n") # width used, not the number of bytes
#'
#' @rdname operator_dollar
#' @aliases operator_dollar oper_dollar
#' @family length
#'
#' @usage
#' e1 \%s$\% e2
Expand Down
8 changes: 4 additions & 4 deletions devel/sphinx/news.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

## 1.6.2.9xxx (to-be >=1.6.3) (2021-xx-yy)

* TODO ... [NEW FEATURE] #420: `stri_sprintf` (alias: `stri_string_format`)
* [NEW FEATURE] #420: `stri_sprintf` (alias: `stri_string_format`)
is a Unicode-aware replacement for and enhancement of the base `sprintf`:
it adds a customised handling of `NA`s (on demand),
computing field size based on code point width,
Expand All @@ -12,12 +12,12 @@
Moreover, `stri_printf` can be used to display formatted strings
conveniently.

* [BACKWARD INCOMPATIBILITY] `%s$%` and `%stri$%` now use `stri_sprintf`
instead of `base::sprintf`.

* TODO ... [NEW FEATURE] #434: `stri_datetime_format` and `stri_datetime_parse`
is now also vectorised with respect to the `format` argument.

* [BACKWARD INCOMPATIBILITY] `%s$%` and `%stri$%` now use `stri_sprintf`
instead of `base::sprintf`.

* [INTERNAL] `stri_prepare_arg*`s have been refactored, buffer overruns
in the exception handling subsystem are now avoided.

Expand Down
29 changes: 12 additions & 17 deletions devel/sphinx/rapi/operator_dollar.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# operator\_dollar: C-Style Formatting with sprintf as a Binary Operator TODO: call stri\_sprintf
# operator\_dollar: C-Style Formatting with [`stri_sprintf`](https://stringi.gagolewski.com/rapi/stri_sprintf.html) as a Binary Operator

## Description

Provides access to base R\'s [`sprintf`](https://stat.ethz.ch/R-manual/R-patched/library/base/html/sprintf.html) in form of a binary operator in a way similar to Python\'s `%` overloaded for strings.
Provides access to [`stri_sprintf`](https://stringi.gagolewski.com/rapi/stri_sprintf.html) in form of a binary operator in a way similar to Python\'s `%` overloaded for strings.

## Usage

Expand All @@ -14,19 +14,17 @@ e1 %stri$% e2

## Arguments

| | |
|------|--------------------------------------------------------------------------------------------------------------------------------------------------------|
| `e1` | format strings, see [`sprintf`](https://stat.ethz.ch/R-manual/R-patched/library/base/html/sprintf.html) for syntax |
| `e2` | a list of atomic vectors to be passed to [`sprintf`](https://stat.ethz.ch/R-manual/R-patched/library/base/html/sprintf.html) or a single atomic vector |
| | |
|------|--------------------------------------------------------------------------------------------------------------------------------------------|
| `e1` | format strings, see [`stri_sprintf`](https://stringi.gagolewski.com/rapi/stri_sprintf.html) for syntax |
| `e2` | a list of atomic vectors to be passed to [`stri_sprintf`](https://stringi.gagolewski.com/rapi/stri_sprintf.html) or a single atomic vector |

## Details

Vectorized over `e1` and `e2`.

`e1 %s$% atomic_vector` is equivalent to `e1 %s$% list(atomic_vector)`.

Note that [`sprintf`](https://stat.ethz.ch/R-manual/R-patched/library/base/html/sprintf.html) takes field width in bytes, not Unicode code points. See Examples for a workaround.

## Value

Returns a character vector
Expand All @@ -39,6 +37,8 @@ Returns a character vector

The official online manual of <span class="pkg">stringi</span> at <https://stringi.gagolewski.com/>

Other length: [`stri_isempty`](https://stringi.gagolewski.com/rapi/stri_isempty.html)(), [`stri_length`](https://stringi.gagolewski.com/rapi/stri_length.html)(), [`stri_numbytes`](https://stringi.gagolewski.com/rapi/stri_numbytes.html)(), [`stri_pad_both`](https://stringi.gagolewski.com/rapi/stri_pad_both.html)(), [`stri_sprintf`](https://stringi.gagolewski.com/rapi/stri_sprintf.html)(), [`stri_width`](https://stringi.gagolewski.com/rapi/stri_width.html)()

## Examples


Expand All @@ -57,14 +57,9 @@ The official online manual of <span class="pkg">stringi</span> at <https://strin
## [1] "a='1'" "b='1'" "c='1'"
"%s='%d'" %s$% list(c("a", "b", "c"), 1:3)
## [1] "a='1'" "b='2'" "c='3'"
# sprintf field width:
x <- c("abcd", "\u00DF\u00B5\U0001F970", "abcdef")
cat(sprintf("%s%6s%s", "-", x, "-"), sep="\n")
## - abcd-
## -ßµ🥰-
## -abcdef-
cat(sprintf("%s%s%s", "-", stringi::stri_pad(x, 6), "-"), sep="\n")
## - abcd-
## - ßµ🥰-
## -abcdef-
cat("[%6s]" %s$% x, sep="\n") # width used, not the number of bytes
## [ abcd]
## [ ßµ🥰]
## [abcdef]
```
4 changes: 2 additions & 2 deletions devel/sphinx/rapi/stri_datetime_add.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,9 @@ Other datetime: [`stri_datetime_create`](https://stringi.gagolewski.com/rapi/str
x <- stri_datetime_now()
stri_datetime_add(x, units='months') <- 2
print(x)
## [1] "2021-07-24 11:19:55 AEST"
## [1] "2021-07-24 12:48:34 AEST"
stri_datetime_add(x, -2, units='months')
## [1] "2021-05-24 11:19:55 AEST"
## [1] "2021-05-24 12:48:34 AEST"
stri_datetime_add(stri_datetime_create(2014, 4, 20), 1, units='years')
## [1] "2015-04-20 12:00:00 AEST"
stri_datetime_add(stri_datetime_create(2014, 4, 20), 1, units='years', locale='@calendar=hebrew')
Expand Down
8 changes: 4 additions & 4 deletions devel/sphinx/rapi/stri_datetime_fields.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,14 +72,14 @@ Other datetime: [`stri_datetime_add`](https://stringi.gagolewski.com/rapi/stri_d
```r
stri_datetime_fields(stri_datetime_now())
## Year Month Day Hour Minute Second Millisecond WeekOfYear WeekOfMonth
## 1 2021 5 24 11 19 55 69 22 5
## 1 2021 5 24 12 48 34 215 22 5
## DayOfYear DayOfWeek Hour12 AmPm Era
## 1 144 2 11 1 2
## 1 144 2 0 2 2
stri_datetime_fields(stri_datetime_now(), locale='@calendar=hebrew')
## Year Month Day Hour Minute Second Millisecond WeekOfYear WeekOfMonth
## 1 5781 10 13 11 19 55 72 37 3
## 1 5781 10 13 12 48 34 219 37 3
## DayOfYear DayOfWeek Hour12 AmPm Era
## 1 248 2 11 1 1
## 1 248 2 0 2 1
stri_datetime_symbols(locale='@calendar=hebrew')$Month[
stri_datetime_fields(stri_datetime_now(), locale='@calendar=hebrew')$Month
]
Expand Down
8 changes: 4 additions & 4 deletions devel/sphinx/rapi/stri_datetime_format.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,11 +181,11 @@ Other datetime: [`stri_datetime_add`](https://stringi.gagolewski.com/rapi/stri_d

```r
stri_datetime_parse(c('2015-02-28', '2015-02-29'), 'yyyy-MM-dd')
## [1] "2015-02-28 11:19:55 AEDT" NA
## [1] "2015-02-28 12:48:34 AEDT" NA
stri_datetime_parse(c('2015-02-28', '2015-02-29'), 'yyyy-MM-dd', lenient=TRUE)
## [1] "2015-02-28 11:19:55 AEDT" "2015-03-01 11:19:55 AEDT"
## [1] "2015-02-28 12:48:34 AEDT" "2015-03-01 12:48:34 AEDT"
stri_datetime_parse('19 lipca 2015', 'date_long', locale='pl_PL')
## [1] "2015-07-19 11:19:55 AEST"
## [1] "2015-07-19 12:48:34 AEST"
stri_datetime_format(stri_datetime_now(), 'datetime_relative_medium')
## [1] "today, 11:19:55 am"
## [1] "today, 12:48:34 pm"
```
2 changes: 1 addition & 1 deletion devel/sphinx/rapi/stri_isempty.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Returns a logical vector of the same length as `str`.

The official online manual of <span class="pkg">stringi</span> at <https://stringi.gagolewski.com/>

Other length: [`stri_length`](https://stringi.gagolewski.com/rapi/stri_length.html)(), [`stri_numbytes`](https://stringi.gagolewski.com/rapi/stri_numbytes.html)(), [`stri_width`](https://stringi.gagolewski.com/rapi/stri_width.html)()
Other length: [`%s$%`](https://stringi.gagolewski.com/rapi/%25s$%25.html)(), [`stri_length`](https://stringi.gagolewski.com/rapi/stri_length.html)(), [`stri_numbytes`](https://stringi.gagolewski.com/rapi/stri_numbytes.html)(), [`stri_pad_both`](https://stringi.gagolewski.com/rapi/stri_pad_both.html)(), [`stri_sprintf`](https://stringi.gagolewski.com/rapi/stri_sprintf.html)(), [`stri_width`](https://stringi.gagolewski.com/rapi/stri_width.html)()

## Examples

Expand Down
2 changes: 1 addition & 1 deletion devel/sphinx/rapi/stri_length.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Returns an integer vector of the same length as `str`.

The official online manual of <span class="pkg">stringi</span> at <https://stringi.gagolewski.com/>

Other length: [`stri_isempty`](https://stringi.gagolewski.com/rapi/stri_isempty.html)(), [`stri_numbytes`](https://stringi.gagolewski.com/rapi/stri_numbytes.html)(), [`stri_width`](https://stringi.gagolewski.com/rapi/stri_width.html)()
Other length: [`%s$%`](https://stringi.gagolewski.com/rapi/%25s$%25.html)(), [`stri_isempty`](https://stringi.gagolewski.com/rapi/stri_isempty.html)(), [`stri_numbytes`](https://stringi.gagolewski.com/rapi/stri_numbytes.html)(), [`stri_pad_both`](https://stringi.gagolewski.com/rapi/stri_pad_both.html)(), [`stri_sprintf`](https://stringi.gagolewski.com/rapi/stri_sprintf.html)(), [`stri_width`](https://stringi.gagolewski.com/rapi/stri_width.html)()

## Examples

Expand Down
2 changes: 1 addition & 1 deletion devel/sphinx/rapi/stri_numbytes.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Returns an integer vector of the same length as `str`.

The official online manual of <span class="pkg">stringi</span> at <https://stringi.gagolewski.com/>

Other length: [`stri_isempty`](https://stringi.gagolewski.com/rapi/stri_isempty.html)(), [`stri_length`](https://stringi.gagolewski.com/rapi/stri_length.html)(), [`stri_width`](https://stringi.gagolewski.com/rapi/stri_width.html)()
Other length: [`%s$%`](https://stringi.gagolewski.com/rapi/%25s$%25.html)(), [`stri_isempty`](https://stringi.gagolewski.com/rapi/stri_isempty.html)(), [`stri_length`](https://stringi.gagolewski.com/rapi/stri_length.html)(), [`stri_pad_both`](https://stringi.gagolewski.com/rapi/stri_pad_both.html)(), [`stri_sprintf`](https://stringi.gagolewski.com/rapi/stri_sprintf.html)(), [`stri_width`](https://stringi.gagolewski.com/rapi/stri_width.html)()

## Examples

Expand Down
Loading

0 comments on commit 87b5b39

Please sign in to comment.