Skip to content

rossellhayes/fauxnaif

Repository files navigation

fauxnaif

License: MIT R build status Dependencies

faux-naïf (/ˌfoʊ.naɪˈif/): a person who pretends to be simple or innocent

fauxnaif: an R package for simplifying data by pretending values are NA

Overview

fauxnaif provides an extension to dplyr::na_if(). Unlike dplyr’s na_if(), na_if_in() allows you to specify multiple values to be replaced with NA using a single function. fauxnaif also includes a complementary function na_if_not() to specify values to keep.

Installation

You can install fauxnaif from CRAN:

install.packages("fauxanif")

Or the development version from GitHub:

# install.packages("remotes")
remotes::install_github("rossellhayes/fauxnaif")

Usage

library(dplyr)
library(fauxnaif)

The basics

Let’s say we want to remove an unwanted negative value from a vector of numbers

-1:10
#>  [1] -1  0  1  2  3  4  5  6  7  8  9 10

We can replace -1…

… explicitly:

na_if_in(-1:10, -1)
#>  [1] NA  0  1  2  3  4  5  6  7  8  9 10

… by specifying values to keep:

na_if_not(-1:10, 0:10)
#>  [1] NA  0  1  2  3  4  5  6  7  8  9 10

… using a formula:

na_if_in(-1:10, ~ . < 0)
#>  [1] NA  0  1  2  3  4  5  6  7  8  9 10

A little more complex

messy_string <- c("abc", "", "def", "NA", "ghi", 42, "jkl", "NULL", "mno")

We can replace unwanted values…

… one at a time:

na_if_in(messy_string, "")
#> [1] "abc"  NA     "def"  "NA"   "ghi"  "42"   "jkl"  "NULL" "mno"

… or all at once:

na_if_in(messy_string, "", "NA", "NULL", 1:100)
#> [1] "abc" NA    "def" NA    "ghi" NA    "jkl" NA    "mno"
na_if_in(messy_string, c("", "NA", "NULL", 1:100))
#> [1] "abc" NA    "def" NA    "ghi" NA    "jkl" NA    "mno"
na_if_in(messy_string, list("", "NA", "NULL", 1:100))
#> [1] "abc" NA    "def" NA    "ghi" NA    "jkl" NA    "mno"

… or using a clever formula:

grepl("[a-z]{3,}", messy_string)
#> [1]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
na_if_not(messy_string, ~ grepl("[a-z]{3,}", .))
#> [1] "abc" NA    "def" NA    "ghi" NA    "jkl" NA    "mno"

With data frames

faux_census
#> # A tibble: 5 × 4
#>   state    age  income gender                      
#>   <chr>  <dbl>   <dbl> <chr>                       
#> 1 TX        57 9999999 Gender is a social construct
#> 2 Canada    49  149000 Male                        
#> 3 NY       557   90750 f                           
#> 4 LA         2   61000 Male                        
#> 5 TN        64 9999999 M

na_if_in() is particularly useful inside dplyr::mutate():

faux_census %>%
 mutate(
   income = na_if_in(income, 9999999),
   age    = na_if_in(age, ~ . < 18, ~ . > 120),
   state  = na_if_not(state, ~ grepl("^[A-Z]{2,}$", .)),
   gender = na_if_in(gender, ~ nchar(.) > 20)
 )
#> # A tibble: 5 × 4
#>   state   age income gender
#>   <chr> <dbl>  <dbl> <chr> 
#> 1 TX       57     NA <NA>  
#> 2 <NA>     49 149000 Male  
#> 3 NY       NA  90750 f     
#> 4 LA       NA  61000 Male  
#> 5 TN       64     NA M

Or you can use dplyr::across() on data frames:

faux_census %>%
  mutate(
    across(age, na_if_in, ~ . < 18, ~ . > 120),
    across(state, na_if_not, ~ grepl("^[A-Z]{2,}$", .)),
    across(where(is.character), na_if_in, ~ nchar(.) > 20),
    across(everything(), na_if_in, 9999999)
  )
#> # A tibble: 5 × 4
#>   state   age income gender
#>   <chr> <dbl>  <dbl> <chr> 
#> 1 TX       57     NA <NA>  
#> 2 <NA>     49 149000 Male  
#> 3 NY       NA  90750 f     
#> 4 LA       NA  61000 Male  
#> 5 TN       64     NA M

Hex sticker fonts are Bodoni* by indestructible type* and Source Code Pro by Adobe.

Image adapted from icon made by Freepik from flaticon.com.

Please note that fauxnaif is released with a Contributor Code of Conduct.

About

Convert Values to NA

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages