= installed.packages() |>
packages as.data.frame() |>
subset(select=c("Package", "Priority")) |>
unique() |>
split(formula("~Priority"))
sapply(packages, nrow)
base recommended
14 15
February 18, 2023
R has quite a rich standard library1, not just to process text, read and work with files, do parallel computing, but also a whole load of statistical functions, including simple neural networks, additive models, survival analysis, and three whole packages for plotting and graphics (graphics
for base graphics, grid
, and lattice
). You can read about all functions in the R’s standard library in the R reference manual.
The disadvantage is that while R has a very large standard library, it doesn’t use namespaces. Don’t get me wrong, R has namespaces. Every package has a namespace. And R comes with a bunch of preinstalled packages. But while some specialized statistical functions are in aptly named packages (survival
for survival analysis), often a group of unrelated functions are grouped into a single namespace. This presents a significant barrier when it comes to the discoverability of these functions. For instance, consider Python. If you want to work with paths, you know that the functions will be in the os.path module or in the pathlib module, if you want to go for an object-oriented way of handling paths. Additionally, Python’s documentation groups functions according to their usage. Compare this to the R’s reference manual linked above. So if you want to, for instance, know all the functions for text parsing, well good luck. You will have to do a lot of contextualized searches through R’s help system or find a book that does this for you. But while there are books about some popular user-created packages, for instance, the R markdown book, or books that serve as an in-depth exploration of R, such as Advanced R or R packages, there is no in-depth exploration of functions available in base R. This often means that a lot of people will keep reinventing wheels or coming up with complicated solutions requiring one or even multiple packages when there is a more performant one-liner in base R available.
The purpose of this series is to explore functions in base R and later perhaps to create a book that will serve as additional documentation of functionalities available in base R.
R comes with a number of preinstalled packages, they are labelled as base and recommended:
packages = installed.packages() |>
as.data.frame() |>
subset(select=c("Package", "Priority")) |>
unique() |>
split(formula("~Priority"))
sapply(packages, nrow)
base recommended
14 15
The 14 base packages are usually the main workhorse of R, while the 15 recommended packages are predominantly of statistical nature. Do not expect the latest, fastest and most feature-full implementation of them, but it also means that if you need something like k-nearest-neighbours, you will find it in the R standard library, which is something that is not typically true for most programming languages.
Because of this, the packages have quite a variable number of objects. Here we don’t count only functions, but also included example datasets or pre-set variables. For example T
is an alias for TRUE
, but you can easily change it such as by setting T=FALSE
. So please don’t use T
in your scripts, interactive usage only.
Before we show the number of objects in each package, we define some helper functions, that way the code will be more readable.
# get object names from a package
getObjects = function(name, all=TRUE){
getNamespace(name) |> ls(all.names=all)
}
# count the number of objects in a package
countObjects = function(name, all=TRUE){
getObjects(name, all=all) |> length()
}
# apply the countObjects on the `packages` objects defined above
n_objects = sapply(packages, function(x){
x[,1] |> sapply(countObjects) |> sort(decreasing=TRUE)
})
n_objects
$base
base stats grid methods tools utils tcltk compiler
1370 1134 872 761 733 529 313 289
grDevices graphics parallel splines stats4 datasets
254 170 145 50 30 3
$recommended
Matrix nlme mgcv survival lattice MASS cluster
996 607 495 421 289 232 91
codetools boot spatial rpart foreign nnet class
89 84 55 51 42 41 29
KernSmooth
27
For the packages labelled as base
, the base
package leads with 1370 objects. The majority of common operations are implemented in this package, from file-system operations, and text-parsing functions, but also mathematical functions like min
, mean
or set operations. The second largest package is stats
, which implements a large number of mathematical and statistical functions. The bread and butter statistical functions like t.test
, anova
, lm
or a general-purpose optimization algorithm optim
or functions to work with time-series data are all included in this package. The third largest package is the grid
package with 872 objects. grid
is one of the alternatives to base graphics, which are implemented in the graphics
package. Surprisingly, the graphics
package has only 170 objects. Outside of base
, stats
, tools
and utils
, the packages starts to be more specialized and also smaller. The packages utils
and especially tools
are already supposed to be specialized for making packages, but since making packages requires a lot of tooling, and this tooling often has quite a lot of utility, you might occasionally use functions from tools
. utils
on the other hand has kind of everything, from functions to make packages, download packages, spellchecker, but also functions like help
, head
, read.table
. Classical if you don’t know where to put it, put it into utils.
If we look at the packages labelled as recommended
, the largest package is the Matrix
package, which is the third overall package by size out of preinstalled packages. The Matrix
package implements several kinds of sparse matrices and operations on them. This is very important in linear algebra and statistics, as the solution of many statistical models often relies on eigenvalues.
You might have noticed that I have shown objects using ls()
. This way, all objects, whether they are exported or not, are counted. After all, I don’t remember reading about 145 functions in the parallel
documentation, so something is off.
If we look only at exported objects, the situation would be like this:
n_exported_objects = sapply(packages, function(x){
x[,1] |> sapply(\(y){
getNamespaceExports(y) |> length()
}) |> sort(decreasing=TRUE)
})
n_exported_objects
$base
base stats methods tcltk utils grid grDevices tools
1370 456 371 269 221 212 119 119
graphics parallel stats4 splines compiler datasets
88 33 28 13 9 0
$recommended
Matrix mgcv lattice nlme MASS survival boot
367 182 144 109 78 77 36
spatial cluster codetools foreign class rpart KernSmooth
25 24 19 17 15 14 7
nnet
7
We can see that the number of objects changes drastically, although the base
package remained unaltered. Interestingly, the package datasets
doesn’t have a single exported object. This is because datasets
consist only of datasets, which are loaded lazily. That is until you use the dataset, it does not occupy the computer memory.
Now, let’s move to the core of the series, the base
package.
We have explored the number of objects across preinstalled packages, but how many functions are in the base package? The core of base R?
# You can see a similar code in ?Filter examples
funs = Filter(is.function, sapply(
ls(baseenv(), all.names=TRUE), get, baseenv()
))
length(funs)
[1] 1325
Quite a lot, 1325 functions. However, a great deal of them are S3 methods. For instance, there are 36 methods just for the S3 generic print
.
These methods are very important to seamlessly work with different S3 classes. After all, no one wants to call print.foo(x)
and print.bar(y)
, when we can just type print(z)
. This reduces typing for sure, but also decreases mental overload. In fact, many R users do not know about S3 system, but the dispatch of various methods still work like magic. But enough about S3. For the purpose of this exercise, these are not very interesting. Maybe in the future, we will explore what kind of S3 classes are defined in base R.
To filter the S3 methods, we can use isS3method
. The only issue is that this function fails for any object starting with a dot. Bummer. Objects starting with a dot are typically considered hidden objects, which should not be used unless you know what you are doing (such as .C
or .Call
, which are important tools when writing packages).
# this would fail:
# isS3method(".C")
visible = grep("^\\.", names(funs), value=TRUE, invert=TRUE)
normal = Filter(Negate(isS3method), visible)
normal |> length()
[1] 865
This means that to explore all the functions in base
package, we need to go through 865 functions.
There is a small caveat we have talked about before. ls()
does not distinguish whether the functions are or are not exported. Luckily for us, we saw that all the functions in the base
package are exported. But just to be sure:
# helper, take a string instead of a function object
is_function = function(name, env=baseenv()){
f = get(name, envir=env)
is.function(f)
}
normal = getNamespaceExports("base") |>
Filter(f=is_function) |>
grep(pattern="^\\.", value=TRUE, invert=TRUE) |>
Filter(f=Negate(isS3method)) |> sort()
normal |> length()
[1] 865
Using slightly different calls, we have arrived at the same number. Great!
In the following part of the series, we will start going through the functions alphabetically. Although when we will find a group of similar or related functions, we will describe them together, such as sub
and gsub
.
We will start with the special symbols or operators, make a small segway about different types of functions in R, like .Primitive
, .Internal
, and we will have to talk a bit more about the generics.
I hope that you have found this small exploration of the preinstalled packages interesting and that you are as excited as me about continuing in this series.
base
packageWe will end with a list of all functions that we will go through in this series:
[1] "-" ":"
[3] "::" ":::"
[5] "!" "!="
[7] "(" "["
[9] "[[" "[[<-"
[11] "[<-" "{"
[13] "@" "@<-"
[15] "*" "/"
[17] "&" "&&"
[19] "%*%" "%/%"
[21] "%%" "%in%"
[23] "%o%" "%x%"
[25] "^" "+"
[27] "<" "<-"
[29] "<<-" "<="
[31] "=" "=="
[33] ">" ">="
[35] "|" "||"
[37] "~" "$"
[39] "$<-" "abbreviate"
[41] "abs" "acos"
[43] "acosh" "activeBindingFunction"
[45] "addNA" "addTaskCallback"
[47] "agrep" "agrepl"
[49] "alist" "all"
[51] "all.equal" "all.names"
[53] "all.vars" "allowInterrupts"
[55] "any" "anyDuplicated"
[57] "anyNA" "aperm"
[59] "append" "apply"
[61] "Arg" "args"
[63] "array" "arrayInd"
[65] "as.array" "as.call"
[67] "as.character" "as.complex"
[69] "as.data.frame" "as.Date"
[71] "as.difftime" "as.double"
[73] "as.environment" "as.expression"
[75] "as.factor" "as.function"
[77] "as.hexmode" "as.integer"
[79] "as.list" "as.logical"
[81] "as.matrix" "as.name"
[83] "as.null" "as.numeric"
[85] "as.numeric_version" "as.octmode"
[87] "as.ordered" "as.package_version"
[89] "as.pairlist" "as.POSIXct"
[91] "as.POSIXlt" "as.qr"
[93] "as.raw" "as.single"
[95] "as.symbol" "as.table"
[97] "as.vector" "asin"
[99] "asinh" "asNamespace"
[101] "asplit" "asS3"
[103] "asS4" "assign"
[105] "atan" "atan2"
[107] "atanh" "attach"
[109] "attachNamespace" "attr"
[111] "attr.all.equal" "attr<-"
[113] "attributes" "attributes<-"
[115] "autoload" "autoloader"
[117] "backsolve" "baseenv"
[119] "basename" "besselI"
[121] "besselJ" "besselK"
[123] "besselY" "beta"
[125] "bindingIsActive" "bindingIsLocked"
[127] "bindtextdomain" "bitwAnd"
[129] "bitwNot" "bitwOr"
[131] "bitwShiftL" "bitwShiftR"
[133] "bitwXor" "body"
[135] "body<-" "bquote"
[137] "break" "browser"
[139] "browserCondition" "browserSetDebug"
[141] "browserText" "builtins"
[143] "by" "bzfile"
[145] "c" "call"
[147] "callCC" "capabilities"
[149] "casefold" "cat"
[151] "cbind" "ceiling"
[153] "char.expand" "character"
[155] "charmatch" "charToRaw"
[157] "chartr" "check_tzones"
[159] "chkDots" "chol"
[161] "chol2inv" "choose"
[163] "class" "class<-"
[165] "clearPushBack" "close"
[167] "closeAllConnections" "col"
[169] "colMeans" "colnames"
[171] "colnames<-" "colSums"
[173] "commandArgs" "comment"
[175] "comment<-" "complex"
[177] "computeRestarts" "conditionCall"
[179] "conditionMessage" "conflictRules"
[181] "conflicts" "Conj"
[183] "contributors" "cos"
[185] "cosh" "cospi"
[187] "crossprod" "Cstack_info"
[189] "cummax" "cummin"
[191] "cumprod" "cumsum"
[193] "curlGetHeaders" "cut"
[195] "data.class" "data.frame"
[197] "data.matrix" "date"
[199] "debug" "debuggingState"
[201] "debugonce" "default.stringsAsFactors"
[203] "delayedAssign" "deparse"
[205] "deparse1" "det"
[207] "detach" "determinant"
[209] "dget" "diag"
[211] "diag<-" "diff"
[213] "difftime" "digamma"
[215] "dim" "dim<-"
[217] "dimnames" "dimnames<-"
[219] "dir" "dir.create"
[221] "dir.exists" "dirname"
[223] "do.call" "dontCheck"
[225] "double" "dput"
[227] "dQuote" "drop"
[229] "droplevels" "dump"
[231] "duplicated" "dyn.load"
[233] "dyn.unload" "dynGet"
[235] "eapply" "eigen"
[237] "emptyenv" "enc2native"
[239] "enc2utf8" "encodeString"
[241] "Encoding" "Encoding<-"
[243] "endsWith" "enquote"
[245] "env.profile" "environment"
[247] "environment<-" "environmentIsLocked"
[249] "environmentName" "errorCondition"
[251] "eval" "eval.parent"
[253] "evalq" "exists"
[255] "exp" "expand.grid"
[257] "expm1" "expression"
[259] "extSoftVersion" "factor"
[261] "factorial" "fifo"
[263] "file" "file.access"
[265] "file.append" "file.choose"
[267] "file.copy" "file.create"
[269] "file.exists" "file.info"
[271] "file.link" "file.mode"
[273] "file.mtime" "file.path"
[275] "file.remove" "file.rename"
[277] "file.show" "file.size"
[279] "file.symlink" "Filter"
[281] "Find" "find.package"
[283] "findInterval" "findPackageEnv"
[285] "findRestart" "floor"
[287] "flush" "for"
[289] "force" "forceAndCall"
[291] "formals" "formals<-"
[293] "format" "format.info"
[295] "format.pval" "formatC"
[297] "formatDL" "forwardsolve"
[299] "function" "gamma"
[301] "gc" "gc.time"
[303] "gcinfo" "gctorture"
[305] "gctorture2" "get"
[307] "get0" "getAllConnections"
[309] "getCallingDLL" "getCallingDLLe"
[311] "getConnection" "getDLLRegisteredRoutines"
[313] "getElement" "geterrmessage"
[315] "getExportedValue" "getHook"
[317] "getLoadedDLLs" "getNamespace"
[319] "getNamespaceExports" "getNamespaceImports"
[321] "getNamespaceInfo" "getNamespaceName"
[323] "getNamespaceUsers" "getNamespaceVersion"
[325] "getNativeSymbolInfo" "getOption"
[327] "getRversion" "getSrcLines"
[329] "getTaskCallbackNames" "gettext"
[331] "gettextf" "getwd"
[333] "gl" "globalCallingHandlers"
[335] "globalenv" "gregexec"
[337] "gregexpr" "grep"
[339] "grepl" "grepRaw"
[341] "grouping" "gsub"
[343] "gzcon" "gzfile"
[345] "I" "iconv"
[347] "iconvlist" "icuGetCollate"
[349] "icuSetCollate" "identical"
[351] "identity" "if"
[353] "ifelse" "Im"
[355] "importIntoEnv" "infoRDS"
[357] "inherits" "integer"
[359] "interaction" "interactive"
[361] "intersect" "intToBits"
[363] "intToUtf8" "inverse.rle"
[365] "invisible" "invokeRestart"
[367] "invokeRestartInteractively" "is.array"
[369] "is.atomic" "is.call"
[371] "is.character" "is.complex"
[373] "is.data.frame" "is.double"
[375] "is.element" "is.environment"
[377] "is.expression" "is.factor"
[379] "is.finite" "is.function"
[381] "is.infinite" "is.integer"
[383] "is.language" "is.list"
[385] "is.loaded" "is.logical"
[387] "is.matrix" "is.na"
[389] "is.na<-" "is.name"
[391] "is.nan" "is.null"
[393] "is.numeric" "is.numeric_version"
[395] "is.object" "is.ordered"
[397] "is.package_version" "is.pairlist"
[399] "is.primitive" "is.qr"
[401] "is.R" "is.raw"
[403] "is.recursive" "is.single"
[405] "is.symbol" "is.table"
[407] "is.unsorted" "is.vector"
[409] "isa" "isatty"
[411] "isBaseNamespace" "isdebugged"
[413] "isFALSE" "isIncomplete"
[415] "isNamespace" "isNamespaceLoaded"
[417] "ISOdate" "ISOdatetime"
[419] "isOpen" "isRestart"
[421] "isS4" "isSeekable"
[423] "isSymmetric" "isTRUE"
[425] "jitter" "julian"
[427] "kappa" "kronecker"
[429] "l10n_info" "La_library"
[431] "La_version" "La.svd"
[433] "labels" "lapply"
[435] "lazyLoad" "lazyLoadDBexec"
[437] "lazyLoadDBfetch" "lbeta"
[439] "lchoose" "length"
[441] "length<-" "lengths"
[443] "levels" "levels<-"
[445] "lfactorial" "lgamma"
[447] "libcurlVersion" "library"
[449] "library.dynam" "library.dynam.unload"
[451] "licence" "license"
[453] "list" "list.dirs"
[455] "list.files" "list2DF"
[457] "list2env" "load"
[459] "loadedNamespaces" "loadingNamespaceInfo"
[461] "loadNamespace" "local"
[463] "lockBinding" "lockEnvironment"
[465] "log" "log10"
[467] "log1p" "log2"
[469] "logb" "logical"
[471] "lower.tri" "ls"
[473] "make.names" "make.unique"
[475] "makeActiveBinding" "Map"
[477] "mapply" "margin.table"
[479] "marginSums" "mat.or.vec"
[481] "match" "match.arg"
[483] "match.call" "match.fun"
[485] "matrix" "max"
[487] "max.col" "mean"
[489] "mem.maxNSize" "mem.maxVSize"
[491] "memCompress" "memDecompress"
[493] "memory.profile" "merge"
[495] "message" "mget"
[497] "min" "missing"
[499] "Mod" "mode"
[501] "mode<-" "months"
[503] "mostattributes<-" "names"
[505] "names<-" "namespaceExport"
[507] "namespaceImport" "namespaceImportClasses"
[509] "namespaceImportFrom" "namespaceImportMethods"
[511] "nargs" "nchar"
[513] "ncol" "NCOL"
[515] "Negate" "new.env"
[517] "next" "NextMethod"
[519] "ngettext" "nlevels"
[521] "noquote" "norm"
[523] "normalizePath" "nrow"
[525] "NROW" "nullfile"
[527] "numeric" "numeric_version"
[529] "numToBits" "numToInts"
[531] "nzchar" "objects"
[533] "oldClass" "oldClass<-"
[535] "OlsonNames" "on.exit"
[537] "open" "options"
[539] "order" "ordered"
[541] "outer" "package_version"
[543] "packageEvent" "packageHasNamespace"
[545] "packageNotFoundError" "packageStartupMessage"
[547] "packBits" "pairlist"
[549] "parent.env" "parent.env<-"
[551] "parent.frame" "parse"
[553] "parseNamespaceFile" "paste"
[555] "paste0" "path.expand"
[557] "path.package" "pcre_config"
[559] "pipe" "plot"
[561] "pmatch" "pmax"
[563] "pmax.int" "pmin"
[565] "pmin.int" "polyroot"
[567] "pos.to.env" "Position"
[569] "pretty" "prettyNum"
[571] "print" "prmatrix"
[573] "proc.time" "prod"
[575] "prop.table" "proportions"
[577] "provideDimnames" "psigamma"
[579] "pushBack" "pushBackLength"
[581] "q" "qr"
[583] "qr.coef" "qr.fitted"
[585] "qr.Q" "qr.qty"
[587] "qr.qy" "qr.R"
[589] "qr.resid" "qr.solve"
[591] "qr.X" "quarters"
[593] "quit" "quote"
[595] "R_system_version" "R.home"
[597] "R.Version" "range"
[599] "rank" "rapply"
[601] "raw" "rawConnection"
[603] "rawConnectionValue" "rawShift"
[605] "rawToBits" "rawToChar"
[607] "rbind" "rcond"
[609] "Re" "read.dcf"
[611] "readBin" "readChar"
[613] "readline" "readLines"
[615] "readRDS" "readRenviron"
[617] "Recall" "Reduce"
[619] "reg.finalizer" "regexec"
[621] "regexpr" "registerS3method"
[623] "registerS3methods" "regmatches"
[625] "regmatches<-" "remove"
[627] "removeTaskCallback" "rep"
[629] "rep_len" "rep.int"
[631] "repeat" "replace"
[633] "replicate" "require"
[635] "requireNamespace" "restartDescription"
[637] "restartFormals" "retracemem"
[639] "return" "returnValue"
[641] "rev" "rle"
[643] "rm" "RNGkind"
[645] "RNGversion" "round"
[647] "row" "row.names"
[649] "row.names<-" "rowMeans"
[651] "rownames" "rownames<-"
[653] "rowsum" "rowSums"
[655] "sample" "sample.int"
[657] "sapply" "save"
[659] "save.image" "saveRDS"
[661] "scale" "scan"
[663] "search" "searchpaths"
[665] "seek" "seq"
[667] "seq_along" "seq_len"
[669] "seq.int" "sequence"
[671] "serialize" "serverSocket"
[673] "set.seed" "setdiff"
[675] "setequal" "setHook"
[677] "setNamespaceInfo" "setSessionTimeLimit"
[679] "setTimeLimit" "setwd"
[681] "showConnections" "shQuote"
[683] "sign" "signalCondition"
[685] "signif" "simpleCondition"
[687] "simpleError" "simpleMessage"
[689] "simpleWarning" "simplify2array"
[691] "sin" "single"
[693] "sinh" "sink"
[695] "sink.number" "sinpi"
[697] "slice.index" "socketAccept"
[699] "socketConnection" "socketSelect"
[701] "socketTimeout" "solve"
[703] "sort" "sort.int"
[705] "sort.list" "source"
[707] "split" "split<-"
[709] "sprintf" "sqrt"
[711] "sQuote" "srcfile"
[713] "srcfilealias" "srcfilecopy"
[715] "srcref" "standardGeneric"
[717] "startsWith" "stderr"
[719] "stdin" "stdout"
[721] "stop" "stopifnot"
[723] "storage.mode" "storage.mode<-"
[725] "str2expression" "str2lang"
[727] "strftime" "strptime"
[729] "strrep" "strsplit"
[731] "strtoi" "strtrim"
[733] "structure" "strwrap"
[735] "sub" "subset"
[737] "substitute" "substr"
[739] "substr<-" "substring"
[741] "substring<-" "sum"
[743] "summary" "suppressMessages"
[745] "suppressPackageStartupMessages" "suppressWarnings"
[747] "suspendInterrupts" "svd"
[749] "sweep" "switch"
[751] "sys.call" "sys.calls"
[753] "Sys.chmod" "Sys.Date"
[755] "sys.frame" "sys.frames"
[757] "sys.function" "Sys.getenv"
[759] "Sys.getlocale" "Sys.getpid"
[761] "Sys.glob" "Sys.info"
[763] "sys.load.image" "Sys.localeconv"
[765] "sys.nframe" "sys.on.exit"
[767] "sys.parent" "sys.parents"
[769] "Sys.readlink" "sys.save.image"
[771] "Sys.setenv" "Sys.setFileTime"
[773] "Sys.setlocale" "Sys.sleep"
[775] "sys.source" "sys.status"
[777] "Sys.time" "Sys.timezone"
[779] "Sys.umask" "Sys.unsetenv"
[781] "Sys.which" "system"
[783] "system.file" "system.time"
[785] "system2" "t"
[787] "table" "tabulate"
[789] "tan" "tanh"
[791] "tanpi" "tapply"
[793] "taskCallbackManager" "tcrossprod"
[795] "tempdir" "tempfile"
[797] "textConnection" "textConnectionValue"
[799] "tolower" "topenv"
[801] "toString" "toupper"
[803] "trace" "traceback"
[805] "tracemem" "tracingState"
[807] "transform" "trigamma"
[809] "trimws" "trunc"
[811] "truncate" "try"
[813] "tryCatch" "tryInvokeRestart"
[815] "typeof" "unclass"
[817] "undebug" "union"
[819] "unique" "units"
[821] "units<-" "unix.time"
[823] "unlink" "unlist"
[825] "unloadNamespace" "unlockBinding"
[827] "unname" "unserialize"
[829] "unsplit" "untrace"
[831] "untracemem" "unz"
[833] "upper.tri" "url"
[835] "UseMethod" "utf8ToInt"
[837] "validEnc" "validUTF8"
[839] "vapply" "vector"
[841] "Vectorize" "warning"
[843] "warningCondition" "warnings"
[845] "weekdays" "which"
[847] "which.max" "which.min"
[849] "while" "with"
[851] "withAutoprint" "withCallingHandlers"
[853] "within" "withRestarts"
[855] "withVisible" "write"
[857] "write.dcf" "writeBin"
[859] "writeChar" "writeLines"
[861] "xor" "xpdrows.data.frame"
[863] "xtfrm" "xzfile"
[865] "zapsmall"
For the definition of a standard library, see: https://en.wikipedia.org/wiki/Standard_library↩︎