A Quick Run-Down
BB
Although there is fairly extensive documentation provided by R Core itself, I can find it a bit much to process at one time. Accordingly, I wanted to go over (some) of options for basic connections with base-R so I’d better understand what is available. This isn’t all of them by any means both due to platform specific issues (I’m on Windows) and because also covering all the options for binary connections is somewhat redundant. I also mostly stay with examples provided by the R documentation.
Standard File Connections
There are a number of different modes for connections. Write, read, (some combination of both), or append are what you will mostly use. Again, note the separate options for binary.
- “r” or “rt” – Open for reading in text mode.
- “w” or “wt” – Open for writing in text mode.
- “a” or “at” -Open for appending in text mode.
- “rb” – Open for reading in binary mode.
- “wb”- Open for writing in binary mode.
- “ab” – Open for appending in binary mode.
- “r+”, “r+b” – Open for reading and writing.
- “w+”, “w+b” – Open for reading and writing, truncating file initially.
- “a+”, “a+b” – Open for reading and appending.
Writing and reading text examples from R documentation with a few additional notes sprinkled in.
# Writing Text zz <- file("ex.data", "w") # open an output file connection cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") close(zz) zz <- file("ex.data", "r") # read-only mode cat("One more line\n", file = zz) # can't append
## Error in cat("One more line\n", file = zz): cannot write to this connection
zz <- file("ex.data", "a") cat("One more line\n", file = zz) close(zz) rm(zz) readLines("ex.data")
## [1] "TITLE extra line" "2 3 5 7" "" ## [4] "11 13 17" "One more line"
unlink("ex.data") # deletes file, nothing left
It’s also possible to open a file for multiple options, as noted above…
## An example of a file open for reading and writing Tfile <- file("test1", "w+") c(isOpen(Tfile, "r"), isOpen(Tfile, "w")) # both TRUE
## [1] TRUE TRUE
cat("abc\ndef\n", file = Tfile) readLines(Tfile)
## [1] "abc" "def"
seek(Tfile, 0, rw = "r") # reset to beginning
## [1] 10
readLines(Tfile)
## [1] "abc" "def"
cat("ghi\n", file = Tfile) readLines(Tfile)
## [1] "ghi"
Tfile
## A connection with ## description "test1" ## class "file" ## mode "w+" ## text "text" ## opened "opened" ## can read "yes" ## can write "yes"
close(Tfile) Tfile
## A connection, specifically, 'file', but invalid.
unlink("test1") rm(Tfile) ## We can do the same thing with an anonymous file. Tfile <- file() cat("abc\ndef\n", file = Tfile) readLines(Tfile)
## [1] "abc" "def"
close(Tfile)
Text Connections
Another option is text connections…
“Text connections are another source of input. They allow R character vectors to be read as if the lines were being read from a text file. A text connection is created and opened by a call to
textConnection, which copies the current contents of the character vector to an internal buffer at the time of creation. Text connections can also be used to capture R output to a character vector…”
textConnection
zz <- textConnection(LETTERS) readLines(zz, 2)
## [1] "A" "B"
scan(zz, "", 4) # keeps the place...
## [1] "C" "D" "E" "F"
pushBack(c("aa", "bb"), zz) # literally throws it back.. scan(zz, "", 4)
## [1] "aa" "bb" "G" "H"
close(zz) rm(zz) zz <- textConnection("foo", "w") writeLines(c("testit1", "testit2"), zz) cat("testit3 ", file = zz) isIncomplete(zz)
## [1] TRUE
cat("testit4\n", file = zz) isIncomplete(zz)
## [1] FALSE
close(zz) foo
## [1] "testit1" "testit2" "testit3 testit4"
rm(foo,zz) ## test fixed-length strings zz <- file("testchar", "wb") x <- c("a", "this will be truncated", "abc") nc <- c(3, 10, 3) writeChar(x, zz, nc, eos = NULL)
## Warning in writeChar(x, zz, nc, eos = NULL): writeChar: more characters ## requested than are in the string - will zero-pad
writeChar(x, zz, eos = "\r\n") close(zz) zz <- file("testchar", "rb") readChar(zz, nc)
## [1] "a" "this will " "abc"
readChar(zz, nchar(x)+3) # need to read the terminator explicitly
## [1] "a\r\n" "this will be truncated\r\n" ## [3] "abc\r\n"
close(zz) unlink("testchar")
Compression
As the documentation notes, the .gz and .bz2 extensions are for compressed files. The space savings are trivial with small files, but obviously increase for larger files.
zz <- gzfile("ex.gz", "w") # compressed file cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") close(zz) file.size("ex.gz")
## [1] 55
readLines(zz <- gzfile("ex.gz"))
## [1] "TITLE extra line" "2 3 5 7" "" ## [4] "11 13 17"
close(zz) unlink("ex.gz") zz # an invalid connection
## A connection, specifically, 'gzfile', but invalid.
rm(zz) zz <- bzfile("ex.bz2", "w") # bzip2-ed file cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") close(zz) file.size("ex.bz2")
## [1] 77
zz # print() method: invalid connection
## A connection, specifically, 'bzfile', but invalid.
print(readLines(zz <- bzfile("ex.bz2")))
## [1] "TITLE extra line" "2 3 5 7" "" ## [4] "11 13 17"
close(zz) unlink("ex.bz2") rm(zz)
The difference is actually quite sizable with only a 30mb file, but obviously serious compression takes both time and computing horsepower.
x <- outer(1:2000,1:2000) object.size(x)
## 32000200 bytes
zz <- file("bigfile.data", "w") cat(x,file=zz) close(zz) file.size("bigfile.data")
## [1] 29072548
unlink("bigfile.data") rm(zz) zz <- gzfile("bigfile.gz", "w") cat(x,file=zz) close(zz) file.size("bigfile.gz")
## [1] 13187416
unlink("bigfile.gz") rm(zz) zz <- bzfile("bigfile.bz2", "w") cat(x,file=zz) close(zz) file.size("bigfile.bz2")
## [1] 11821782
unlink("bigfile.bz2") rm(zz)
URLs
It is also possible to uncompress data directly from a url, assuming it is in correct format, as the example provided by the R documentation illustrates.
## Uncompress a data file from a URL z <- gzcon(url("http://www.stats.ox.ac.uk/pub/datasets/csb/ch12.dat.gz")) # read.table can only read from a text-mode connection. raw <- textConnection(readLines(z)) close(z) dat <- read.table(raw) close(raw) dat[1:4, ]
## V1 V2 V3 V4 V5 V6 V7 ## 1 37 86 1 0 0 2 0 ## 2 61 77 1 0 0 4 0 ## 3 1084 75 1 0 0 3 1 ## 4 1092 77 1 0 1 2 1
Sockets
One interesting, perhaps trivial, example is using socket connections to communicate between two R processes (two separate R sessions) on the same computer. Note that this prompt a firewall warning or fail if you have certain ports locked down. Also, you will want to avoid using a port that may interfere with other processes (e.g., 80).
## Two R processes communicating via non-blocking sockets # R process 1 con1 <- socketConnection(port = 6011, server = TRUE) writeLines(LETTERS, con1) close(con1) # R process 2 con2 <- socketConnection(Sys.info()["nodename"], port = 6011) # as non-blocking, may need to loop for input readLines(con2) while(isIncomplete(con2)) { Sys.sleep(1) z <- readLines(con2) if(length(z)) print(z) } close(con2)
RSQLite
There are also a number of other CRAN packages that facilitate other types of connections as well. Here, I’ll cover the RSQLite package (which really is mainly DBI) to connect to a SQLite database. Note that I want to cover web-based data connections separately as there is a fair amount to cover.
RSQLite is a very simple interface with a lot of functionality. As noted above, in order to use this package it is necessary to load the DBI package first as this is what does most of the heavy lifting.
library(DBI) library(RSQLite) datasetsDb()
## <SQLiteConnection> ## Path: C:\Users\Ben\Documents\R\win-library\3.4\RSQLite\db\datasets.sqlite ## Extensions: TRUE
db <- datasetsDb() class(db)
## [1] "SQLiteConnection" ## attr(,"package") ## [1] "RSQLite"
isS4(db)
## [1] TRUE
Krill Mueller, Hadley, and Co. were very considerate and included a bunch of data sets to play around with.
dbListTables(db)
## [1] "BOD" "CO2" "ChickWeight" ## [4] "DNase" "Formaldehyde" "Indometh" ## [7] "InsectSprays" "LifeCycleSavings" "Loblolly" ## [10] "Orange" "OrchardSprays" "PlantGrowth" ## [13] "Puromycin" "Theoph" "ToothGrowth" ## [16] "USArrests" "USJudgeRatings" "airquality" ## [19] "anscombe" "attenu" "attitude" ## [22] "cars" "chickwts" "esoph" ## [25] "faithful" "freeny" "infert" ## [28] "iris" "longley" "morley" ## [31] "mtcars" "npk" "pressure" ## [34] "quakes" "randu" "rock" ## [37] "sleep" "stackloss" "swiss" ## [40] "trees" "warpbreaks" "women"
head(dbReadTable(db,'mtcars'))
## row_names mpg cyl disp hp drat wt qsec vs am gear carb ## 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## 6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
con <- dbConnect(SQLite()) dbWriteTable(con,'mtcars',mtcars) head(dbReadTable(con,'mtcars'))
## mpg cyl disp hp drat wt qsec vs am gear carb ## 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
In order to keep rownames, it is necessary to specify it as an argument. Also, take notice that either ‘append’ or ‘overwrite’ must be changed to ‘TRUE’.
dbWriteTable(con,'mtcars',mtcars,row.names = TRUE, overwrite = TRUE) head(dbReadTable(con,'mtcars'))
## row_names mpg cyl disp hp drat wt qsec vs am gear carb ## 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## 6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
You can both print the selected data on your console, or more usefully, save it to an R object. It will come out as a typical R data frame (but with stringsAsFactors = FALSE).
bad_mileage <- dbGetQuery(con,"SELECT * FROM mtcars WHERE mpg < 15;") class(bad_mileage)
## [1] "data.frame"
rm(bad_mileage)
If you want to make any changes to the table then use dbExecute. Note that SQLite does not support dropping columns. You will need to create a new table if you want to do that or use another database (e.g., MySQL, POSTGRES, etc.). You, can, however, insert new values into it.
dbExecute(con, "INSERT INTO mtcars (row_names, mpg) VALUES ('BIG ASS TRUCK',5);")
## [1] 1
dbGetQuery(con,"SELECT * FROM mtcars WHERE mpg = 5;")
## row_names mpg cyl disp hp drat wt qsec vs am gear carb ## 1 BIG ASS TRUCK 5 NA NA NA NA NA NA NA NA NA NA
# Now delete the row. dbExecute(con, "DELETE FROM mtcars WHERE mpg = 5;")
## [1] 1
dbGetQuery(con,"SELECT * FROM mtcars WHERE mpg = 5;")
## [1] row_names mpg cyl disp hp drat wt ## [8] qsec vs am gear carb ## <0 rows> (or 0-length row.names)
# And finally, to disconnect.. dbDisconnect(con) # You can check the status of the connection with, dbisValid(con)
## Error in dbisValid(con): could not find function "dbisValid"
# Opps, forgot about 'db', better check that.. dbIsValid(db)
## [1] TRUE
dbDisconnect(db) rm(con,db)
Lastly, what if you want to read/write to a file?
# There may be another way to persist the .db file... con = dbConnect(SQLite()) dbWriteTable(con,'mtcars',mtcars,row.names = TRUE, overwrite = TRUE) sqliteCopyDatabase(con,'test.db') dbDisconnect(con) rm(con) con = dbConnect(SQLite(), dbname="./test.db") dbListTables(con)
## [1] "mtcars"
And that is really all there is to it. SQLite is a very simple, but powerful database.