I’ve learnt python since the beginning of this year. In this blog, I’ll compare
the data structures in R to Python briefly.
Array
R
Atomic vectors
one-dimensional array
contain only one data type
scalars are one-element vectors, e.g. f <- 3
, g <- "US"
function c()
v <- c ( "k" , "j" , "w" , "d" , "v" )
> v [ 1 ]
[ 1 ] "k"
> v [ c ( 1 , 3 )]
[ 1 ] "k" "w"
> v [ 3 : 4 ]
[ 1 ] "w" , "d"
Matrices
two-dimensional array
each element has the same mode (numeric, character or logical)
function matrix()
m <- matrix ( 1 : 6 , nrow = 2 , ncol = 3 )
> m
[, 1 ] [, 2 ] [, 3 ]
[ 1 ,] 1 3 5
[ 2 ,] 2 4 6
> m [ 2 , ]
[ 1 ] 2 4 6
> m [ 1 , 3 ]
[ 1 ] 5
Arrays
similar to matrices
more than two dimensions
function array()
dim1 <- c ( "A1" , "A2" )
dim2 <- c ( "B1" , "B2" , "B3" )
dim3 <- c ( "C1" , "C2" )
arr <- array ( 1 : 12 , c ( 2 , 3 , 2 ), dimnames = list ( dim1 , dim2 , dim3 ))
> arr
, , C1
B1 B2 B3
A1 1 3 5
A2 2 4 6
, , C2
B1 B2 B3
A1 7 9 11
A2 8 10 12
> arr [ 1 , 2 , 2 ]
[ 1 ] 9
> arr [ 1 , 2 ,]
C1 C2
3 9
> arr [ 1 ,,]
C1 C2
B1 1 7
B2 3 9
B3 5 11
Python
package numpy
functions numpy.array()
, numpy.arange()
In [ 1 ]: import numpy
In [ 2 ]: arr_1d = numpy . array ([ 6 , 5.2 , 2 , 7 ])
In [ 3 ]: print ( arr_1d )
Out [ 3 ]: array ([ 6. , 5.2 , 2. , 7. ])
In [ 4 ]: arr_2d = numpy . array ([[ 1 , 2 , 3 ], [ 4 , 5 , 6 ]])
In [ 5 ]: print ( arr_2d )
Out [ 5 ]:
array ([[ 1 , 2 , 3 ],
[ 4 , 5 , 6 ]])
In [ 6 ]: arr_3d = numpy . array ([[[ 1 , 2 , 3 ], [ 4 , 5 , 6 ]],[[ 7 , 8 , 9 ], [ 10 , 11 , 12 ]]])
In [ 7 ]: print ( arr_3d )
Out [ 7 ]:
array ([[[ 1 , 2 , 3 ],
[ 4 , 5 , 6 ]],
[[ 7 , 8 , 9 ],
[ 10 , 11 , 12 ]]])
List
R
an ordered collection of objects
allow to gather a variety of objects under one name
str <- "My first list"
mtx <- matrix ( 1 : 6 , nrow = 3 )
intVtr <- c ( 5 , 7 , 32 , 19 )
strVtr <- c ( "one" , "two" )
mylist <- list ( title = str , ages = intVtr , mtx , strVtr )
> mylist
$ title
[ 1 ] "My first list"
$ ages
[ 1 ] 5 7 32 19
[[ 3 ]]
[, 1 ] [, 2 ]
[ 1 ,] 1 4
[ 2 ,] 2 5
[ 3 ,] 3 6
[[ 4 ]]
[ 1 ] "one" "two"
> mylist [[ 2 ]]
[ 1 ] 5 7 32 19
> mylist [[ "ages" ]]
[ 1 ] 5 7 32 19
Python
variable-length
can be modified in-place
[]
, list()
methods: append()
, insert()
, pop()
, remove()
, extend()
, sort()
In [ 1 ]: a_list = [ 2 , 7 , None ]
In [ 2 ]: print ( a_list )
Out [ 2 ]: [ 2 , 7 , None ]
In [ 3 ]: b_list = list (( 'foo' , 'bar' ))
In [ 4 ]: b_list [ 1 ] = 'pee'
In [ 5 ]: print ( b_list )
Out [ 5 ]: [ 'foo' , 'pee' ]
Dataframe
R
patientId <- c ( 1 , 2 , 3 )
age <- c ( 34 , 23 , 7 )
diabetes <- c ( "Type1" , "Type2" , "Type1" )
status <- c ( "Poor" , "Excellent" , "Improved" )
patientDF <- data.frame ( patientId , age , diabetes , status )
> patientDF
patientId age diabetes status
1 1 34 Type1 Poor
2 2 23 Type2 Excellent
3 3 7 Type1 Improved
> patientDF [ 1 : 2 ]
patientId age
1 1 34
2 2 23
3 3 7
> patientDF [ 1 , ]
patientId age diabetes status
1 1 34 Type1 Poor
> patientDF [ c ( "patientId" , "age" )]
patientId age
1 1 34
2 2 23
3 3 7
> patientDF $ age
[ 1 ] 34 23 7
Python
contain an ordered collection of columns
have both row and column index
package pandas
pandas.DataFrame()
import pandas as pd
data = { 'state' : [ 'Ohio' , 'Ohio' , 'Nevada' , 'Nevada' ],
'year' : [ 2000 , 2001 , 2002 , 2003 ],
'pop' : [ 1.5 , 1.7 , 3.6 , 2.7 ]}
frame = pd . DataFrame ( data , columns = [ 'year' , 'state' , 'pop' ])
In [ 1 ]: print ( frame )
Out [ 1 ]:
year state pop
0 2000 Ohio 1.5
1 2001 Ohio 1.7
2 2002 Nevada 3.6
3 2003 Nevada 2.7
Besides, there are some data structures which don’t exist in both R and Python:
Factors (R)
nominal / ordinal / continuous
factor()
patientId <- c ( 1 , 2 , 3 )
age <- c ( 34 , 23 , 7 )
diabetes <- c ( "Type1" , "Type2" , "Type1" )
status <- c ( "Poor" , "Excellent" , "Improved" )
diabetes <- factor ( diabetes )
status <- factor ( status , order = T )
patientDF <- data.frame ( patientId , age , diabetes , status )
> str ( patientDF )
'data.frame' : 3 obs. of 4 variables :
$ patientId : num 1 2 3
$ age : num 34 23 7
$ diabetes : Factor w / 2 levels "Type1" , "Type2" : 1 2 1
$ status : Ord.factor w / 3 levels "Excellent" < "Improved" < .. : 3 1 2
> summary ( patientDF )
patientId age diabetes status
Min. : 1.0 Min. : 7.00 Type1 : 2 Excellent : 1
1 st Qu. : 1.5 1 st Qu. : 15.00 Type2 : 1 Improved : 1
Median : 2.0 Median : 23.00 Poor : 1
Mean : 2.0 Mean : 21.33
3 rd Qu. : 2.5 3 rd Qu. : 28.50
Max. : 3.0 Max. : 34.00
Tuple (Python)
fixed-length
immutable
tuple()
In [ 1 ]: tup_int = 4 , 5 , 6
In [ 2 ]: print ( tup_int )
Out [ 2 ]: ( 4 , 5 , 6 )
In [ 3 ]: tup_str = tuple ( 'string' )
In [ 4 ]: print ( tup_str )
Out [ 4 ]: ( 's' , 't' , 'r' , 'i' , 'n' , 'g' )
In [ 5 ]: tup_str [ 0 ]
Out [ 5 ]: 's'
Dict (Python)
hash map, associative array
key-value pairs
{}
, ,
methods: del
, pop()
, update()
empty_dict = {}
d1 = { 'a' : 'some value' , 'b' : [ 1 , 2 ]}
In [ 1 ]: print ( d1 )
Out [ 1 ]: { 'a' : 'some value' , 'b' : [ 1 , 2 ]}
d1 [ 7 ] = 'an integer'
In [ 2 ]: print ( di )
Out [ 2 ]: { 'a' : 'some value' , 'b' : [ 1 , 2 ], 7 : 'an integer' }
In [ 3 ]: print ( d1 [ 'b' ])
Out [ 3 ]: [ 1 , 2 ]
Set (Python)
unordered collection
unique element
set()
, {}
set operations: union, intersection, difference, symmetric difference
In [ 1 ]: print ( set ([ 2 , 2 , 1 , 3 ]))
Out [ 1 ]: { 1 , 2 , 3 }
In [ 2 ]: print ({ 2 , 2 , 1 , 3 })
Out [ 2 ]: { 1 , 2 , 3 }
List, Set and Dict comprehensions (Python)
form a new list by filtering the elements of a collection
transform the elements passing the filter in one concise expression
list comprehension
[expr for val in collection if condition]
dict comprehension
dict_comp = {key-expr : value-expr for value in collection if condition}
set comprehension
set_comp = {expr for value in collection if condition}