如何标准化包含数字和因子变量的数据框

How to standardize a data frame which contains both numeric and factor variables

我的数据框 my.data 包含数值变量和因子变量。我只想标准化此数据框中的数字变量。

> mydata2=data.frame(scale(my.data, center=T, scale=T))

Error in colMeans(x, na.rm = TRUE) : 'x' must be numericmydata=data.frame(scale(flowdis3[,c(8,9,10,11,12)], center=T, scale=T,))mydata[] <- lapply(mydata, function(x) if(is.numeric(x)){

          scale(x, center=TRUE, scale=TRUE)

           } else x)mydata2%>%mutate_if(is.numeric,scale)# Working environment and Memory management

rm(list = ls(all.names = TRUE))

gc()

memory.limit(size = 64935)



# Set working directory

setwd("path")



# Example data frame

df <- data.frame("Age" = c(21, 19, 25, 34, 45, 63, 39, 28, 50, 39), 

        "Name" = c("Christine","Kim","Kevin","Aishwarya","Rafel","Bettina","Joshua","Afreen","Wang","Kerubo"),

        "Salary in $" = c(2137.52, 1515.79, 2212.81, 2500.28, 2660, 4567.45, 2733, 3314, 5757.11, 4435.99),

        "Gender" = c("Female","Female","Male","Female","Male","Female","Male","Female","Male","Male"),

        "Height in cm" = c(172, 166, 191, 169, 179, 177, 181, 155, 154, 183),

        "Weight in kg" = c(60, 70, 88, 48, 71, 51, 65, 44, 53, 91))str(df)

'data.frame':  10 obs. of 6 variables:

$ Age     : num 21 19 25 34 45 63 39 28 50 39

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num 2138 1516 2213 2500 2660 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num 172 166 191 169 179 177 181 155 154 183

$ Weight.in.kg: num 60 70 88 48 71 51 65 44 53 91start_time1 <- Sys.time()

df1 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 (x-mean(x))/sd(x)

} else x))

end_time1 <- Sys.time()

end_time1 - start_time1



Time difference of 0.02717805 secs

str(df1)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time2 <- Sys.time()

df2 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 scale(x, center=TRUE, scale=TRUE)

} else x))

end_time2 <- Sys.time()

end_time2 - start_time2



Time difference of 0.02599907 secs

str(df2)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time3 <- Sys.time()

indices <- sapply(df, is.numeric)

df3 <- df

df3[indices] <- lapply(df3[indices], scale)

end_time3 <- Sys.time()

end_time2 - start_time3



Time difference of -59.6766 secs

str(df3)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2library(tidyverse)

start_time4 <- Sys.time()

df4 <-df %>% dplyr::mutate_if(is.numeric, scale)

end_time4 <- Sys.time()

end_time4 - start_time4



Time difference of 0.012043 secs

str(df4)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2str(df4$Age)

num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

- attr(*,"scaled:center")= num 36.3

- attr(*,"scaled:scale")= num 13.8library(tidyverse)



start_time4 <- Sys.time()



df4 <-df %>% dplyr::mutate_if(is.numeric, ~scale (.) %>% as.vector)



end_time4 <- Sys.time()



end_time4 - start_time4Time difference of 0.01400399 secs



str(df4)



'data.frame':  10 obs. of 6 variables:



$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...





$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6



$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...



$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2



$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...



$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...

这样可以标准化吗?我想标准化第 8、9、10、11 和 12 列,但我认为我的代码有误。

> mydata2=data.frame(scale(my.data, center=T, scale=T))

Error in colMeans(x, na.rm = TRUE) : 'x' must be numericmydata=data.frame(scale(flowdis3[,c(8,9,10,11,12)], center=T, scale=T,))mydata[] <- lapply(mydata, function(x) if(is.numeric(x)){

          scale(x, center=TRUE, scale=TRUE)

           } else x)mydata2%>%mutate_if(is.numeric,scale)# Working environment and Memory management

rm(list = ls(all.names = TRUE))

gc()

memory.limit(size = 64935)



# Set working directory

setwd("path")



# Example data frame

df <- data.frame("Age" = c(21, 19, 25, 34, 45, 63, 39, 28, 50, 39), 

        "Name" = c("Christine","Kim","Kevin","Aishwarya","Rafel","Bettina","Joshua","Afreen","Wang","Kerubo"),

        "Salary in $" = c(2137.52, 1515.79, 2212.81, 2500.28, 2660, 4567.45, 2733, 3314, 5757.11, 4435.99),

        "Gender" = c("Female","Female","Male","Female","Male","Female","Male","Female","Male","Male"),

        "Height in cm" = c(172, 166, 191, 169, 179, 177, 181, 155, 154, 183),

        "Weight in kg" = c(60, 70, 88, 48, 71, 51, 65, 44, 53, 91))str(df)

'data.frame':  10 obs. of 6 variables:

$ Age     : num 21 19 25 34 45 63 39 28 50 39

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num 2138 1516 2213 2500 2660 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num 172 166 191 169 179 177 181 155 154 183

$ Weight.in.kg: num 60 70 88 48 71 51 65 44 53 91start_time1 <- Sys.time()

df1 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 (x-mean(x))/sd(x)

} else x))

end_time1 <- Sys.time()

end_time1 - start_time1



Time difference of 0.02717805 secs

str(df1)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time2 <- Sys.time()

df2 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 scale(x, center=TRUE, scale=TRUE)

} else x))

end_time2 <- Sys.time()

end_time2 - start_time2



Time difference of 0.02599907 secs

str(df2)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time3 <- Sys.time()

indices <- sapply(df, is.numeric)

df3 <- df

df3[indices] <- lapply(df3[indices], scale)

end_time3 <- Sys.time()

end_time2 - start_time3



Time difference of -59.6766 secs

str(df3)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2library(tidyverse)

start_time4 <- Sys.time()

df4 <-df %>% dplyr::mutate_if(is.numeric, scale)

end_time4 <- Sys.time()

end_time4 - start_time4



Time difference of 0.012043 secs

str(df4)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2str(df4$Age)

num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

- attr(*,"scaled:center")= num 36.3

- attr(*,"scaled:scale")= num 13.8library(tidyverse)



start_time4 <- Sys.time()



df4 <-df %>% dplyr::mutate_if(is.numeric, ~scale (.) %>% as.vector)



end_time4 <- Sys.time()



end_time4 - start_time4Time difference of 0.01400399 secs



str(df4)



'data.frame':  10 obs. of 6 variables:



$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...





$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6



$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...



$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2



$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...



$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...

提前致谢


这是标准化的一种选择

> mydata2=data.frame(scale(my.data, center=T, scale=T))

Error in colMeans(x, na.rm = TRUE) : 'x' must be numericmydata=data.frame(scale(flowdis3[,c(8,9,10,11,12)], center=T, scale=T,))mydata[] <- lapply(mydata, function(x) if(is.numeric(x)){

          scale(x, center=TRUE, scale=TRUE)

           } else x)mydata2%>%mutate_if(is.numeric,scale)# Working environment and Memory management

rm(list = ls(all.names = TRUE))

gc()

memory.limit(size = 64935)



# Set working directory

setwd("path")



# Example data frame

df <- data.frame("Age" = c(21, 19, 25, 34, 45, 63, 39, 28, 50, 39), 

        "Name" = c("Christine","Kim","Kevin","Aishwarya","Rafel","Bettina","Joshua","Afreen","Wang","Kerubo"),

        "Salary in $" = c(2137.52, 1515.79, 2212.81, 2500.28, 2660, 4567.45, 2733, 3314, 5757.11, 4435.99),

        "Gender" = c("Female","Female","Male","Female","Male","Female","Male","Female","Male","Male"),

        "Height in cm" = c(172, 166, 191, 169, 179, 177, 181, 155, 154, 183),

        "Weight in kg" = c(60, 70, 88, 48, 71, 51, 65, 44, 53, 91))str(df)

'data.frame':  10 obs. of 6 variables:

$ Age     : num 21 19 25 34 45 63 39 28 50 39

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num 2138 1516 2213 2500 2660 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num 172 166 191 169 179 177 181 155 154 183

$ Weight.in.kg: num 60 70 88 48 71 51 65 44 53 91start_time1 <- Sys.time()

df1 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 (x-mean(x))/sd(x)

} else x))

end_time1 <- Sys.time()

end_time1 - start_time1



Time difference of 0.02717805 secs

str(df1)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time2 <- Sys.time()

df2 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 scale(x, center=TRUE, scale=TRUE)

} else x))

end_time2 <- Sys.time()

end_time2 - start_time2



Time difference of 0.02599907 secs

str(df2)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time3 <- Sys.time()

indices <- sapply(df, is.numeric)

df3 <- df

df3[indices] <- lapply(df3[indices], scale)

end_time3 <- Sys.time()

end_time2 - start_time3



Time difference of -59.6766 secs

str(df3)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2library(tidyverse)

start_time4 <- Sys.time()

df4 <-df %>% dplyr::mutate_if(is.numeric, scale)

end_time4 <- Sys.time()

end_time4 - start_time4



Time difference of 0.012043 secs

str(df4)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2str(df4$Age)

num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

- attr(*,"scaled:center")= num 36.3

- attr(*,"scaled:scale")= num 13.8library(tidyverse)



start_time4 <- Sys.time()



df4 <-df %>% dplyr::mutate_if(is.numeric, ~scale (.) %>% as.vector)



end_time4 <- Sys.time()



end_time4 - start_time4Time difference of 0.01400399 secs



str(df4)



'data.frame':  10 obs. of 6 variables:



$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...





$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6



$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...



$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2



$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...



$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...

您可以使用 dplyr 包来执行此操作:

> mydata2=data.frame(scale(my.data, center=T, scale=T))

Error in colMeans(x, na.rm = TRUE) : 'x' must be numericmydata=data.frame(scale(flowdis3[,c(8,9,10,11,12)], center=T, scale=T,))mydata[] <- lapply(mydata, function(x) if(is.numeric(x)){

          scale(x, center=TRUE, scale=TRUE)

           } else x)mydata2%>%mutate_if(is.numeric,scale)# Working environment and Memory management

rm(list = ls(all.names = TRUE))

gc()

memory.limit(size = 64935)



# Set working directory

setwd("path")



# Example data frame

df <- data.frame("Age" = c(21, 19, 25, 34, 45, 63, 39, 28, 50, 39), 

        "Name" = c("Christine","Kim","Kevin","Aishwarya","Rafel","Bettina","Joshua","Afreen","Wang","Kerubo"),

        "Salary in $" = c(2137.52, 1515.79, 2212.81, 2500.28, 2660, 4567.45, 2733, 3314, 5757.11, 4435.99),

        "Gender" = c("Female","Female","Male","Female","Male","Female","Male","Female","Male","Male"),

        "Height in cm" = c(172, 166, 191, 169, 179, 177, 181, 155, 154, 183),

        "Weight in kg" = c(60, 70, 88, 48, 71, 51, 65, 44, 53, 91))str(df)

'data.frame':  10 obs. of 6 variables:

$ Age     : num 21 19 25 34 45 63 39 28 50 39

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num 2138 1516 2213 2500 2660 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num 172 166 191 169 179 177 181 155 154 183

$ Weight.in.kg: num 60 70 88 48 71 51 65 44 53 91start_time1 <- Sys.time()

df1 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 (x-mean(x))/sd(x)

} else x))

end_time1 <- Sys.time()

end_time1 - start_time1



Time difference of 0.02717805 secs

str(df1)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time2 <- Sys.time()

df2 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 scale(x, center=TRUE, scale=TRUE)

} else x))

end_time2 <- Sys.time()

end_time2 - start_time2



Time difference of 0.02599907 secs

str(df2)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time3 <- Sys.time()

indices <- sapply(df, is.numeric)

df3 <- df

df3[indices] <- lapply(df3[indices], scale)

end_time3 <- Sys.time()

end_time2 - start_time3



Time difference of -59.6766 secs

str(df3)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2library(tidyverse)

start_time4 <- Sys.time()

df4 <-df %>% dplyr::mutate_if(is.numeric, scale)

end_time4 <- Sys.time()

end_time4 - start_time4



Time difference of 0.012043 secs

str(df4)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2str(df4$Age)

num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

- attr(*,"scaled:center")= num 36.3

- attr(*,"scaled:scale")= num 13.8library(tidyverse)



start_time4 <- Sys.time()



df4 <-df %>% dplyr::mutate_if(is.numeric, ~scale (.) %>% as.vector)



end_time4 <- Sys.time()



end_time4 - start_time4Time difference of 0.01400399 secs



str(df4)



'data.frame':  10 obs. of 6 variables:



$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...





$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6



$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...



$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2



$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...



$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...

以下是一些可供考虑的选项,尽管回答较晚:

> mydata2=data.frame(scale(my.data, center=T, scale=T))

Error in colMeans(x, na.rm = TRUE) : 'x' must be numericmydata=data.frame(scale(flowdis3[,c(8,9,10,11,12)], center=T, scale=T,))mydata[] <- lapply(mydata, function(x) if(is.numeric(x)){

          scale(x, center=TRUE, scale=TRUE)

           } else x)mydata2%>%mutate_if(is.numeric,scale)# Working environment and Memory management

rm(list = ls(all.names = TRUE))

gc()

memory.limit(size = 64935)



# Set working directory

setwd("path")



# Example data frame

df <- data.frame("Age" = c(21, 19, 25, 34, 45, 63, 39, 28, 50, 39), 

        "Name" = c("Christine","Kim","Kevin","Aishwarya","Rafel","Bettina","Joshua","Afreen","Wang","Kerubo"),

        "Salary in $" = c(2137.52, 1515.79, 2212.81, 2500.28, 2660, 4567.45, 2733, 3314, 5757.11, 4435.99),

        "Gender" = c("Female","Female","Male","Female","Male","Female","Male","Female","Male","Male"),

        "Height in cm" = c(172, 166, 191, 169, 179, 177, 181, 155, 154, 183),

        "Weight in kg" = c(60, 70, 88, 48, 71, 51, 65, 44, 53, 91))str(df)

'data.frame':  10 obs. of 6 variables:

$ Age     : num 21 19 25 34 45 63 39 28 50 39

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num 2138 1516 2213 2500 2660 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num 172 166 191 169 179 177 181 155 154 183

$ Weight.in.kg: num 60 70 88 48 71 51 65 44 53 91start_time1 <- Sys.time()

df1 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 (x-mean(x))/sd(x)

} else x))

end_time1 <- Sys.time()

end_time1 - start_time1



Time difference of 0.02717805 secs

str(df1)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time2 <- Sys.time()

df2 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 scale(x, center=TRUE, scale=TRUE)

} else x))

end_time2 <- Sys.time()

end_time2 - start_time2



Time difference of 0.02599907 secs

str(df2)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time3 <- Sys.time()

indices <- sapply(df, is.numeric)

df3 <- df

df3[indices] <- lapply(df3[indices], scale)

end_time3 <- Sys.time()

end_time2 - start_time3



Time difference of -59.6766 secs

str(df3)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2library(tidyverse)

start_time4 <- Sys.time()

df4 <-df %>% dplyr::mutate_if(is.numeric, scale)

end_time4 <- Sys.time()

end_time4 - start_time4



Time difference of 0.012043 secs

str(df4)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2str(df4$Age)

num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

- attr(*,"scaled:center")= num 36.3

- attr(*,"scaled:scale")= num 13.8library(tidyverse)



start_time4 <- Sys.time()



df4 <-df %>% dplyr::mutate_if(is.numeric, ~scale (.) %>% as.vector)



end_time4 <- Sys.time()



end_time4 - start_time4Time difference of 0.01400399 secs



str(df4)



'data.frame':  10 obs. of 6 variables:



$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...





$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6



$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...



$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2



$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...



$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...

让我们检查一下df的结构:

> mydata2=data.frame(scale(my.data, center=T, scale=T))

Error in colMeans(x, na.rm = TRUE) : 'x' must be numericmydata=data.frame(scale(flowdis3[,c(8,9,10,11,12)], center=T, scale=T,))mydata[] <- lapply(mydata, function(x) if(is.numeric(x)){

          scale(x, center=TRUE, scale=TRUE)

           } else x)mydata2%>%mutate_if(is.numeric,scale)# Working environment and Memory management

rm(list = ls(all.names = TRUE))

gc()

memory.limit(size = 64935)



# Set working directory

setwd("path")



# Example data frame

df <- data.frame("Age" = c(21, 19, 25, 34, 45, 63, 39, 28, 50, 39), 

        "Name" = c("Christine","Kim","Kevin","Aishwarya","Rafel","Bettina","Joshua","Afreen","Wang","Kerubo"),

        "Salary in $" = c(2137.52, 1515.79, 2212.81, 2500.28, 2660, 4567.45, 2733, 3314, 5757.11, 4435.99),

        "Gender" = c("Female","Female","Male","Female","Male","Female","Male","Female","Male","Male"),

        "Height in cm" = c(172, 166, 191, 169, 179, 177, 181, 155, 154, 183),

        "Weight in kg" = c(60, 70, 88, 48, 71, 51, 65, 44, 53, 91))str(df)

'data.frame':  10 obs. of 6 variables:

$ Age     : num 21 19 25 34 45 63 39 28 50 39

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num 2138 1516 2213 2500 2660 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num 172 166 191 169 179 177 181 155 154 183

$ Weight.in.kg: num 60 70 88 48 71 51 65 44 53 91start_time1 <- Sys.time()

df1 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 (x-mean(x))/sd(x)

} else x))

end_time1 <- Sys.time()

end_time1 - start_time1



Time difference of 0.02717805 secs

str(df1)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time2 <- Sys.time()

df2 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 scale(x, center=TRUE, scale=TRUE)

} else x))

end_time2 <- Sys.time()

end_time2 - start_time2



Time difference of 0.02599907 secs

str(df2)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time3 <- Sys.time()

indices <- sapply(df, is.numeric)

df3 <- df

df3[indices] <- lapply(df3[indices], scale)

end_time3 <- Sys.time()

end_time2 - start_time3



Time difference of -59.6766 secs

str(df3)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2library(tidyverse)

start_time4 <- Sys.time()

df4 <-df %>% dplyr::mutate_if(is.numeric, scale)

end_time4 <- Sys.time()

end_time4 - start_time4



Time difference of 0.012043 secs

str(df4)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2str(df4$Age)

num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

- attr(*,"scaled:center")= num 36.3

- attr(*,"scaled:scale")= num 13.8library(tidyverse)



start_time4 <- Sys.time()



df4 <-df %>% dplyr::mutate_if(is.numeric, ~scale (.) %>% as.vector)



end_time4 <- Sys.time()



end_time4 - start_time4Time difference of 0.01400399 secs



str(df4)



'data.frame':  10 obs. of 6 variables:



$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...





$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6



$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...



$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2



$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...



$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...

我们看到年龄、薪水、身高和体重是数字,而姓名和性别是分类变量(因子变量)。

让我们仅使用基数 R 来缩放数值变量:

1) 选项:(对 akrun 在此处提出的建议稍作修改)

> mydata2=data.frame(scale(my.data, center=T, scale=T))

Error in colMeans(x, na.rm = TRUE) : 'x' must be numericmydata=data.frame(scale(flowdis3[,c(8,9,10,11,12)], center=T, scale=T,))mydata[] <- lapply(mydata, function(x) if(is.numeric(x)){

          scale(x, center=TRUE, scale=TRUE)

           } else x)mydata2%>%mutate_if(is.numeric,scale)# Working environment and Memory management

rm(list = ls(all.names = TRUE))

gc()

memory.limit(size = 64935)



# Set working directory

setwd("path")



# Example data frame

df <- data.frame("Age" = c(21, 19, 25, 34, 45, 63, 39, 28, 50, 39), 

        "Name" = c("Christine","Kim","Kevin","Aishwarya","Rafel","Bettina","Joshua","Afreen","Wang","Kerubo"),

        "Salary in $" = c(2137.52, 1515.79, 2212.81, 2500.28, 2660, 4567.45, 2733, 3314, 5757.11, 4435.99),

        "Gender" = c("Female","Female","Male","Female","Male","Female","Male","Female","Male","Male"),

        "Height in cm" = c(172, 166, 191, 169, 179, 177, 181, 155, 154, 183),

        "Weight in kg" = c(60, 70, 88, 48, 71, 51, 65, 44, 53, 91))str(df)

'data.frame':  10 obs. of 6 variables:

$ Age     : num 21 19 25 34 45 63 39 28 50 39

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num 2138 1516 2213 2500 2660 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num 172 166 191 169 179 177 181 155 154 183

$ Weight.in.kg: num 60 70 88 48 71 51 65 44 53 91start_time1 <- Sys.time()

df1 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 (x-mean(x))/sd(x)

} else x))

end_time1 <- Sys.time()

end_time1 - start_time1



Time difference of 0.02717805 secs

str(df1)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time2 <- Sys.time()

df2 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 scale(x, center=TRUE, scale=TRUE)

} else x))

end_time2 <- Sys.time()

end_time2 - start_time2



Time difference of 0.02599907 secs

str(df2)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time3 <- Sys.time()

indices <- sapply(df, is.numeric)

df3 <- df

df3[indices] <- lapply(df3[indices], scale)

end_time3 <- Sys.time()

end_time2 - start_time3



Time difference of -59.6766 secs

str(df3)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2library(tidyverse)

start_time4 <- Sys.time()

df4 <-df %>% dplyr::mutate_if(is.numeric, scale)

end_time4 <- Sys.time()

end_time4 - start_time4



Time difference of 0.012043 secs

str(df4)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2str(df4$Age)

num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

- attr(*,"scaled:center")= num 36.3

- attr(*,"scaled:scale")= num 13.8library(tidyverse)



start_time4 <- Sys.time()



df4 <-df %>% dplyr::mutate_if(is.numeric, ~scale (.) %>% as.vector)



end_time4 <- Sys.time()



end_time4 - start_time4Time difference of 0.01400399 secs



str(df4)



'data.frame':  10 obs. of 6 variables:



$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...





$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6



$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...



$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2



$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...



$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...

2) 选项:(akrun\\'s approach)

> mydata2=data.frame(scale(my.data, center=T, scale=T))

Error in colMeans(x, na.rm = TRUE) : 'x' must be numericmydata=data.frame(scale(flowdis3[,c(8,9,10,11,12)], center=T, scale=T,))mydata[] <- lapply(mydata, function(x) if(is.numeric(x)){

          scale(x, center=TRUE, scale=TRUE)

           } else x)mydata2%>%mutate_if(is.numeric,scale)# Working environment and Memory management

rm(list = ls(all.names = TRUE))

gc()

memory.limit(size = 64935)



# Set working directory

setwd("path")



# Example data frame

df <- data.frame("Age" = c(21, 19, 25, 34, 45, 63, 39, 28, 50, 39), 

        "Name" = c("Christine","Kim","Kevin","Aishwarya","Rafel","Bettina","Joshua","Afreen","Wang","Kerubo"),

        "Salary in $" = c(2137.52, 1515.79, 2212.81, 2500.28, 2660, 4567.45, 2733, 3314, 5757.11, 4435.99),

        "Gender" = c("Female","Female","Male","Female","Male","Female","Male","Female","Male","Male"),

        "Height in cm" = c(172, 166, 191, 169, 179, 177, 181, 155, 154, 183),

        "Weight in kg" = c(60, 70, 88, 48, 71, 51, 65, 44, 53, 91))str(df)

'data.frame':  10 obs. of 6 variables:

$ Age     : num 21 19 25 34 45 63 39 28 50 39

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num 2138 1516 2213 2500 2660 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num 172 166 191 169 179 177 181 155 154 183

$ Weight.in.kg: num 60 70 88 48 71 51 65 44 53 91start_time1 <- Sys.time()

df1 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 (x-mean(x))/sd(x)

} else x))

end_time1 <- Sys.time()

end_time1 - start_time1



Time difference of 0.02717805 secs

str(df1)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time2 <- Sys.time()

df2 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 scale(x, center=TRUE, scale=TRUE)

} else x))

end_time2 <- Sys.time()

end_time2 - start_time2



Time difference of 0.02599907 secs

str(df2)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time3 <- Sys.time()

indices <- sapply(df, is.numeric)

df3 <- df

df3[indices] <- lapply(df3[indices], scale)

end_time3 <- Sys.time()

end_time2 - start_time3



Time difference of -59.6766 secs

str(df3)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2library(tidyverse)

start_time4 <- Sys.time()

df4 <-df %>% dplyr::mutate_if(is.numeric, scale)

end_time4 <- Sys.time()

end_time4 - start_time4



Time difference of 0.012043 secs

str(df4)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2str(df4$Age)

num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

- attr(*,"scaled:center")= num 36.3

- attr(*,"scaled:scale")= num 13.8library(tidyverse)



start_time4 <- Sys.time()



df4 <-df %>% dplyr::mutate_if(is.numeric, ~scale (.) %>% as.vector)



end_time4 <- Sys.time()



end_time4 - start_time4Time difference of 0.01400399 secs



str(df4)



'data.frame':  10 obs. of 6 variables:



$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...





$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6



$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...



$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2



$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...



$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...

3) 选项:

> mydata2=data.frame(scale(my.data, center=T, scale=T))

Error in colMeans(x, na.rm = TRUE) : 'x' must be numericmydata=data.frame(scale(flowdis3[,c(8,9,10,11,12)], center=T, scale=T,))mydata[] <- lapply(mydata, function(x) if(is.numeric(x)){

          scale(x, center=TRUE, scale=TRUE)

           } else x)mydata2%>%mutate_if(is.numeric,scale)# Working environment and Memory management

rm(list = ls(all.names = TRUE))

gc()

memory.limit(size = 64935)



# Set working directory

setwd("path")



# Example data frame

df <- data.frame("Age" = c(21, 19, 25, 34, 45, 63, 39, 28, 50, 39), 

        "Name" = c("Christine","Kim","Kevin","Aishwarya","Rafel","Bettina","Joshua","Afreen","Wang","Kerubo"),

        "Salary in $" = c(2137.52, 1515.79, 2212.81, 2500.28, 2660, 4567.45, 2733, 3314, 5757.11, 4435.99),

        "Gender" = c("Female","Female","Male","Female","Male","Female","Male","Female","Male","Male"),

        "Height in cm" = c(172, 166, 191, 169, 179, 177, 181, 155, 154, 183),

        "Weight in kg" = c(60, 70, 88, 48, 71, 51, 65, 44, 53, 91))str(df)

'data.frame':  10 obs. of 6 variables:

$ Age     : num 21 19 25 34 45 63 39 28 50 39

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num 2138 1516 2213 2500 2660 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num 172 166 191 169 179 177 181 155 154 183

$ Weight.in.kg: num 60 70 88 48 71 51 65 44 53 91start_time1 <- Sys.time()

df1 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 (x-mean(x))/sd(x)

} else x))

end_time1 <- Sys.time()

end_time1 - start_time1



Time difference of 0.02717805 secs

str(df1)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time2 <- Sys.time()

df2 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 scale(x, center=TRUE, scale=TRUE)

} else x))

end_time2 <- Sys.time()

end_time2 - start_time2



Time difference of 0.02599907 secs

str(df2)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time3 <- Sys.time()

indices <- sapply(df, is.numeric)

df3 <- df

df3[indices] <- lapply(df3[indices], scale)

end_time3 <- Sys.time()

end_time2 - start_time3



Time difference of -59.6766 secs

str(df3)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2library(tidyverse)

start_time4 <- Sys.time()

df4 <-df %>% dplyr::mutate_if(is.numeric, scale)

end_time4 <- Sys.time()

end_time4 - start_time4



Time difference of 0.012043 secs

str(df4)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2str(df4$Age)

num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

- attr(*,"scaled:center")= num 36.3

- attr(*,"scaled:scale")= num 13.8library(tidyverse)



start_time4 <- Sys.time()



df4 <-df %>% dplyr::mutate_if(is.numeric, ~scale (.) %>% as.vector)



end_time4 <- Sys.time()



end_time4 - start_time4Time difference of 0.01400399 secs



str(df4)



'data.frame':  10 obs. of 6 variables:



$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...





$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6



$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...



$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2



$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...



$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...

4) 选项(使用 tidyverse 并调用 dplyr):

> mydata2=data.frame(scale(my.data, center=T, scale=T))

Error in colMeans(x, na.rm = TRUE) : 'x' must be numericmydata=data.frame(scale(flowdis3[,c(8,9,10,11,12)], center=T, scale=T,))mydata[] <- lapply(mydata, function(x) if(is.numeric(x)){

          scale(x, center=TRUE, scale=TRUE)

           } else x)mydata2%>%mutate_if(is.numeric,scale)# Working environment and Memory management

rm(list = ls(all.names = TRUE))

gc()

memory.limit(size = 64935)



# Set working directory

setwd("path")



# Example data frame

df <- data.frame("Age" = c(21, 19, 25, 34, 45, 63, 39, 28, 50, 39), 

        "Name" = c("Christine","Kim","Kevin","Aishwarya","Rafel","Bettina","Joshua","Afreen","Wang","Kerubo"),

        "Salary in $" = c(2137.52, 1515.79, 2212.81, 2500.28, 2660, 4567.45, 2733, 3314, 5757.11, 4435.99),

        "Gender" = c("Female","Female","Male","Female","Male","Female","Male","Female","Male","Male"),

        "Height in cm" = c(172, 166, 191, 169, 179, 177, 181, 155, 154, 183),

        "Weight in kg" = c(60, 70, 88, 48, 71, 51, 65, 44, 53, 91))str(df)

'data.frame':  10 obs. of 6 variables:

$ Age     : num 21 19 25 34 45 63 39 28 50 39

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num 2138 1516 2213 2500 2660 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num 172 166 191 169 179 177 181 155 154 183

$ Weight.in.kg: num 60 70 88 48 71 51 65 44 53 91start_time1 <- Sys.time()

df1 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 (x-mean(x))/sd(x)

} else x))

end_time1 <- Sys.time()

end_time1 - start_time1



Time difference of 0.02717805 secs

str(df1)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time2 <- Sys.time()

df2 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 scale(x, center=TRUE, scale=TRUE)

} else x))

end_time2 <- Sys.time()

end_time2 - start_time2



Time difference of 0.02599907 secs

str(df2)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time3 <- Sys.time()

indices <- sapply(df, is.numeric)

df3 <- df

df3[indices] <- lapply(df3[indices], scale)

end_time3 <- Sys.time()

end_time2 - start_time3



Time difference of -59.6766 secs

str(df3)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2library(tidyverse)

start_time4 <- Sys.time()

df4 <-df %>% dplyr::mutate_if(is.numeric, scale)

end_time4 <- Sys.time()

end_time4 - start_time4



Time difference of 0.012043 secs

str(df4)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2str(df4$Age)

num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

- attr(*,"scaled:center")= num 36.3

- attr(*,"scaled:scale")= num 13.8library(tidyverse)



start_time4 <- Sys.time()



df4 <-df %>% dplyr::mutate_if(is.numeric, ~scale (.) %>% as.vector)



end_time4 <- Sys.time()



end_time4 - start_time4Time difference of 0.01400399 secs



str(df4)



'data.frame':  10 obs. of 6 variables:



$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...





$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6



$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...



$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2



$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...



$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...

根据你需要什么样的结构作为输出和速度,你可以判断。如果你的数据是不平衡的,你想平衡它,假设你想在对数值变量进行缩放后进行分类,那么数值变量的矩阵数值结构,即 - Age、Salary、Height 和 Weight 就会出现问题。我的意思是,

> mydata2=data.frame(scale(my.data, center=T, scale=T))

Error in colMeans(x, na.rm = TRUE) : 'x' must be numericmydata=data.frame(scale(flowdis3[,c(8,9,10,11,12)], center=T, scale=T,))mydata[] <- lapply(mydata, function(x) if(is.numeric(x)){

          scale(x, center=TRUE, scale=TRUE)

           } else x)mydata2%>%mutate_if(is.numeric,scale)# Working environment and Memory management

rm(list = ls(all.names = TRUE))

gc()

memory.limit(size = 64935)



# Set working directory

setwd("path")



# Example data frame

df <- data.frame("Age" = c(21, 19, 25, 34, 45, 63, 39, 28, 50, 39), 

        "Name" = c("Christine","Kim","Kevin","Aishwarya","Rafel","Bettina","Joshua","Afreen","Wang","Kerubo"),

        "Salary in $" = c(2137.52, 1515.79, 2212.81, 2500.28, 2660, 4567.45, 2733, 3314, 5757.11, 4435.99),

        "Gender" = c("Female","Female","Male","Female","Male","Female","Male","Female","Male","Male"),

        "Height in cm" = c(172, 166, 191, 169, 179, 177, 181, 155, 154, 183),

        "Weight in kg" = c(60, 70, 88, 48, 71, 51, 65, 44, 53, 91))str(df)

'data.frame':  10 obs. of 6 variables:

$ Age     : num 21 19 25 34 45 63 39 28 50 39

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num 2138 1516 2213 2500 2660 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num 172 166 191 169 179 177 181 155 154 183

$ Weight.in.kg: num 60 70 88 48 71 51 65 44 53 91start_time1 <- Sys.time()

df1 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 (x-mean(x))/sd(x)

} else x))

end_time1 <- Sys.time()

end_time1 - start_time1



Time difference of 0.02717805 secs

str(df1)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time2 <- Sys.time()

df2 <- as.data.frame(lapply(df, function(x) if(is.numeric(x)){

 scale(x, center=TRUE, scale=TRUE)

} else x))

end_time2 <- Sys.time()

end_time2 - start_time2



Time difference of 0.02599907 secs

str(df2)

'data.frame':  10 obs. of 6 variables:

$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...start_time3 <- Sys.time()

indices <- sapply(df, is.numeric)

df3 <- df

df3[indices] <- lapply(df3[indices], scale)

end_time3 <- Sys.time()

end_time2 - start_time3



Time difference of -59.6766 secs

str(df3)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2library(tidyverse)

start_time4 <- Sys.time()

df4 <-df %>% dplyr::mutate_if(is.numeric, scale)

end_time4 <- Sys.time()

end_time4 - start_time4



Time difference of 0.012043 secs

str(df4)

'data.frame':  10 obs. of 6 variables:

 $ Age     : num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

..- attr(*,"scaled:center")= num 36.3

..- attr(*,"scaled:scale")= num 13.8

$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6

$ Salary.in.. : num [1:10, 1] -0.787 -1.255 -0.731 -0.514 -0.394 ...

..- attr(*,"scaled:center")= num 3183

..- attr(*,"scaled:scale")= num 1329

$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2

$ Height.in.cm: num [1:10, 1] -0.0585 -0.5596 1.5285 -0.309 0.5262 ...

..- attr(*,"scaled:center")= num 173

..- attr(*,"scaled:scale")= num 12

$ Weight.in.kg: num [1:10, 1] -0.254 0.365 1.478 -0.996 0.427 ...

..- attr(*,"scaled:center")= num 64.1

..- attr(*,"scaled:scale")= num 16.2str(df4$Age)

num [1:10, 1] -1.105 -1.249 -0.816 -0.166 0.628 ...

- attr(*,"scaled:center")= num 36.3

- attr(*,"scaled:scale")= num 13.8library(tidyverse)



start_time4 <- Sys.time()



df4 <-df %>% dplyr::mutate_if(is.numeric, ~scale (.) %>% as.vector)



end_time4 <- Sys.time()



end_time4 - start_time4Time difference of 0.01400399 secs



str(df4)



'data.frame':  10 obs. of 6 variables:



$ Age     : num -1.105 -1.249 -0.816 -0.166 0.628 ...





$ Name    : Factor w/ 10 levels"Afreen","Aishwarya",..: 4 8 7 2 9 3 5 1 10 6



$ Salary.in.. : num -0.787 -1.255 -0.731 -0.514 -0.394 ...



$ Gender   : Factor w/ 2 levels"Female","Male": 1 1 2 1 2 1 2 1 2 2



$ Height.in.cm: num -0.0585 -0.5596 1.5285 -0.309 0.5262 ...



$ Weight.in.kg: num -0.254 0.365 1.478 -0.996 0.427 ...

例如,由于 ROSE 包(平衡数据)不接受除 int、factor 和 num 之外的数据结构,因此会抛出错误。

为了避免这个问题,缩放后的数值变量可以通过以下方式保存为向量而不是列矩阵:

> mydata2=data.frame(scale(my.data, center=T, scale=T))

Error in colMeans(x, na.rm = TRUE) : 'x' must be numericmydata=data.frame(scale(flowdis3[,c(8,9,10,11,12)], center=T, scale=T,))mydata[] <- lapply(mydata, function(x) if(is.numeric(x)){

          scale(x, center=TRUE, scale=TRUE)

           } else x)mydata2%>%mutate_if(is.numeric,scale)# Working environment and Memory management

rm(list = ls(all.names = TRUE))

gc()

memory.limit(size = 64935)



# Set working directory

setwd("path")



# Example data frame

df <- data.frame("Age" = c(21, 19, 25, 34, 45, 63, 39, 28, 50, 39), 

        "Name" = c("Christine","Kim","Kevin","Aishwarya","Rafel","Bettina","Joshua","Afreen","Wang","Kerubo"),

        "Salary in $" = c(2137.52, 1515.79, 2212.81, 2500.28, 2660, 4567.45, 2733, 3314, 5757.11, 4435.99),

        "Gender" = c("Female","Female","Male","Female","Male","Female","Male","Female","Male","Male"),

        "Height in cm" = c(172, 166, 191, 169, 179, 177, 181, 155, 154, 183),

        "Weight in kg" = c(60, 70, 88, 48, 71, 51, 65, 44, 53, 91))str(df)

'dat				

相关推荐

  • Spring部署设置openshift

    Springdeploymentsettingsopenshift我有一个问题让我抓狂了三天。我根据OpenShift帐户上的教程部署了spring-eap6-quickstart代码。我已配置调试选项,并且已将Eclipse工作区与OpehShift服务器同步-服务器上的一切工作正常,但在Eclipse中出现无法消除的错误。我有这个错误:cvc-complex-type.2.4.a:Invali…
    2025-04-161
  • 检查Java中正则表达式中模式的第n次出现

    CheckfornthoccurrenceofpatterninregularexpressioninJava本问题已经有最佳答案,请猛点这里访问。我想使用Java正则表达式检查输入字符串中特定模式的第n次出现。你能建议怎么做吗?这应该可以工作:MatchResultfindNthOccurance(intn,Patternp,CharSequencesrc){Matcherm=p.matcher…
    2025-04-161
  • 如何让 JTable 停留在已编辑的单元格上

    HowtohaveJTablestayingontheeditedcell如果有人编辑JTable的单元格内容并按Enter,则内容会被修改并且表格选择会移动到下一行。是否可以禁止JTable在单元格编辑后转到下一行?原因是我的程序使用ListSelectionListener在单元格选择上同步了其他一些小部件,并且我不想在编辑当前单元格后选择下一行。Enter的默认绑定是名为selectNext…
    2025-04-161
  • Weblogic 12c 部署

    Weblogic12cdeploy我正在尝试将我的应用程序从Tomcat迁移到Weblogic12.2.1.3.0。我能够毫无错误地部署应用程序,但我遇到了与持久性提供程序相关的运行时错误。这是堆栈跟踪:javax.validation.ValidationException:CalltoTraversableResolver.isReachable()threwanexceptionatorg.…
    2025-04-161
  • Resteasy Content-Type 默认值

    ResteasyContent-Typedefaults我正在使用Resteasy编写一个可以返回JSON和XML的应用程序,但可以选择默认为XML。这是我的方法:@GET@Path("/content")@Produces({MediaType.APPLICATION_XML,MediaType.APPLICATION_JSON})publicStringcontentListRequestXm…
    2025-04-161
  • 代码不会停止运行,在 Java 中

    thecodedoesn'tstoprunning,inJava我正在用Java解决项目Euler中的问题10,即"Thesumoftheprimesbelow10is2+3+5+7=17.Findthesumofalltheprimesbelowtwomillion."我的代码是packageprojecteuler_1;importjava.math.BigInteger;importjava…
    2025-04-161
  • Out of memory java heap space

    Outofmemoryjavaheapspace我正在尝试将大量文件从服务器发送到多个客户端。当我尝试发送大小为700mb的文件时,它显示了"OutOfMemoryjavaheapspace"错误。我正在使用Netbeans7.1.2版本。我还在属性中尝试了VMoption。但仍然发生同样的错误。我认为阅读整个文件存在一些问题。下面的代码最多可用于300mb。请给我一些建议。提前致谢publicc…
    2025-04-161
  • Log4j 记录到共享日志文件

    Log4jLoggingtoaSharedLogFile有没有办法将log4j日志记录事件写入也被其他应用程序写入的日志文件。其他应用程序可以是非Java应用程序。有什么缺点?锁定问题?格式化?Log4j有一个SocketAppender,它将向服务发送事件,您可以自己实现或使用与Log4j捆绑的简单实现。它还支持syslogd和Windows事件日志,这对于尝试将日志输出与来自非Java应用程序…
    2025-04-161