• Welcome !
  • Mail us: contact@analytickast.com
Analytickast.com Analytickast.com
  • Home
  • Blog
  • Coaching
    • Course Dashboard
    • Instructor Registration
    • Student Registration
  • Shop Now
  • Contact Us
  • My account
    • Cart
    • Checkout
  • Log In

Signup

Code Snippets for R – Part 1

Code Snippets for R - Part 1

Zone Leader Sibanjan Das came across some temporal code snippets for R and thought he would share them with everyone.

There are times we know things but can’t execute them immediately. For example, we are working on a task which requires us to transform a categorical variable. It is effortless for us to tell one hot encoding or label encoding would be the appropriate technique to convert categorical variables to an equivalent numeric format. However, when we start writing the code, we face difficulty. First, we search for the codes over the internet. It is a time-consuming task and is repetitive exercise. Being in the field of Machine Learning and Artificial Intelligence (AI), we should streamline our work before we automate the world. CARET is an excellent package that has most of the functions we need while working in R. But sometimes that’s not enough, and seldom we require to work on things that are not available in CARET.

So, I have started cooking a few R codes that would be handy for me when I work on R and thought to share with you all.

Categorical Treatment

One hot encode and Label encode function for transforming categorical data.

one_hot_encode = function(outcome, vars, df){
# Load the package vtreat
library(vtreat)
library(magrittr)
# Create the treatment plan
treatplan <- designTreatmentsZ(df, vars, verbose = FALSE)
# Prepare the training data
temp.treat <- prepare(treatplan, df)
# join  treatment dat with  original data
temp.clean <- cbind(df[,!(names(df) %in% vars)], temp.treat)
temp.clean
}

label_encode = function(vars){
as.factor(vars)
}

label_encode_xgboost = function(vars){
as.numeric(vars)
}

Temporal Data Treatment

It is very essential to create features out of the temporal attribute for using it to build a supervised learning model. The below time_features function will create 11 new attributes out of a temporal variable.

library(lubridate)

time_features = function(time, col_name)
{
   numeric_time <- as.numeric(time)
                       day_of_week <- wday(time)
                       day_of_month <- mday(time)
                       day_of_quarter <- qday(time)
                       day_of_year <- yday(time)
                       hr_of_day <- hour(time)
                       min_of_day <- 60*hour(time) + minute(time)
                       sec_of_day <- 3600*hour(time) + 60*minute(time) + second(time)
                       week_of_year <- week(time)
                       month_of_year <- month(time)
                       year <- year(time)

   df_temp <- data.frame(numeric_time,
 day_of_week,
day_of_month,
day_of_quarter,
day_of_year,
hr_of_day,
min_of_day,
sec_of_day,
week_of_year,
month_of_year,
year
)

  time_df <- setNames(df_temp, paste(col_name, names(df_temp),sep="_"))
  return(time_df)

}

Numerical Binning

Sometimes it is required to convert continuous numerical to discrete data. For example, Naive Bayes and Apriori algorithm work properly when the values are discrete. The below function employs equiwidth binning to convert continuous data to discrete format.

#set.seed(1)
equi_width_binning = function(input, no_of_bins){
#Equi width binning
bins<-no_of_bins #10
minimumVal<-min(input, na.rm=TRUE)
minimumVal
maximumVal<-max(input, na.rm=TRUE)
maximumVal
width=(maximumVal-minimumVal)/bins;
width
bins <- cut(input, breaks=seq(minimumVal, maximumVal, width))
#browser()
bins
}

This is just the beginning. We will continue creating similar modules for the tasks that are redundant. You can download the codes from my github and start using them. If you need something in R to be modularized or want to contribute, feel free to add your code to the project and help us out.

Categories: Machine Learning
Prev Post
Next Post

Add your Comment

Recent Posts

  • Insights on Data Science Automation for Big Data and IoT Environments
  • The Changing Landscape: Data Science Trends
  • Streamline the Machine Learning Process Using Apache Spark ML Pipelines
  • Dive Deep Into Deep Learning
  • CEP Patterns for Stream Analytics

Recent Comments

    Archives

    • June 2020

    Post Categories

    • Business Analytics
    • Machine Learning
    • Popular Content

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org

    About AnalyticKast

    Author

    Our goal is to provide easy access to people on data technologies related information to thrive in this digital economy.

    Latest Posts

    Insights on Data Science Automation for Big Data and IoT Environments

    Insights on Data Science Automation for Big Data and IoT Environments

    June 30, 2020

    The Changing Landscape: Data Science Trends

    June 30, 2020

    Streamline the Machine Learning Process Using Apache Spark ML Pipelines

    June 30, 2020

    About Analytickast

    One-stop knowledge services platform that supports individuals connect the dots between technologies and management to build data products. Our goal is to provide easy access to people on data technologies related information to thrive in this digital economy.

    Blogs

    • Business Analytics
    • Machine Learning
    • Popular Content

    Quick Links

    • Home
    • Blog
    • Coaching
    • Shop Now
    • Contact Us
    • My account
    • Log In

    Our Videos

    All Rights Reserved © 2020. - www.analytickast.com .

    • Privacy Policy
    • Legal Disclaimer
    • Terms of Use

    Insert/edit link

    Enter the destination URL

    Or link to existing content

      No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.