• Welcome !
  • Mail us: contact@analytickast.com
Analytickast.com Analytickast.com
  • Home
  • Blog
  • Coaching
    • Course Dashboard
    • Instructor Registration
    • Student Registration
  • Shop Now
  • Contact Us
  • My account
    • Cart
    • Checkout
  • Log In

Signup

Data Manipulation in R Using dplyr

Data Manipulation in R Using dplyr

Learn about the primary functions of the dplyr package and the power of this package to transform and manipulate your datasets with ease in R.

In our previous article, we discussed the importance of data preprocessing and data management tasks in a data science pipeline. Also, we provided a brief explanation of the dplyr R package. This article will focus on the power of this package to transform your datasets with ease in R.

The dplyr package has five primary functions, commonly known as verbs. The verbs aids in performing most of the typical data manipulation operations, which we will discuss in the below sections.

Glimpse

The glimpse method can be used to see the columns of data and display some portion of the data for each variable that can be fit on a single line.

library(dplyr)
glimpse(mtcars)

Image title

Select

select is used for choosing display variables based on the subset criteria. For instance, select(mtcars,mpg) displays the MPG column from the mtcars dataset: 

Image title

select(mtcars,mpg:disp) displays data in the columns from MPG to DISP, as shown in the below results:

Image title

select(mtcars, mpg:disp,-cyl) displays data in the columns from MPG to DISP without the CYL attribute:

Image title

Pipe Operator 

pipe operator(%>%) is used to tie multiple operations together. This makes it easy, especially when we need to perform various operations on a dataset to derive the results.

We can read mtcars %>% select(wt,mpg,disp) from left to right — from the mtcars dataset, select WT, MPG, and DISP variables.

Image title

Mutate

mutate is used to add new columns to a dataset. It is useful to create attributes that are functions of other attributes in the dataset. It’s one of the essential tools that can come handy for new feature creation in the data preprocessing stage.

mtcars %>% mutate(nv=wt+mpg) creates a new attribute NV by adding WT and MPG together.

Image title

Filter

The filter method selects cases based on their values.

mtcars %>% filter(hp>123) displays data whose HP values are more than 123.

Image title

Group_by

group_by is used to group data together based on one or more columns. It is often used along with a summarizing function to derive aggregated values:

mtcars %>% filter(hp>123) %>% group_by(am)

Summarize

summarize is used to aggregate multiple values to a single value. It is most often used with the group_by function, and the output has one row per group:

mtcars %>% filter(hp>123) %>% group_by(am) %>% summarize(avg_wt=mean(wt))

This command calculates the average WT for each unique value in the AM column for mtcar data having HP > 123.

Image title

Arrange

arrange is used to sort cases is ascending or descending order. The default is ascending order:

mtcars %>% filter(hp>123) %>% arrange(mpg)

Image title

As shown below, use desc to order the data in descending order.

mtcars %>% filter(hp>123) %>% arrange(desc(mpg))

Image title

To learn more about dplyr, see here.

Though we can perform these tasks using base R functions, the verbs in dplyr are optimized for high performance, are easier to work with, and are consistent in the syntax. So, pick up a dataset, get started with dplyr, and share your data preparation story on DZone for other people to understand.

Categories: Machine Learning
Prev Post
Next Post

Add your Comment

Recent Posts

  • Insights on Data Science Automation for Big Data and IoT Environments
  • The Changing Landscape: Data Science Trends
  • Streamline the Machine Learning Process Using Apache Spark ML Pipelines
  • Dive Deep Into Deep Learning
  • CEP Patterns for Stream Analytics

Recent Comments

    Archives

    • June 2020

    Post Categories

    • Business Analytics
    • Machine Learning
    • Popular Content

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org

    About AnalyticKast

    Author

    Our goal is to provide easy access to people on data technologies related information to thrive in this digital economy.

    Latest Posts

    Insights on Data Science Automation for Big Data and IoT Environments

    Insights on Data Science Automation for Big Data and IoT Environments

    June 30, 2020

    The Changing Landscape: Data Science Trends

    June 30, 2020

    Streamline the Machine Learning Process Using Apache Spark ML Pipelines

    June 30, 2020

    About Analytickast

    One-stop knowledge services platform that supports individuals connect the dots between technologies and management to build data products. Our goal is to provide easy access to people on data technologies related information to thrive in this digital economy.

    Blogs

    • Business Analytics
    • Machine Learning
    • Popular Content

    Quick Links

    • Home
    • Blog
    • Coaching
    • Shop Now
    • Contact Us
    • My account
    • Log In

    Our Videos

    All Rights Reserved © 2020. - www.analytickast.com .

    • Privacy Policy
    • Legal Disclaimer
    • Terms of Use

    Insert/edit link

    Enter the destination URL

    Or link to existing content

      No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.