• Welcome !
  • Mail us: contact@analytickast.com
Analytickast.com Analytickast.com
  • Home
  • Blog
  • Coaching
    • Course Dashboard
    • Instructor Registration
    • Student Registration
  • Shop Now
  • Contact Us
  • My account
    • Cart
    • Checkout
  • Log In

Signup

Portal Format for Analytics Using R

Portal Format for Analytics Using R

There are many ways to make data science models production-ready. In this article, I will focus on PFA, which makes it easy to deploy the models in production.

It is always a great moment for the data scientist when the results from their models are approved by stakeholders to be migrated to production. It takes a lot of effort to create a production-ready model that is seen as a valuable addition to an organization. Integrating the model with an organization’s existing IT infrastructure is not that easy. We have to follow rules that are laid down by different IT groups. Our data science infrastructure is not ready and people want to integrate the data scientist’s models to their existing IT infrastructure seamlessly.

One of the ways to do this is to create a model and let IT and engineering teams figure out on how to take it to production. Another way is to translate the model into programs in the production environment, such as porting the logistic regression formula or decision tree rules by embedding them in the programming language supported by the production environment. However, neither of these are reliable and stabilizing the code will take a lot of effort. To our relief, there are many ways (which we discussed in this article) to make our models production-ready. In this article, I will focus on PFA, which makes it easy to deploy the models into production.

PFA stands for portable format for analytics. It is developed by people at Data Mining Group. The same group developed PMML (predictive model markup language). It was the de facto standard and was used widely. PMML is a specification expressed in XML that is designed to represent a collection of specific-purpose, configurable statistical models. Most analytic tools support the export of PMML models, and some tools support the deployment of PMML models into production. However, this had many limitations, e.g. it had limited support for computation. It had a standard set of models that were supported by PMML.

To address these problems, Data Mining Group introduced PFA. PFA blends the ease of portability across systems besides algorithmic flexibility such as model scoring, pre-processing, and post-processing. It is a JSON file containing model parameters and a scoring procedure. The scoring method transforms inputs to outputs by composing functions that range in complexity from simple models to complex models such as neural nets.

Let’s see PFA in action using R. First, install the packageaurelius:

install.packages("aurelius")

Next, load the packages and create a model:

library(aurelius)
library(rpart)

data <- iris
tree_model <- rpart(Species~., data=data)

Next, use the pfa() function to create the PFA scoring engine:

pfa_tree_model <- pfa(tree_model, pred_type='prob')

Then, the model can be exported to a .pfa file that can be used in other systems:

# Export the model that can be used in other systems
write_pfa(pfa_tree_model, file = 'tree_model_exp.pfa')

To read the PFA, use the read_pfa function:

pfa_model <- read_pfa(file("tree_model_exp.pfa"))

And that’s it!

Categories: Machine Learning
Prev Post
Next Post

Add your Comment

Recent Posts

  • Insights on Data Science Automation for Big Data and IoT Environments
  • The Changing Landscape: Data Science Trends
  • Streamline the Machine Learning Process Using Apache Spark ML Pipelines
  • Dive Deep Into Deep Learning
  • CEP Patterns for Stream Analytics

Recent Comments

    Archives

    • June 2020

    Post Categories

    • Business Analytics
    • Machine Learning
    • Popular Content

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org

    About AnalyticKast

    Author

    Our goal is to provide easy access to people on data technologies related information to thrive in this digital economy.

    Latest Posts

    Insights on Data Science Automation for Big Data and IoT Environments

    Insights on Data Science Automation for Big Data and IoT Environments

    June 30, 2020

    The Changing Landscape: Data Science Trends

    June 30, 2020

    Streamline the Machine Learning Process Using Apache Spark ML Pipelines

    June 30, 2020

    About Analytickast

    One-stop knowledge services platform that supports individuals connect the dots between technologies and management to build data products. Our goal is to provide easy access to people on data technologies related information to thrive in this digital economy.

    Blogs

    • Business Analytics
    • Machine Learning
    • Popular Content

    Quick Links

    • Home
    • Blog
    • Coaching
    • Shop Now
    • Contact Us
    • My account
    • Log In

    Our Videos

    All Rights Reserved © 2020. - www.analytickast.com .

    • Privacy Policy
    • Legal Disclaimer
    • Terms of Use