Portal Format for Analytics Using R

There are many ways to make data science models production-ready. In this article, I will focus on PFA, which makes it easy to deploy the models in production.

It is always a great moment for the data scientist when the results from their models are approved by stakeholders to be migrated to production. It takes a lot of effort to create a production-ready model that is seen as a valuable addition to an organization. Integrating the model with an organization’s existing IT infrastructure is not that easy. We have to follow rules that are laid down by different IT groups. Our data science infrastructure is not ready and people want to integrate the data scientist’s models to their existing IT infrastructure seamlessly.

One of the ways to do this is to create a model and let IT and engineering teams figure out on how to take it to production. Another way is to translate the model into programs in the production environment, such as porting the logistic regression formula or decision tree rules by embedding them in the programming language supported by the production environment. However, neither of these are reliable and stabilizing the code will take a lot of effort. To our relief, there are many ways (which we discussed in this article) to make our models production-ready. In this article, I will focus on PFA, which makes it easy to deploy the models into production.

PFA stands for portable format for analytics. It is developed by people at Data Mining Group. The same group developed PMML (predictive model markup language). It was the de facto standard and was used widely. PMML is a specification expressed in XML that is designed to represent a collection of specific-purpose, configurable statistical models. Most analytic tools support the export of PMML models, and some tools support the deployment of PMML models into production. However, this had many limitations, e.g. it had limited support for computation. It had a standard set of models that were supported by PMML.

To address these problems, Data Mining Group introduced PFA. PFA blends the ease of portability across systems besides algorithmic flexibility such as model scoring, pre-processing, and post-processing. It is a JSON file containing model parameters and a scoring procedure. The scoring method transforms inputs to outputs by composing functions that range in complexity from simple models to complex models such as neural nets.

Let’s see PFA in action using R. First, install the packageaurelius:

install.packages("aurelius")

Next, load the packages and create a model:

library(aurelius)
library(rpart)
data <- iris
tree_model <- rpart(Species~., data=data)

Next, use the pfa() function to create the PFA scoring engine:

pfa_tree_model <- pfa(tree_model, pred_type='prob')

Then, the model can be exported to a .pfa file that can be used in other systems:

# Export the model that can be used in other systems
write_pfa(pfa_tree_model, file = 'tree_model_exp.pfa')

To read the PFA, use the read_pfa function:

pfa_model <- read_pfa(file("tree_model_exp.pfa"))

And that’s it!

Categories: Machine Learning

Portal Format for Analytics Using R

Portal Format for Analytics Using R

Add your Comment

Recent Posts

Recent Comments

Archives

Post Categories

Meta

About AnalyticKast

Latest Posts

Insights on Data Science Automation for Big Data and IoT Environments

June 30, 2020

The Changing Landscape: Data Science Trends

June 30, 2020

Streamline the Machine Learning Process Using Apache Spark ML Pipelines

June 30, 2020

About Analytickast

Blogs

Quick Links

Our Videos