It is always a great moment for the data scientist when the results from their models are approved by stakeholders to be migrated to production. It takes a lot of effort to create a production-ready model that is seen as a valuable addition to an organization. Integrating the model with an organization’s existing IT infrastructure is not that easy. We have to follow rules that are laid down by different IT groups. Our data science infrastructure is not ready and people want to integrate the data scientist’s models to their existing IT infrastructure seamlessly.
One of the ways to do this is to create a model and let IT and engineering teams figure out on how to take it to production. Another way is to translate the model into programs in the production environment, such as porting the logistic regression formula or decision tree rules by embedding them in the programming language supported by the production environment. However, neither of these are reliable and stabilizing the code will take a lot of effort. To our relief, there are many ways (which we discussed in this article) to make our models production-ready. In this article, I will focus on PFA, which makes it easy to deploy the models into production.
PFA stands for portable format for analytics. It is developed by people at Data Mining Group. The same group developed PMML (predictive model markup language). It was the de facto standard and was used widely. PMML is a specification expressed in XML that is designed to represent a collection of specific-purpose, configurable statistical models. Most analytic tools support the export of PMML models, and some tools support the deployment of PMML models into production. However, this had many limitations, e.g. it had limited support for computation. It had a standard set of models that were supported by PMML.
To address these problems, Data Mining Group introduced PFA. PFA blends the ease of portability across systems besides algorithmic flexibility such as model scoring, pre-processing, and post-processing. It is a JSON file containing model parameters and a scoring procedure. The scoring method transforms inputs to outputs by composing functions that range in complexity from simple models to complex models such as neural nets.
Let’s see PFA in action using R. First, install the packageaurelius
:
install.packages("aurelius")
Next, load the packages and create a model:
library(aurelius)
library(rpart)
data <- iris
tree_model <- rpart(Species~., data=data)
Next, use the pfa()
function to create the PFA scoring engine:
pfa_tree_model <- pfa(tree_model, pred_type='prob')
Then, the model can be exported to a .pfa file that can be used in other systems:
# Export the model that can be used in other systems
write_pfa(pfa_tree_model, file = 'tree_model_exp.pfa')
To read the PFA, use the read_pfa
function:
pfa_model <- read_pfa(file("tree_model_exp.pfa"))
And that’s it!