• Welcome !
  • Mail us: contact@analytickast.com
Analytickast.com Analytickast.com
  • Home
  • Blog
  • Coaching
    • Course Dashboard
    • Instructor Registration
    • Student Registration
  • Shop Now
  • Contact Us
  • My account
    • Cart
    • Checkout
  • Log In

Signup

Important Evaluation Metrics for the ML Classifiers

Important Evaluation Metrics for the ML Classifiers

Assessing the performance of a machine learning model is an essential step in a predictive modeling pipeline. Once a model is ready, it has to be evaluated to establish its correctness. Building a model is easy, but creating a useful model is difficult. Assessing the usefulness of an ML model is a two-phase process. First, a model is evaluated for its statistical accuracy, that is, whether the statistical assumptions are correct, model performance is excellent, and the performance holds true for other unknown datasets. This step is performed using several evaluation metrics. Second, a model is evaluated to see if it performs as per business requirements and if users genuinely get some insights or useful predictions out of it.

In this article, we will walk you through some of the widely used evaluation metrics used to assess a classification model.

1. Confusion matrix: The confusion matrix is the primary method used to validate a classifier. Most of the model quality and accuracy metrics are based on the values of the confusion matrix. This matrix is a table that contains information about the actual and predicted values for a classifier. It typically looks like the figure below for a binary classifier.

Image title

The data in the confusion matrix have the following meaning:

  1. “a” is the number of positive class predictions that were correctly identified
  2. “b” is the number of incorrect predictions for actual positive cases
  3. “c” is the number of incorrect predictions for negative cases
  4. “d” is the number of negative class predictions that were correctly identified
  • Accuracy: Accuracy measures how often the classifier makes a correct prediction. It is the ratio of the number of correct predictions to the total number of predictions.  

                                                   Accuracy = (a+d)/a+b+c+d

  • Precision: Precision measures the proportions of true positives that were correctly identified.

Precision = a/a+c

  • Recall: Recall is also termed “sensitivity” or “true positive rate.” It measures the proportions of true positives out of all observed positive values of a target.

Recall = a/a+b

  • Misclassification rate: It measures how often the classifier has predicted incorrectly

                                                 Misclassification rate = (c + b)/(a+b+c+d)

  • Specificity: Specificity is also termed a “true negative rate.” It measures the proportions of true negatives out of all observed negative values of a target.

Specificity = d/b+d

  • ROC (receiver operating characteristic) curve: The ROC curve is used to summarize the performance of a classifier over all possible thresholds. The graph for ROC curve is plotted with sensitivity/True Positive Rate(TPR) in the y axis and (1 – specificity)/False Positive Rate(FPR) in the x axis for all possible cutpoints (thresholds).
  • AUC (area under curve): AUC is the area under a ROC curve. If the classifier is excellent, the sensitivity will increase, and the area under the curve will be close to 1. If the classifier is equivalent to random guessing, the sensitivity will increase linearly with the false positive rate(1 – sensitivity). In this case, the AUC will be around 0.5. As a rule of thumb, the better the AUC measure, the better the model.
  • Lift: Lift helps measure the marginal improvement in a model’s predictive ability over the average response. For example, for a marketing campaign, the average response rate is 5%, but a model identifies segments with a 10% response rate. Then that segment has a lift of 2 (10%/5%).
  • Balanced Accuracy: When the data set is unbalanced, the accuracy might not be a good measure alone to evaluate a model. A model built with unbalanced data is biased towards the most occurring class. Balanced accuracy is a measure calculated on the average accuracy obtained in either class.

Balanced Accuracy = 0.5*((a/(a+b))+(d/(c+d))

  • F1 Score:   F1 Score is also considered as a good measure to evaluate an imbalanced classifier. F1 score is the harmonic mean of precision and recall. Its value lies in between 0 and 1.

F1 Score = 2* PR/(P+R)

Some parts of this article is from my book Data Science with Oracle Data Miner and Oracle R Enterprise. You can try my book to know more about Machine Learning algorithms, Data Preprocessing techniques and much more.

Categories: Machine Learning
Next Post

Add your Comment

Recent Posts

  • Insights on Data Science Automation for Big Data and IoT Environments
  • The Changing Landscape: Data Science Trends
  • Streamline the Machine Learning Process Using Apache Spark ML Pipelines
  • Dive Deep Into Deep Learning
  • CEP Patterns for Stream Analytics

Recent Comments

    Archives

    • June 2020

    Post Categories

    • Business Analytics
    • Machine Learning
    • Popular Content

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org

    About AnalyticKast

    Author

    Our goal is to provide easy access to people on data technologies related information to thrive in this digital economy.

    Latest Posts

    Insights on Data Science Automation for Big Data and IoT Environments

    Insights on Data Science Automation for Big Data and IoT Environments

    June 30, 2020

    The Changing Landscape: Data Science Trends

    June 30, 2020

    Streamline the Machine Learning Process Using Apache Spark ML Pipelines

    June 30, 2020

    About Analytickast

    One-stop knowledge services platform that supports individuals connect the dots between technologies and management to build data products. Our goal is to provide easy access to people on data technologies related information to thrive in this digital economy.

    Blogs

    • Business Analytics
    • Machine Learning
    • Popular Content

    Quick Links

    • Home
    • Blog
    • Coaching
    • Shop Now
    • Contact Us
    • My account
    • Log In

    Our Videos

    All Rights Reserved © 2020. - www.analytickast.com .

    • Privacy Policy
    • Legal Disclaimer
    • Terms of Use

    Insert/edit link

    Enter the destination URL

    Or link to existing content

      No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.