*Learn how to implement loss functions in **TensorFlow in this article by Nick McClure, a senior data scientist at PayScale with a passion for learning and advocating for analytics, machine learning, and artificial intelligence. *

Loss functions are very important for machine learning algorithms. They measure the distance between the model outputs and the target (truth) values. This article delves into various loss function implementations in TensorFlow.

Getting ready

In order to optimize your machine learning algorithms, you need to evaluate the outcomes. Evaluating outcomes in TensorFlow depends on specifying a loss function. A loss function tells TensorFlow how good or bad the predictions are, compared with the desired result. In most cases, you’ll have a set of data and a target on which to train your algorithm. The loss function compares the target with the prediction and gives a numerical distance between the two.

This article will cover the main loss functions that you can implement in TensorFlow. To see how the different loss functions operate, start a computational graph and load matplotlib, a Python plotting library using the following code:

import matplotlib.pyplot as plt import tensorflow as tf

How to do it…

- First, look at loss functions for regression, which means predicting a continuous dependent variable. Create a sequence of your predictions and a target as a tensor using the below code. You can output the results across 500 x values between -1 and 1 later in the article.

x_vals = tf.linspace(-1., 1., 500) target = tf.constant(0.)

- The L2 norm loss is also known as the
**Euclidean loss function**. It is just the square of the distance to the target. Here, you’ll compute the loss function as if the target is zero. The L2 norm is a great loss function because it is curved near the target and algorithms can use this fact to converge to the target more slowly, the closer it gets to zero. You can implement this as follows:

l2_y_vals = tf.square(target - x_vals) l2_y_out = sess.run(l2_y_vals)

- The L1 norm loss is also known as the
**absolute loss function**. Instead of squaring the difference, take the absolute value. The L1 norm is better for outliers than the L2 norm because it is not as steep for larger values. One issue to be aware of is that the L1 norm is not smooth at the target, and this can result in algorithms not converging well.

l1_y_vals = tf.abs(target - x_vals) l1_y_out = sess.run(l1_y_vals)

- Pseudo-Huber loss is a continuous and smooth approximation to the
**Huber loss function**. This loss function attempts to take the best of the L1 and L2 norms by being convex near the target and less steep for extreme values. The form depends on an extra parameter, delta, which dictates how steep it will be. Plot two forms,and**delta1 = 0.25**, to show the difference, as follows:**delta2 = 5**

delta1 = tf.constant(0.25) phuber1_y_vals = tf.multiply(tf.square(delta1), tf.sqrt(1. + tf.square((target - x_vals)/delta1)) - 1.) phuber1_y_out = sess.run(phuber1_y_vals) delta2 = tf.constant(5.) phuber2_y_vals = tf.multiply(tf.square(delta2), tf.sqrt(1. + tf.square((target - x_vals)/delta2)) - 1.) phuber2_y_out = sess.run(phuber2_y_vals)

Now, move on to loss functions for classification problems. Classification loss functions are used to evaluate loss when predicting categorical outcomes. Usually, the output of your model for a class category is a real number between 0 and 1. Then, if the number is above the cutoff, choose a cutoff (0.5 is commonly chosen) and classify the outcome as being in that category. Here, consider the various loss functions for categorical outputs:

- You’ll need to redefine your predictions (x_vals) and target. Save the outputs and plot them in the next section:

x_vals = tf.linspace(-3., 5., 500) target = tf.constant(1.) targets = tf.fill([500,], 1.)

- Hinge loss is mostly used for support vector machines but can be used in neural networks as well. It is meant to compute a loss among two target classes, 1 and -1. In the following code, you’ll use the target value 1, so the closer your predictions are to 1, the lower the loss value:

hinge_y_vals = tf.maximum(0., 1. - tf.multiply(target, x_vals)) hinge_y_out = sess.run(hinge_y_vals)

- Cross-entropy loss for a binary case is also sometimes referred to as the
**logistic loss function**. It comes about when you are predicting the two classes 0 or 1. You may wish to measure the distance from the actual class (0 or 1) to the predicted value, which is usually a real number between 0 and 1. To measure this distance, use the cross-entropy formula from information theory, as follows:

xentropy_y_vals = - tf.multiply(target, tf.log(x_vals)) - tf.multiply((1. - target), tf.log(1. - x_vals)) xentropy_y_out = sess.run(xentropy_y_vals)

- Sigmoid cross-entropy loss is similar to the previous loss function, except you transform the x values using the sigmoid function before you put them in the cross-entropy loss, as follows:

xentropy_sigmoid_y_vals = tf.nn.sigmoid_cross_entropy_with_logits_v2(logits=x_vals, labels=targets) xentropy_sigmoid_y_out = sess.run(xentropy_sigmoid_y_vals)

- Weighted cross-entropy loss is a weighted version of the sigmoid cross-entropy loss. Provide a weight on the positive target. For example, weigh the positive target by 0.5, as follows:

weight = tf.constant(0.5) xentropy_weighted_y_vals = tf.nn.weighted_cross_entropy_with_logits(logits=x_vals, targets=targets, pos_weight=weight) xentropy_weighted_y_out = sess.run(xentropy_weighted_y_vals)

- Softmax cross-entropy loss operates on non-normalized outputs. This function is used to measure a loss when there is only one target category instead of multiple categories. Because of this, the function transforms the outputs into a probability distribution via the softmax function. Then, it computes the loss function from a true probability distribution, as follows:

unscaled_logits = tf.constant([[1., -3., 10.]]) target_dist = tf.constant([[0.1, 0.02, 0.88]]) softmax_xentropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=unscaled_logits, labels=target_dist) print(sess.run(softmax_xentropy))[ 1.16012561]

- Sparse softmax cross-entropy loss is the same as the previous, except instead of the target being a probability distribution, it is an index of the category that is true. Instead of a sparse all-zero target vector with one value of 1, pass in the index of the category (that is, the true value) as follows:

unscaled_logits = tf.constant([[1., -3., 10.]]) sparse_target_dist = tf.constant([2]) sparse_xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=unscaled_logits, labels=sparse_target_dist) print(sess.run(sparse_xentropy))[ 0.00012564]

How it works…

This code shows how to use matplotlib to plot the regression loss functions:

x_array = sess.run(x_vals) plt.plot(x_array, l2_y_out, 'b-', label='L2 Loss') plt.plot(x_array, l1_y_out, 'r--', label='L1 Loss') plt.plot(x_array, phuber1_y_out, 'k-.', label='P-Huber Loss (0.25)') plt.plot(x_array, phuber2_y_out, 'g:', label='P-Huber Loss (5.0)') plt.ylim(-0.2, 0.4) plt.legend(loc='lower right', prop={'size': 11}) plt.show()

You’ll get the following plot as output from the preceding code:

Figure 4: Plotting various regression loss functions

And here is how to use matplotlib to plot the various classification loss functions:

x_array = sess.run(x_vals) plt.plot(x_array, hinge_y_out, 'b-''', label='Hinge Loss''') plt.plot(x_array, xentropy_y_out, 'r--''', label='Cross' Entropy Loss') plt.plot(x_array, xentropy_sigmoid_y_out, 'k-.''', label='Cross' Entropy Sigmoid Loss') plt.plot(x_array, xentropy_weighted_y_out, g:''', label='Weighted' Cross Enropy Loss (x0.5)') plt.ylim(-1.5, 3) plt.legend(loc='lower right''', prop={'size''': 11}) plt.show()

You’ll get the following plot from the preceding code:

Figure 5: Plots of classification loss functions

There’s more…

Here is a table summarizing the different loss functions covered in this article:

Loss function |
Use |
Benefits |
Disadvantages |

L2 | Regression | More stable | Less robust |

L1 | Regression | More robust | Less stable |

Pseudo-Huber | Regression | More robust and stable | One more parameter |

Hinge | Classification | Creates a max margin for use in SVM | Unbounded loss affected by outliers |

Cross-entropy | Classification | More stable | Unbounded loss, less robust |

The remaining classification loss functions all have to do with the type of cross-entropy loss. The cross-entropy sigmoid loss function is for use on unscaled logits and is preferred over computing the sigmoid and then the cross-entropy. This is because TensorFlow has better built-in ways to handle numerical edge cases. The same goes for softmax cross-entropy and sparse softmax cross-entropy.

Most of the classification loss functions described here are for two-class predictions. This can be extended to multiple classes by summing the cross-entropy terms over each prediction/target.

There are several other metrics to look at when evaluating a model. Here’s a list of some more to consider:

Model metric |
Description |

R-squared (coefficient of determination) | For linear models, this is the proportion of variance in the dependent variable that is explained by the independent data. For models with a larger number of features, consider using adjusted R squared. |

Root mean squared error | For continuous models, this measures the difference between prediction and actual via the square root of the average squared error. |

Confusion matrix | For categorical models, look at a matrix of predicted categories versus actual categories. A perfect model has all the counts along the diagonal. |

Recall | For categorical models, this is the fraction of true positives over all predicted positives. |

Precision | For categorical models, this is the fraction of true positives over all actual positives. |

F-score | For categorical models, this is the harmonic mean of precision and recall. |

*If you found this article interesting, you can explore Nick McClure’s TensorFlow Machine Learning Cookbook – Second Edition to skip the theory and get the most out of Tensorflow to build production-ready machine learning models. TensorFlow Machine Learning Cookbook – Second Edition will teach you how to use TensorFlow for complex data computations and allow you to dig deeper and gain more insights into your data than ever before.*