NectarGAN API - Loss Functions

NectarGAN API - Home

The NectarGAN API currently includes three non-standard loss functions (with more planned in the near future). These are:

Sobel Loss

Reference: nectargan.losses.losses.Sobel

Sobel loss is effectively large-scale structure loss. It takes a real ground truth and a generator fake, converts each to grayscale, and applies a Sobel filter to each to approximate an edge map. It then takes the two results and compares them with a traditional pixel-wise loss (L1 in this implementation) to derive the final loss value.

Here is an example of a Sobel loss map[^1]: _sobel_heatmap_example_ On the left, we have an example y_real and a y_fake generated by the NectarGAN Pix2pix implementation during training. Below each is the output of a Sobel filter applied to the respective image.

On the right is the result of an L1 loss function applied to the two Sobel maps. The Mean at the bottom is the final loss value (at least in the default implementation which uses a mean reduction on the returned L1 loss result), unweighted and normalized (0, 1).

Laplacian Loss

Reference: nectargan.losses.losses.Laplacian

Laplacian loss is very similar to Sobel loss, except that, instead of applying a Sobel filter to the ground truth and generator fake, it instead applies a Laplacian filter. This can encourage the generator to preserve more small-scale textural detail from the ground truth image. For some tasks, it can be used alongside Sobel loss as a very effective replacement for L1 loss. It can also help to dissuade the generator from averaging inputs which can sometimes result in blurry outputs.

Here is an example of a Laplacian loss map[^1]: _laplacian_heatmap_example_ The layout here is the same as in the Sobel example. Note the "grittier" appearance and the less well defined edges in the example Laplacian filter images, and also in the resulting L1 loss map when compared with that of the Sobel loss.

VGGPerceptual

Reference: nectargan.losses.losses.VGGPerceptual

[!IMPORTANT] The first time you run this loss function, either by calling it yourself in your own training script, using it in the Toolbox, or by initializing the Pix2pixTrainer with a loss_subspec which includes +VGG, the VGG19 default weights (IMAGENET1K_V1) will be downloaded from PyTorch if you do not already have them installed.

I find VGGPerceptual loss to be particularly interesting, both in how it works, and in the results it can produce. It takes a real ground-truth and a generator fake as input, and feeds each to a pre-trained image classification model (VGG19 in this case). We don't actually care about what the classifier thinks it is though. You may be creating images of something which the model has never seen. Instead, we extract the feature maps from the model at various depths, and we use a traditional pixel-wise loss (L1 in this case), to compare them. The resulting loss values for each depth are then added together to calculate the final loss.

This process gives us a cost function which behaves in sort of a unique way. It strongly encourages the generator to make images which are visually similar to the ground truth, as with traditional L1 loss. But in differs in that it punishes the generator far less harshly for small inaccuracies, allowing it some room for "creativity" and oftentimes reducing the blurring (or averaging) which is common in traditional L1 loss.

Depending on the task, this can allow the generator to produce extremely realistic results, and I find that adding it alongside traditional L1 loss can oftentimes allow the generator to produce much more visually believable images noticably earlier in training than with just L1 loss alone.

[^1]: Disclaimer: While the y_fake in these examples is a real example output from the NectarGAN Pix2pix implementation, the example Sobel and Laplacian images, as well as the loss maps, were instead generated post-train in Houdini via Copernicus. The mean loss values were then derived from the resulting loss maps using VEX. Please note, however, that it is possible to extract loss maps as tensors during training natively. Please see here for more information.