NectarGAN – Frequently Asked Questions (FAQ)

## Sections 1. Installation and Environment 2. Dataset and Configuration 3. Training (Toolbox and CLI) 4. Toolbox Overview 5. Testing and Reviewing Results 6. ONNX Export and Deployment 7. Scheduling, Losses and Internals 8. Troubleshooting

Installation and Environment

1) What platforms are supported?

NectarGAN has been tested on Windows. Linux support is planned. See the Getting Started page for more info.

2) What Python versions are supported?

Python >= 3.12

3) How do I install it?

Clone the repo: git clone https://github.com/ZacharyBork/NectarGAN.git
Create a fresh environment and install:

pip install .

--- or ---

conda env create -f environment.yml

Dev/testing install:

pip install -e ".[dev]"

-- or --

pip install -r requirements.dev.txt

4) How can I verify my installation works?

Run the test suite with pytest. See here for more information.

5) I tried running the Toolbox/CLI, but it says I'm missing PyTorch?

PyTorch is not included in the core dependencies. It must be installed separately. PyTorch has multiple versions, each tied to a compute platform, and you should choose the one that best suits your needs based on the system you are running NectarGAN on. Installation instructions for PyTorch can be found on their website.

[!NOTE] At this time, NectarGAN has been tested on the CUDA 12.6, CUDA 12.8, and CPU compute platforms.

Dataset and Configuration

1) What dataset layout does paired training expect?

Your root dataset directory should be laid out as follows:

root/
├─ train/
├─ val/
├─ test/

[!NOTE] The test directory is optional. The dataset images in this directory can be used in a separate post-train model validation. See here.

2) How do I set/override config values?

Toolbox: The Toolbox generates training configs dynamically. When using it, all config values can be set via the UI. See here.

CLI: The default config file used by the training scripts can be found at nectargan/config/default.json. You can open this file in any text editor to override values. See here for more information.

Custom Training Script: There are a number of ways to manage configuration options when writing a custom training script. See here for more information.

3) Can I train on grayscale or multi-channel images?

This can be controlled by setting the value of in_channels on the UNet and PatchGAN models. 1 for grayscale, 3 for RGB. I haven't tested multi-channel inputs much; you may encounter problems. I will check on this and fix it if necessary at some point in the future. ONNX model conversion currently only supports 3 channels. See here and here for more information on ONNX conversion.

4) How do I select the mapping direction (A->B vs B->A)?

By setting the value of config.dataloader.direction in your config file. Valid values are AtoB and BtoA. See here for more information regarding dataset loading.

Training (Toolbox and CLI)

1) How do I start Pix2pix training from the CLI?

See here.

2) What loss setups are available?

There are 4 pre-build loss subspecs for the Pix2pix model objective. These are: - basic - basic+vgg (adds VGG19 perceptual) - extended (adds L2, Sobel, Laplacian) - extended+vgg

See the documentation on loss functions and the Pix2pix objective function for more information on the behaviour of these loss subspecs.

3) How can I control loss weights?

Toolbox: On the Training panel, under the loss tab, you will find sliders for the various loss functions.

CLI: In the config file, under config.train.loss, you can set weights for the various loss functions by changing the correspoding value.

Custom Training Script: See the LossManager documentation for information on setting loss weights and applying weight schedules.

4) Can I resume training from a checkpoint?

Toolbox: Yes. On the Training tab, check the Continue Train checkbox and input the epoch you would like to load. When training begins, Toolbox will look for the config file for the given epoch in the experiment directory and load it, if present, to continue training.

CLI: Yes. Just use the -f --config_file flag and pass it the path the the config file for the epoch you would like to resume training from. See the CLI documentation for more information.

Custom Training Script: See here.

5) How are learning rates scheduled?

Toolbox: On the Training panel, you can set a learning rate schedule with the Epochs, Epochs Decay, and Initial and Target learning rate. See here for more info.

CLI: Epochs, Epochs Decay, and Initial and Target learning rate can all be set in the config file, under config.train.generator.learning_rate and config.train.discriminator.learning_rate.

Custom Training Script: See here for more information on scheduling.

6) What input resolution can I use?

Any reasonable resolution. Higher resolutions take longer to train and are oftentimes more difficult to train in a stable manner. They also require more memory which is ultimately the limiting factor. Individual training images (not paired), should have a 1:1 aspect ratio, and a power of two resolution. Popular choices are 256x256 and 512x512. Sometimes higher if batch size is kept low.

Just be sure to use a reasonable number of layers for your chosen resolution. Higher resolutions generally require a higher layer count for both generator and discriminator to produce acceptable results. But too many layers will cause training to fail, as the tensor will be downsampled too far before hitting the bottleneck.

See here for more information.

7) I’m seeing checkerboard artifacts. How can I reduce them?

These artifacts are incredibly common with pixel to pixel GAN models. Sometimes training longer will help, or just increasing the decay epochs to give the model longer to settle. You can also try using the transposed convolution upsampling method for the generator (see here), or using the Residual UNet block instead of the standar UNet block. Both of these solutions are frequently able to reduce or completely remove the checkerboarding.

If none of the above solutions work, you may also try adjusting your loss values. Sometimes adding L2 or VGG Perceptual, and/or reducing L1, can also help to eliminate this artifacting.

Toolbox Overview

1) What are the Toolbox sections and shortcuts?

See here.

2) Where do outputs go and how are experiments versioned?

Experiments will be exported to the Output Root, in directories named Experiment Name. These directories are versioned automatically (see here). Output Root and Experiment Name are set in different places depending upon how training is initiated:

Toolbox: Both can be set on the Experiment panel.

CLI: Both are set in the config file under config.common.

Custom Training Script: Same as CLI.

3) Can I change UI performance vs feedback rate?

Yes. When training with Toolbox, the actual training happens in a separate thread from UI, and that thread sends updates back to the UI every so often. Doing this too frequently is performance intensive and slows down training significantly. You could have it send back an update every iteration, though, if you wanted.

This update frequency can be changed by going to the Settings panel and setting the value of Training Update Rate. This value can be changed at runtime, so you can decrease it briefly to get a better look at the models output, then increase it again to revert to your initial training speed.

Testing and Reviewing Results

1) How do I test a trained model?

Toolbox: Using the Testing panel. See here

CLI: See here for information on model testing via CLI.

Custom Testing Script: See the Tester class documentation.

2) How do I review my experiments?

The Toolbox Review panel offers an easy way to review the results of your model test. See here and here for more information.

3) How do I log losses during training?

Toolbox: On the Training panel, under the Loss tab, you can enable Log Losses During Training and configure the logging behaviour.

CLI: Using the --log_losses flag. See here.

Custom Training Script: See the LossManager documentation.

ONNX Export and Deployment

1) How do I export to ONNX?

Toolbox: On the Utilities panel, you will find a set of tools that allow you to convert your models to run on the ONNX runtime, and to test your converted models. See here for more information.

CLI: This is currently not supported.

Custom Script: See the ONNX tools documentation.

2) Why do I get instance-norm warnings when exporting?

See here.

3) Can I test an exported ONNX model inside the Toolbox?

Yes, see here.

4) Does ONNX export support non-RGB inputs?

No, currently the ONNXConverter only supports 3 channels (RGB).

Scheduling, Losses and Internals

1) What is the LossManager and why use it?

The Loss Manager is a comprehensive module for managing everything related to model loss during training. It is highly flexible and configurable, and allows you to easily and accurately manage complex objective functions.

See the Loss Manager documentation for more information.

2) What are “loss specs”?

Loss specs are drop in objective functions which you can pre-define, and feed in to a Loss Manager, allowing you to more carefully build, track, and reuse objectives from model to model.

See the Loss Spec documentation for more information, and the Pix2pix Objective Function for an example of a loss spec.

3) How do schedules integrate with training?

NectarGAN offers a generic Scheduler, and a wrapper around the native PyTorch learning rate scheduler called TorchScheduler. This allows you to use the same Schedule Functions for each.

The TorchScheduler is predominantly used for learning rate, to take advantage of the inherant integration with the PyTorch optimizers. The generic Scheduler is predominantly used for loss weight scheduling, though you could use it for whatever you want in your own models.

See here to get started with scheduling in NectarGAN.

Troubleshooting

1) Training is slow in the Toolbox. Any tips?

Increase Training Update Rate (see here), and avoid very low dump frequencies for loss logs, as these can potentially cause lag spikes.

2) I can’t find my outputs/checkpoints. Where are they?

All files related to a given training session will be exported to your Output Root directory, in a subdirectory named after your current Experiment Name. These directories will be automatically versioned, so be sure to look for the latest one.

3) CLI won’t see my config. What’s used by default?

If the -f flag isn't used, the training/testing scripts will instead use the default config file located at nectargan/config/default.json.