Examples


Content:

public class HuberLossRegressionExample

Language: C#

Example: GPU-Aware Huber Loss for Robust Regression Training

HuberLoss is a robust loss function that combines the best of both worlds:

  • Quadratic (L2) loss for small errors: smooth gradients
  • Linear (L1) loss for large errors: robustness to outliers

GPU-Aware Implementation Features:

  • Uses AbsAsync for element-wise absolute values
  • Uses WhereAsync for conditional selection between loss regions
  • Uses LessEqualAsync and GreaterEqualAsync for comparisons
  • All operations stay on GPU when using TorchSharp backend
  • Automatic CPU fallback for non-GPU tensors

Key GPU Operations Used:

  • AbsAsync: Element-wise absolute value
  • WhereAsync: Conditional tensor selection (ternary operator)
  • LessEqualAsync, GreaterEqualAsync: Boolean comparisons
  • SubtractAsync, MultiplyAsync: Element-wise arithmetic
  • SumAsync, MeanAsync: Tensor reduction

public class HuberLossMathematicalReference

Language: C#

Mathematical Background on Huber Loss

For each sample i, the Huber loss is defined as:

L(y_i, ŷ_i) =
0.5 * (y_i - ŷ_i)² if |y_i - ŷ_i| ≤ δ [Quadratic region]
δ * (|y_i - ŷ_i| - 0.5*δ) if |y_i - ŷ_i| > δ [Linear region]

Gradient (d/dŷ):

g_i =
-(y_i - ŷ_i) if |y_i - ŷ_i| ≤ δ [Quadratic gradient]
-δ * sign(y_i - ŷ_i) if |y_i - ŷ_i| > δ [Linear gradient]

Properties:

  1. Continuous: Always differentiable at δ boundary
  2. Smooth gradients: No sharp transitions
  3. Outlier robust: Linear penalty limits impact of large errors
  4. Hyperparameter δ: Controls the trade-off

GPU Implementation Strategy:

  1. Compute difference: diff = predictions - targets (SubtractAsync)
  2. Absolute value: absDiff = |diff| (AbsAsync)
  3. Compare: condition = absDiff ≤ δ (LessEqualAsync)
  4. Compute both loss regions (MultiplyAsync for scaling)
  5. Select based on condition: loss = condition ? smallLoss : largeLoss (WhereAsync)
  6. Reduce: sum or mean across batch (SumAsync, MeanAsync)

For gradients, similar approach with conditional selection:

  1. smallGrad = diff
  2. largeGrad = δ * sign(diff)
  3. grad = condition ? smallGrad : largeGrad

public class HuberLossDeltaComparison

Language: C#

Example: Comparing different delta values for Huber Loss

The delta parameter is crucial for controlling robustness:

  • Small delta (0.1 - 0.5): More similar to MSE, less robust to outliers
  • Medium delta (1.0 - 2.0): Good balance, general purpose
  • Large delta (5.0+): Very robust to outliers, more similar to MAE

GPU Implementation ensures these comparisons are fast and stay on device.

public class HuberLossOptimizerIntegration

Language: C#

Example: Integration with Optimizers

HuberLoss gradients integrate seamlessly with GPU-aware optimizers:

  • SGD with momentum
  • Adam / AdamW
  • RMSProp

All optimizer steps use the same GPU operations infrastructure.

public class HuberLossCpuFallback

Language: C#

Example: Fallback Behavior for CPU-Only Tensors

The GPU-aware implementation includes automatic fallback:

  • If gpuOps is null: Uses CPU-based computation
  • If tensor is not GPU-compatible: Automatically falls back to CPU
  • If backend is CPU-only: Still works, just slower

This ensures HuberLoss works in any environment.