Examples
Content:
public class HuberLossRegressionExample
Language: C#
Example: GPU-Aware Huber Loss for Robust Regression Training
HuberLoss is a robust loss function that combines the best of both worlds:
- Quadratic (L2) loss for small errors: smooth gradients
- Linear (L1) loss for large errors: robustness to outliers
GPU-Aware Implementation Features:
- Uses AbsAsync for element-wise absolute values
- Uses WhereAsync for conditional selection between loss regions
- Uses LessEqualAsync and GreaterEqualAsync for comparisons
- All operations stay on GPU when using TorchSharp backend
- Automatic CPU fallback for non-GPU tensors
Key GPU Operations Used:
- AbsAsync: Element-wise absolute value
- WhereAsync: Conditional tensor selection (ternary operator)
- LessEqualAsync, GreaterEqualAsync: Boolean comparisons
- SubtractAsync, MultiplyAsync: Element-wise arithmetic
- SumAsync, MeanAsync: Tensor reduction
public class HuberLossMathematicalReference
Language: C#
Mathematical Background on Huber Loss
For each sample i, the Huber loss is defined as:
L(y_i, ŷ_i) =
0.5 * (y_i - ŷ_i)² if |y_i - ŷ_i| ≤ δ [Quadratic region]
δ * (|y_i - ŷ_i| - 0.5*δ) if |y_i - ŷ_i| > δ [Linear region]Gradient (d/dŷ):
g_i =
-(y_i - ŷ_i) if |y_i - ŷ_i| ≤ δ [Quadratic gradient]
-δ * sign(y_i - ŷ_i) if |y_i - ŷ_i| > δ [Linear gradient]Properties:
- Continuous: Always differentiable at δ boundary
- Smooth gradients: No sharp transitions
- Outlier robust: Linear penalty limits impact of large errors
- Hyperparameter δ: Controls the trade-off
GPU Implementation Strategy:
- Compute difference: diff = predictions - targets (SubtractAsync)
- Absolute value: absDiff = |diff| (AbsAsync)
- Compare: condition = absDiff ≤ δ (LessEqualAsync)
- Compute both loss regions (MultiplyAsync for scaling)
- Select based on condition: loss = condition ? smallLoss : largeLoss (WhereAsync)
- Reduce: sum or mean across batch (SumAsync, MeanAsync)
For gradients, similar approach with conditional selection:
- smallGrad = diff
- largeGrad = δ * sign(diff)
- grad = condition ? smallGrad : largeGrad
public class HuberLossDeltaComparison
Language: C#
Example: Comparing different delta values for Huber Loss
The delta parameter is crucial for controlling robustness:
- Small delta (0.1 - 0.5): More similar to MSE, less robust to outliers
- Medium delta (1.0 - 2.0): Good balance, general purpose
- Large delta (5.0+): Very robust to outliers, more similar to MAE
GPU Implementation ensures these comparisons are fast and stay on device.
public class HuberLossOptimizerIntegration
Language: C#
Example: Integration with Optimizers
HuberLoss gradients integrate seamlessly with GPU-aware optimizers:
- SGD with momentum
- Adam / AdamW
- RMSProp
All optimizer steps use the same GPU operations infrastructure.
public class HuberLossCpuFallback
Language: C#
Example: Fallback Behavior for CPU-Only Tensors
The GPU-aware implementation includes automatic fallback:
- If gpuOps is null: Uses CPU-based computation
- If tensor is not GPU-compatible: Automatically falls back to CPU
- If backend is CPU-only: Still works, just slower
This ensures HuberLoss works in any environment.