Examples

Content:

public class HuberLossRegressionExample

Language: C#

Example: GPU-Aware Huber Loss for Robust Regression Training

HuberLoss is a robust loss function that combines the best of both worlds:

Quadratic (L2) loss for small errors: smooth gradients

Linear (L1) loss for large errors: robustness to outliers

GPU-Aware Implementation Features:

Uses AbsAsync for element-wise absolute values

Uses WhereAsync for conditional selection between loss regions

Uses LessEqualAsync and GreaterEqualAsync for comparisons

All operations stay on GPU when using TorchSharp backend

Automatic CPU fallback for non-GPU tensors

Key GPU Operations Used:

AbsAsync: Element-wise absolute value

WhereAsync: Conditional tensor selection (ternary operator)

LessEqualAsync, GreaterEqualAsync: Boolean comparisons

SubtractAsync, MultiplyAsync: Element-wise arithmetic

SumAsync, MeanAsync: Tensor reduction

public class HuberLossMathematicalReference

Language: C#

Mathematical Background on Huber Loss

For each sample i, the Huber loss is defined as:

L(y_i, ŷ_i) =
0.5 * (y_i - ŷ_i)² if |y_i - ŷ_i| ≤ δ [Quadratic region]
δ * (|y_i - ŷ_i| - 0.5*δ) if |y_i - ŷ_i| > δ [Linear region]

Gradient (d/dŷ):

g_i =
-(y_i - ŷ_i) if |y_i - ŷ_i| ≤ δ [Quadratic gradient]
-δ * sign(y_i - ŷ_i) if |y_i - ŷ_i| > δ [Linear gradient]

Properties:

Continuous: Always differentiable at δ boundary

Smooth gradients: No sharp transitions

Outlier robust: Linear penalty limits impact of large errors

Hyperparameter δ: Controls the trade-off

GPU Implementation Strategy:

Compute difference: diff = predictions - targets (SubtractAsync)

Absolute value: absDiff = |diff| (AbsAsync)

Compare: condition = absDiff ≤ δ (LessEqualAsync)

Compute both loss regions (MultiplyAsync for scaling)

Select based on condition: loss = condition ? smallLoss : largeLoss (WhereAsync)

Reduce: sum or mean across batch (SumAsync, MeanAsync)

For gradients, similar approach with conditional selection:

smallGrad = diff

largeGrad = δ * sign(diff)

grad = condition ? smallGrad : largeGrad

public class HuberLossDeltaComparison

Language: C#

Example: Comparing different delta values for Huber Loss

The delta parameter is crucial for controlling robustness:

Small delta (0.1 - 0.5): More similar to MSE, less robust to outliers

Medium delta (1.0 - 2.0): Good balance, general purpose

Large delta (5.0+): Very robust to outliers, more similar to MAE

GPU Implementation ensures these comparisons are fast and stay on device.

public class HuberLossOptimizerIntegration

Language: C#

Example: Integration with Optimizers

HuberLoss gradients integrate seamlessly with GPU-aware optimizers:

SGD with momentum

Adam / AdamW

RMSProp

All optimizer steps use the same GPU operations infrastructure.

public class HuberLossCpuFallback

Language: C#

Example: Fallback Behavior for CPU-Only Tensors

The GPU-aware implementation includes automatic fallback:

If gpuOps is null: Uses CPU-based computation

If tensor is not GPU-compatible: Automatically falls back to CPU

If backend is CPU-only: Still works, just slower

This ensures HuberLoss works in any environment.