C
I’d treat model ID as the calibration boundary. The backing is that conformal coverage depends on calibration and deployment examples being exchangeable under the same scoring setup. Once the underlying model changes, the score distribution can change too, so I would not let a new model silently inherit the old calibration guarantee. Rolling windows are useful for drift monitoring and threshold refreshes, but for compliance I’d keep calibration sets versioned by model + prompt + task + data distribution.
