xaib.metrics.feature_importance¶

class xaib.metrics.feature_importance.covariate_regularity.CovariateRegularity(ds: Dataset, model: Model, *args: Any, **kwargs: Any)[source]¶

Covariate Regularity using entropy over explanations

This measures how comprehensible the explanations are in average. More simple explanations are considered better. This is measured by average Shannon entropy over batch-normalized explanations.

The less the better

Worst case: constant explainer that gives same importance to each feature, that is equal to 1/N where N is the number of features
Best case: constant explainer that gives one feature maximum value and others zero

compute(expl: Explainer, batch_size: int = 1, expl_kwargs: Dict[Any, Any] | None = None)[source]¶

class xaib.metrics.feature_importance.label_difference.LabelDifference(ds: Dataset, model: Model, *args: Any, **kwargs: Any)[source]¶

Obtain explanations for all targets and compare them. In binary case one metric can be computed - the difference between explanations of positive and negative targets. In case of multiclass classification we can compare one explanation to explanations of all other classes and obtain a number of metrics which can be averaged.

The greater the better

Worst case: same explanations for different targets - constant explainer
Best case: different explanations for different targets

compute(expl: Explainer, batch_size: int = 1, expl_kwargs: Dict[Any, Any] | None = None) → None[source]¶

class xaib.metrics.feature_importance.other_disagreement.OtherDisagreement(ds: Dataset, model: Model, *args: Any, **kwargs: Any)[source]¶

Measures how distant explanations on the same data points for this particular method from explanations of all others. Average RMSE is used as a metric. The less the better

compute(expl: Explainer, batch_size: int = 1, expls: List[Explainer] | None = None, expl_kwargs: Dict[Any, Any] | None = None, expls_kwargs: List[Dict[Any, Any]] | None = None) → None[source]¶

class xaib.metrics.feature_importance.model_randomization_check.ModelRandomizationCheck(ds: Dataset, model: Model, noisy_model: Model, **kwargs: Any)[source]¶

Model randomization check is a sanity-check. To ensure that the model influence explanations the following is done. The model is changed and it is expected that explanations should not stay the same is model changed. This check uses random model baselines instead of same models with randomized internal states. Then the explanations on the original data are obtained. They are compared with explanations done with the original model using average RMSE on the whole dataset. The further original explanations from the explanations on the randomized model the better.

The greater the better

Worst case: explanations are the same, so it is Constant explainer.
Best case: is reached when explanations are the opposite, distance between them maximized. The problem with this kind of metric is with its maximization. It seems redundant to maximize it because more different explanations on random states do not mean that the model is more correct.

It is difficult to define best case explainer in this case - the metric has no maximum value.

compute(expl: Explainer, batch_size: int = 1, expl_kwargs: Dict[Any, Any] | None = None, expl_noisy_kwargs: Dict[Any, Any] | None = None) → None[source]¶

class xaib.metrics.feature_importance.small_noise_check.SmallNoiseCheck(ds: Dataset, noisy_ds: Dataset, model: Model, *args: Any, **kwargs: Any)[source]¶

Apply noise of small magnitude to the input data. Obtain original and perturbed explanations. Compare them using RMSE and average. The less the better - Worst case: is when explanations are hugely changed by the small variations in input - Best case: is no variations, so constant explainer should achieve best results

compute(expl: Explainer, batch_size: int = 1, expl_kwargs: Dict[Any, Any] | None = None) → None[source]¶

class xaib.metrics.feature_importance.sparsity.Sparsity(ds: Dataset, model: Model, *args: Any, **kwargs: Any)[source]¶

Considering Gini-index as a measure of sparsity, one can give an average of it as a measure of sparsity for explanations. The greater the better

Worst case: is achieved by constant explainer that gives same importance to each feature that is equal to 1/N where N is the number of features will obtain best gini index and hence worst sparsity
Best case: is when explainer is constant and gives one feature maximum value and others zero, which is the most unequal distribution and is the sparsest explanation that can be given

compute(expl: Explainer, batch_size: int = 1, expl_kwargs: Dict[Any, Any] | None = None) → None[source]¶