Track scikit-learn experiment#
With Cascade you can track any ML experiment. And the workflow is usually the same for a single library.
Cascade respects the tools ML engineers use every day and was built to simplify repetitive work. This is the reason why it features scikit-learn integration - it simplifies experiment tracking a lot.
Everything is located in cascade.utils.sklearn
.
from cascade.utils.sklearn import SkModel
from sklearn.feature_selection import SelectKBest
from sklearn.svm import SVC
SkModel
class accepts a list of Pipeline
blocks from scikit-learn
.
Everything you can put into a pipeline, you can pass as a list to the Cascade
wrapper.
Note
Remember to use the keyword blocks
, it will not work without it!
k_best = 2
model = SkModel(
blocks=[
SelectKBest(k=k_best),
SVC(),
],
k=k_best,
)
Notice how k_best
was passed both into a transform and wrapper.
This is how SkModel
gets the parameters to track. You can pass
anything else you want to be tracked as parameters. This may change to
an automatic parameters tacking in future versions.
After creating you can use the wrapper as you would
use sklearn
estimator.
from sklearn.datasets import load_iris
iris = load_iris()
For example fit will look like this.
model.fit(iris.data, iris.target)
And predict like this. Nothing unusual.
model.predict([iris.data[0]]), iris.target[0]
Cascade allows to conveniently use metrics in evaluation.
You can create metrics by using alias like "acc" or "f1" here or
using a name that you would import from sklearn.metrics
module.
from cascade.utils.sklearn import SkMetric
metrics = [
SkMetric("acc"),
SkMetric("f1", average="macro"),
SkMetric("precision_score", average="macro"),
SkMetric("recall_score", average="macro"),
]
To evaluate a model, pass the data and a list of metric objects.
model.evaluate(iris.data, iris.target, metrics=metrics)
Evaluate will not return anything, instead it will fill metrics list inside a model.
from pprint import pprint
pprint(model.metrics)
[SkMetric(name=acc, value=0.9533333333333334, created_at=2024-09-16 19:06:05.354980+00:00),
SkMetric(name=f1, value=0.9532912954992826, created_at=2024-09-16 19:06:05.355031+00:00),
SkMetric(name=precision_score, value=0.9543690619563763, created_at=2024-09-16 19:06:05.355048+00:00),
SkMetric(name=recall_score, value=0.9533333333333333, created_at=2024-09-16 19:06:05.355060+00:00)]
Now to track all the results we can save the model to the line.
from cascade.lines import ModelLine
line = ModelLine("sklearn_demo", model_cls=SkModel)
line.save(model)