mlots.models package¶

Module contents¶

class mlots.models.AnnoyClassifier(n_neighbors=5, mac_neighbors=None, metric='euclidean', metric_params=None, n_trees=- 1, n_jobs=- 1, random_seed=1992)¶

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

NAME: AnnoyClassifier

This is a class that represents Annoy model with MAC/FAC strategy.

Parameters

n_neighbors (int (default 5)) – The n (or k) neighbors to consider for classification.
mac_neighbors (int (default None)) –
Number of neighbors to consider for MAC stage. If None, n_neighbors are used for classification directly.

If int; the classification is in two stages:
MAC stage: A candidate set of size ‘mac_neighbors’ are returned using ‘metric’.

FAC stage: n_neighbors from candidate set are used for classification using DTW.
metric (str (default "euclidean")) – The distance metric to be employed for Annoy. Check annoy library for allowed metrics.
metric_params (dict() (default None)) –
The parameters of the DTW for FAC stage.

Example: { “global_constraint” : “sakoe_chiba”, “sakoe_chiba_radius”: 1}

See tslearn.metrics for more details.
n_trees (int (default -1)) – The number of RPTrees to create for Annoy. If n_trees=-1, it creates as many RPTs as possible.
n_jobs (int (default -1)) – The number of CPU threads to use to build Annoy. -1 to use all the available threads.
random_seed (int (default 1992)) – The initial seed to be used by random function.

Returns

object – AnnoyClassifier class with the parameters supplied.

Return type

self

See also

annoy.AnnoyIndex: The underlying annoy module.
tslearn.metrics.dtw: The underlying dtw function.

Examples

>>> from mlots.models import AnnoyClassifier
>>> model = AnnoyClassifier(n_neighbors=9, random_seed=42)
>>> model.fit(X_train, y_train)
>>> model.score(X_test, y_test)
>>> 0.7880794701986755

fit(X_train, y_train)¶

This is the fit function for NSW model.

Parameters

X_train (ndarray) – The train data to be fitted.
y_train (array) – The true labels of X_train data.

Returns

object – AnnoyClassifier class with train data fitted.

Return type

self

predict(X_test)¶

This is the predict function for AnnoyClassifier model.

Parameters: X_test (ndarray) – The test data for the prediction.
Returns: y_hat – The predicted labels of the test samples.
Return type: array

class mlots.models.HNSWClassifier(n_neighbors=1, mac_neighbors=None, space='l2', max_elements=10, M=5, ef_construction=100, ef_Search=50, metric_params=None, random_seed=1992, n_jobs=- 1)¶

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

NAME: HNSWClassifier

This is a class that represents HNSW model from hnswlib combined with MAC/FAC strategy.

Parameters

n_neighbors (int (default 1)) – The n (or k) neighbors to consider for classification.
mac_neighbors (int (default None)) –
Number of neighbors to consider for MAC stage. If None, n_neighbors are used for classification directly.

If int; the classification is in two stages:
MAC stage: A candidate set of size ‘mac_neighbors’ are returned using HNSW with supplied ‘space’.

FAC stage: n_neighbors from candidate set are used for classification using DTW.
space (str (default "l2")) – The distance metric to be employed for HNSW. Check hnswlib library for allowed metrics.
max_elements (int (default 10)) – The maximum number of elements that can be stored in the structure.
M (int (default 5)) – The maximum number of outgoing connections in the graph.
ef_construction (int (default 100)) – Controls the tradeoff between construction time and accuracy. Bigger ef_construction leads to longer construction, but better index quality.
ef_Search (int (default 50)) – The size of the dynamic list for the nearest neighbors in HNSW. Higher ef leads to more accurate but slower search. The value ef of can be anything between k and the size of the dataset. if mac_neighbors = None; k = n_neighbors if mac_neighbors = int; k = mac_neighbors
metric_params (dict() (default None)) –
The parameters of the DTW for FAC stage.

Example: { “global_constraint” : “sakoe_chiba”, “sakoe_chiba_radius”: 1}

See tslearn.metrics for more details.
n_jobs (int (default -1)) – The number of CPU threads to use. -1 to use all the available threads.
random_seed (int (default 1992)) – The initial seed to be used by random function.

Returns

object – HNSWClassifier class with the parameters supplied.

Return type

self

See also

hnswlib.Index: The underlying hnsw module.
tslearn.metrics.dtw: The underlying dtw function.

Examples

>>> from mlots.models import HNSWClassifier
>>> model = HNSWClassifier(n_neighbors=5, mac_neighbors=30, metric_params={"global_constraint": "sakoe_chiba", "sakoe_chiba_radius": 23})
>>> model.fit(X_train, y_train)
>>> model.score(X_test, y_test)
>>> 0.8344370860927153

fit(X_train=None, y_train=None)¶

This is the fit function for HNSWClassifier model.

Parameters

X_train (ndarray) – The train data to be fitted.
y_train (array) – The true labels of X_train data.

Returns

object – HNSWClassifier class with train data fitted.

Return type

self

predict(X_test)¶

This is the predict function for HNSWClassifier model.

Parameters: X_test (ndarray) – The test data for the prediction.
Returns: y_hat – The predicted labels of the test samples.
Return type: array

class mlots.models.NSWClassifier(f: int = 1, m: int = 1, k: int = 1, metric: str = 'euclidean', metric_params=None, random_seed: int = 1992)¶

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

NAME: Navigable Small Worlds

This is a class that represents NSW model.

Parameters

f (int (default 1)) – The maximum number of friends a node can have or connect to.
m (int (default 1)) – Number of iterations or search in the network.
k (int (default 1)) – The number of neighbors to consider for classification.
metric (str (default "euclidean")) – The distance metric/measure to be employed. Can be one from the list: euclidean, dtw, lb_keogh
metric_params (dict() (default None)) –
The parameters of the metric being employed. Example: For metric = “dtw”, the metric_params can be:

{“global_constraint” : “sakoe_chiba”, “sakoe_chiba_radius”: 1}

See tslearn.metrics for more details.
random_seed (int (default 1992)) – The initial seed to be used by random function.

corpus¶

It stores the all the nodes in the network. The keys are the indices of the nodes and the values are the node objects of Node class.

Type: dict()

Returns: object – NSW class with the parameters supplied.
Return type: self

See also

sortedcollections.ValueSortedDict: The data-structure that stores conencted neighbours of a node in the corpus.
tslearn.metrics: The underlying library for dtw and lb_keogh distance measures.

Examples

>>> from mlots.models import NSWClassifier
>>> nsw = NSWClassifier(f=1, k=5, m=9, metric="euclidean")
>>> nsw.fit(X_train, y_train)
>>> nsw.score(X_test, y_test)
>>> 0.7086092715231788

fit(X_train, y_train, dist_mat=None)¶

This is the fit function for NSW model.

Parameters

X_train (ndarray) – The train data to be fitted.
y_train (array) – The true labels of X_train data.
dist_mat (ndarray (default None)) – [Optional] Pre-computed distance matrix for X_train vs X_train

Returns

object – NSW class with train data fitted.

Return type

self

kneighbors(X_test=None, dist_mat=None, return_prediction=False)¶

This is the kneighbors function for NSW model. The kneighbors are fetched for the test samples.

Parameters

X_test (ndarray) – The test data for the prediction.
dist_mat (ndarray (default None)) – [Optional] Pre-computed distance matrix for X_test vs X_train
return_prediction (bool (default False)) – If True, the function returns kneighbors and predictions (nns and y_hat)

Returns

nns (ndarray) – The kneighbors of the test samples.
y_hat (array) – The predicted labels of the test samples.

predict(X_test, dist_mat=None)¶

This is the predict function for NSW model.

Parameters

X_test (ndarray) – The test data for the prediction.
dist_mat (ndarray (default None)) – [Optional] Pre-computed distance matrix for X_test vs X_train

Returns

y_hat – The predicted labels of the test samples.

Return type

array

class mlots.models.RidgeClassifier(alpha=1.0, *, fit_intercept=True, normalize=False, copy_X=True, max_iter=None, tol=0.001, class_weight=None, solver='auto', random_state=None)¶

Bases: sklearn.linear_model._base.LinearClassifierMixin, sklearn.linear_model._ridge._BaseRidge

Classifier using Ridge regression.

This classifier first converts the target values into {-1, 1} and then treats the problem as a regression task (multi-output regression in the multiclass case).

See also

Ridge: Ridge regression.
RidgeClassifierCV: Ridge classifier with built-in cross validation.

Notes

For multi-class classification, n_class classifiers are trained in a one-versus-all approach. Concretely, this is implemented by taking advantage of the multi-variate response support in Ridge.

Examples

>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.linear_model import RidgeClassifier
>>> X, y = load_breast_cancer(return_X_y=True)
>>> clf = RidgeClassifier().fit(X, y)
>>> clf.score(X, y)
0.9595...

property classes_¶

fit(X, y, sample_weight=None)¶

Fit Ridge classifier model.

Parameters

X ({ndarray, sparse matrix} of shape (n_samples, n_features)) – Training data.
y (ndarray of shape (n_samples,)) – Target values.
sample_weight (float or ndarray of shape (n_samples,), default=None) –
Individual weights for each sample. If given a float, every sample will have the same weight.

New in version 0.17: sample_weight support to Classifier.

Returns

self – Instance of the estimator.

Return type

object

class mlots.models.RidgeClassifierCV(alphas=(0.1, 1.0, 10.0), *, fit_intercept=True, normalize=False, scoring=None, cv=None, class_weight=None, store_cv_values=False)¶

Bases: sklearn.linear_model._base.LinearClassifierMixin, sklearn.linear_model._ridge._BaseRidgeCV

Ridge classifier with built-in cross-validation.

See glossary entry for cross-validation estimator.

By default, it performs Leave-One-Out Cross-Validation. Currently, only the n_features > n_samples case is handled efficiently.

See also

Ridge: Ridge regression.
RidgeClassifier: Ridge classifier.
RidgeCV: Ridge regression with built-in cross validation.

Notes

For multi-class classification, n_class classifiers are trained in a one-versus-all approach. Concretely, this is implemented by taking advantage of the multi-variate response support in Ridge.

property classes_¶

fit(X, y, sample_weight=None)¶

Fit Ridge classifier with cv.

Parameters

X (ndarray of shape (n_samples, n_features)) – Training vectors, where n_samples is the number of samples and n_features is the number of features. When using GCV, will be cast to float64 if necessary.
y (ndarray of shape (n_samples,)) – Target values. Will be cast to X’s dtype if necessary.
sample_weight (float or ndarray of shape (n_samples,), default=None) – Individual weights for each sample. If given a float, every sample will have the same weight.

Returns

self

Return type

object

class mlots.models.kNNClassifier(n_neighbors=5, mac_neighbors=None, weights='uniform', mac_metric='euclidean', metric_params=None, n_jobs=- 1)¶

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

NAME: kNNClassifier

This is a class that represents kNNClassifier model with MAC/FAC strategy.

Parameters

n_neighbors (int (default 5)) – The n (or k) neighbors to consider for classification.
mac_neighbors (int (default None)) –
Number of neighbors to consider for MAC stage. If None, n_neighbors are used for classification directly.

If int; the classification is in two stages:
MAC stage: A candidate set of size ‘mac_neighbors’ are returned using ‘mac_metric’.

FAC stage: n_neighbors from candidate set are used for classification using DTW.
weights (str (default "uniform")) – The weighting scheme of the distances. Options: “uniform” or “distance”
mac_metric (str (default "euclidean")) – The distance metric to be employed for MAC stage. Check tslearn.neighbors.KNeighborsTimeSeriesClassifier for allowed metrics.
metric_params (dict() (default None)) –
The parameters of the DTW for FAC stage.

Example: { “global_constraint” : “sakoe_chiba”, “sakoe_chiba_radius”: 1}

See tslearn.metrics for more details.
n_jobs (int (default -1)) – The number of CPU threads to use. -1 to use all the available threads.

Returns

object – kNNClassifier class with the parameters supplied.

Return type

self

See also

tslearn.neighbors.KNeighborsTimeSeriesClassifier: The underlying k-NN module for time-series data.
tslearn.metrics.dtw: The underlying dtw function.

Examples

>>> from mlots.models import kNNClassifier
>>> model = kNNClassifier(n_neighbors=5)
>>> model.fit(X_train, y_train)
>>> model.score(X_test, y_test)
>>> 0.7814569536423841

fit(X_train, y_train)¶

This is the fit function for kNNClassifier model.

Parameters

X_train (ndarray) – The train data to be fitted.
y_train (array) – The true labels of X_train data.

Returns

object – kNNClassifier class with train data fitted.

Return type

self

predict(X_test)¶

This is the predict function for kNNClassifier model.

Parameters: X_test (ndarray) – The test data for the prediction.
Returns: y_hat – The predicted labels of the test samples.
Return type: array

class mlots.models.kNNClassifier_CustomDist(n_neighbors=5, mac_neighbors=None, weights='uniform', mac_metric='lb_keogh', metric_params=None, n_jobs=- 1)¶

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

NAME: kNNClassifier_CustomDist

This is a class that represents kNNClassifier_CustomDist model with MAC/FAC strategy.

Parameters

n_neighbors (int (default 5)) – The n (or k) neighbors to consider for classification.
mac_neighbors (int (default None)) –
Number of neighbors to consider for MAC stage. If None, n_neighbors are used for classification directly.

If int; the classification is in two stages:
MAC stage: A candidate set of size ‘mac_neighbors’ are returned using ‘mac_metric’.

FAC stage: n_neighbors from candidate set are used for classification using DTW.
weights (str (default "uniform")) – The weighting scheme of the distances. Options: “uniform” or “distance”
mac_metric (str (default "lb_keogh")) –
The distance metric to be employed for MAC stage.

Options:

"lb_keogh",

any allowed distance measures for sklearn.neighbors.KNeighborsClassifier,

or, a callable distance function.

If mac_metric = “lb_keogh”, provide “radius” parameter for it in metric_params.
metric_params (dict() (default None)) –
The parameters of the DTW for FAC stage.

Example: { “global_constraint” : “sakoe_chiba”, “sakoe_chiba_radius”: 1}

Check tslearn.neighbors.KNeighborsTimeSeriesClassifier model for allowed metrics.
n_jobs (int (default -1)) – The number of CPU threads to use. -1 to use all the available threads.

Returns

object – kNNClassifier_CustomDist class with the parameters supplied.

Return type

self

See also

sklearn.neighbors.KNeighborsClassifier: The underlying k-NN module for MAC stage with custom distance measure.
tslearn.neighbors.KNeighborsTimeSeriesClassifier: The underlying k-NN module for FAC stage with dtw.
tslearn.metrics.dtw: The underlying dtw function.

Examples

>>> from mlots.models import kNNClassifier_CustomDist
>>> model = kNNClassifier_CustomDist(mac_metric="lb_keogh", mac_neighbors=20, metric_params={"radius": 23})
>>> model.fit(X_train, y_train)
>>> model.score(X_test, y_test)
>>> 0.7748344370860927

fit(X_train, y_train)¶

This is the fit function for kNNClassifier_CustomDist model.

Parameters

X_train (ndarray) – The train data to be fitted.
y_train (array) – The true labels of X_train data.

Returns

object – kNNClassifier_CustomDist class with train data fitted.

Return type

self

predict(X_test)¶

This is the predict function for kNNClassifier_CustomDist model.

Parameters: X_test (ndarray) – The test data for the prediction.
Returns: y_hat – The predicted labels of the test samples.
Return type: array