mlots.models package

Module contents

class mlots.models.AnnoyClassifier(n_neighbors=5, mac_neighbors=None, metric='euclidean', metric_params=None, n_trees=- 1, n_jobs=- 1, random_seed=1992)

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

NAME: AnnoyClassifier

This is a class that represents Annoy model with MAC/FAC strategy.

Parameters
  • n_neighbors (int (default 5)) – The n (or k) neighbors to consider for classification.

  • mac_neighbors (int (default None)) –

    Number of neighbors to consider for MAC stage. If None, n_neighbors are used for classification directly.

    If int; the classification is in two stages:

    MAC stage: A candidate set of size ‘mac_neighbors’ are returned using ‘metric’.

    FAC stage: n_neighbors from candidate set are used for classification using DTW.

  • metric (str (default "euclidean")) – The distance metric to be employed for Annoy. Check annoy library for allowed metrics.

  • metric_params (dict() (default None)) –

    The parameters of the DTW for FAC stage.

    Example: { “global_constraint” : “sakoe_chiba”, “sakoe_chiba_radius”: 1}

    See tslearn.metrics for more details.

  • n_trees (int (default -1)) – The number of RPTrees to create for Annoy. If n_trees=-1, it creates as many RPTs as possible.

  • n_jobs (int (default -1)) – The number of CPU threads to use to build Annoy. -1 to use all the available threads.

  • random_seed (int (default 1992)) – The initial seed to be used by random function.

Returns

object – AnnoyClassifier class with the parameters supplied.

Return type

self

See also

annoy.AnnoyIndex

The underlying annoy module.

tslearn.metrics.dtw

The underlying dtw function.

Examples

>>> from mlots.models import AnnoyClassifier
>>> model = AnnoyClassifier(n_neighbors=9, random_seed=42)
>>> model.fit(X_train, y_train)
>>> model.score(X_test, y_test)
>>> 0.7880794701986755
fit(X_train, y_train)

This is the fit function for NSW model.

Parameters
  • X_train (ndarray) – The train data to be fitted.

  • y_train (array) – The true labels of X_train data.

Returns

object – AnnoyClassifier class with train data fitted.

Return type

self

predict(X_test)

This is the predict function for AnnoyClassifier model.

Parameters

X_test (ndarray) – The test data for the prediction.

Returns

y_hat – The predicted labels of the test samples.

Return type

array

class mlots.models.HNSWClassifier(n_neighbors=1, mac_neighbors=None, space='l2', max_elements=10, M=5, ef_construction=100, ef_Search=50, metric_params=None, random_seed=1992, n_jobs=- 1)

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

NAME: HNSWClassifier

This is a class that represents HNSW model from hnswlib combined with MAC/FAC strategy.

Parameters
  • n_neighbors (int (default 1)) – The n (or k) neighbors to consider for classification.

  • mac_neighbors (int (default None)) –

    Number of neighbors to consider for MAC stage. If None, n_neighbors are used for classification directly.

    If int; the classification is in two stages:

    MAC stage: A candidate set of size ‘mac_neighbors’ are returned using HNSW with supplied ‘space’.

    FAC stage: n_neighbors from candidate set are used for classification using DTW.

  • space (str (default "l2")) – The distance metric to be employed for HNSW. Check hnswlib library for allowed metrics.

  • max_elements (int (default 10)) – The maximum number of elements that can be stored in the structure.

  • M (int (default 5)) – The maximum number of outgoing connections in the graph.

  • ef_construction (int (default 100)) – Controls the tradeoff between construction time and accuracy. Bigger ef_construction leads to longer construction, but better index quality.

  • ef_Search (int (default 50)) – The size of the dynamic list for the nearest neighbors in HNSW. Higher ef leads to more accurate but slower search. The value ef of can be anything between k and the size of the dataset. if mac_neighbors = None; k = n_neighbors if mac_neighbors = int; k = mac_neighbors

  • metric_params (dict() (default None)) –

    The parameters of the DTW for FAC stage.

    Example: { “global_constraint” : “sakoe_chiba”, “sakoe_chiba_radius”: 1}

    See tslearn.metrics for more details.

  • n_jobs (int (default -1)) – The number of CPU threads to use. -1 to use all the available threads.

  • random_seed (int (default 1992)) – The initial seed to be used by random function.

Returns

object – HNSWClassifier class with the parameters supplied.

Return type

self

See also

hnswlib.Index

The underlying hnsw module.

tslearn.metrics.dtw

The underlying dtw function.

Examples

>>> from mlots.models import HNSWClassifier
>>> model = HNSWClassifier(n_neighbors=5, mac_neighbors=30, metric_params={"global_constraint": "sakoe_chiba", "sakoe_chiba_radius": 23})
>>> model.fit(X_train, y_train)
>>> model.score(X_test, y_test)
>>> 0.8344370860927153
fit(X_train=None, y_train=None)

This is the fit function for HNSWClassifier model.

Parameters
  • X_train (ndarray) – The train data to be fitted.

  • y_train (array) – The true labels of X_train data.

Returns

object – HNSWClassifier class with train data fitted.

Return type

self

predict(X_test)

This is the predict function for HNSWClassifier model.

Parameters

X_test (ndarray) – The test data for the prediction.

Returns

y_hat – The predicted labels of the test samples.

Return type

array

class mlots.models.NSWClassifier(f: int = 1, m: int = 1, k: int = 1, metric: str = 'euclidean', metric_params=None, random_seed: int = 1992)

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

NAME: Navigable Small Worlds

This is a class that represents NSW model.

Parameters
  • f (int (default 1)) – The maximum number of friends a node can have or connect to.

  • m (int (default 1)) – Number of iterations or search in the network.

  • k (int (default 1)) – The number of neighbors to consider for classification.

  • metric (str (default "euclidean")) – The distance metric/measure to be employed. Can be one from the list: euclidean, dtw, lb_keogh

  • metric_params (dict() (default None)) –

    The parameters of the metric being employed. Example: For metric = “dtw”, the metric_params can be:

    {“global_constraint” : “sakoe_chiba”, “sakoe_chiba_radius”: 1}

    See tslearn.metrics for more details.

  • random_seed (int (default 1992)) – The initial seed to be used by random function.

corpus

It stores the all the nodes in the network. The keys are the indices of the nodes and the values are the node objects of Node class.

Type

dict()

Returns

object – NSW class with the parameters supplied.

Return type

self

See also

sortedcollections.ValueSortedDict

The data-structure that stores conencted neighbours of a node in the corpus.

tslearn.metrics

The underlying library for dtw and lb_keogh distance measures.

Examples

>>> from mlots.models import NSWClassifier
>>> nsw = NSWClassifier(f=1, k=5, m=9, metric="euclidean")
>>> nsw.fit(X_train, y_train)
>>> nsw.score(X_test, y_test)
>>> 0.7086092715231788
fit(X_train, y_train, dist_mat=None)

This is the fit function for NSW model.

Parameters
  • X_train (ndarray) – The train data to be fitted.

  • y_train (array) – The true labels of X_train data.

  • dist_mat (ndarray (default None)) – [Optional] Pre-computed distance matrix for X_train vs X_train

Returns

object – NSW class with train data fitted.

Return type

self

kneighbors(X_test=None, dist_mat=None, return_prediction=False)

This is the kneighbors function for NSW model. The kneighbors are fetched for the test samples.

Parameters
  • X_test (ndarray) – The test data for the prediction.

  • dist_mat (ndarray (default None)) – [Optional] Pre-computed distance matrix for X_test vs X_train

  • return_prediction (bool (default False)) – If True, the function returns kneighbors and predictions (nns and y_hat)

Returns

  • nns (ndarray) – The kneighbors of the test samples.

  • y_hat (array) – The predicted labels of the test samples.

predict(X_test, dist_mat=None)

This is the predict function for NSW model.

Parameters
  • X_test (ndarray) – The test data for the prediction.

  • dist_mat (ndarray (default None)) – [Optional] Pre-computed distance matrix for X_test vs X_train

Returns

y_hat – The predicted labels of the test samples.

Return type

array

class mlots.models.RidgeClassifier(alpha=1.0, *, fit_intercept=True, normalize=False, copy_X=True, max_iter=None, tol=0.001, class_weight=None, solver='auto', random_state=None)

Bases: sklearn.linear_model._base.LinearClassifierMixin, sklearn.linear_model._ridge._BaseRidge

Classifier using Ridge regression.

This classifier first converts the target values into {-1, 1} and then treats the problem as a regression task (multi-output regression in the multiclass case).

Read more in the User Guide.

Parameters
  • alpha (float, default=1.0) – Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to 1 / (2C) in other linear models such as LogisticRegression or LinearSVC.

  • fit_intercept (bool, default=True) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).

  • normalize (bool, default=False) – This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use StandardScaler before calling fit on an estimator with normalize=False.

  • copy_X (bool, default=True) – If True, X will be copied; else, it may be overwritten.

  • max_iter (int, default=None) – Maximum number of iterations for conjugate gradient solver. The default value is determined by scipy.sparse.linalg.

  • tol (float, default=1e-3) – Precision of the solution.

  • class_weight (dict or 'balanced', default=None) –

    Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

    The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

  • solver ({'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga'}, default='auto') –

    Solver to use in the computational routines:

    • ’auto’ chooses the solver automatically based on the type of data.

    • ’svd’ uses a Singular Value Decomposition of X to compute the Ridge coefficients. More stable for singular matrices than ‘cholesky’.

    • ’cholesky’ uses the standard scipy.linalg.solve function to obtain a closed-form solution.

    • ’sparse_cg’ uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than ‘cholesky’ for large-scale data (possibility to set tol and max_iter).

    • ’lsqr’ uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure.

    • ’sag’ uses a Stochastic Average Gradient descent, and ‘saga’ uses its unbiased and more flexible version named SAGA. Both methods use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.

      New in version 0.17: Stochastic Average Gradient descent solver.

      New in version 0.19: SAGA solver.

  • random_state (int, RandomState instance, default=None) – Used when solver == ‘sag’ or ‘saga’ to shuffle the data. See Glossary for details.

coef_

Coefficient of the features in the decision function.

coef_ is of shape (1, n_features) when the given problem is binary.

Type

ndarray of shape (1, n_features) or (n_classes, n_features)

intercept_

Independent term in decision function. Set to 0.0 if fit_intercept = False.

Type

float or ndarray of shape (n_targets,)

n_iter_

Actual number of iterations for each target. Available only for sag and lsqr solvers. Other solvers will return None.

Type

None or ndarray of shape (n_targets,)

classes_

The classes labels.

Type

ndarray of shape (n_classes,)

See also

Ridge

Ridge regression.

RidgeClassifierCV

Ridge classifier with built-in cross validation.

Notes

For multi-class classification, n_class classifiers are trained in a one-versus-all approach. Concretely, this is implemented by taking advantage of the multi-variate response support in Ridge.

Examples

>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.linear_model import RidgeClassifier
>>> X, y = load_breast_cancer(return_X_y=True)
>>> clf = RidgeClassifier().fit(X, y)
>>> clf.score(X, y)
0.9595...
property classes_
fit(X, y, sample_weight=None)

Fit Ridge classifier model.

Parameters
  • X ({ndarray, sparse matrix} of shape (n_samples, n_features)) – Training data.

  • y (ndarray of shape (n_samples,)) – Target values.

  • sample_weight (float or ndarray of shape (n_samples,), default=None) –

    Individual weights for each sample. If given a float, every sample will have the same weight.

    New in version 0.17: sample_weight support to Classifier.

Returns

self – Instance of the estimator.

Return type

object

class mlots.models.RidgeClassifierCV(alphas=(0.1, 1.0, 10.0), *, fit_intercept=True, normalize=False, scoring=None, cv=None, class_weight=None, store_cv_values=False)

Bases: sklearn.linear_model._base.LinearClassifierMixin, sklearn.linear_model._ridge._BaseRidgeCV

Ridge classifier with built-in cross-validation.

See glossary entry for cross-validation estimator.

By default, it performs Leave-One-Out Cross-Validation. Currently, only the n_features > n_samples case is handled efficiently.

Read more in the User Guide.

Parameters
  • alphas (ndarray of shape (n_alphas,), default=(0.1, 1.0, 10.0)) – Array of alpha values to try. Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to 1 / (2C) in other linear models such as LogisticRegression or LinearSVC.

  • fit_intercept (bool, default=True) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

  • normalize (bool, default=False) – This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use StandardScaler before calling fit on an estimator with normalize=False.

  • scoring (string, callable, default=None) – A string (see model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y).

  • cv (int, cross-validation generator or an iterable, default=None) –

    Determines the cross-validation splitting strategy. Possible inputs for cv are:

    • None, to use the efficient Leave-One-Out cross-validation

    • integer, to specify the number of folds.

    • CV splitter,

    • An iterable yielding (train, test) splits as arrays of indices.

    Refer User Guide for the various cross-validation strategies that can be used here.

  • class_weight (dict or 'balanced', default=None) –

    Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

    The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

  • store_cv_values (bool, default=False) – Flag indicating if the cross-validation values corresponding to each alpha should be stored in the cv_values_ attribute (see below). This flag is only compatible with cv=None (i.e. using Leave-One-Out Cross-Validation).

cv_values_

Cross-validation values for each alpha (if store_cv_values=True and cv=None). After fit() has been called, this attribute will contain the mean squared errors (by default) or the values of the {loss,score}_func function (if provided in the constructor). This attribute exists only when store_cv_values is True.

Type

ndarray of shape (n_samples, n_targets, n_alphas), optional

coef_

Coefficient of the features in the decision function.

coef_ is of shape (1, n_features) when the given problem is binary.

Type

ndarray of shape (1, n_features) or (n_targets, n_features)

intercept_

Independent term in decision function. Set to 0.0 if fit_intercept = False.

Type

float or ndarray of shape (n_targets,)

alpha_

Estimated regularization parameter.

Type

float

best_score_

Score of base estimator with best alpha.

New in version 0.23.

Type

float

classes_

The classes labels.

Type

ndarray of shape (n_classes,)

Examples

>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.linear_model import RidgeClassifierCV
>>> X, y = load_breast_cancer(return_X_y=True)
>>> clf = RidgeClassifierCV(alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X, y)
>>> clf.score(X, y)
0.9630...

See also

Ridge

Ridge regression.

RidgeClassifier

Ridge classifier.

RidgeCV

Ridge regression with built-in cross validation.

Notes

For multi-class classification, n_class classifiers are trained in a one-versus-all approach. Concretely, this is implemented by taking advantage of the multi-variate response support in Ridge.

property classes_
fit(X, y, sample_weight=None)

Fit Ridge classifier with cv.

Parameters
  • X (ndarray of shape (n_samples, n_features)) – Training vectors, where n_samples is the number of samples and n_features is the number of features. When using GCV, will be cast to float64 if necessary.

  • y (ndarray of shape (n_samples,)) – Target values. Will be cast to X’s dtype if necessary.

  • sample_weight (float or ndarray of shape (n_samples,), default=None) – Individual weights for each sample. If given a float, every sample will have the same weight.

Returns

self

Return type

object

class mlots.models.kNNClassifier(n_neighbors=5, mac_neighbors=None, weights='uniform', mac_metric='euclidean', metric_params=None, n_jobs=- 1)

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

NAME: kNNClassifier

This is a class that represents kNNClassifier model with MAC/FAC strategy.

Parameters
  • n_neighbors (int (default 5)) – The n (or k) neighbors to consider for classification.

  • mac_neighbors (int (default None)) –

    Number of neighbors to consider for MAC stage. If None, n_neighbors are used for classification directly.

    If int; the classification is in two stages:

    MAC stage: A candidate set of size ‘mac_neighbors’ are returned using ‘mac_metric’.

    FAC stage: n_neighbors from candidate set are used for classification using DTW.

  • weights (str (default "uniform")) – The weighting scheme of the distances. Options: “uniform” or “distance”

  • mac_metric (str (default "euclidean")) – The distance metric to be employed for MAC stage. Check tslearn.neighbors.KNeighborsTimeSeriesClassifier for allowed metrics.

  • metric_params (dict() (default None)) –

    The parameters of the DTW for FAC stage.

    Example: { “global_constraint” : “sakoe_chiba”, “sakoe_chiba_radius”: 1}

    See tslearn.metrics for more details.

  • n_jobs (int (default -1)) – The number of CPU threads to use. -1 to use all the available threads.

Returns

object – kNNClassifier class with the parameters supplied.

Return type

self

See also

tslearn.neighbors.KNeighborsTimeSeriesClassifier

The underlying k-NN module for time-series data.

tslearn.metrics.dtw

The underlying dtw function.

Examples

>>> from mlots.models import kNNClassifier
>>> model = kNNClassifier(n_neighbors=5)
>>> model.fit(X_train, y_train)
>>> model.score(X_test, y_test)
>>> 0.7814569536423841
fit(X_train, y_train)

This is the fit function for kNNClassifier model.

Parameters
  • X_train (ndarray) – The train data to be fitted.

  • y_train (array) – The true labels of X_train data.

Returns

object – kNNClassifier class with train data fitted.

Return type

self

predict(X_test)

This is the predict function for kNNClassifier model.

Parameters

X_test (ndarray) – The test data for the prediction.

Returns

y_hat – The predicted labels of the test samples.

Return type

array

class mlots.models.kNNClassifier_CustomDist(n_neighbors=5, mac_neighbors=None, weights='uniform', mac_metric='lb_keogh', metric_params=None, n_jobs=- 1)

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

NAME: kNNClassifier_CustomDist

This is a class that represents kNNClassifier_CustomDist model with MAC/FAC strategy.

Parameters
  • n_neighbors (int (default 5)) – The n (or k) neighbors to consider for classification.

  • mac_neighbors (int (default None)) –

    Number of neighbors to consider for MAC stage. If None, n_neighbors are used for classification directly.

    If int; the classification is in two stages:

    MAC stage: A candidate set of size ‘mac_neighbors’ are returned using ‘mac_metric’.

    FAC stage: n_neighbors from candidate set are used for classification using DTW.

  • weights (str (default "uniform")) – The weighting scheme of the distances. Options: “uniform” or “distance”

  • mac_metric (str (default "lb_keogh")) –

    The distance metric to be employed for MAC stage.

    Options:

    "lb_keogh",

    any allowed distance measures for sklearn.neighbors.KNeighborsClassifier,

    or, a callable distance function.

    If mac_metric = “lb_keogh”, provide “radius” parameter for it in metric_params.

  • metric_params (dict() (default None)) –

    The parameters of the DTW for FAC stage.

    Example: { “global_constraint” : “sakoe_chiba”, “sakoe_chiba_radius”: 1}

    Check tslearn.neighbors.KNeighborsTimeSeriesClassifier model for allowed metrics.

  • n_jobs (int (default -1)) – The number of CPU threads to use. -1 to use all the available threads.

Returns

object – kNNClassifier_CustomDist class with the parameters supplied.

Return type

self

See also

sklearn.neighbors.KNeighborsClassifier

The underlying k-NN module for MAC stage with custom distance measure.

tslearn.neighbors.KNeighborsTimeSeriesClassifier

The underlying k-NN module for FAC stage with dtw.

tslearn.metrics.dtw

The underlying dtw function.

Examples

>>> from mlots.models import kNNClassifier_CustomDist
>>> model = kNNClassifier_CustomDist(mac_metric="lb_keogh", mac_neighbors=20, metric_params={"radius": 23})
>>> model.fit(X_train, y_train)
>>> model.score(X_test, y_test)
>>> 0.7748344370860927
fit(X_train, y_train)

This is the fit function for kNNClassifier_CustomDist model.

Parameters
  • X_train (ndarray) – The train data to be fitted.

  • y_train (array) – The true labels of X_train data.

Returns

object – kNNClassifier_CustomDist class with train data fitted.

Return type

self

predict(X_test)

This is the predict function for kNNClassifier_CustomDist model.

Parameters

X_test (ndarray) – The test data for the prediction.

Returns

y_hat – The predicted labels of the test samples.

Return type

array