mlots.models package¶
Module contents¶
-
class
mlots.models.
AnnoyClassifier
(n_neighbors=5, mac_neighbors=None, metric='euclidean', metric_params=None, n_trees=- 1, n_jobs=- 1, random_seed=1992)¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.ClassifierMixin
NAME: AnnoyClassifier
This is a class that represents Annoy model with MAC/FAC strategy.
- Parameters
n_neighbors (int (default 5)) – The n (or k) neighbors to consider for classification.
mac_neighbors (int (default None)) –
Number of neighbors to consider for MAC stage. If None, n_neighbors are used for classification directly.
- If int; the classification is in two stages:
MAC stage: A candidate set of size ‘mac_neighbors’ are returned using ‘metric’.
FAC stage: n_neighbors from candidate set are used for classification using DTW.
metric (str (default "euclidean")) – The distance metric to be employed for Annoy. Check annoy library for allowed metrics.
metric_params (dict() (default None)) –
The parameters of the DTW for FAC stage.
Example: { “global_constraint” : “sakoe_chiba”, “sakoe_chiba_radius”: 1}
See tslearn.metrics for more details.
n_trees (int (default -1)) – The number of RPTrees to create for Annoy. If n_trees=-1, it creates as many RPTs as possible.
n_jobs (int (default -1)) – The number of CPU threads to use to build Annoy. -1 to use all the available threads.
random_seed (int (default 1992)) – The initial seed to be used by random function.
- Returns
object – AnnoyClassifier class with the parameters supplied.
- Return type
self
See also
annoy.AnnoyIndex
The underlying annoy module.
tslearn.metrics.dtw
The underlying dtw function.
Examples
>>> from mlots.models import AnnoyClassifier >>> model = AnnoyClassifier(n_neighbors=9, random_seed=42) >>> model.fit(X_train, y_train) >>> model.score(X_test, y_test) >>> 0.7880794701986755
-
fit
(X_train, y_train)¶ This is the fit function for NSW model.
- Parameters
X_train (ndarray) – The train data to be fitted.
y_train (array) – The true labels of X_train data.
- Returns
object – AnnoyClassifier class with train data fitted.
- Return type
self
-
predict
(X_test)¶ This is the predict function for AnnoyClassifier model.
- Parameters
X_test (ndarray) – The test data for the prediction.
- Returns
y_hat – The predicted labels of the test samples.
- Return type
array
-
class
mlots.models.
HNSWClassifier
(n_neighbors=1, mac_neighbors=None, space='l2', max_elements=10, M=5, ef_construction=100, ef_Search=50, metric_params=None, random_seed=1992, n_jobs=- 1)¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.ClassifierMixin
NAME: HNSWClassifier
This is a class that represents HNSW model from hnswlib combined with MAC/FAC strategy.
- Parameters
n_neighbors (int (default 1)) – The n (or k) neighbors to consider for classification.
mac_neighbors (int (default None)) –
Number of neighbors to consider for MAC stage. If None, n_neighbors are used for classification directly.
- If int; the classification is in two stages:
MAC stage: A candidate set of size ‘mac_neighbors’ are returned using HNSW with supplied ‘space’.
FAC stage: n_neighbors from candidate set are used for classification using DTW.
space (str (default "l2")) – The distance metric to be employed for HNSW. Check hnswlib library for allowed metrics.
max_elements (int (default 10)) – The maximum number of elements that can be stored in the structure.
M (int (default 5)) – The maximum number of outgoing connections in the graph.
ef_construction (int (default 100)) – Controls the tradeoff between construction time and accuracy. Bigger ef_construction leads to longer construction, but better index quality.
ef_Search (int (default 50)) – The size of the dynamic list for the nearest neighbors in HNSW. Higher ef leads to more accurate but slower search. The value ef of can be anything between k and the size of the dataset. if mac_neighbors = None; k = n_neighbors if mac_neighbors = int; k = mac_neighbors
metric_params (dict() (default None)) –
The parameters of the DTW for FAC stage.
Example: { “global_constraint” : “sakoe_chiba”, “sakoe_chiba_radius”: 1}
See tslearn.metrics for more details.
n_jobs (int (default -1)) – The number of CPU threads to use. -1 to use all the available threads.
random_seed (int (default 1992)) – The initial seed to be used by random function.
- Returns
object – HNSWClassifier class with the parameters supplied.
- Return type
self
See also
hnswlib.Index
The underlying hnsw module.
tslearn.metrics.dtw
The underlying dtw function.
Examples
>>> from mlots.models import HNSWClassifier >>> model = HNSWClassifier(n_neighbors=5, mac_neighbors=30, metric_params={"global_constraint": "sakoe_chiba", "sakoe_chiba_radius": 23}) >>> model.fit(X_train, y_train) >>> model.score(X_test, y_test) >>> 0.8344370860927153
-
fit
(X_train=None, y_train=None)¶ This is the fit function for HNSWClassifier model.
- Parameters
X_train (ndarray) – The train data to be fitted.
y_train (array) – The true labels of X_train data.
- Returns
object – HNSWClassifier class with train data fitted.
- Return type
self
-
predict
(X_test)¶ This is the predict function for HNSWClassifier model.
- Parameters
X_test (ndarray) – The test data for the prediction.
- Returns
y_hat – The predicted labels of the test samples.
- Return type
array
-
class
mlots.models.
NSWClassifier
(f: int = 1, m: int = 1, k: int = 1, metric: str = 'euclidean', metric_params=None, random_seed: int = 1992)¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.ClassifierMixin
NAME: Navigable Small Worlds
This is a class that represents NSW model.
- Parameters
f (int (default 1)) – The maximum number of friends a node can have or connect to.
m (int (default 1)) – Number of iterations or search in the network.
k (int (default 1)) – The number of neighbors to consider for classification.
metric (str (default "euclidean")) – The distance metric/measure to be employed. Can be one from the list: euclidean, dtw, lb_keogh
metric_params (dict() (default None)) –
The parameters of the metric being employed. Example: For metric = “dtw”, the metric_params can be:
{“global_constraint” : “sakoe_chiba”, “sakoe_chiba_radius”: 1}
See tslearn.metrics for more details.
random_seed (int (default 1992)) – The initial seed to be used by random function.
-
corpus
¶ It stores the all the nodes in the network. The keys are the indices of the nodes and the values are the node objects of Node class.
- Type
dict()
- Returns
object – NSW class with the parameters supplied.
- Return type
self
See also
sortedcollections.ValueSortedDict
The data-structure that stores conencted neighbours of a node in the corpus.
tslearn.metrics
The underlying library for dtw and lb_keogh distance measures.
Examples
>>> from mlots.models import NSWClassifier >>> nsw = NSWClassifier(f=1, k=5, m=9, metric="euclidean") >>> nsw.fit(X_train, y_train) >>> nsw.score(X_test, y_test) >>> 0.7086092715231788
-
fit
(X_train, y_train, dist_mat=None)¶ This is the fit function for NSW model.
- Parameters
X_train (ndarray) – The train data to be fitted.
y_train (array) – The true labels of X_train data.
dist_mat (ndarray (default None)) – [Optional] Pre-computed distance matrix for X_train vs X_train
- Returns
object – NSW class with train data fitted.
- Return type
self
-
kneighbors
(X_test=None, dist_mat=None, return_prediction=False)¶ This is the kneighbors function for NSW model. The kneighbors are fetched for the test samples.
- Parameters
X_test (ndarray) – The test data for the prediction.
dist_mat (ndarray (default None)) – [Optional] Pre-computed distance matrix for X_test vs X_train
return_prediction (bool (default False)) – If True, the function returns kneighbors and predictions (nns and y_hat)
- Returns
nns (ndarray) – The kneighbors of the test samples.
y_hat (array) – The predicted labels of the test samples.
-
predict
(X_test, dist_mat=None)¶ This is the predict function for NSW model.
- Parameters
X_test (ndarray) – The test data for the prediction.
dist_mat (ndarray (default None)) – [Optional] Pre-computed distance matrix for X_test vs X_train
- Returns
y_hat – The predicted labels of the test samples.
- Return type
array
-
class
mlots.models.
RidgeClassifier
(alpha=1.0, *, fit_intercept=True, normalize=False, copy_X=True, max_iter=None, tol=0.001, class_weight=None, solver='auto', random_state=None)¶ Bases:
sklearn.linear_model._base.LinearClassifierMixin
,sklearn.linear_model._ridge._BaseRidge
Classifier using Ridge regression.
This classifier first converts the target values into
{-1, 1}
and then treats the problem as a regression task (multi-output regression in the multiclass case).Read more in the User Guide.
- Parameters
alpha (float, default=1.0) – Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to
1 / (2C)
in other linear models such asLogisticRegression
orLinearSVC
.fit_intercept (bool, default=True) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
normalize (bool, default=False) – This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please useStandardScaler
before callingfit
on an estimator withnormalize=False
.copy_X (bool, default=True) – If True, X will be copied; else, it may be overwritten.
max_iter (int, default=None) – Maximum number of iterations for conjugate gradient solver. The default value is determined by scipy.sparse.linalg.
tol (float, default=1e-3) – Precision of the solution.
class_weight (dict or 'balanced', default=None) –
Weights associated with classes in the form
{class_label: weight}
. If not given, all classes are supposed to have weight one.The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y))
.solver ({'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga'}, default='auto') –
Solver to use in the computational routines:
’auto’ chooses the solver automatically based on the type of data.
’svd’ uses a Singular Value Decomposition of X to compute the Ridge coefficients. More stable for singular matrices than ‘cholesky’.
’cholesky’ uses the standard scipy.linalg.solve function to obtain a closed-form solution.
’sparse_cg’ uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than ‘cholesky’ for large-scale data (possibility to set tol and max_iter).
’lsqr’ uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure.
’sag’ uses a Stochastic Average Gradient descent, and ‘saga’ uses its unbiased and more flexible version named SAGA. Both methods use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.
New in version 0.17: Stochastic Average Gradient descent solver.
New in version 0.19: SAGA solver.
random_state (int, RandomState instance, default=None) – Used when
solver
== ‘sag’ or ‘saga’ to shuffle the data. See Glossary for details.
-
coef_
¶ Coefficient of the features in the decision function.
coef_
is of shape (1, n_features) when the given problem is binary.- Type
ndarray of shape (1, n_features) or (n_classes, n_features)
-
intercept_
¶ Independent term in decision function. Set to 0.0 if
fit_intercept = False
.- Type
float or ndarray of shape (n_targets,)
-
n_iter_
¶ Actual number of iterations for each target. Available only for sag and lsqr solvers. Other solvers will return None.
- Type
None or ndarray of shape (n_targets,)
-
classes_
¶ The classes labels.
- Type
ndarray of shape (n_classes,)
See also
Ridge
Ridge regression.
RidgeClassifierCV
Ridge classifier with built-in cross validation.
Notes
For multi-class classification, n_class classifiers are trained in a one-versus-all approach. Concretely, this is implemented by taking advantage of the multi-variate response support in Ridge.
Examples
>>> from sklearn.datasets import load_breast_cancer >>> from sklearn.linear_model import RidgeClassifier >>> X, y = load_breast_cancer(return_X_y=True) >>> clf = RidgeClassifier().fit(X, y) >>> clf.score(X, y) 0.9595...
-
property
classes_
¶
-
fit
(X, y, sample_weight=None)¶ Fit Ridge classifier model.
- Parameters
X ({ndarray, sparse matrix} of shape (n_samples, n_features)) – Training data.
y (ndarray of shape (n_samples,)) – Target values.
sample_weight (float or ndarray of shape (n_samples,), default=None) –
Individual weights for each sample. If given a float, every sample will have the same weight.
New in version 0.17: sample_weight support to Classifier.
- Returns
self – Instance of the estimator.
- Return type
object
-
class
mlots.models.
RidgeClassifierCV
(alphas=(0.1, 1.0, 10.0), *, fit_intercept=True, normalize=False, scoring=None, cv=None, class_weight=None, store_cv_values=False)¶ Bases:
sklearn.linear_model._base.LinearClassifierMixin
,sklearn.linear_model._ridge._BaseRidgeCV
Ridge classifier with built-in cross-validation.
See glossary entry for cross-validation estimator.
By default, it performs Leave-One-Out Cross-Validation. Currently, only the n_features > n_samples case is handled efficiently.
Read more in the User Guide.
- Parameters
alphas (ndarray of shape (n_alphas,), default=(0.1, 1.0, 10.0)) – Array of alpha values to try. Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to
1 / (2C)
in other linear models such asLogisticRegression
orLinearSVC
.fit_intercept (bool, default=True) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).
normalize (bool, default=False) – This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please useStandardScaler
before callingfit
on an estimator withnormalize=False
.scoring (string, callable, default=None) – A string (see model evaluation documentation) or a scorer callable object / function with signature
scorer(estimator, X, y)
.cv (int, cross-validation generator or an iterable, default=None) –
Determines the cross-validation splitting strategy. Possible inputs for cv are:
None, to use the efficient Leave-One-Out cross-validation
integer, to specify the number of folds.
CV splitter,
An iterable yielding (train, test) splits as arrays of indices.
Refer User Guide for the various cross-validation strategies that can be used here.
class_weight (dict or 'balanced', default=None) –
Weights associated with classes in the form
{class_label: weight}
. If not given, all classes are supposed to have weight one.The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y))
store_cv_values (bool, default=False) – Flag indicating if the cross-validation values corresponding to each alpha should be stored in the
cv_values_
attribute (see below). This flag is only compatible withcv=None
(i.e. using Leave-One-Out Cross-Validation).
-
cv_values_
¶ Cross-validation values for each alpha (if
store_cv_values=True
andcv=None
). Afterfit()
has been called, this attribute will contain the mean squared errors (by default) or the values of the{loss,score}_func
function (if provided in the constructor). This attribute exists only whenstore_cv_values
is True.- Type
ndarray of shape (n_samples, n_targets, n_alphas), optional
-
coef_
¶ Coefficient of the features in the decision function.
coef_
is of shape (1, n_features) when the given problem is binary.- Type
ndarray of shape (1, n_features) or (n_targets, n_features)
-
intercept_
¶ Independent term in decision function. Set to 0.0 if
fit_intercept = False
.- Type
float or ndarray of shape (n_targets,)
-
alpha_
¶ Estimated regularization parameter.
- Type
float
-
best_score_
¶ Score of base estimator with best alpha.
New in version 0.23.
- Type
float
-
classes_
¶ The classes labels.
- Type
ndarray of shape (n_classes,)
Examples
>>> from sklearn.datasets import load_breast_cancer >>> from sklearn.linear_model import RidgeClassifierCV >>> X, y = load_breast_cancer(return_X_y=True) >>> clf = RidgeClassifierCV(alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X, y) >>> clf.score(X, y) 0.9630...
See also
Ridge
Ridge regression.
RidgeClassifier
Ridge classifier.
RidgeCV
Ridge regression with built-in cross validation.
Notes
For multi-class classification, n_class classifiers are trained in a one-versus-all approach. Concretely, this is implemented by taking advantage of the multi-variate response support in Ridge.
-
property
classes_
¶
-
fit
(X, y, sample_weight=None)¶ Fit Ridge classifier with cv.
- Parameters
X (ndarray of shape (n_samples, n_features)) – Training vectors, where n_samples is the number of samples and n_features is the number of features. When using GCV, will be cast to float64 if necessary.
y (ndarray of shape (n_samples,)) – Target values. Will be cast to X’s dtype if necessary.
sample_weight (float or ndarray of shape (n_samples,), default=None) – Individual weights for each sample. If given a float, every sample will have the same weight.
- Returns
self
- Return type
object
-
class
mlots.models.
kNNClassifier
(n_neighbors=5, mac_neighbors=None, weights='uniform', mac_metric='euclidean', metric_params=None, n_jobs=- 1)¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.ClassifierMixin
NAME: kNNClassifier
This is a class that represents kNNClassifier model with MAC/FAC strategy.
- Parameters
n_neighbors (int (default 5)) – The n (or k) neighbors to consider for classification.
mac_neighbors (int (default None)) –
Number of neighbors to consider for MAC stage. If None, n_neighbors are used for classification directly.
- If int; the classification is in two stages:
MAC stage: A candidate set of size ‘mac_neighbors’ are returned using ‘mac_metric’.
FAC stage: n_neighbors from candidate set are used for classification using DTW.
weights (str (default "uniform")) – The weighting scheme of the distances. Options: “uniform” or “distance”
mac_metric (str (default "euclidean")) – The distance metric to be employed for MAC stage. Check tslearn.neighbors.KNeighborsTimeSeriesClassifier for allowed metrics.
metric_params (dict() (default None)) –
The parameters of the DTW for FAC stage.
Example: { “global_constraint” : “sakoe_chiba”, “sakoe_chiba_radius”: 1}
See tslearn.metrics for more details.
n_jobs (int (default -1)) – The number of CPU threads to use. -1 to use all the available threads.
- Returns
object – kNNClassifier class with the parameters supplied.
- Return type
self
See also
tslearn.neighbors.KNeighborsTimeSeriesClassifier
The underlying k-NN module for time-series data.
tslearn.metrics.dtw
The underlying dtw function.
Examples
>>> from mlots.models import kNNClassifier >>> model = kNNClassifier(n_neighbors=5) >>> model.fit(X_train, y_train) >>> model.score(X_test, y_test) >>> 0.7814569536423841
-
fit
(X_train, y_train)¶ This is the fit function for kNNClassifier model.
- Parameters
X_train (ndarray) – The train data to be fitted.
y_train (array) – The true labels of X_train data.
- Returns
object – kNNClassifier class with train data fitted.
- Return type
self
-
predict
(X_test)¶ This is the predict function for kNNClassifier model.
- Parameters
X_test (ndarray) – The test data for the prediction.
- Returns
y_hat – The predicted labels of the test samples.
- Return type
array
-
class
mlots.models.
kNNClassifier_CustomDist
(n_neighbors=5, mac_neighbors=None, weights='uniform', mac_metric='lb_keogh', metric_params=None, n_jobs=- 1)¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.ClassifierMixin
NAME: kNNClassifier_CustomDist
This is a class that represents kNNClassifier_CustomDist model with MAC/FAC strategy.
- Parameters
n_neighbors (int (default 5)) – The n (or k) neighbors to consider for classification.
mac_neighbors (int (default None)) –
Number of neighbors to consider for MAC stage. If None, n_neighbors are used for classification directly.
- If int; the classification is in two stages:
MAC stage: A candidate set of size ‘mac_neighbors’ are returned using ‘mac_metric’.
FAC stage: n_neighbors from candidate set are used for classification using DTW.
weights (str (default "uniform")) – The weighting scheme of the distances. Options: “uniform” or “distance”
mac_metric (str (default "lb_keogh")) –
The distance metric to be employed for MAC stage.
Options:
"lb_keogh",
any allowed distance measures for sklearn.neighbors.KNeighborsClassifier,
or, a callable distance function.
If mac_metric = “lb_keogh”, provide “radius” parameter for it in metric_params.
metric_params (dict() (default None)) –
The parameters of the DTW for FAC stage.
Example: { “global_constraint” : “sakoe_chiba”, “sakoe_chiba_radius”: 1}
Check tslearn.neighbors.KNeighborsTimeSeriesClassifier model for allowed metrics.
n_jobs (int (default -1)) – The number of CPU threads to use. -1 to use all the available threads.
- Returns
object – kNNClassifier_CustomDist class with the parameters supplied.
- Return type
self
See also
sklearn.neighbors.KNeighborsClassifier
The underlying k-NN module for MAC stage with custom distance measure.
tslearn.neighbors.KNeighborsTimeSeriesClassifier
The underlying k-NN module for FAC stage with dtw.
tslearn.metrics.dtw
The underlying dtw function.
Examples
>>> from mlots.models import kNNClassifier_CustomDist >>> model = kNNClassifier_CustomDist(mac_metric="lb_keogh", mac_neighbors=20, metric_params={"radius": 23}) >>> model.fit(X_train, y_train) >>> model.score(X_test, y_test) >>> 0.7748344370860927
-
fit
(X_train, y_train)¶ This is the fit function for kNNClassifier_CustomDist model.
- Parameters
X_train (ndarray) – The train data to be fitted.
y_train (array) – The true labels of X_train data.
- Returns
object – kNNClassifier_CustomDist class with train data fitted.
- Return type
self
-
predict
(X_test)¶ This is the predict function for kNNClassifier_CustomDist model.
- Parameters
X_test (ndarray) – The test data for the prediction.
- Returns
y_hat – The predicted labels of the test samples.
- Return type
array