partycls package
partycls is a Python package for cluster analysis of systems of interacting particles. By grouping particles that share similar structural or dynamical properties, partycls enables rapid and unsupervised exploration of the system’s relevant features. It provides descriptors suitable for applications in condensed matter physics, such as structural analysis of disordered or partially ordered materials, and integrates the necessary tools of unsupervised learning into a streamlined workflow.
Subpackages
- partycls.core package
- partycls.descriptors package
- Submodules
- partycls.descriptors.averaged_bo module
LocallyAveragedBondOrientationalDescriptor
LocallyAveragedBondOrientationalDescriptor.trajectory
LocallyAveragedBondOrientationalDescriptor.active_filters
LocallyAveragedBondOrientationalDescriptor.dimension
LocallyAveragedBondOrientationalDescriptor.grid
LocallyAveragedBondOrientationalDescriptor.features
LocallyAveragedBondOrientationalDescriptor.groups
LocallyAveragedBondOrientationalDescriptor.verbose
LocallyAveragedBondOrientationalDescriptor.neighbors_boost
LocallyAveragedBondOrientationalDescriptor.max_num_neighbors
LocallyAveragedBondOrientationalDescriptor.name
LocallyAveragedBondOrientationalDescriptor.symbol
LocallyAveragedBondOrientationalDescriptor.compute()
LechnerDellagoDescriptor
- partycls.descriptors.ba module
BondAngleDescriptor
BondAngleDescriptor.trajectory
BondAngleDescriptor.active_filters
BondAngleDescriptor.dimension
BondAngleDescriptor.grid
BondAngleDescriptor.features
BondAngleDescriptor.groups
BondAngleDescriptor.verbose
BondAngleDescriptor.neighbors_boost
BondAngleDescriptor.max_num_neighbors
BondAngleDescriptor.name
BondAngleDescriptor.symbol
BondAngleDescriptor.dtheta
BondAngleDescriptor.compute()
BondAngleDescriptor.normalize()
- partycls.descriptors.bo module
BondOrientationalDescriptor
BondOrientationalDescriptor.trajectory
BondOrientationalDescriptor.active_filters
BondOrientationalDescriptor.dimension
BondOrientationalDescriptor.grid
BondOrientationalDescriptor.features
BondOrientationalDescriptor.groups
BondOrientationalDescriptor.verbose
BondOrientationalDescriptor.neighbors_boost
BondOrientationalDescriptor.max_num_neighbors
BondOrientationalDescriptor.name
BondOrientationalDescriptor.symbol
BondOrientationalDescriptor.orders
BondOrientationalDescriptor.compute()
SteinhardtDescriptor
- partycls.descriptors.compactness module
CompactnessDescriptor
CompactnessDescriptor.trajectory
CompactnessDescriptor.active_filters
CompactnessDescriptor.dimension
CompactnessDescriptor.features
CompactnessDescriptor.groups
CompactnessDescriptor.verbose
CompactnessDescriptor.neighbors_boost
CompactnessDescriptor.max_num_neighbors
CompactnessDescriptor.name
CompactnessDescriptor.symbol
CompactnessDescriptor.compute()
CompactnessDescriptor.tetrahedra()
TongTanakaDescriptor
- partycls.descriptors.coordination module
CoordinationDescriptor
CoordinationDescriptor.trajectory
CoordinationDescriptor.active_filters
CoordinationDescriptor.dimension
CoordinationDescriptor.grid
CoordinationDescriptor.features
CoordinationDescriptor.groups
CoordinationDescriptor.verbose
CoordinationDescriptor.neighbors_boost
CoordinationDescriptor.max_num_neighbors
CoordinationDescriptor.name
CoordinationDescriptor.symbol
CoordinationDescriptor.total
CoordinationDescriptor.partial
CoordinationDescriptor.compute()
- partycls.descriptors.descriptor module
StructuralDescriptor
StructuralDescriptor.trajectory
StructuralDescriptor.active_filters
StructuralDescriptor.dimension
StructuralDescriptor.grid
StructuralDescriptor.features
StructuralDescriptor.groups
StructuralDescriptor.verbose
StructuralDescriptor.neighbors_boost
StructuralDescriptor.max_num_neighbors
StructuralDescriptor.accept_nans
StructuralDescriptor.n_samples
StructuralDescriptor.n_features
StructuralDescriptor.average
StructuralDescriptor.add_filter()
StructuralDescriptor.clear_filters()
StructuralDescriptor.clear_all_filters()
StructuralDescriptor.group_size()
StructuralDescriptor.group_fraction()
StructuralDescriptor.get_group_property()
StructuralDescriptor.dump()
StructuralDescriptor.compute()
StructuralDescriptor.normalize()
StructuralDescriptor.discard_nans()
DummyDescriptor
- partycls.descriptors.dscribe module
- partycls.descriptors.radial module
RadialDescriptor
RadialDescriptor.trajectory
RadialDescriptor.active_filters
RadialDescriptor.dimension
RadialDescriptor.grid
RadialDescriptor.features
RadialDescriptor.groups
RadialDescriptor.verbose
RadialDescriptor.name
RadialDescriptor.symbol
RadialDescriptor.n_shells
RadialDescriptor.bounds
RadialDescriptor.dr
RadialDescriptor.compute()
RadialDescriptor.normalize()
- partycls.descriptors.radial_bo module
RadialBondOrientationalDescriptor
RadialBondOrientationalDescriptor.trajectory
RadialBondOrientationalDescriptor.active_filters
RadialBondOrientationalDescriptor.dimension
RadialBondOrientationalDescriptor.grid
RadialBondOrientationalDescriptor.features
RadialBondOrientationalDescriptor.groups
RadialBondOrientationalDescriptor.verbose
RadialBondOrientationalDescriptor.neighbors_boost
RadialBondOrientationalDescriptor.max_num_neighbors
RadialBondOrientationalDescriptor.name
RadialBondOrientationalDescriptor.symbol
RadialBondOrientationalDescriptor.orders
RadialBondOrientationalDescriptor.bounds
RadialBondOrientationalDescriptor.dr
RadialBondOrientationalDescriptor.distance_grid
RadialBondOrientationalDescriptor.compute()
BoattiniDescriptor
- partycls.descriptors.smoothed_ba module
SmoothedBondAngleDescriptor
SmoothedBondAngleDescriptor.trajectory
SmoothedBondAngleDescriptor.active_filters
SmoothedBondAngleDescriptor.dimension
SmoothedBondAngleDescriptor.grid
SmoothedBondAngleDescriptor.features
SmoothedBondAngleDescriptor.groups
SmoothedBondAngleDescriptor.verbose
SmoothedBondAngleDescriptor.neighbors_boost
SmoothedBondAngleDescriptor.max_num_neighbors
SmoothedBondAngleDescriptor.name
SmoothedBondAngleDescriptor.symbol
SmoothedBondAngleDescriptor.compute()
- partycls.descriptors.smoothed_bo module
SmoothedBondOrientationalDescriptor
SmoothedBondOrientationalDescriptor.trajectory
SmoothedBondOrientationalDescriptor.active_filters
SmoothedBondOrientationalDescriptor.dimension
SmoothedBondOrientationalDescriptor.grid
SmoothedBondOrientationalDescriptor.features
SmoothedBondOrientationalDescriptor.groups
SmoothedBondOrientationalDescriptor.verbose
SmoothedBondOrientationalDescriptor.neighbors_boost
SmoothedBondOrientationalDescriptor.max_num_neighbors
SmoothedBondOrientationalDescriptor.name
SmoothedBondOrientationalDescriptor.symbol
SmoothedBondOrientationalDescriptor.compute()
- partycls.descriptors.tetrahedrality module
TetrahedralDescriptor
TetrahedralDescriptor.trajectory
TetrahedralDescriptor.active_filters
TetrahedralDescriptor.dimension
TetrahedralDescriptor.features
TetrahedralDescriptor.groups
TetrahedralDescriptor.verbose
TetrahedralDescriptor.neighbors_boost
TetrahedralDescriptor.max_num_neighbors
TetrahedralDescriptor.name
TetrahedralDescriptor.symbol
TetrahedralDescriptor.compute()
Submodules
partycls.cell module
Simulation cell.
This class is inspired by the framework atooms authored by Daniele Coslovich.
- class partycls.cell.Cell(side, periodic=None)[source]
Bases:
object
Orthorhombic cell.
- side
List of lengths for the sides of the cell.
- Type
- periodic
Periodicity of the cell on each axis.
- Type
- Parameters
Example
>>> c = Cell([2.0, 2.0, 2.0], periodic=[True, True, True ])
- property volume
Volume of the cell.
partycls.clustering module
Clustering algorithms.
- class partycls.clustering.Clustering(n_clusters=2, n_init=1, backend=None)[source]
Bases:
object
Base class for clustering methods.
If a scikit-learn compatible
backend
is available, it will be used within Strategy.- Parameters
n_clusters (int, default: 2) – Requested number of clusters.
n_init (int, default: 1) – Number of times the clustering will be run with different seeds.
backend (scikit-learn compatible backend, default: None) – Backend used for the clustering method. If provided, it must be an object implementing an scikit-learn compatible interface, with a
fit
method and alabels_
attribute. Duck typing is assumed.
- fit(X)[source]
Run a scikit-learn compatible clustering backend (if available) on
X
.Subclasses implementing a specific clustering algorithm must override this method.
- Parameters
X (numpy.ndarray) – Dataset matrix for which to compute the clusters.
- Return type
None
- property fractions
numpy.ndarray
with the fractions of particles in each cluster.
- property populations
numpy.ndarray
with the number of particles in each cluster.
- centroids(X)[source]
Central feature vector of each cluster.
Each object in the dataset over which the clustering was performed is assigned a discrete label. This label represents the index of the nearest cluster center to which this object belongs. The centroid (i.e. the cluster center), is thus the average feature vector of all the objects in the cluster.
Cluster memberships of the objects are stored in the
labels
attribute. Coordinates of the centroids can then be calculated for an arbitrary datasetX
, provided it has the same shape as the original dataset used for the clustering.- Parameters
X (numpy.ndarray) – Array of features (dataset) for which to compute the centroids.
- Returns
C_k – Cluster centroids.
C_k[n]
is the coordinates of the n-th cluster center.- Return type
- class partycls.clustering.KMeans(n_clusters=2, n_init=1)[source]
Bases:
Clustering
KMeans clustering.
This class relies on the class
KMeans
from the machine learning package scikit-learn. An instance ofsklearn.cluster.KMeans
is created when calling thefit
method, and is then accessible through thebackend
attribute for later use. See scikit-learn’s documentation for more information on the original class.- Parameters
- fit(X)[source]
Run the K-Means algorithm on
X
. The predicted labels are updated in the attributelabels
of the current instance ofKMeans
.- Parameters
X (numpy.ndarray) – Dataset matrix for which to compute the clusters.
- Return type
None
- class partycls.clustering.GaussianMixture(n_clusters=2, n_init=1)[source]
Bases:
Clustering
Gaussian Mixture.
This class relies on the class
GaussianMixture
from the machine learning package scikit-learn. An instance ofsklearn.mixture.GaussianMixture
is created when calling thefit
method, and is then accessible through thebackend
attribute for later use. See scikit-learn’s documentation for more information on the original class.- Parameters
- fit(X)[source]
Run the expectation-maximization algorithm on
X
using a mixture of Gaussians. The predicted labels are updated in the attributelabels
of the current instance ofGaussianMixture
.- Parameters
X (numpy.ndarray) – Dataset matrix for which to compute the clusters.
- Return type
None
- class partycls.clustering.CommunityInference(n_clusters=2, n_init=1)[source]
Bases:
Clustering
Community Inference is a hard clustering method based on information theory. See https://doi.org/10.1063/5.0004732 (Paret et. al) for more details.
- Parameters
- fit(X)[source]
Run the community inference algorithm on
X
, whereX
is an instance ofStructuralDescriptor
with anormalize
method. OtherwiseX
is converted to a dummy descriptor.- Parameters
X (StructuralDescriptor) – Descriptor on which the community algorithm inference will be run.
- Return type
None
partycls.dim_reduction module
Dimensionality reduction techniques (linear and non-linear), to be performed on a dataset stored in a numpy array.
- class partycls.dim_reduction.PCA(n_components=None, *, copy=True, whiten=False, svd_solver='auto', tol=0.0, iterated_power='auto', n_oversamples=10, power_iteration_normalizer='auto', random_state=None)[source]
Bases:
PCA
- symbol = 'pca'
- full_name = 'Principal Component Analysis (PCA)'
- reduce(X)[source]
Project the input features onto a reduced space using principal component analysis.
- Parameters
X (numpy.ndarray) – Features in the original space.
- Returns
Features in the reduced space.
- Return type
- class partycls.dim_reduction.TSNE(n_components=2, *, perplexity=30.0, early_exaggeration=12.0, learning_rate='warn', n_iter=1000, n_iter_without_progress=300, min_grad_norm=1e-07, metric='euclidean', metric_params=None, init='warn', verbose=0, random_state=None, method='barnes_hut', angle=0.5, n_jobs=None, square_distances='deprecated')[source]
Bases:
TSNE
- symbol = 'tsne'
- full_name = 't-distributed Stochastic Neighbor Embedding (t-SNE)'
- reduce(X)[source]
Project the input features onto a reduced space using t-distributed stochastic neighbor embedding.
- Parameters
X (numpy.ndarray) – Features in the original space.
- Returns
Features in the reduced space.
- Return type
- class partycls.dim_reduction.LocallyLinearEmbedding(*, n_neighbors=5, n_components=2, reg=0.001, eigen_solver='auto', tol=1e-06, max_iter=100, method='standard', hessian_tol=0.0001, modified_tol=1e-12, neighbors_algorithm='auto', random_state=None, n_jobs=None)[source]
Bases:
LocallyLinearEmbedding
- symbol = 'lle'
- full_name = 'Locally Linear Embedding (LLE)'
- reduce(X)[source]
Project the input features onto a reduced space using locally linear embedding.
- Parameters
X (numpy.ndarray) – Features in the original space.
- Returns
Features in the reduced space.
- Return type
- class partycls.dim_reduction.AutoEncoder(layers=(100, 2, 100), activation='relu', solver='adam', alpha=0.0001)[source]
Bases:
MLPRegressor
- symbol = 'ae'
- full_name = 'Neural-Network Auto-Encoder (AE)'
- property n_components
Number of nodes at the level of the bottleneck layer (i.e. dimension after reduction).
- reduce(X)[source]
Project the input features onto a reduced space using a neural network autoencoder. The dimension of the reduced space is the number of nodes in the bottleneck layer.
- Parameters
X (numpy.ndarray) – Features in the original space.
- Returns
Features in the reduced space.
- Return type
partycls.feature_scaling module
Feature scaling techniques, to be performed on a dataset stored in a numpy array.
- class partycls.feature_scaling.ZScore(*, copy=True, with_mean=True, with_std=True)[source]
Bases:
StandardScaler
- symbol = 'zscore'
- full_name = 'Z-Score'
- scale(X)[source]
Standardize features by removing the mean and scaling to unit variance.
- Parameters
X (numpy.ndarray) – Original features.
- Returns
Scaled features.
- Return type
- class partycls.feature_scaling.MinMax(feature_range=(0, 1), *, copy=True, clip=False)[source]
Bases:
MinMaxScaler
- symbol = 'minmax'
- full_name = 'Min-Max'
- scale(X)[source]
Transform features by scaling each feature to a given range (default is \([0,1]\)).
- Parameters
X (numpy.ndarray) – Original features.
- Returns
Scaled features.
- Return type
- class partycls.feature_scaling.MaxAbs(*, copy=True)[source]
Bases:
MaxAbsScaler
- symbol = 'maxabs'
- full_name = 'Max-Abs'
- scale(X)[source]
Scale each feature by its maximum absolute value.
- Parameters
X (numpy.ndarray) – Original features.
- Returns
Scaled features.
- Return type
- class partycls.feature_scaling.Robust(*, with_centering=True, with_scaling=True, quantile_range=(25.0, 75.0), copy=True, unit_variance=False)[source]
Bases:
RobustScaler
- symbol = 'robust'
- full_name = 'Robust'
- scale(X)[source]
Scale features using statistics that are robust to outliers.
- Parameters
X (numpy.ndarray) – Original features.
- Returns
Scaled features.
- Return type
partycls.helpers module
Various helper functions for visualization, cluster analysis, etc.
- partycls.helpers.AMI(labels_true, labels_pred, *, average_method='arithmetic')
Adjusted Mutual Information between two clusterings.
Adjusted Mutual Information (AMI) is an adjustment of the Mutual Information (MI) score to account for chance. It accounts for the fact that the MI is generally higher for two clusterings with a larger number of clusters, regardless of whether there is actually more information shared. For two clusterings \(U\) and \(V\), the AMI is given as:
AMI(U, V) = [MI(U, V) - E(MI(U, V))] / [avg(H(U), H(V)) - E(MI(U, V))]
This metric is independent of the absolute values of the labels: a permutation of the class or cluster label values won’t change the score value in any way.
This metric is furthermore symmetric: switching \(U\) (
label_true
) with \(V\) (labels_pred
) will return the same score value. This can be useful to measure the agreement of two independent label assignments strategies on the same dataset when the real ground truth is not known.Be mindful that this function is an order of magnitude slower than other metrics, such as the Adjusted Rand Index.
Read more in the User Guide.
- Parameters
labels_true (int array, shape = [n_samples]) – A clustering of the data into disjoint subsets, called \(U\) in the above formula.
labels_pred (int array-like of shape (n_samples,)) – A clustering of the data into disjoint subsets, called \(V\) in the above formula.
average_method (str, default='arithmetic') –
How to compute the normalizer in the denominator. Possible options are ‘min’, ‘geometric’, ‘arithmetic’, and ‘max’.
New in version 0.20.
Changed in version 0.22: The default value of
average_method
changed from ‘max’ to ‘arithmetic’.
- Returns
ami – The AMI returns a value of 1 when the two partitions are identical (ie perfectly matched). Random partitions (independent labellings) have an expected AMI around 0 on average hence can be negative. The value is in adjusted nats (based on the natural logarithm).
- Return type
float (upperlimited by 1.0)
See also
adjusted_rand_score
Adjusted Rand Index.
mutual_info_score
Mutual Information (not adjusted for chance).
Examples
Perfect labelings are both homogeneous and complete, hence have score 1.0:
>>> from sklearn.metrics.cluster import adjusted_mutual_info_score >>> adjusted_mutual_info_score([0, 0, 1, 1], [0, 0, 1, 1]) ... 1.0 >>> adjusted_mutual_info_score([0, 0, 1, 1], [1, 1, 0, 0]) ... 1.0
If classes members are completely split across different clusters, the assignment is totally in-complete, hence the AMI is null:
>>> adjusted_mutual_info_score([0, 0, 0, 0], [0, 1, 2, 3]) ... 0.0
References
- partycls.helpers.ARI(labels_true, labels_pred)
Rand index adjusted for chance.
The Rand Index computes a similarity measure between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings.
The raw RI score is then “adjusted for chance” into the ARI score using the following scheme:
ARI = (RI - Expected_RI) / (max(RI) - Expected_RI)
The adjusted Rand index is thus ensured to have a value close to 0.0 for random labeling independently of the number of clusters and samples and exactly 1.0 when the clusterings are identical (up to a permutation).
ARI is a symmetric measure:
adjusted_rand_score(a, b) == adjusted_rand_score(b, a)
Read more in the User Guide.
- Parameters
labels_true (int array, shape = [n_samples]) – Ground truth class labels to be used as a reference
labels_pred (array-like of shape (n_samples,)) – Cluster labels to evaluate
- Returns
ARI – Similarity score between -1.0 and 1.0. Random labelings have an ARI close to 0.0. 1.0 stands for perfect match.
- Return type
Examples
Perfectly matching labelings have a score of 1 even
>>> from sklearn.metrics.cluster import adjusted_rand_score >>> adjusted_rand_score([0, 0, 1, 1], [0, 0, 1, 1]) 1.0 >>> adjusted_rand_score([0, 0, 1, 1], [1, 1, 0, 0]) 1.0
Labelings that assign all classes members to the same clusters are complete but may not always be pure, hence penalized:
>>> adjusted_rand_score([0, 0, 1, 2], [0, 0, 1, 1]) 0.57...
ARI is symmetric, so labelings that have pure clusters with members coming from the same classes but unnecessary splits are penalized:
>>> adjusted_rand_score([0, 0, 1, 1], [0, 0, 1, 2]) 0.57...
If classes members are completely split across different clusters, the assignment is totally incomplete, hence the ARI is very low:
>>> adjusted_rand_score([0, 0, 0, 0], [0, 1, 2, 3]) 0.0
References
- Hubert1985
L. Hubert and P. Arabie, Comparing Partitions, Journal of Classification 1985 https://link.springer.com/article/10.1007%2FBF01908075
- Steinley2004
D. Steinley, Properties of the Hubert-Arabie adjusted Rand index, Psychological Methods 2004
- wk
https://en.wikipedia.org/wiki/Rand_index#Adjusted_Rand_index
See also
adjusted_mutual_info_score
Adjusted Mutual Information.
- partycls.helpers.silhouette_samples(X, labels, *, metric='euclidean', **kwds)[source]
Compute the Silhouette Coefficient for each sample.
The Silhouette Coefficient is a measure of how well samples are clustered with samples that are similar to themselves. Clustering models with a high Silhouette Coefficient are said to be dense, where samples in the same cluster are similar to each other, and well separated, where samples in different clusters are not very similar to each other.
The Silhouette Coefficient is calculated using the mean intra-cluster distance (
a
) and the mean nearest-cluster distance (b
) for each sample. The Silhouette Coefficient for a sample is(b - a) / max(a, b)
. Note that Silhouette Coefficient is only defined if number of labels is 2<= n_labels <= n_samples - 1
.This function returns the Silhouette Coefficient for each sample.
The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters.
Read more in the User Guide.
- Parameters
X (array-like of shape (n_samples_a, n_samples_a) if metric == "precomputed" or (n_samples_a, n_features) otherwise) – An array of pairwise distances between samples, or a feature array.
labels (array-like of shape (n_samples,)) – Label values for each sample.
metric (str or callable, default='euclidean') – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by
sklearn.metrics.pairwise.pairwise_distances()
. IfX
is the distance array itself, use “precomputed” as the metric. Precomputed distance matrices must have 0 along the diagonal.**kwds (optional keyword parameters) – Any further parameters are passed directly to the distance function. If using a
scipy.spatial.distance
metric, the parameters are still metric dependent. See the scipy docs for usage examples.
- Returns
silhouette – Silhouette Coefficients for each sample.
- Return type
array-like of shape (n_samples,)
References
- partycls.helpers.silhouette_score(X, labels, *, metric='euclidean', sample_size=None, random_state=None, **kwds)[source]
Compute the mean Silhouette Coefficient of all samples.
The Silhouette Coefficient is calculated using the mean intra-cluster distance (
a
) and the mean nearest-cluster distance (b
) for each sample. The Silhouette Coefficient for a sample is(b - a) / max(a, b)
. To clarify,b
is the distance between a sample and the nearest cluster that the sample is not a part of. Note that Silhouette Coefficient is only defined if number of labels is2 <= n_labels <= n_samples - 1
.This function returns the mean Silhouette Coefficient over all samples. To obtain the values for each sample, use
silhouette_samples()
.The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters. Negative values generally indicate that a sample has been assigned to the wrong cluster, as a different cluster is more similar.
Read more in the User Guide.
- Parameters
X (array-like of shape (n_samples_a, n_samples_a) if metric == "precomputed" or (n_samples_a, n_features) otherwise) – An array of pairwise distances between samples, or a feature array.
labels (array-like of shape (n_samples,)) – Predicted labels for each sample.
metric (str or callable, default='euclidean') – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by
metrics.pairwise.pairwise_distances
. IfX
is the distance array itself, usemetric="precomputed"
.sample_size (int, default=None) – The size of the sample to use when computing the Silhouette Coefficient on a random subset of the data. If
sample_size is None
, no sampling is used.random_state (int, RandomState instance or None, default=None) – Determines random number generation for selecting a subset of samples. Used when
sample_size is not None
. Pass an int for reproducible results across multiple function calls. See Glossary.**kwds (optional keyword parameters) – Any further parameters are passed directly to the distance function. If using a scipy.spatial.distance metric, the parameters are still metric dependent. See the scipy docs for usage examples.
- Returns
silhouette – Mean Silhouette Coefficient for all samples.
- Return type
References
- partycls.helpers.show_matplotlib(system, color, view='top', palette=None, cmap='viridis', outfile=None, linewidth=0.5, alpha=1.0, show=False)[source]
Make a snapshot of the
system
using matplotlib. The figure is returned for further customization or visualization in jupyter notebooks.- Parameters
system (System) – The system to visualize.
color (str) – Particle property to use for color coding, e.g.
"species"
,"label"
.view (str, default: "top") – View type, i.e. face of the box to show. Only works for a 3D system.
palette (list, default: None) – List of colors when coloring particles according to a discrete property, such as
"species"
or"label"
. A default palette will be used if not specified.cmap (str, default: "viridis") – Name of a matplotlib colormap to use when coloring particles according to a continuous property such as
"velocity"
or"energy"
. List of available colormaps can be found inmatplotlib.cm.cmaps_listed
.outfile (str, default: None) – Output filename to save the snapshot. Default is to not save.
linewidth (float, default: 0.5) – Line width.
alpha (float, default: 1.0) – Transparency parameter.
show (bool, default: False) – Show the snapshot when calling the function.
- Returns
fig – Figure of the snapshot.
- Return type
- partycls.helpers.show_ovito(system, color, view='top', palette=None, cmap='viridis', outfile=None, size=(640, 480), zoom=True)[source]
Make a snapshot of the
system
using Ovito. The image is returned for further customization or visualization in jupyter notebooks.- Parameters
system (System) – The system to visualize.
color (str) – Particle property to use for color coding, e.g.
"species"
,"label"
.view (str, default: "top") – View type, i.e. face of the box to show. Only works for a 3D system.
palette (list, default: "viridis") – List of colors when coloring particles according to a discrete property, such as
"species"
or"label"
. Colors must be expressed in RGB format through tuples (e.g.palette=[(0,0,1), (1,0,0)]
). A default palette will be used if not specified.cmap (str, optional) – Name of a matplotlib colormap to use when coloring particles according to a continuous property such as
"velocity"
or “energy"
. List of available colormap can be found inmatplotlib.cm.cmaps_listed
.outfile (str, default: None) – Output filename to save the snapshot. The default is to not save.
size (tuple, default: (640, 480)) – Size of the image to render.
zoom (bool, default: True) – Zoom on the simulation box.
- Returns
Rendered image.
- Return type
Image
- partycls.helpers.show_3dmol(system, color, palette=None)[source]
Visualize the
system
using 3dmol. The py3Dmol view is returned for further customization or visualization in jupyter notebooks.- Parameters
system (System) – The system to visualize.
color (str) – Particle property to use for color coding, e.g.
"species"
,"label"
. This property must be a string or an integer.palette (list, default: None) – List of colors when coloring particles according to a discrete property, such as
"species"
or"label"
. A default palette will be used if not specified.
- Raises
ValueError – If the
color
parameter refers to afloat
particle property.- Returns
view – py3Dmol view.
- Return type
py3Dmol.view
- partycls.helpers.merge_clusters(weights, n_clusters_min=2, epsilon_=1e-15)[source]
Merge clusters into
n_clusters_min
new clusters based on the probabilities that particles initially belong to each of the original clusters with a certain probability and using an entropy criterion.See https://doi.org/10.1198/jcgs.2010.08111 (Baudry et al.).
- Parameters
weights (list) – Probabilities that each particle belongs to each cluster. If there are :math`N` particles, then the length of the list (or first dimension of the array) must be \(N\). If there are math:K original clusters, each element of
weights
(or the first dimension of the array) must be \(K\).weights[i][j]
(list) orweights[i,k]
(array) is the probability that particlei
belongs to clusterk
before merging. For each particle,sum(weights[i])
is equal to 1.n_clusters_min (int, default: 2) – Final number of clusters after merging.
epsilon (float) – Small number (close to zero). This is needed as a replacement for zero when computing a logarithm to avoid errors.
- Returns
new_weights (numpy.ndarray) – New weights after merging. Same shape and interpretation as the
weights
input parameter.new_labels (list) – New discrete labels based on the weights after merging.
- partycls.helpers.sort_clusters(labels, centroids, func=<function shannon_entropy>)[source]
Make a consistent labeling of the clusters based on their centroids by computing an associated numerical value as sorting criterion. By default, the labeling is based on the Shannon entropy of each cluster.
- Parameters
labels (list) – Original labels.
centroids (numpy.ndarray) – Cluster centroids.
func (function, default:
shannon_entropy
) – Function used to associate a numerical value to each cluster, to be used as sorting criterion. This function must accept a list or a one dimensional array as parameter (this parameter being the coordinates of a given centroid).
- Returns
new_labels (list) – New labels based on centroid entropies.
new_centroids (numpy.ndarray) – Centroids arranged in order of descending entropies.
partycls.particle module
Point particles in a cartesian reference frame.
This class is inspired by the framework atooms authored by Daniele Coslovich.
- class partycls.particle.Particle(position=None, species='A', label=-1, radius=0.5, nearest_neighbors=None)[source]
Bases:
object
A particle is defined by its position, its type, and additional attributes like a radius, a cluster label, a list of neighbors, etc.
- position
The position of the particle.
- Type
- Parameters
position (list, default: None) – The position of the particle. If not given, it will be set to [0.0, 0.0, 0.0].
species (str, default: "A") – Particle type / species.
label (int, default: -1) – Cluster label of the particle. Default is
-1
(i.e. not belonging to any cluster).radius (float, defaut: 0.5) – Particle radius.
nearest_neighbors (list, default: None) – Indices of the particle’s nearest neighbors in the
System
.
Examples
>>> p = Particle([0.0, 0.0, 0.0], species='A', radius=0.4) >>> p = Particle([1.5, -0.3, 3.2], species='B', nearest_neighbors=[12,34,68])
partycls.system module
The physical system at hand.
The system of interest in a classical atomistic simulations is composed of interacting point particles, usually enclosed in a simulation cell.
This class is inspired by the framework atooms authored by Daniele Coslovich.
- class partycls.system.System(particle=None, cell=None)[source]
Bases:
object
A system is composed of a collection of particles that lie within an orthorhombic cell.
- nearest_neighbors_cutoffs
List of nearest neighbors cutoffs for each pair of species in the system.
- Type
- Parameters
Examples
>>> p = [Particle(position=[0.0, 0.0, 0.0], species='A'), Particle(position=[1.0, 1.0, 1.0], species='B')] >>> c = Cell([5.0, 5.0, 5.0]) >>> sys = System(particle=p, cell=c)
- property nearest_neighbors_method
Method used to identify the nearest neighbors of all the particles in the system. Should be one of
"fixed"
,"sann"
or"voronoi"
.
- property n_dimensions
Number of spatial dimensions, guessed from the length of
self.particle[0].position
.
- property density
Number density of the system.
It will raise a ValueException if
self.cell
isNone
.
- property distinct_species
Sorted
numpy.ndarray
of all the distinct species in the system.
- property pairs_of_species
List of all the possible pairs of species.
- property pairs_of_species_id
List of all the possible pairs of species ID.
- property chemical_fractions
numpy.ndarray
with the chemical fractions of each species in the system.
- get_property(what, subset=None)[source]
Return a
numpy.ndarray
with the system property specified bywhat
. Ifwhat
is a particle property, return the property for all particles in the system, or for a given subset of particles specified bysubset
.- Parameters
what (str) –
Requested system property.
what
must be of the form"particle.<attribute>"
or"cell.<attribute>"
. The following particle aliases are accepted:'position'
:'particle.position'
'pos'
:'particle.position'
'position[0]'
:'particle.position[0]'
'pos[0]'
:'particle.position[0]'
'x'
:'particle.position[0]'
'position[1]'
:'particle.position[1]'
'pos[1]'
:'particle.position[1]'
'y'
:'particle.position[1]'
'position[2]'
:'particle.position[2]'
'pos[2]'
:'particle.position[2]'
'z'
:'particle.position[2]'
'species'
:'particle.species'
'spe'
:'particle.species'
'label'
:'particle.label'
'mass'
:'particle.mass'
'radius'
:'particle.radius'
'nearest_neighbors'
:'particle.nearest_neighbors'
'neighbors'
:particle.nearest_neighbors'
'neighbours'
:'particle.nearest_neighbors'
'voronoi_signature'
:'particle.voronoi_signature'
'signature'
:'particle.voronoi_signature'
subset (str, default: None) – Subset of particles for which the property must be dumped. Must be of the form
"particle.<attribute>"
unless"<attribute>"
is an alias. The default isNone
(all particles will be included). This is ignored ifwhat
is cell property.
- Returns
to_dump – Array of the requested system property.
- Return type
Examples
>>> traj = Trajectory('trajectory.xyz') >>> sys = traj[0] >>> pos_0 = sys.get_property('position') >>> spe_0 = sys.get_property('species') >>> sides = sys.get_property('cell.side')
- set_property(what, value, subset=None)[source]
Set a system property
what
tovalue
. Ifwhat
is a particle property, set the property for all the particles in the system or for a given subset of particles specified bysubset
.- Parameters
what (str) – Name of the property to set. This is considered to be a particle property by default, unless it starts with “cell”, e.g. “cell.side”.
value (int, float, list, or numpy.ndarray) – Value(s) of the property to set. An instance of
int
orfloat
will set the same value for all concerned particles. An instance oflist
ornumpy.ndarray
will assign a specific value to each particle. In this case, the size ofvalue
should respect the number of concerned particles.subset (str, default: None) – Particles for which the property must be set. The default is
None
. This is ignored ifwhat
is cell property.
- Return type
None
Examples
>>> sys.set_property('mass', 1.0) >>> sys.set_property('radius', 0.5, "species == 'A'") >>> labels = [0, 1, 0] # 3 particles in the subset >>> sys.set_property('label', labels, "species == 'B'") >>> sys.set_property('cell.side[0]', 2.0)
- compute_nearest_neighbors(method, cutoffs)[source]
Compute the nearest neighbors for all the particles in the system using the provided method. Neighbors are stored in the
nearest_neighbors
particle property. Available methods are:'fixed'
: use fixed cutoffs for each pair of species in the trajectory.'sann'
: solid-angle based nearest neighbor algorithm (see https://doi.org/10.1063/1.4729313).'voronoi'
: radical Voronoi tessellation method (uses particles’ radii) (see https://doi.org/10.1016/0022-3093(82)90093-X)
- Parameters
method (str, default: None) – Method to identify the nearest neighbors. Must be one of
'fixed'
,'sann'
, or'voronoi'
.cutoffs (list) – List containing the cutoffs distances for each pair of species in the system (for method
'fixed'
and'sann'
). For method'sann'
, cutoffs are required as a first guess to identify the nearest neighbors. LeaveNone
for method'voronoi'
.
- Return type
None
Examples
>>> sys.compute_nearest_neighbors('fixed', [1.5, 1.4, 1.4, 1.3]) >>> sys.compute_nearest_neighbors('sann', [1.5, 1.4, 1.4, 1.3]) >>> sys.compute_nearest_neighbors('voronoi', None)
- compute_voronoi_signatures()[source]
Compute the Voronoi signatures of all the particles in the system using the radical Voronoi tessellation method (see https://doi.org/10.1016/0022-3093(82)90093-X).
Particle radii must be set using the
set_property
method if the original trajectory file does not contain such information.Creates a
voronoi_signature
property for the particles.- Return type
None
- show(backend='matplotlib', color='species', **kwargs)[source]
Show a snapshot of the system and color particles according to an arbitrary property, such as species, cluster label, etc. Current visualization backends are
"matplotlib"
,"ovito"
and"3dmol"
.- Parameters
backend (str, default: "matplotlib") – Name of the backend to use for visualization.
color (str, default: "species") – Name of the particle property to use as basis for coloring the particles. This property must be defined for all the particles in the system.
**kwargs (additional keyworded arguments (backend-dependent).) –
- Raises
ValueError – In case of unknown
backend
.- Return type
backend-dependent
Examples
>>> sys.show(frame=0, color='label', backend='3dmol') >>> sys.show(frame=1, color='energy', backend='matplotlib', cmap='viridis')
partycls.trajectory module
Physical trajectory.
This class is inspired by the framework atooms authored by Daniele Coslovich.
- class partycls.trajectory.Trajectory(filename, fmt=None, backend=None, top=None, additional_fields=None, first=0, last=None, step=1)[source]
Bases:
object
A trajectory is composed by one or several frames, each frame being an instance of
System
.Trajectory
instances are iterable. By default, only the positions and particle types are being read from the trajectory file. Additional particle properties in the file can be read using theadditional_fields
parameter.- additional_fields
List of additional particle properties that were extracted from the original trajectory file.
- Type
- Parameters
filename (str) – Path to the trajectory file to read.
fmt (str, default: "xyz") – Format of the trajectory. Needed when using
"atooms"
as a backend.backend (str, default: None) – Name of a third-party package to use as backend when reading the input trajectory. Currently supports
"atooms"
and"mdtraj"
.top (str, mdtraj.Trajectory, or mdtraj.Topology, defaut: None) – Topology information. Needed when using
"mdtraj"
as backend on a trajectory file whose format requires topology information. See MDTraj documentation for more information.additional_fields (list, optional, default: None) – Additional fields (i.e. particle properties) to read from the trajectory. Not all trajectory formats allow for additional fields.
first (int, default: 0) – Index of the first frame to consider in the trajectory. Starts at zero.
last (int, default: None) – Index of the last frame to consider in the trajectory. Default is the last frame.
step (int, default: 1) – Step between each frame to consider in the trajectory. For example, if
step=2
, one out of every two frames is read.
Examples
>>> traj = Trajectory('trajectory.xyz', additional_fields=['mass']) >>> traj = Trajectory('trajectory.dat', fmt='lammps', backend='atooms')
- property nearest_neighbors_method
Method used to identify the nearest neighbors of all the particles in the trajectory. Should be one of
"auto"
,"fixed"
,"sann"
or"voronoi"
.
- property nearest_neighbors_cutoffs
List of cutoffs that delimit the first coordination shell. Cutoffs are usually defined on the basis of the first minimum of the partial radial distribution function of each pair of species, \(g_{\alpha\beta}(r)\). The list must have the same length as the number of pairs of species in the system (e.g. 2 species yield 4 possible pairs, 3 species yield 6 pairs, etc.).
- remove(frame)[source]
Remove the system at position
frame
from the trajectory.- Parameters
frame (int) – Index of the frame to remove from the trajectory.
- Return type
None
- get_property(what, subset=None)[source]
Return a list of numpy.ndarrays with the system property specified by
what
. The list size is the number of systems in the trajectory.- Parameters
what (str) –
Requested system property.
what
must be of the form"particle.<attribute>"
or"cell.<attribute>"
. The following particle aliases are accepted:'position'
:'particle.position'
'pos'
:'particle.position'
'position[0]'
:'particle.position[0]'
'pos[0]'
:'particle.position[0]'
'x'
:'particle.position[0]'
'position[1]'
:'particle.position[1]'
'pos[1]'
:'particle.position[1]'
'y'
:'particle.position[1]'
'position[2]'
:'particle.position[2]'
'pos[2]'
:'particle.position[2]'
'z'
:'particle.position[2]'
'species'
:'particle.species'
'spe'
:'particle.species'
'label'
:'particle.label'
'mass'
:'particle.mass'
'radius'
:'particle.radius'
'nearest_neighbors'
:'particle.nearest_neighbors'
'neighbors'
:particle.nearest_neighbors'
'neighbours'
:'particle.nearest_neighbors'
'voronoi_signature'
:'particle.voronoi_signature'
'signature'
:'particle.voronoi_signature'
subset (str, optional, default:
None
) – Subset of particles for which the property must be dumped. Must be of the form"particle.<attribute>"
unless"<attribute>"
is an alias. The default isNone
(all particles will be included). This is ignored if`what`
is cell property.
- Returns
to_dump – List of the requested system property with length equal to the number of frames in the trajectory. Each element of the list is a numpy.ndarray of the requested system property.
- Return type
Examples
>>> traj = Trajectory('trajectory.xyz') >>> pos = traj.get_property('position') >>> spe = traj.get_property('species') >>> sides = traj.get_property('cell.side')
- set_property(what, value, subset=None)[source]
Set a property
what
tovalue
for all the particles in the trajectory or for a given subset of particles specified bysubset
.- Parameters
what (str) – Name of the property to set. This is considered to be a particle property by default, unless it starts with
"cell"
, e.g."cell.side"
.value (int, float, list, numpy.ndarray) – Value(s) of the property to set. An instance of
int
orfloat
will set the same value for all concerned particles. An instance oflist
ornumpy.ndarray
will assign a specific value to each particle. In this case, the shape ofvalue
should respect the number of frames in the trajectory and the number of concerned particles.subset (str, default: None) – Particles to which the property must be set. The default is
None
. This is ignored ifwhat
is a cell property.
- Return type
None
Examples
>>> traj.set_property('mass', 1.0) >>> traj.set_property('radius', 0.5, subset="species == 'A'") >>> labels = [[0, 1, 0], # 2 frames, 3 particles in the subset [1, 1, 0]] >>> traj.set_property('label', labels, subset="species == 'B'")
- compute_nearest_neighbors(method=None, cutoffs=None, dr=0.1)[source]
Compute the nearest neighbors for all the particles in the trajectory using the provided method. Neighbors are stored in the
nearest_neighbors
particle property. Available methods are:'auto'
: read neighbors from the trajectory file, if explicitly requested with theadditional_fields
argument in the constructor.'fixed'
: use fixed cutoffs for each pair of species in the trajectory.'sann'
: solid-angle based nearest neighbor algorithm (see https://doi.org/10.1063/1.4729313).'voronoi'
: radical Voronoi tessellation method (uses particles’ radii) (see https://doi.org/10.1016/0022-3093(82)90093-X)
- Parameters
method (str, default: None) – Method to identify the nearest neighbors. Must be one of
'auto'
,'fixed'
,'sann'
, or'voronoi'
.None
defaults to'auto'
. If method is'auto'
, neighbors are read directly from the trajectory file, if specified with theadditional_fields
argument in the constructor. If no neighbors are found, falls back tomethod='fixed'
instead.cutoffs (list, default: None) – List containing the cutoffs distances for each pair of species in the trajectory (for method
'fixed'
and'sann'
). IfNone
, cutoffs will be computed automatically. For method'sann'
, cutoffs are required as a first guess to identify the nearest neighbors.dr (float, default: 0.1) – Radial grid spacing \(\Delta r\) for computing the cutoffs on the basis of the first minimum of each partial radial distribution function in the trajectory, if cutoffs are not provided.
- Return type
None
Examples
>>> traj.compute_nearest_neighbors(method='fixed', cutoffs=[1.5, 1.4, 1.4, 1.3]) >>> traj.compute_nearest_neighbors(method='sann', cutoffs=[1.5, 1.4, 1.4, 1.3]) >>> traj.compute_nearest_neighbors(method='voronoi')
- set_nearest_neighbors_cutoff(s_a, s_b, rcut, mirror=True)[source]
Set the nearest-neighbor cutoff for the pair of species
(s1, s2)
. The cutoff of the mirror pair(s2, s1)
is set automatically if themirror
parameter isTrue
(default). Writes in thenearest_neighbors_cutoffs
list attribute.- Parameters
- Return type
None
- compute_nearest_neighbors_cutoffs(dr=0.1)[source]
Compute the nearest neighbors cutoffs on the basis of the first minimum of the partial radial distribution function \(g_{\alpha\beta}(r)\) between each pair of species \((\alpha,\beta)\) in the trajectory. Sets the
nearest_neighbors_cutoffs
list attribute.- Parameters
dr (float, default: 0.1) – Bin width \(\Delta r\) for the radial grid used to compute the partial radial distribution functions \(g_{\alpha\beta}(r)\).
- Return type
None
- compute_voronoi_signatures()[source]
Compute the Voronoi signatures of all the particles in the trajectory using the radical Voronoi tessellation method (see https://doi.org/10.1016/0022-3093(82)90093-X).).
Particle radii must be set using the
set_property
method if the original trajectory file does not contain such information.Creates a
voronoi_signature
property for the particles.- Return type
None
- show(frames=None, backend='matplotlib', color='species', **kwargs)[source]
Show the frames on index
frames
of the trajectory and color particles according to an arbitrary property, such as species, cluster label, etc. Current visualization backends are"matplotlib"
,"ovito"
, and"3dmol"
.- Parameters
frames (list, default: None) – Indices of the frames to show. The default is
None
(shows all frames).backend (str, default: "matplotlib") – Name of the backend to use for visualization.
color (str, default: "species") – Name of the particle property to use as basis for coloring the particles. This property must be defined for all the particles in the system.
**kwargs (additional keyworded arguments (backend-dependent).) –
- Raises
ValueError – In case of unknown
backend
.- Return type
Backend-dependent
Examples
>>> traj.show(frames=[0,1,2], color='label', backend='3dmol') >>> traj.show(frames=[0,1], color='energy', backend='matplotlib', cmap='viridis') >>> traj[0].show() # use the iterability of Trajectory objects
- write(output_path, fmt='xyz', backend=None, additional_fields=None, precision=6)[source]
Write the current trajectory to a file.
- Parameters
output_path (str) – Name of the output trajectory file.
fmt (str, default: "xyz") – Format of the output trajectory file.
backend (str, default: None) – Name of a third-party package to use when writing the output trajectory.
additional_fields (list, default: None) – Additional fields (i.e. particle properties) to write in the output trajectory. Not all trajectory formats allow for additional fields. The default is to not write any additional particle property.
precision (int, default: 6) – Number of decimals when writing the output trajectory.
- Raises
If
backend=None
andfmt
is not recognized natively. - Ifbackend
is unknown.
- Return type
None
partycls.workflow module
Workflow for clustering analysis.
A workflow is a procedure that goes through various steps (some of which are optional) to perform a structural clustering on a trajectory.
- class partycls.workflow.Workflow(trajectory, descriptor='gr', scaling=None, dim_reduction=None, clustering='kmeans')[source]
Bases:
object
A Workflow is a clustering procedure that goes through the following steps:
compute a structural descriptor on a given trajectory ;
(optional) apply a feature scaling on the previously computed structural features ;
(optional) apply a dimensionality reduction on the (raw/scaled) features ;
run a clustering algorithm to partition particles into structurally different clusters ;
- trajectory
The trajectory file as read by the
Trajectory
class.- Type
- descriptor
Structural descriptor associated to the trajectory.
- Type
- dim_reduction
Dimensionality reduction method.
- Type
- clustering
Clustering method.
- Type
- output_metadata
Dictionnary that controls the writing process and the properties of all the output files.
- Type
- features
Raw features as computed by the associated structural descriptor. Initial value is
None
if features were not computed.- Type
- scaled_features
Features after being rescaled by a feature scaling method. Equal to
None
if no scaling is applied to the features.- Type
- reduced_features
Features in the reduced space after applying a dimensionality reduction technique. Equal to
None
if no reduction is applied to the features.- Type
- naming_convention
Base name for output files. Default is
"{filename}.{code}.{descriptor}.{clustering}"
, where each tag will be replaced by its value in the current instance ofWorkflow
(e.g."traj.xyz.partycls.gr.kmeans"
).Base name can be changed using any combination of the available tags:
{filename}
{code}
{descriptor}
{scaling}
{dim_reduction}
{clustering}
Example:
"{filename}_descriptor-{descriptor}_scaling-{scaling}.{code}"
.- Type
- Parameters
trajectory (Trajectory) – An instance of
Trajectory
or a path to trajectory file to read, or an instance of a class with compatible interface.descriptor (StructuralDescriptor, default: "gr") –
An instance of
StructuralDescriptor
, the short name of a descriptor (str), or an instance of a class with compatible interface. See thedescriptor_db
class attribute for compatible strings. Examples:"gr"
: radial distribution of particles around a central particle."ba"
: angular distribution of pairs of nearest neighbors of a central particle."bo"
: Steinhardt bond-orientational order parameter."ld"
: Lechner-Dellago cond-orientational order parameter.
scaling (method, default: None) –
Feature scaling method. See the
scaling_db
class attribute for compatible strings. Examples:"zscore"
: standardize features by removing the mean and scaling to unit variance"minmax"
: scale and translate each feature individually such that it is in the given range on the training set, e.g. between zero and one"maxabs"
: scale and translate each feature individually such that the maximal absolute value of each feature in the training set will be 1."robust"
: remove the median and scale the data according to the specified quantile range (default is between 25th quantile and 75th quantile)
dim_reduction (method, default: None) –
Dimensionality reduction method. See the
dim_reduction_db
class attribute for compatible strings. Examples:"pca"
: Principal Component Analysis"tsne"
: t-distributed Stochastic Neighbor Embedding"lle"
: Locally Linear Embedding"ae"
: neural network Auto-Encoder
clustering (Clustering, default: 'kmeans') –
Clustering algorithm. See the
clustering_db
class attribute for compatible strings. Examples:"kmeans"
: K-Means algorithm"gmm"
: Gaussian Mixture Model"cinf"
: Community Inference (see https://doi.org/10.1063/5.0004732).
Example
>>> wf = Workflow('trajectory.xyz', descriptor='ba', scaling='zscore') >>> wf.run()
- descriptor_db = {'ang': <class 'partycls.descriptors.ba.BondAngleDescriptor'>, 'angular': <class 'partycls.descriptors.ba.BondAngleDescriptor'>, 'ba': <class 'partycls.descriptors.ba.BondAngleDescriptor'>, 'bo': <class 'partycls.descriptors.bo.BondOrientationalDescriptor'>, 'boattini': <class 'partycls.descriptors.radial_bo.BoattiniDescriptor'>, 'boo': <class 'partycls.descriptors.bo.BondOrientationalDescriptor'>, 'bop': <class 'partycls.descriptors.bo.BondOrientationalDescriptor'>, 'compact': <class 'partycls.descriptors.compactness.CompactnessDescriptor'>, 'coord': <class 'partycls.descriptors.coordination.CoordinationDescriptor'>, 'coordination': <class 'partycls.descriptors.coordination.CoordinationDescriptor'>, 'gr': <class 'partycls.descriptors.radial.RadialDescriptor'>, 'labo': <class 'partycls.descriptors.averaged_bo.LocallyAveragedBondOrientationalDescriptor'>, 'ld': <class 'partycls.descriptors.averaged_bo.LechnerDellagoDescriptor'>, 'lechner dellago': <class 'partycls.descriptors.averaged_bo.LechnerDellagoDescriptor'>, 'lechner-dellago': <class 'partycls.descriptors.averaged_bo.LechnerDellagoDescriptor'>, 'rad': <class 'partycls.descriptors.radial.RadialDescriptor'>, 'radial': <class 'partycls.descriptors.radial.RadialDescriptor'>, 'rbo': <class 'partycls.descriptors.radial_bo.RadialBondOrientationalDescriptor'>, 'rboo': <class 'partycls.descriptors.radial_bo.RadialBondOrientationalDescriptor'>, 'rbop': <class 'partycls.descriptors.radial_bo.RadialBondOrientationalDescriptor'>, 'sbo': <class 'partycls.descriptors.smoothed_bo.SmoothedBondOrientationalDescriptor'>, 'sboo': <class 'partycls.descriptors.smoothed_bo.SmoothedBondOrientationalDescriptor'>, 'sbop': <class 'partycls.descriptors.smoothed_bo.SmoothedBondOrientationalDescriptor'>, 'steinhardt': <class 'partycls.descriptors.bo.SteinhardtDescriptor'>, 'tetra': <class 'partycls.descriptors.tetrahedrality.TetrahedralDescriptor'>, 'tong tanaka': <class 'partycls.descriptors.compactness.TongTanakaDescriptor'>, 'tong-tanaka': <class 'partycls.descriptors.compactness.TongTanakaDescriptor'>}
- clustering_db = {'cinf': <class 'partycls.clustering.CommunityInference'>, 'community inference': <class 'partycls.clustering.CommunityInference'>, 'community-inference': <class 'partycls.clustering.CommunityInference'>, 'gaussian mixture': <class 'partycls.clustering.GaussianMixture'>, 'gaussian-mixture': <class 'partycls.clustering.GaussianMixture'>, 'gm': <class 'partycls.clustering.GaussianMixture'>, 'gmm': <class 'partycls.clustering.GaussianMixture'>, 'inference': <class 'partycls.clustering.CommunityInference'>, 'k-means': <class 'partycls.clustering.KMeans'>, 'kmeans': <class 'partycls.clustering.KMeans'>}
- scaling_db = {'max-abs': <class 'partycls.feature_scaling.MaxAbs'>, 'maxabs': <class 'partycls.feature_scaling.MaxAbs'>, 'min-max': <class 'partycls.feature_scaling.MinMax'>, 'minmax': <class 'partycls.feature_scaling.MinMax'>, 'robust': <class 'partycls.feature_scaling.Robust'>, 'standard': <class 'partycls.feature_scaling.ZScore'>, 'z-score': <class 'partycls.feature_scaling.ZScore'>, 'zscore': <class 'partycls.feature_scaling.ZScore'>}
- dim_reduction_db = {'ae': <class 'partycls.dim_reduction.AutoEncoder'>, 'auto-encoder': <class 'partycls.dim_reduction.AutoEncoder'>, 'autoencoder': <class 'partycls.dim_reduction.AutoEncoder'>, 'lle': <class 'partycls.dim_reduction.LocallyLinearEmbedding'>, 'pca': <class 'partycls.dim_reduction.PCA'>, 't-sne': <class 'partycls.dim_reduction.TSNE'>, 'tsne': <class 'partycls.dim_reduction.TSNE'>}
- property labels
Clustering labels.
- property fractions
Fraction of particles in each cluster.
- property populations
Number of particles in each cluster.
- property centroids
Centroid of each cluster.
- run()[source]
Compute the clustering and write the output files according to the defined Workflow :
compute the descriptor
(optional) apply feature scaling
(optional) apply dimensionality reduction
compute the clustering
(optional) write the output files
- Raises
ValueError – If a community inference clustering is attempted with feature scaling or dimensionality reduction.
- Return type
None
- set_output_metadata(what, **kwargs)[source]
Change the output properties.
- Parameters
what (str) –
Type of output file to change. Must be one of:
"trajectory"
"log"
"centroids"
"labels"
"dataset"
**kwargs – Keywords arguments (specific to each type of file)
- Return type
None
Examples
>>> wf = Workflow('trajectory.xyz') >>> wf.set_output_metadata('log', enable=False) # do not write the log file >>> wf.set_output_metadata('trajectory', filename='awesome_trajectory.xyz') # change the default output name >>> wf.run('dataset', enable=True, precision=8) # write the dataset and change the writing precision to 8 digits
- write_trajectory(filename=None, fmt='xyz', backend=None, additional_fields=None, precision=6, **kwargs)[source]
Write the trajectory file with cluster labels (default) and other additional fields (if any).
- Parameters
filename (str, default: None) – Filename of the output trajectory. Uses a default naming convention if not specified. The default is None.
fmt (str, default: "xyz") – Output trajectory format.
backend (str, default: None) – Name of the backend to use to write the trajectory. Must be either
None
,"atooms"
or"mdtraj"
.additional_fields (list, default: None) – Additional fields (i.e. particle properties) to write in the output trajectory. Note that all the
Particle
objects should have the specified properties as attributes.precision (int, default: 6) – Number of decimals when writing the output trajectory.
- Return type
None
Examples
>>> wf = Workflow('trajectory.xyz') >>> wf.write_trajectory(fmt='rumd') >>> wf.write_trajectory(additional_field=['particle.mass']) # `Particle` must have the attribute `mass`. >>> wf.write_trajectory(filename='my_custom_name', precision=8)
- write_log(filename=None, precision=6, **kwargs)[source]
Write a log file with all relevant information about the workflow. The log file can be written only if the workflow has been run at least once with the method Workflow.run.
- write_centroids(filename=None, precision=6, **kwargs)[source]
Write the coordinates of the clusters’ centroids using the raw features from the descriptor (i.e. nor scaled or reduced).
- write_labels(filename=None, **kwargs)[source]
Write the clusters’ labels only.
- Parameters
filename (str, default: None) – Filename of the labels file. Uses a default naming convention if not specified.
- Return type
None