Principal Component Analysis

Principal Component Analysis (PCA) is a dimensionality reduction technique that simplifies complex datasets by identifying principal components—directions of maximum variance in the data. These components are linear combinations of the original properties, ranked by their importance in explaining the variance.

For example, PCA can highlight geochemical trends from core samples in magmatic settings. A principal component with high silica can indicate samples of granitic compositions, while another component rich in iron may indicate magma mixing. Various components combined can help geologists identify magmatic processes or compositional patterns.

Pre-processing

Before proceeding with the PCA, the data are normalized if the option is activated. Normalization is done by subtracting the mean and dividing it by the standard deviation of each property.

Data points containing any properties with no-data value are excluded from the PCA.

Warning

If all points contain one no-data value, PCA cannot be performed. There should be at least as many valid data points (no empty data) as there are properties.

Interface

General Parameters

The general parameters controlling the Principal Component Analysis are shown in the figure below.

../../_images/ui_general3.png

The options are described as follows:

Object selection

  • Client: The object containing the data to be analyzed.

  • Data: The data to be analyzed. Users can select as many Float data properties as desired.

Pretrained model

  • Object containing weights: The object containing the weights of a pre-trained PCA model. If a model is provided, the application will use the weights to transform the data.

  • Pretrained model file (optional): If users have a pretrained model, they can load it here. The data the model was trained on must be present in the source object.

PCA parameters

  • Number of components (optional): The number of principal components to compute. Must be an integer greater than 0.

  • Explained variance (default = 0.8): The total variance to be explained by the principal components. This must be a float between 0 and 1, representing the percentage of total variance. This option is used only if Number of components is not set.

Output names

  • Output name (optional): The name of the output group. If not specified, the output will default to the input data name appended with “PCA”.

  • Save model (optional): If activated, users have to define the name of the model to save in the source object.

Optional Parameters

Advanced controls on the Scikit-Learn PCA algorithm are available as optional parameters, as shown in the figure below.

../../_images/ui_optional3.png
  • Standardize (optional): If checked, the data will be normalized before performing Principal Component Analysis (PCA).

  • Whiten (optional): If checked, the components are scaled to unit variance. Whitening removes correlations between features but may reduce interpretability.

  • SVD solver (default=’auto’): Determines the algorithm used for computing the principal components. Options include:

    • auto: Chooses the solver based on data size.

    • full: Uses a full Singular Value Decomposition (SVD) approach.

    • arpack: Uses a truncated SVD.

    • randomized: Uses a fast, approximate randomized SVD.

  • Tolerance (default=0.0): The tolerance for the singular values computed during the SVD.

  • Iterated power (optional, default=0): The number of iterations for the randomized SVD solver.

  • Oversamples (default=10): The number of additional random vectors to sample the range of the matrix.

  • Power iteration normalizer (default=’auto’): The method to normalize the power iterations. Options include:

    • auto: Chooses the normalizer based on the solver.

    • QR: Uses the QR decomposition (a mathematical decomposition method for matrices).

    • LU: Uses the LU decomposition (another method for matrix factorization).

  • random_state (optional): Controls the randomness of the solver when svd_solver = randomized. Pass an integer for reproducibility.

Once the parameters are selected, press OK to run the analysis.

Results

The application create a new group, as presented in the figure below. This group is named as defined in the Output name parameter and the new data created are name “Output name” + PC_n where n is the number of the component.

../../_images/group_results.png

If the option is activated, the saved weights are stored under the Client as “defined name” + .pkl.

Tutorial

The following video presents a tutorial on how to use the PCA application.

  • Open the application.

  • Select the object containing the data.

  • Select the data to be analyzed.

  • Define the number of components or the explained variance.

  • (optional) Choose the name to save the model.

  • Run the application.

  • Inspect the computed principal components.

To run a pretrained model

  • Select the object containing the data.

  • Select the data to be analyzed.

  • Select the object containing the weights.

  • Select the pretrained model file.

  • Run the application.

../../_images/pca.gif