import React from 'react';
import '../../styles/subsection.css';
import Header from '../../components/Header';
import Footer from '../../components/Footer';
import { Link } from 'react-router-dom';
import 'katex/dist/katex.min.css';
import { InlineMath, BlockMath } from 'react-katex';

// Image imports
import LinearTransformations from '../../media/LinearAlgebra/lineartransformation.png';
import nmf from '../../media/LinearAlgebra/nmf.png'

function LinearAlgebra() {
    return (
        <div className="subsubsection-container">
            <Header />
            <div class="side-nav-container">
                <aside className="subsubsection-side-nav">
                    <a href="#linbasics">Data Structures</a>
                    <a href="#lintrans">Transformations</a>
                    <a href="#spaces">Vector Spaces</a>
                    <a href="#eigen">Eigenvalues & Eigenvectors</a>
                    <a href="#comps">Decomposition & Factorization</a>
                </aside>
            </div>
            
            <main className="subsubsection-content">
                <div className="titles"><h1>Linear Algebra</h1></div>
                <section id="corelinalg" className="code-cleaned">

                    <p className="subsubsection-paragraph"></p>

                    <p className='subsubsection-paragraph'>
                        Linear algebra is a branch of mathematics that revolves around vectors, matrices, and systems of linear equations. At its heart, it studies Linear Transformations
                        between vector spaces, which are fundamental in representing geometric transformations, solving systems of linear equations, and much more. 
                        Linear algebra introduces concepts such as vector spaces, matrices, determinants, eigenvalues, and eigenvectors, which are pivotal in understanding multi-dimensional data. If 
                        some of these concepts are new to me, you might want to go back and review them before you venture forward however, I will be providing a short primer on many of these
                        topics.
                    </p>

                    <p className='subsubsection-paragraph'>
                        Linear algebra is at the core of natural language processing. Words, sentences, and documents can be represented as vectors in high-dimensional spaces. 
                        Techniques like Word2Vec or GloVe embed words into vectors, capturing semantic meanings based on their context. Matrices come into play when we deal with large datasets or 
                        corpuses, representing term-document matrices or co-occurrence matrices. Eigenvalues and eigenvectors become relevant in techniques like Latent Semantic Analysis that 
                        reduce dimensionality and capture latent relationships between terms. Furthermore, linear transformations are at the core of neural networks, which are now the backbone of 
                        modern NLP. They help in transforming input data (text) into meaningful representations, allowing algorithms to understand and generate human-like text. In essence, 
                        linear algebra provides the mathematical foundation upon which many NLP techniques and models are built, making it an indispensable tool in the NLP toolkit. If much of this sounds
                        like jargon, don't worry -- the depth behind these terms will come into illumination as you work through the website!
                    </p>

                </section>
                
                <section id="linbasics" className="code-cleaned">
                    <h2>Data Structures</h2>
                    <p className="subsubsection-paragraph"></p>

                    <h4>Scalars</h4>
                    <p className="subsubsection-paragraph">
                        A scalar is a single numerical value. Mathematically, a scalar can be a real number <InlineMath math="\alpha \in \mathbb{R}" /> or a 
                        complex number <InlineMath math="\alpha \in \mathbb{C}" />. 

                    <p className="subsubsection-paragraph"></p>
                        Normally, I would provide some examples of how each of these concepts is relevant to NLP but for some, it should be patently obvious; if it's not, then yeah, I'll just echo what 
                        I said before about going back and reviewing things.
                    </p>


                    <section id="vectors">
                        <h4>Vectors</h4>
                        <p className="subsubsection-paragraph">
                            A vector is an ordered array of numbers. These numbers can represent various quantities. In mathematical notation, a column vector with three elements can be represented as:
                            <BlockMath math="\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}" />
                        </p>
                        <p className="subsubsection-paragraph">
                            Vectors are fundamental in NLP as they allow us to represent words, sentences, or entire documents as points in a high-dimensional space. 
                            As we'll learn later, this representation is known as an "embedding". For example, using methods like Word2Vec, words are mapped to vectors such that the similarity 
                            between two vectors is smaller if those words are similar in some sense.
                        </p>

                    </section>
                    <section id="matrices">
                        <h4>Matrices</h4>
                        <p className="subsubsection-paragraph">
                            A matrix is a two-dimensional array of numbers arranged in rows and columns. It can be thought of as a collection of vectors stacked either horizontally or vertically. 
                            Mathematically, a matrix with three rows and three columns can be represented as:
                            <BlockMath math="\mathbf{X} = \begin{bmatrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23} \\ x_{31} & x_{32} & x_{33} \end{bmatrix}" />
                        </p>
                        <p className="subsubsection-paragraph">
                            Notice that we continuing down the path of generalization. Scalars make up vectors which make up matrices. The matrix is a standard data structure and you have most 
                            definitely encountered this object if you have ever worked with data. For example, any dataset with rows and columns is, in fact, (representable by) a matrix. You will see
                            that different kinds of matrices show up in the underlying mathematics of various techniques such as, for example, the weight matrix in neural networks. 
                        </p>
                    </section>

                    <section id="tensors">
                        <h4>Tensors</h4>
                        <p className="subsubsection-paragraph">
                            A tensor can be thought of as a generalization of scalars, vectors, and matrices. It is a multi-dimensional array of numbers. While a scalar is 0-dimensional, 
                            a vector is 1-dimensional, and a matrix is 2-dimensional, a tensor can be 3-dimensional or higher (or lower). Mathematically, a 3-dimensional tensor can be visualized as 
                            a cube of numbers, represented as:
                            <BlockMath math="\mathbf{X} = \begin{bmatrix} 
                                \begin{bmatrix} x_{111} & x_{112} \\ x_{121} & x_{122} \end{bmatrix} & 
                                \begin{bmatrix} x_{211} & x_{212} \\ x_{221} & x_{222} \end{bmatrix} &
                                \begin{bmatrix} x_{311} & x_{312} \\ x_{321} & x_{322} \end{bmatrix}
                            \end{bmatrix}" />
                        </p>
                        <p className="subsubsection-paragraph">
                            The first subscript represents the matrix within the tensor in question and the next two subscripts position the scalars within the matrix as in the previous section. 
                            Tensors are fundamental in modern NLP, particularly with deep learning models. They are used to represent and process data in frameworks like TensorFlow and PyTorch. 
                            For instance, in a convolutional neural network (CNN), the input image or text is often represented as a tensor to account for height, width, and multiple channels or 
                            embeddings. Similarly, in transformers like BERT or GPT, tensors manage the input sequences, attention scores, and output predictions, facilitating operations across 
                            multiple dimensions simultaneously.
                        </p>
                    </section>




                </section>


                <section id="lintrans" className="code-cleaned">
                    <h2>Transformations</h2>

                    <h4>Linear Transformations</h4>
                    <p className="subsubsection-paragraph">
                        Linear transformations refer to functions that map one vector to another in such a way that both vector addition and scalar multiplication are preserved. Mathematically, 
                        for any vectors <InlineMath math="\mathbf{v}" /> and <InlineMath math="\mathbf{w}" />, and scalar <InlineMath math="c" />, a linear transformation <InlineMath math="T" /> satisfies:
                        <BlockMath math="T(\mathbf{v} + \mathbf{w}) = T(\mathbf{v}) + T(\mathbf{w})" />
                        <BlockMath math="T(c \mathbf{v}) = c T(\mathbf{v})" />
                    </p>

                    <p className="subsubsection-paragraph">Basically, if the above rules are satisfied, we call it a linear transformation. Geometrically, a linear transformation will not change 
                    the shape of our object but it may be stretch, flip, or rotate it. For example, if we had a vector 
                    <BlockMath math="\mathbf{v} = \begin{bmatrix} v_1 \\ v_2  \end{bmatrix}" /> 
                    and we multiplied this object by the scalar <InlineMath math="c = -1" />, the resulting vector: 
                    <BlockMath math="\mathbf{v} = \begin{bmatrix} -v_1 \\ -v_2  \end{bmatrix}" /> 
                    will have the same shape but will just be flipped on its axis. </p>

                    <div className="flex-container"><img src={LinearTransformations} alt="Broken" className="image-small"/></div>
                    
                    
                    <p className="subsubsection-paragraph">In the image above, you can see that the object has remained a line but it has just been flipped on the origin. 
                    
                    </p>

                    <h4>Non-Linear Transformations</h4>
                    <p className="subsubsection-paragraph">
                    Non-linear transformations are mappings from one vector space to another, or within the same vector space, that do not necessarily preserve vector addition and scalar 
                    multiplication. Naturally, all non-linear transformations <InlineMath math="T" /> do not satisfy both of the linear transformation conditions i.e.:
                    <BlockMath math="T(\mathbf{v} + \mathbf{w}) \neq T(\mathbf{v}) + T(\mathbf{w})" /> <center>Or</center>
                    <BlockMath math="T(c \mathbf{v}) \neq c T(\mathbf{v})" />
                    for all vectors <InlineMath math="\mathbf{v}" /> <InlineMath math="\mathbf{w}" />, and scalars <InlineMath math="c" />.
                    </p>

                    <p className="subsubsection-paragraph">In essence, non-linear transformations can distort the space in which the vectors reside, such as bending, twisting, or any change 
                    that doesn't maintain straight lines or uniform scaling. For instance, consider a vector 

                    <BlockMath math="\mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \end{bmatrix}" />

                    a non-linear transformation might map it to 

                    <BlockMath math="T(\mathbf{v}) = \begin{bmatrix} \sin(v_1) \\ \sin(v_2) \end{bmatrix}" />

                    which does not preserve the linear relationships between its components. To see why this is not a linear transformation algebraically, we can consider the definition of a 
                    linear transformation; first, the property of additivity:</p>
                    
                    <div className="math-container">
                    <BlockMath math="T(\mathbf{v} + \mathbf{w}) = T\left(\begin{bmatrix} v_1 + w_1 \\ v_2 + w_2 \end{bmatrix}\right) = \begin{bmatrix} \sin(v_1 + w_1) \\ \sin(v_2 + w_2) \end{bmatrix}" />
                    <BlockMath math="T(\mathbf{v}) + T(\mathbf{w}) = \begin{bmatrix} \sin(v_1) \\ \sin(v_2) \end{bmatrix} + \begin{bmatrix} \sin(w_1) \\ \sin(w_2) \end{bmatrix} = \begin{bmatrix} \sin(v_1) + \sin(w_1) \\ \sin(v_2) + \sin(w_2) \end{bmatrix}" />
                    <BlockMath math="T(\mathbf{v} + \mathbf{w}) \neq T(\mathbf{v}) + T(\mathbf{w})" />
                    </div>

                    <p className="subsubsection-paragraph">And next, the property of scalar multiplication:</p>

                    <div className="math-container">
                    <BlockMath math="T(c \mathbf{v}) = T\left(\begin{bmatrix} c v_1 \\ c v_2 \end{bmatrix}\right) = \begin{bmatrix} \sin(c v_1) \\ \sin(c v_2) \end{bmatrix}" />
                    <BlockMath math="c T(\mathbf{v}) = c \begin{bmatrix} \sin(v_1) \\ \sin(v_2) \end{bmatrix} = \begin{bmatrix} c \sin(v_1) \\ c \sin(v_2) \end{bmatrix}" />
                    <BlockMath math="T(c \mathbf{v}) \neq c T(\mathbf{v})" />
                    </div>

                    <p className="subsubsection-paragraph">These kinds of transformations allow us tons of flexibility if we were building a model to fit data. For example, if we had data that was cyclical over the year, we would probably
                    not want to use a linear model on such data but, as you might have guessed, linear models are limited to whatever flexibility that linear transformations allow. Models that have
                    non-linear components are much better at handling data that has some inherent non-linearities (which is most data). Neural networks have this in the form of activation functions as
                    you will learn later.
                    </p>


                </section>

                <section id="spaces" className="code-cleaned">
                    <h2>Vector Spaces</h2>
                    <p className="subsubsection-paragraph"></p>

                    <h4>Definitions</h4>
                    <p className="subsubsection-paragraph">Okay okay, I have mentioned vector spaces a couple of times but what exactly are they? Well the mathematical definition is as follows:

                    A <strong>vector space</strong> is a set <InlineMath math="V" /> along with two operations that adhere to specific rules. The elements of <InlineMath math="V" /> are called vectors. The two operations are:
                    <ol>
                        <li><strong>Vector addition</strong>: An operation that takes two vectors <InlineMath math="\mathbf{u}" /> and <InlineMath math="\mathbf{v}" /> in <InlineMath math="V" /> and 
                        assigns to them a third vector <InlineMath math="\mathbf{u} + \mathbf{v}" /> in <InlineMath math="V" />.</li>
                        <li><strong>Scalar multiplication</strong>: An operation that takes a scalar <InlineMath math="c" /> and a vector <InlineMath math="\mathbf{v}" /> in <InlineMath math="V" /> 
                        and assigns to them a vector <InlineMath math="c\mathbf{v}" /> in <InlineMath math="V" />.</li>
                    </ol>
                    Notice that these are the same properties we used in the previous section. 
                    These operations must satisfy the following eight axioms, for all vectors <InlineMath math="\mathbf{u}, \mathbf{v}, \mathbf{w}" /> in <InlineMath math="V" /> and all 
                    scalars <InlineMath math="a, b" />:
                    <ol>
                        <li><strong>Closure under addition</strong>: <InlineMath math="\mathbf{u} + \mathbf{v}" /> is in <InlineMath math="V" />.</li>
                        <li><strong>Commutativity of addition</strong>: <InlineMath math="\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}" />.</li>
                        <li><strong>Associativity of addition</strong>: <InlineMath math="(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})" />.</li>
                        <li><strong>Existence of additive identity</strong>: There exists an element <InlineMath math="\mathbf{0}" /> in <InlineMath math="V" />, such that <InlineMath math="\mathbf{v} + \mathbf{0} = \mathbf{v}" /> for all <InlineMath math="\mathbf{v}" /> in <InlineMath math="V" />.</li>
                        <li><strong>Existence of additive inverses</strong>: For each <InlineMath math="\mathbf{v}" /> in <InlineMath math="V" />, there exists 
                        a <InlineMath math="\mathbf{-v}" /> in <InlineMath math="V" /> such that <InlineMath math="\mathbf{v} + (\mathbf{-v}) = \mathbf{0}" />.</li>
                        <li><strong>Closure under scalar multiplication</strong>: <InlineMath math="c\mathbf{v}" /> is in <InlineMath math="V" />.</li>
                        <li><strong>Associativity of scalar multiplication</strong>: <InlineMath math="a(b\mathbf{v}) = (ab)\mathbf{v}" />.</li>
                        <li><strong>Distributive properties</strong>: <InlineMath math="a(\mathbf{u} + \mathbf{v}) = a\mathbf{u} + a\mathbf{v}" /> and <InlineMath math="(a + b)\mathbf{v} = a\mathbf{v} + b\mathbf{v}" />.</li>
                    </ol>
                </p>
                <p className="subsubsection-paragraph">
                    Intuitively, a vector space can be considered as a collection of objects (vectors) that can be scaled and added together in a consistent and structured way. In this space:
                    <ul>
                        <li>Vectors can represent various mathematical and physical concepts like points in space, functions, or more abstract entities.</li>
                        <li>The rules for vector addition and scalar multiplication align with our basic understanding of these operations from algebra.</li>
                        <li>The axioms ensure predictable and structured behavior of vector addition and scalar multiplication.</li>
                    </ul>
                    Basically, vector spaces are just like the "setting" in which many operations can take place and these operations must adhere to certain axioms (outlined above).
                </p>
                <p className="subsubsection-paragraph">
                    Examples of vector spaces include:
                    <ul>
                        <li>The set of all two-dimensional vectors, with vector addition corresponding to component-wise addition and scalar multiplication scaling the vectors.</li>
                        <li>The set of all polynomials of a certain degree or less.</li>
                        <li>Function spaces, where vectors are functions.</li>
                    </ul>
                    Vector spaces provide a framework for dealing with linear equations, transformations, and complex structures like inner product spaces and Hilbert spaces -- just ignore all 
                    this if you don't know it. More importantly, vector spaces play a crucial role in NLP, such as in semantic analysis and representation of words, 
                    phrases, and documents. One of the most significant applications of vector spaces in NLP is through word embeddings. Word embeddings are a type of word representation that 
                    allows words with similar meaning to have a similar representation. They are essentially vector space models where each word in a language is assigned to a high-dimensional 
                    vector (typically in a space with hundreds of dimensions). 
                </p>

                <h4>Span & Linear Independence</h4>
                <p className="subsubsection-paragraph">
                    The concepts of <strong>span</strong> and <strong>linear independence</strong> are fundamental in understanding vector spaces.
                </p>
                <p className="subsubsection-paragraph">
                    <strong>Span:</strong> The span of a set of vectors in a vector space is the set of all possible linear combinations of those vectors. Mathematically, if we have a set of 
                    vectors <InlineMath math="\{ \mathbf{v}_1, \mathbf{v}_2, ..., \mathbf{v}_n \}" />, the span is the set of all vectors that can be 
                    expressed as <InlineMath math="a_1\mathbf{v}_1 + a_2\mathbf{v}_2 + ... + a_n\mathbf{v}_n" />, where <InlineMath math="a_1, a_2, ..., a_n" /> are scalars. The span 
                    represents all the points that can be reached within a vector space by linearly combining the original set of vectors. Linear combinations are also important and
                    you should know what they are -- if you don't know what they are, just reread this paragraph and look specifically at this part:

                    <BlockMath math="a_1\mathbf{v}_1 + a_2\mathbf{v}_2 + ... + a_n\mathbf{v}_n" /> where <InlineMath math="a_1, a_2, ..., a_n" /> -- this is a linear combination.
                </p>
                <p className="subsubsection-paragraph">
                    <strong>Linear Independence:</strong> A set of vectors is said to be linearly independent if no vector in the set can be written as a linear combination of the others. In formal terms, 
                    a set of vectors <InlineMath math="\{ \mathbf{u}_1, \mathbf{u}_2, ..., \mathbf{u}_m \}" /> is linearly independent if the only solution to 
                    the equation <InlineMath math="c_1\mathbf{u}_1 + c_2\mathbf{u}_2 + ... + c_m\mathbf{u}_m = \mathbf{0}" /> (where <InlineMath math="\mathbf{0}" /> is the 
                    zero vector) is <InlineMath math="c_1 = c_2 = ... = c_m = 0" />. If any <InlineMath math="c_i" /> is non-zero, then the vectors are linearly dependent.
                </p>
                <p className="subsubsection-paragraph">
                    These concepts are crucial in understanding the structure and dimension of vector spaces which, by extension, is relevant to NLP.  In particular, having linearly independent 
                    features (e.g., word embeddings or other linguistic features) is crucial for reducing redundancy in data representation. It helps in building more efficient and effective models.
                    Moreover, Techniques like Principal Component Analysis (PCA) or Singular Value Decomposition (SVD) are used in NLP to reduce the dimensionality of feature spaces 
                    (like high-dimensional word vectors) while retaining as much information as possible (you'll learn more about these very soon). These techniques rely on the concept of linear 
                    independence to identify the most informative axes or dimensions.
                </p>




                <h4>Orthogonality & Orthonormality</h4>
                <p className="subsubsection-paragraph">
                    Orthogonality and Orthonormality are key concepts in linear algebra with significant implications in various fields, including NLP.
                </p>
                <p className="subsubsection-paragraph">
                    <strong>Orthogonality:</strong> Two vectors are orthogonal if their dot product is zero (i.e. the sum of the ordered pair-wise multiplication between vector elements). 
                    In mathematical terms, vectors <InlineMath math="\mathbf{u}" /> and <InlineMath math="\mathbf{v}" /> are 
                    orthogonal if <InlineMath math="\mathbf{u} \cdot \mathbf{v} = 0" />. In a geometric sense, this means they are at right angles 
                    to each other. Orthogonality is a critical concept in vector spaces, indicating a kind of independence between vectors.
                </p>
                <p className="subsubsection-paragraph">
                    <strong>Orthonormality:</strong> A set of vectors is orthonormal if all vectors in the set are both orthogonal to each other and each vector is of unit length (norm equals 1). 
                    Mathematically, for any two vectors <InlineMath math="\mathbf{u}" /> and <InlineMath math="\mathbf{v}" /> in an orthonormal 
                    set, <InlineMath math="\mathbf{u} \cdot \mathbf{v} = 0" /> if <InlineMath math="\mathbf{u} \neq \mathbf{v}" /> and <InlineMath math="\|\mathbf{u}\| = 1" />.
                </p>
                <p className="subsubsection-paragraph">
                    In NLP, these concepts are relevant in:
                    <ul>
                        <li><strong>Word Embeddings:</strong> Orthogonal or orthonormal vectors can represent word embeddings in a way that reduces redundancy and improves the interpretability of the embeddings. This helps in tasks like semantic analysis, where the goal is to understand and process the meanings of words and phrases.</li>
                        <li><strong>Dimensionality Reduction:</strong> Techniques like PCA (Principal Component Analysis) often aim for orthonormality in the transformed feature space to maintain the 
                        independence of features and reduce dimensionality without losing significant information.</li>
                        <li><strong>Model Optimization:</strong> In designing and optimizing NLP models, ensuring orthogonal or orthonormal features can lead to more stable and efficient training processes,
                         as these features are less likely to contain redundant information.</li>
                    </ul>
                </p>


                <h4>Inner Products & Cosine Similarity</h4>
                <p className="subsubsection-paragraph">
                    The concepts of <strong>inner products</strong> and <strong>cosine similarity</strong> are crucial in linear algebra and have significant applications in NLP.
                </p>
                <p className="subsubsection-paragraph">
                    <strong>Inner Products:</strong> An inner product is a generalization of the dot product. In a vector space, the inner product between two 
                    vectors <InlineMath math="\mathbf{u}" /> and <InlineMath math="\mathbf{v}" /> is a scalar that provides a measure of how much one vector extends in the direction of another. 
                    Mathematically, it is often represented as <InlineMath math="\langle \mathbf{u}, \mathbf{v} \rangle" />. The inner product can be used to define lengths (norms) and angles 
                    between vectors. Mathematically, it's defined as:

                    <BlockMath math="\langle \mathbf{u}, \mathbf{v} \rangle = u_1v_1 + u_2v_2 + \ldots + u_nv_n" />
    where <InlineMath math="\mathbf{u} = (u_1, u_2, \ldots, u_n)" /> and <InlineMath math="\mathbf{v} = (v_1, v_2, \ldots, v_n)" />.
                </p>
                <p className="subsubsection-paragraph">
                    <strong>Cosine Similarity:</strong> Cosine similarity is a measure of similarity between two vectors that is derived from the inner product. It is calculated as the cosine of the angle 
                    between the two vectors, which can be found using the dot product and the magnitudes (norms) of the vectors. Cosine similarity is especially useful when the magnitude of the 
                    vectors is not of interest but rather their orientation or direction. Mathematically, it is expressed as 
                    
                    <BlockMath math="cos(\theta) = \frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\|\|\mathbf{v}\|}" />
                </p>
                <p className="subsubsection-paragraph">
                    By the way, in case you were wondering, The <b>norm</b> of a vector <InlineMath math="\mathbf{v}" /> is the measure of its magnitude and is a function of the inner product:
                    <BlockMath math="\| \mathbf{v} \| = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle} = \sqrt{v_1^2 + v_2^2 + \ldots + v_n^2}" />
                    This represents the length of vector <InlineMath math="\mathbf{v}" /> in Euclidean space. </p>
                    
                    <p className="subsubsection-paragraph">In NLP, these concepts are important for:
                    <ul>
                        <li><strong>Measuring Semantic Similarity:</strong> Cosine similarity is extensively used in NLP to measure the similarity between different word embeddings, capturing how closely related two words or documents are in terms of their meaning or context.</li>
                        <li><strong>Text Classification and Clustering:</strong> By using inner product and cosine similarity, machine learning models can classify or cluster texts based on their semantic similarity, aiding in tasks like document retrieval, sentiment analysis, and topic modeling.</li>
                        <li><strong>Word Embedding Optimization:</strong> These measures help in refining and assessing the quality of word embeddings, ensuring that words with similar meanings have similar vector representations.</li>
                    </ul>
                    We will see that cosine similarity is extremely popular and shows up numerous times, particularly when comparing embeddings -- it's just such a common sense way of comparing two 
                    vectors within a vector space!
                </p>


                <h4>Gram-Schmidt Process</h4>
                <p className="subsubsection-paragraph">
                    The <strong>Gram-Schmidt Process</strong> is a fundamental method in linear algebra used for orthogonalizing a set of vectors in an inner product space, and it plays a key role in 
                    constructing orthonormal bases in vector spaces.
                </p>
                <p className="subsubsection-paragraph">
                    This process transforms a set of linearly independent vectors into a set of orthogonal vectors. It begins with a non-zero vector, normalizes it to unit length to form the 
                    first vector of the orthonormal set. For each subsequent vector, the process involves removing the components that are parallel to the already constructed vectors. 
                    This is achieved through projection and subtraction: a vector is projected onto another, and this projection is subtracted from the original vector to ensure orthogonality. </p>

                    <p className="subsubsection-paragraph">
                    Mathematically, for a set of vectors <InlineMath math="\{ \mathbf{v}_1, \mathbf{v}_2, ..., \mathbf{v}_n \}" />, the Gram-Schmidt Process generates a set of orthogonal 
                    vectors <InlineMath math="\{ \mathbf{u}_1, \mathbf{u}_2, ..., \mathbf{u}_n \}" /> where 
                    <BlockMath math="\mathbf{u}_1 = \mathbf{v}_1" /> 
                    and for <InlineMath math="i > 1" />, <BlockMath math="\mathbf{u}_i = \mathbf{v}_i - \sum_{j=1}^{i-1} \text{proj}_{\mathbf{u}_j} \mathbf{v}_i" />
                    with <BlockMath math="\text{proj}_{\mathbf{u}} \mathbf{v} = \frac{\langle \mathbf{v}, \mathbf{u} \rangle}{\langle \mathbf{u}, \mathbf{u} \rangle} \mathbf{u}" /> representing 
                    the projection of <InlineMath math="\mathbf{v}" /> onto <InlineMath math="\mathbf{u}" />.
                </p>
                <p className="subsubsection-paragraph">
                    Intuitively, this process can be thought of as a way to straighten and align vectors so that they are perpendicular to each other, much like organizing a set of tools in 
                    orthogonal directions for efficiency and ease of access. This organization makes it easier to understand their relationships and utilize them effectively in various mathematical
                     operations. Or, another way to put it is that we have some vectors <InlineMath math="\{ \mathbf{v}_1, \mathbf{v}_2, ..., \mathbf{v}_n \}" /> and we want to find another 
                     set of vectors such that those vectors form a orthonormal basis (span of linearly independent vectors) i.e. you can get the original vectors back using some transformations 
                     on the new vectors, <InlineMath math="\{ \mathbf{u}_1, \mathbf{u}_2, ..., \mathbf{u}_n \}" />.
                </p>
                <p className="subsubsection-paragraph">
                    The relevance of the Gram-Schmidt Process in NLP is multifaceted. In the context of NLP, particularly when dealing with high-dimensional spaces such as those involved in 
                    word embeddings, the Gram-Schmidt Process can be instrumental in orthogonalizing features. This reduction in redundancy is key to enhancing model performance, as it ensures 
                    that each feature contributes unique information. Additionally, this process aids in dimensionality reduction techniques, which are vital in NLP for managing the computational 
                    complexity and improving the efficiency of algorithms. By ensuring that the reduced set of features retains as much of the original information as possible while being orthogonal, 
                    the Gram-Schmidt Process helps maintain the integrity and interpretability of the data. Moreover, for complex NLP tasks that require a structured and orthogonal feature set,
                     such as in certain types of linguistic analysis and model constructions, the Gram-Schmidt Process offers a systematic method for constructing an orthonormal basis. 
                     This structured approach is crucial for the stability and effectiveness of advanced NLP models, as it helps in accurately capturing and representing the nuances of 
                     human language in a computational format.
                </p>



                </section>

                <section id="eigen" className="code-cleaned">
                <h2>Eigenvalues & Eigenvectors</h2>
                <p className="subsubsection-paragraph">
                    Eigenvalues and Eigenvectors are key concepts in linear algebra with significant implications in various fields, including NLP.
                </p>

                <h4>Definitions</h4>
                <p className="subsubsection-paragraph">
                    An <strong>eigenvalue</strong> of a matrix <InlineMath math="A" /> is a scalar <InlineMath math="\lambda" /> such that there exists a non-zero 
                    vector <InlineMath math="\mathbf{v}" />, known as an eigenvector, which satisfies the equation

                    <BlockMath math="A\mathbf{v} = \lambda\mathbf{v}" />
                    
                    This relationship implies that the action of the matrix on the eigenvector <InlineMath math="\mathbf{v}" /> results in a vector that is a scalar multiple of <InlineMath math="\mathbf{v}" />.
                </p>
                <p className="subsubsection-paragraph">
                    <strong>Eigenvectors</strong> of a matrix <InlineMath math="A" /> are vectors that, upon multiplication by <InlineMath math="A" />, change only in magnitude, not in direction. 
                    The corresponding eigenvalue <InlineMath math="\lambda" /> indicates the factor by which the magnitude changes.
                </p>

                <h4>Geometry</h4>
                <p className="subsubsection-paragraph">
                    Geometrically, an eigenvector of a matrix represents a direction in which the transformation (described by the matrix) only scales the vector and does not rotate it.
                     The eigenvalue gives the scale factor of this stretching or compressing. In two dimensions, this can be visualized as vectors on a plane getting stretched or 
                     compressed along specific directions (lines) determined by the eigenvectors, while maintaining their direction.
                </p>

                <p className="subsubsection-paragraph">
                    In NLP, eigenvalues and eigenvectors are utilized in techniques like Latent Semantic Analysis (LSA) and Principal Component Analysis for dimensionality 
                    reduction and semantic space construction. By identifying principal components in word vector spaces, these concepts help in extracting meaningful semantic relationships 
                    from large text corpora, improving the efficiency and accuracy of various NLP tasks. Also, "eigen" is a German word that means "characteristic" or something like that.
                </p>


                </section>

                <section id="comps" className="code-cleaned">
                    <h2>Decompositions & Factorizations</h2>
                    <p className="subsubsection-paragraph"></p>

                    <h4>Singular Value Decomposition (SVD)</h4>
                    <p className="subsubsection-paragraph">
                        Singular Value Decomposition, or SVD, is a fundamental concept in linear algebra and has widespread applications in statistics, signal processing, and machine learning, 
                        including NLP. It provides a way to decompose a matrix into simpler, constituent parts, which can be very useful for analysis and data compression. 
                    </p>

                    <p className="subsubsection-paragraph">
                        The SVD of a matrix <InlineMath math="A" /> is given by:
                        <BlockMath math="A = U \Sigma V^*" />
                        where <InlineMath math="A" /> is an <InlineMath math="m \times n" /> matrix. The components of the SVD are:
                        <ul>
                            <li><b>U:</b> An <InlineMath math="m \times m" /> orthogonal matrix, where the columns of <InlineMath math="U" />, known as left-singular vectors, are eigenvectors of <InlineMath math="AA^*" />.</li>
                            <li><b><InlineMath math="\Sigma"/>:</b> An <InlineMath math="m \times n" /> diagonal matrix. The diagonal entries of <InlineMath math="\Sigma" /> are known as singular values, which are the 
                            square roots of the eigenvalues of <InlineMath math="A^*A" /> and <InlineMath math="AA^*" />. These values are non-negative and are usually arranged in descending order.</li>
                            <li><b>V* (V conjugate transpose):</b> An <InlineMath math="n \times n" /> orthogonal matrix. The columns of <InlineMath math="V" />, called right-singular vectors, are 
                            eigenvectors of <InlineMath math="A^*A" />. The asterisk (*) represents the conjugate transpose (or simply the transpose for real-valued matrices).</li>
                        </ul>
                        All of this might sound kind of gibberish so I'll provide the following intuitive explanation: the first matrix, <InlineMath math="U"/> is columnly ordered in such a way that the first column is more 
                        "important" than the ones following it -- these columns provide a basis for <InlineMath math="A" />. Also, the diagonal entries of <InlineMath math="\Sigma"/> are decreasing -- this allows us to 
                        approximate the original matrix by being able to ignore some of the trailing diagonals in <InlineMath math="\Sigma"/>... you can see how this would help with reducing dimensionality. The last object in 
                        our decomposition, <b>V*</b> provides the weights for the basis and importance we discussed with the prior two components. For example, the first vector in <b>V*</b> would give us how much 
                        we need of each diagonal in <InlineMath math="\Sigma"/> and the eigenvectors in <b>U</b> to make the first column of  <InlineMath math="A" />. The exact interpretations of each of these components is context dependent. 
                        This decomposition is guaranteed to exist and is unique (at least to the point of direction, +/-). 
                    </p>

                    <p className="subsubsection-paragraph">
                        SVD is incredibly valuable for its ability to provide insight into the structure of a matrix. It's often used in data reduction, noise reduction, and for solving linear 
                        systems. In the context of NLP, SVD is a core component of algorithms like Latent Semantic Analysis (LSA), where it helps in identifying patterns and concepts hidden in 
                        large text corpora by reducing the dimensions of word-feature matrices while preserving the most critical information. You'll also see its relation to PCA in 
                        the next section.
                    </p>

                    <h4>Principal Component Analysis</h4>
                    <p className="subsubsection-paragraph">
                        Principal Component Analysis, or PCA, is a statistical procedure used in machine learning and data science to simplify the complexity in high-dimensional data while retaining trends and patterns. It is particularly useful in the fields of exploratory data analysis and making predictive models.
                    </p>

                    <p className="subsubsection-paragraph">
                        PCA transforms the original data into a new set of variables, the principal components, which are orthogonal (uncorrelated), and which capture the maximum variance in the data.
                        Mathematically, PCA involves the following steps:
                            <ol>
                                <li><strong>Standardization:</strong> Given a dataset with variables <InlineMath math="X_1, X_2, ..., X_n" />, each variable is standardized to have zero mean and unit variance. This is essential to compare variables on the same scale.</li>
                                <li><strong>Covariance Matrix:</strong> Compute the covariance matrix of the standardized data. The covariance matrix <InlineMath math="C" /> reflects how changes in one variable are associated with changes in another.</li>
                                <li><strong>Eigenvalue Decomposition:</strong> The eigenvalues and eigenvectors of the covariance matrix are computed. The eigenvectors represent the directions of maximum variance, and the eigenvalues represent the magnitude of this variance in each principal component.</li>
                                <li><strong>Projection:</strong> The original data points are projected onto the new subspace of principal components. This is done by multiplying the original data matrix by the matrix of eigenvectors.</li>
                            </ol>
                    </p>

                    <p className="subsubsection-paragraph">
                        PCA is closely related to Singular Value Decomposition. In fact, PCA can be performed through SVD. When SVD is applied to the standardized data 
                        matrix <InlineMath math="X" />, it decomposes <InlineMath math="X" /> into three matrices <InlineMath math="U, \Sigma, V^*" />, where <InlineMath math="X = U \Sigma V^*" />. 
                        The columns of <InlineMath math="V" /> (right-singular vectors) correspond to the principal components of <InlineMath math="X" />. The singular values 
                        in <InlineMath math="\Sigma" /> are the square roots of the eigenvalues of <InlineMath math="X^TX" />, which is the covariance 
                        matrix of <InlineMath math="X" /> when <InlineMath math="X" /> is standardized. Therefore, PCA via SVD gives a robust and computationally efficient method for 
                        dimensionality reduction.
                    </p>

                    <p className="subsubsection-paragraph">
                        PCA is widely used for exploratory data analysis and dimensionality reduction in machine learning. It helps in visualizing high-dimensional data, removing multicollinearity,
                         and simplifying the complexity in data. PCA is used for reducing the dimensions of high-dimensional datasets, such as word embeddings, making the data 
                         easier to work with and visualize, and often improving the performance of machine learning algorithms.
                    </p>

                    <h4>Matrix Factorization</h4>

                    <p className="subsubsection-paragraph">
                        Non-negative Matrix Factorization (NMF) is a group of algorithms in multivariate analysis where a data matrix is factorized into usually two matrices, with the 
                        constraint that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect and is useful for data with inherent 
                        non-negativity like images and text data.
                    </p>


                    <p className="subsubsection-paragraph">
                        Given a non-negative matrix <InlineMath math="V" />, NMF aims to find two non-negative matrices <InlineMath math="W" /> and <InlineMath math="H" /> such that:
                        <BlockMath math="V \approx W \times H" />
                        Here, <InlineMath math="V" /> is typically an <InlineMath math="n \times m" /> matrix, where <InlineMath math="n" /> is the number of features 
                        and <InlineMath math="m" /> is the number of samples. The matrix <InlineMath math="W" /> is an <InlineMath math="n \times k" /> matrix, 
                        and <InlineMath math="H" /> is a <InlineMath math="k \times m" /> matrix, where <InlineMath math="k" /> is the number of components (or latent features) we choose to extract.
                    </p>

                    <p className="subsubsection-paragraph"> <div className="flex-container"> <img src={nmf} alt="Broken" className="image-small" style={{ width: '70%' }}/></div></p>
                   
                    <p className="subsubsection-paragraph">
                        The goal of NMF is to approximate the original matrix <InlineMath math="V" /> with the product of <InlineMath math="W" /> and <InlineMath math="H" />, under the constraint 
                        that all elements in <InlineMath math="V" />, <InlineMath math="W" />, and <InlineMath math="H" /> are non-negative. This constraint leads to a parts-based representation
                         because it allows only additive, not subtractive, combinations.
                    </p>


                    <p className="subsubsection-paragraph">
                        In data analysis, NMF is used for dimensionality reduction and feature extraction, similar to PCA and SVD. However, its non-negativity constraint often makes the results
                         more interpretable. In NLP, NMF is particularly useful in topic modeling where it helps in discovering the latent topics in a corpus of text. Each component (or topic) 
                         in matrix <InlineMath math="H" /> can be viewed as a distribution over words, while each column of <InlineMath math="W" /> represents the composition of the topics in a
                          document.
                    </p>


                </section>

                
                
                <div className="subsubsection-navigation">
                    <Link to="/foundations/python">← Python</Link>
                    <Link to="/foundations/calc">Calculus →</Link>
                </div>
            </main>
            
            <Footer />
        </div>
    );
}

export default LinearAlgebra;
