线性代数 · Interview Prep

🧩 Matrix Partitioning and Decomposition: Complete Summary

Matrix partitioning refers to dividing or decomposing a matrix either by structure (into blocks) or by algebraic factorization (into special matrices).

It’s fundamental in linear algebra, optimization, and data science.

1️⃣ Structural Partitioning (Block Matrices)

Given a matrix $A \in \mathbb{R}^{m \times n}$, it can be partitioned as:

$$A =

\begin{bmatrix}

A_{11} & A_{12} \

A_{21} & A_{22}

\end{bmatrix},

\quad A_{ij} \text{ are submatrices.}$$

Common Forms

Row-wise partition:

$$A = [A_1^\top, A_2^\top, \dots, A_k^\top]^\top$$

Column-wise partition:

$$A = [A_1, A_2, \dots, A_k]$$

Block-diagonal:

A =

\begin{bmatrix}

A_{11} & 0 & 0 \

0 & A_{22} & 0 \

0 & 0 & A_{33}

\end{bmatrix}

Often used in covariance matrices or sparse systems.

Hierarchical (H-Matrix):

Recursively subdivides $A$ into smaller blocks, each approximated by low-rank matrices to accelerate inversion or multiplication.

Example: Schur Complement

If $A_{22}$ is invertible,

A / A_{22} = A_{11} - A_{12}A_{22}^{-1}A_{21}.

Used in block LU decomposition and Gaussian elimination.

2️⃣ Algebraic Matrix Decomposition

These express $A$ as products or sums of structured matrices for easier computation or interpretation.

(1) LU Decomposition

$$A = LU$$

$L$: lower-triangular

$U$: upper-triangular

→ Solving linear systems, determinant computation.

(2) QR Decomposition

$$A = QR$$

$Q$: orthogonal ($Q^\top Q = I$)

$R$: upper-triangular

→ Least-squares, numerical stability.

(3) Cholesky Decomposition

For symmetric positive-definite $A$:

$$A = LL^\top$$

→ Covariance matrices, Kalman filters.

(4) Eigenvalue Decomposition

$$A = P D P^{-1}$$

$D$: diagonal of eigenvalues

$P$: eigenvectors

→ Spectral analysis, diagonalization.

(5) Singular Value Decomposition (SVD)

$$A = U \Sigma V^\top$$

$U,V$: orthogonal

$\Sigma$: diagonal with non-negative singular values

→ PCA, compression, recommender systems.

(6) Polar Decomposition

$$A = QP$$

$Q$: orthogonal

$P$: symmetric positive semidefinite

→ Analogous to magnitude-direction in complex numbers.

(7) Low-Rank Factorization

$$A \approx UV^\top,\quad U\in\mathbb{R}^{m\times r},;V\in\mathbb{R}^{n\times r},;r\ll\min(m,n)$$

→ Used in large-scale learning, recommender systems, and matrix completion.

(8) Nonnegative Matrix Factorization (NMF)

$$A \approx WH, \quad W,H \ge 0$$

→ Topic modeling, interpretable features, signal separation.

(9) Block LU / QR Factorization

Combines structure and decomposition:

$$ A = \begin{bmatrix} A_{11} & A_{12} \ A_{21} & A_{22} \end{bmatrix}

\begin{bmatrix}

I & 0 \

A_{21}A_{11}^{-1} & I

\end{bmatrix}

\begin{bmatrix}

A_{11} & A_{12} \

0 & S

\end{bmatrix},

where $$S = A_{22} - A_{21}A_{11}^{-1}A_{12}.$$

(10) Tensor and Multiway Decompositions

For multidimensional (tensor) data:

\text{Tucker: } \mathcal{X} = \mathcal{G} \times_1 U_1 \times_2 U_2 \times_3 U_3,

\quad

\text{CP: } \mathcal{X} = \sum_{r=1}^R a_r \otimes b_r \otimes c_r.

→ Used in multimodal, spatiotemporal, and recommendation tasks.

3️⃣ Summary Table

Type	Form	Applications
Structural (block)	$A = \begin{bmatrix}A_{11}&A_{12}\A_{21}&A_{22}\end{bmatrix}$	Schur complement, sparse systems
Hierarchical	H-matrix	Fast inversion (PDEs)
LU / QR / Cholesky	$A=LU,\ A=QR,\ A=LL^\top$	Linear systems, least squares
Eigen / SVD	$A=PDP^{-1},\ A=U\Sigma V^\top$	PCA, spectral analysis
Low-rank / NMF	$A\approx UV^\top,\ A\approx WH$	Recommender systems, topic modeling
Block factorization	Block LU / QR	Structured numerical algorithms
Tensor decomposition	Tucker / CP	Multimodal and temporal data

4️⃣ Key Insights

Structural partitioning exploits sparsity and modularity.

Algebraic decomposition provides interpretable transformations and stability.

Modern algorithms often combine both:

block-sparse structures + low-rank approximations.

Would you like me to generate a Notion-friendly diagram (ASCII or Mermaid) showing how matrix partitioning and decomposition relate (e.g., a tree or flowchart)?

这两个问题同样是线性代数在面试中考察“矩阵性质直觉”的经典题目。

下面我为你逐一拆解：

问题一：的矩阵，如果秩 (Rank) 不到，有什么性质？

设矩阵为 $A$，已知。

这个条件最核心的推论在于 “零空间 (Null Space/Kernel)” 的维度非常大。

1. 零空间维度超过一半 (Rank-Nullity Theorem)

根据秩-零化度定理（Rank-Nullity Theorem）：

因为，所以：

结论：矩阵 $A$ 的零空间（也就是满足 $Ax=0$ 的解空间）的维度比它的列空间（值域）还要大。这意味着 $A$ 把空间中“超过一半”的向量都压缩成了零。

2. 两个这样的矩阵相乘必然有非零零空间交集

这是一个进阶性质，常考。如果你有两个 $n \times n$ 矩阵 $A$ and $B$，且它们的秩都小于 $n/2$。

那么它们的零空间必然相交（除了零向量以外）。

理由：两个子空间的维度之和。在 $n$ 维空间中，维度之和大于 $n$ 的两个子空间必然有非平凡的交集。

3. 特征值性质

0 是重数很高的特征值：$A$ 至少有 $n - \text{rank}(A)$ 个特征值为 0（代数重数至少这么大，几何重数即为零空间维数）。

也就是说，矩阵 $A$ 的特征值中，超过一半都是 0。

4. 幂零性 (可能的陷阱)

虽然零空间很大，但这不代表 $A^2 = 0$ 或者 $A$ 是幂零矩阵。

反例： $A = \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}$ (3x3矩阵)。

秩为 2（不满足 < 1.5），不好意思换个例子。

$A = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}$ (3x3矩阵)。

秩为 1 (< 1.5)。$A^2 = A \neq 0$。

所以，秩很小只能说明它是一个很“扁”的变换，但不一定是幂零的。

问题二：如果 $M M^T = M$，矩阵 $M$ 的秩有什么特性？

这个方程信息量非常大，它直接锁定了 $M$ 的几何形态。

推导步骤：

1. 证明 $M$ 是对称矩阵 (Symmetric)

观察方程右边是 $M$，左边是 $M M^T$。

我们知道对于任何实矩阵，$A A^T$ 永远是对称矩阵（因为 $(A A^T)^T = (A^T)^T A^T = A A^T$）。

既然 $M$ 等于一个对称矩阵，那么 $M$ 必须是对称的。

$$M = M^T$$

2. 证明 $M$ 是幂等矩阵 (Idempotent)

既然 $M = M^T$，我们可以把原方程 $M M^T = M$ 中的 $M^T$ 替换为 $M$。

$$M \cdot M = M \implies M^2 = M$$

3. 几何意义：正交投影矩阵 (Orthogonal Projection Matrix)

同时满足对称性 ($M=M^T$) 和幂等性 ($M^2=M$) 的实矩阵，在线性代数中被称为正交投影矩阵。

它表示将向量垂直投影到某个子空间上。

4. 秩的特性

对于投影矩阵（以及所有的幂等矩阵），有一个非常漂亮的性质：

秩 (Rank) 等于迹 (Trace)。

$$\text{rank}(M) = \text{trace}(M)$$

为什么？（证明微操）

因为 $M^2 = M$， $M$ 的特征值 $\lambda$ 必须满足 $\lambda^2 = \lambda$。

方程 $\lambda^2 - \lambda = 0$ 的根只有 0 和 1。

对于对称矩阵，它是可以对角化的。所以 $M$ 相似于一个对角矩阵 $D$，对角线上只有 0 和 1。

秩 (Rank) 等于非零特征值的个数（也就是 1 的个数）。

迹 (Trace) 等于所有特征值之和（也就是 1 的个数，因为 0 不贡献）。

所以，$\text{rank}(M) = \text{trace}(M)$。

总结回答：

如果 $M M^T = M$，则 $M$ 是一个正交投影矩阵。它的秩具有以下特性：

$\text{rank}(M) = \text{trace}(M)$（秩等于主对角线元素之和）。

$\text{rank}(M)$ 等于特征值为 1 的个数（其余特征值全为 0）。

$M$ 是半正定的。

如果，这种矩阵在数学上被称为 对合矩阵 (Involutory Matrix)。

它有很多非常好的性质，面试中如果问到，可以从代数性质、特征值、几何意义和投影关系这四个维度来回答：

1. 它是自身的逆矩阵 (Self-Inverse)

这是最直接的性质。

由 $A \cdot A = I$ 可知：

$$A^{-1} = A$$

这也意味着 $A$ 一定是可逆的（非奇异的）。

2. 特征值 (Eigenvalues) 只能是 1 或 -1

设是 $A$ 的特征值，对应的特征向量为 $x$。

两边同时左乘 $A$：

因为 $A^2 = I$，所以：

因为 $x \neq 0$，所以 $\lambda^2 = 1$。

结论：$\lambda \in \{1, -1\}$。

行列式：$\det(A) = \pm 1$。

3. 一定可对角化 (Diagonalizable)

这是面试中的高分回答点。

矩阵 $A$ 的零化多项式是 $x^2 - 1 = (x-1)(x+1)$。

因为这个多项式的根（$1$ 和 $-1$）是互不相同的单根 (Distinct roots)。

根据线性代数定理：如果一个矩阵的最小多项式没有重根，那么它一定可以对角化。

推论：$A$ 相似于一个对角矩阵 $D$，对角线上只有 $1$ 和 $-1$。

$$A = S \begin{pmatrix} I_k & 0 \\ 0 & -I_{n-k} \end{pmatrix} S^{-1}$$

4. 几何意义：反射 (Reflection)

如果 $A \neq I$ 且 $A \neq -I$，那么 $A$ 代表某种坐标变换下的**反射（镜像）**操作。

它把空间分成了两个互补的子空间：

$V_1$（特征值1对应的空间）：在这个空间上的向量保持不变 ($Ax=x$)。
$V_{-1}$（特征值-1对应的空间）：在这个空间上的向量方向反转 ($Ax=-x$)。

$A$ 的作用就是沿着 $V_1$ 将向量“反射”过 $V_{-1}$。

5. 与投影矩阵 (Projection) 的紧密关系

这对解决上一题提到的 $M^2=M$ 很有帮助。对合矩阵和投影矩阵是一一对应的。

如果我们令：

$$P = \frac{I + A}{2}$$

那么 $P$ 就是一个投影矩阵（$P^2 = P$）。

证明：

$$P^2 = (\frac{I+A}{2})^2 = \frac{I + 2A + A^2}{4} = \frac{I + 2A + I}{4} = \frac{2I + 2A}{4} = \frac{I+A}{2} = P$$

反之亦然：

如果 $P$ 是投影矩阵，那么 $A = 2P - I$ 就是对合矩阵。

直观理解：

$A = 2P - I$ 意味着把投影的部分放大2倍再减去原向量，或者理解为：保留分量（$P$的部分）不变，把剩下的分量（$I-P$的部分）取反。这就是反射。

线性代数

🧩 Matrix Partitioning and Decomposition: Complete Summary

1️⃣ Structural Partitioning (Block Matrices)

Common Forms

Example: Schur Complement

2️⃣ Algebraic Matrix Decomposition

(1) LU Decomposition

(2) QR Decomposition

(3) Cholesky Decomposition

(4) Eigenvalue Decomposition

(5) Singular Value Decomposition (SVD)

(6) Polar Decomposition

(7) Low-Rank Factorization

(8) Nonnegative Matrix Factorization (NMF)

(9) Block LU / QR Factorization

$$ A = \begin{bmatrix} A_{11} & A_{12} \ A_{21} & A_{22} \end{bmatrix}

(10) Tensor and Multiway Decompositions

3️⃣ Summary Table

4️⃣ Key Insights

问题一： 的矩阵，如果秩 (Rank) 不到 ，有什么性质？

问题二：如果 $M M^T = M$，矩阵 $M$ 的秩有什么特性？

1. 它是自身的逆矩阵 (Self-Inverse)

2. 特征值 (Eigenvalues) 只能是 1 或 -1

3. 一定可对角化 (Diagonalizable)

4. 几何意义：反射 (Reflection)

5. 与投影矩阵 (Projection) 的紧密关系

问题一：的矩阵，如果秩 (Rank) 不到，有什么性质？