Matrices

Matrix operations
Properties of matrix operations
Properties that do not hold
Transpose, trace, inverse
The theorem about transposes
The theorem about traces
The theorem about inverses

Matrices and matrix operations

A matrix is a rectangular array of numbers. The numbers in the array are called entries.

Examples. Here are three matrices:

[ 2	3	4 ]
[ 1	0	5 ]

[ 0

0 ]

[ 2 ]

[ 3 ]

The size of a matrix is the pair of numbers: the number of rows and the number of columns. The matrices above have sizes (2,3), (1,4), (2,1), respectively.

A matrix with one row is called a row-vector. A matrix with one column is called a column-vector. In the example above the second matrix is a row-vector, the third one is a column-vector. The entry of a matrix A which stays in the i-th row and j-th column will be usually denoted by A_ij or A(i,j).

A matrix with n rows and n columns is called a square matrix of size n.

Discussing matrices, we shall call numbers scalars. In some cases one can view scalars as 1x1-matrices.

Matrices were introduced first in the middle of 19-th century by W. Hamilton and A. Cayley. Following Cayley, we are going to describe an arithmetic where the role of numbers is played by matrices.

Motivation.

In order to solve an equation

ax=b

with a not equal 0 we just divide b by a and get x. We want to solve systems of linear equations in a similar manner. Instead of the scalar a we shall have a matrix of coefficients of the system of equations, that is the array of the coefficients of the unknowns (i.e. the augmented matrix without the last column). Instead of x we shall have a vector of unknowns and instead of b we shall have the vector of right sides of the system.

In order to do that we must learn how to multiply and divide matrices.

But first we need to learn when two matrices are equal, how to add two matrices and how to multiply a matrix by a scalar.

Two matrices are called equal if they have the same size and their corresponding entries are equal.
The sum of two matrices A and B of the same size (m,n) is the matrix C of size (m,n) such that C(i,j)=A(i,j)+B(i,j) for every i and j.

Example.

[ 2	3	4 ]
[ 1	2	4 ]
[ 0	0	1 ]

[ 1	2	1 ]
[ 2	2	3 ]
[ 0	0	2 ]

[ 3	5	5 ]
[ 3	4	7 ]
[ 0	0	3 ]

In order to multiply a matrix by a scalar, one has to multiply all entries of the matrix by this scalar.

Example:

3 *

[ 2	3	1	2 ]
[ 3	1	0	1 ]
[ 1	2	2	3 ]

[ 6	9	3	6 ]
[ 9	3	0	3 ]
[ 3	6	6	9 ]

The product of a row-vector v of size (1, n) and a column vector u of size (n,1) is the sum of products of corresponding entries: uv=u(1)v(1)+u(2)v(2)+...+u(n)v(n)

Example:

                        [ 3 ]
            [1, 2, 3] * [ 4 ] =1*3 + 2*4 + 3*1 = 3+8+3=14
                        [ 1 ]

Example:

                        [ x ]
            [2, 4, 3] * [ y ] = 2x + 4y + 3z
                        [ z ]

As you see, we can represent the left side of a linear equation as a product of two matrices. The product of arbitrary two matrices which we shall define next will allow us to represent the left side of any system of equations as a product of two matrices.

Let A be a matrix of size (m,n), let B be a matrix of size (n,k) (that is the number of columns in A is equal to the number of rows in B. We can subdivide A into a column of m row-vectors of size (1,n). We can also subdivide B into a row of k column-vectors of size (n,1):

          r₁          r₂       A =...      B=[c₁ c₂ ... c_k]
          r_m

Then the product of A and B is the matrix C of size (m,k) such that

C(i,j)=r_ic_j

(C(i,j) is the product of the row-vector r_i and the column-vector c_j).

Matrices A and B such that the number of columns of A is not equal to the number of rows of B cannot be multiplied.

Example:

[ 2	3	4 ]
[ 1	2	3 ]

[ 1	2 ]
[ 3	4 ]
[ 5	6 ]

[ 31	40 ]
[ 22	28 ]

Example:

[ 2	3	4 ]
[ 1	2	3 ]

[ x ]

[ y ]

[ z ]

[ 2x + 3y + 4z ]

[ x + 2y + 3z ]

You see: we can represent the left part of a system of linear equations as a product of a matrix and a column-vector. The whole system of linear equations can thus be written in the following form:

A*v = b

where A is the matrix of coefficients of the system -- the array of coefficients of the left side (do not mix with the augmented matrix), v is the column-vector of unknowns, b is the column vector of the right sides (constants).

Properties of matrix operations.

The addition is commutative and associative: A+B=B+A, A+(B+C)=(A+B)+C
The product is associative: A(BC)=(AB)C
The product is distributive with respect to the addition: A(B+C)=AB+AC, (A+B)C=AC+BC
The multiplication by scalar is distributive with respect to the addition of matrices: a(B+C)=aB+aC
The product by a scalar is distributive with respect to the addition of scalars: (a+b)C=aC+bC
a(bC)=(ab)C; a(BC)=(aB)C; (Ab)C=A(bC)
1 A=A; 0 A=0 (here the 0 on the left is the number zero, the 0 on the right is the zero matrix of the same size as A, that is a matrix all of whose entries are 0
0+A=A+0=A (here 0 is the zero matrix of the same size as A).
0A=0 (these two zeroes are matrices of appropriate sizes).
Let I_n denote the identity matrix of order n that is a square matrix of order n with 1s on the main diagonal and zeroes everywhere else. Then for every m by n matrix A the product of A and I_n is A and the product of I_m and A is A.

The following properties of matrix operations do not hold:

For every square matrices A and B we have AB=BA:

Example:

A =

[ 1	2 ]
[ 0	1 ]

B =

[ 1	0 ]
[ 2	1 ]

Indeed,

AB =

[ 5	2 ]
[ 2	1 ]

BA =

[ 1	2 ]
[ 2	5 ]

For every matrices A, B, C, if AB=AC and A is not a zero matrix then B=C.

Example:

A =

[ 1	0 ]
[ 0	0 ]

B =

[ 0	0 ]
[ 1	0 ]

C =

[ 0	0 ]
[ 0	1 ]

Then AB=AC=0 but B and C are not equal. Notice that this example shows also that a product of two non-zero matrices can be zero.

Transpose, trace, inverse

There are three other important operations on matrices.

If A is any m by n matrix then the transpose of A, denoted by A^T, is defined to be the n by m matrix obtained by interchanging the rows and columns of A, that is the first column of A^T is the first row of A, the second column of A^T is the second row of A, etc.

Example.
The transpose of

[ 2	3 ]
[ 4	5 ]
[ 6	7 ]

[ 2	4	6 ]
[ 3	5	7 ]

If A is a square matrix of size n then the sum of the entries on the main diagonal of A is called the trace of A and is denoted by tr(A).

Example.
The trace of the matrix

[ 1	2	3 ]
[ 4	5	6 ]
[ 7	8	9 ]

is equal to 1+5+9=15

A square matrix of size n A is called invertible if there exists a square matrix B of the same size such that AB = BA = I_n, the identity matrix of size n. In this case B is called the inverse of A.

Examples. 1. The matrix I_n is invertible. The inverse matrix is I_n: I_n times I_n is I_n because I_n is the identity matrix.

2. The matrix A

[ 1	3 ]
[ 0	1 ]

is invertible. Indeed the following matrix B:

[ 1	-3 ]
[ 0	1 ]

is the inverse of A since A*B=I₂=B*A.

3. The zero matrix O is not invertible. Indeed, if O*B=I_n then O=O*B=I_n which is impossible.

4. A matrix A with a zero row cannot be invertible because in this case for every matrix B the product A*B will have a zero row but I_n does not have zero rows.

5. The following matrix A:

[ 1	2	3 ]
[ 3	4	5 ]
[ 4	6	8 ]

is not invertible. Indeed, suppose that there exists a matrix B:

[ a	b	c ]
[ d	e	f ]
[ g	h	i ]

such that A*B=I₃. The corresponding entries of A*B and I₃ must be equal, so we get the following system of nine linear equations with nine unknowns:

a+2d+3g = 1	;	the (1,1)-entry
b+2e+3h = 0	;	the (1,2)-entry
c+2f+3i = 0	;	the (1,3)-entry
3a+4d+5g = 0
3b+4e+5h = 1
3c+4f+5i = 0
4a+6d+8g = 0
4b+6e+8h = 0
4c+6f+8i = 1

This system does not have a solution which can be shown with the help of Maple.

Now we are going to prove some theorems about transposes, traces and inverses.

Theorem.

(A^T)^T=A, that is the transpose of the transpose of A is A (the operation of taking the transpose is an involution).
(A+B)^T=A^T+B^T, the transpose of a sum is the sum of transposes.
(kA)^T=kA^T.
(AB)^T=B^TA^T, the transpose of a product is the product of the transposes in the reverse order.

Theorem.

tr(A+B)=tr(A)+tr(B)
tr(kA)=k tr(A)
tr(A^T)=tr(A)
tr(AB)=tr(BA)

Theorem.

If B and C are inverses of A then B=C. Thus we can speak about the inverse of a matrix A, A^-1.
If A is invertible and k is a non-zero scalar then kA is invertible and (kA)^-1=1/k A^-1.
If A and B are invertible then AB is invertible and

(AB)^-1=B^-1 A^-1

that is the inverse of the product is the product of inverses in the opposite order. In particular

(Aⁿ)^-1=(A^-1)ⁿ.

(A^T)^-1=(A^-1)^T, the inverse of the transpose is the transpose of the inverse.
If A is invertible then (A^-1)^-1=A.

The proofs of 2, 4, 5 are left as exercises.

Notice that using inverses we can solve some systems of linear equations just in the same way we solve the equation ax=b where a and b are numbers. Suppose that we have a system of linear equations with n equations and n unknowns. Then as we know, this system can be represented in the form Av=b where A is the matrix of the system, v is the column-vector of unknowns, b is the column-vector of the right sides of the equations. The matrix A is a square matrix. Suppose that it has an inverse A^-1. Then we can multiply both sides of the equation A v = b by A^-1 on the left. Using the associativity, the fact that A^-1 A=I and the fact that Iv=v, we get: v=A^-1b. This is the solution of our system.