1. Code-based Cryptography: Lecture Notes
1.1 Introduction
A linear code
To represent an
Conversely, any matrix
A subspace of a vector space is a nonempty subset that satisfies the requirements for a vector space: Linear combinations stay in the subspace. (i) If we add any vector
and in the subspace, then is in the subspace. (ii) If we multiply any vector
in the subspace by any scalar , then is in the subspace.
The dual code of an
Its generator matrix is the parity-check matrix of
The linear code
Conversely, any matrix
Solutions to
form the nullspace of . Elimination simplifies a system of linear equations without changing the solutions, i.e., if
is row reduced to , then and give the same solutions . Therefore, the nullspace of is the same as the nullspace of . Since
has rank , there are pivot variables and free variables, so the nullspace has dimension . These vectors are special solutions to and form the basis of the nullspace.
Left multiplication by an invertible matrix does not change the code (
If
Let
Syndrome Decoding: Given
Any solver of syndrome decoding can be turned in polynomial time into an algorithm solving Noisy Codeword Decoding:
Given
of rank , , , find with weight such that for some .
Hardness of Decoding Problem: From telecommunication side, the decoding problem should be easy to solve. From security side, the decoding problem should be hard to solve. All the subtlety lies in the inputs of the problem. There exist some codes and decoding distances where the problem is easy to solve. The existance of codes that are easy to decode is at the foundation of code-based cryptography.
Decoding problem is hard in the worst case (NP-complete), which means that we cannot hope to solve the decoding problem in polynomial time for all inputs.
Decoding problem is hard on average, which means that for well chose
, almost all codes are intractable.
The NP–completeness of a problem for a cryptographic use is a nice property but it is not the panacea to ensure its hardness, since it is quite possible that the security of a cryptosystem relies on the difficulty to solve an NP–complete problem but at the same time breaking the scheme amounts to solve the problem on a subset of inputs for which make the problem easy to solve.
Instead of considering the hardness based on its inputs, we focus on the algorithmic hardness of this decoding problem. Assume a probabilistic algorithm
Decoding Problem is a problem parametrized by
1.2 Random Codes
Random codes' parity-check matrix or generator matrix is drawn unifromly at random.
In cryptography, negligible functions are often used to describe the probability of an event occurring with negligible probability. We say a function
The introduction of the function
in the definition of negligible functions serves to formalize the idea that the function being considered becomes increasingly smaller as the input size grows. It helps establish a quantitative measure for the rate at which the function approaches zero. The use of
allows for generality in expressing the notion of negligibility. Different contexts and applications may require different rates of decay for a function to be considered negligible. The choice of provides a flexible framework for capturing these variations.
As all the parameters are functions of
The
It is equal to the entropy of a random variable
It can be verified that
Let
Asymptotically, we have
Probability
Defining
with generator or parity-check matrix is just a matter of personal taste, it does not change the average hardness.
Lemma 2.2.3. Given
This lemma gives the probability over all random codes that a fixed non-zero word
Proof. We assume
Since
Question: given a random code
Let
This number is crucial to know, the reason is as follows.
A trivial solution to solve
If there is only one solution
, then our success probability is .If there are
solutions , then the success probability is .
It is therefore crucial to know the value of
Solution:
The expectation of
where
where
Quantaties
The average number of solutions of
The expected minimum distance of a code is given by the so-called Gilbert-Varshamov distance, i.e.,
Proof.
The minimum Hamming weight of a code is defined as the smallest Hamming weight of all non-zero codewords.
It can be verified that
1.3 Information Set Decoding Algorithms
Prange's approach: using linearity.
We are looking for some codeword
This idea comes from that in the nullspace (
) there are free variables, and their positions are fixed.
Let
is an information set means that positions in are pivot variables, and those in are free variables.
Prange's idea is to pick some ranfom information set, and compute the unique codeword based on those coordinates, and then check whether the calculated codeword has distance
The algorithm:
Pick the information set: Let
be a random set of size . If is not full-rank, pick another set .Linear Algebra. Calculate
.Test step: Pick
according to distribution , and let be such that
If
Picking
in step is like fixing unknowns to the value that we want. Suppose we want to find a solution of small Hamming weight, fixing to zero vector would be helpful. If we want to find with large weight, fixing to non-zero vector will improve the success probability. Therefore, we define the distribution .
If
, only outputs .If
, outputs uniform vectors of weight .If
, outputs uniform vectors of weight .
Rough Analysis:
Assume
where
Since
by carefully choosing
Precise Analysis:
....
Dumer's approach: collision search
Dumer takes advantage of the birthday paradox to improve the search.
Lemma 3.2.1. Let