关于code-based cryptography的论文笔记

paper reading

1. Code-based Cryptography: Lecture Notes

1.1 Introduction

A linear code C of length n and dimension k over Fq, i.e., an [n,k]q-code, is a subspace of Fqn of dimension k.

To represent an [n,k]q-code, we may take any basis of it, namely a set of k linearly independent vectors. We use them as rows to form the generator matrix GFqk×n, then C can be written as C={c=mG:mFqk}.

Conversely, any matrix GFqk×n with rank k defines an [n,k]q-code.

A subspace of a vector space is a nonempty subset that satisfies the requirements for a vector space: Linear combinations stay in the subspace. (i) If we add any vector x and y in the subspace, then x+y is in the subspace.

(ii) If we multiply any vector x in the subspace by any scalar c, then cx is in the subspace.

The dual code of an [n,k]q-code C is defined as C={cFqn:cC,c,c=0}. Since c=mG, C is the nullspace of G, and is an [n,nk]q-code.

Its generator matrix is the parity-check matrix of C, i.e., HFq(nk)×n , whose rows are basis of the nullspace.

The linear code C can also be written as C={cFqn:Hc=0}.

Conversely, any matrix HFq(nk)×n of rank nk defines an [n,k]q-code.

Solutions to Gx=0 form the nullspace of G.

Elimination simplifies a system of linear equations without changing the solutions, i.e., if G is row reduced to G=[Ik|A], then Gx=0 and Gx=0 give the same solutions x. Therefore, the nullspace of G is the same as the nullspace of G.

Since G has rank k, there are k pivot variables and nk free variables, so the nullspace has dimension nk. These nk vectors are special solutions to Gx=0 and form the basis of the nullspace.

Left multiplication by an invertible matrix does not change the code (SG or SH), it just gives another basis.

If RREF(G)=[Ik|A], then H=[A|Ink], and GH=0.

Let HFq(nk)×n, the syndrome of y with respect to H is defined as Hy. Any element of Fqnk is called a syndrome.

Syndrome Decoding: Given HFq(nk)×n of rank nk, sFq(nk), t{0,,n}, find e with weight t such that He=s.

Any solver of syndrome decoding can be turned in polynomial time into an algorithm solving Noisy Codeword Decoding:

Given GFqk×n of rank k, yFqn, t{0,,n}, find e with weight t such that y=mG+e for some mFqk.

Hardness of Decoding Problem: From telecommunication side, the decoding problem should be easy to solve. From security side, the decoding problem should be hard to solve. All the subtlety lies in the inputs of the problem. There exist some codes and decoding distances where the problem is easy to solve. The existance of codes that are easy to decode is at the foundation of code-based cryptography.

  • Decoding problem is hard in the worst case (NP-complete), which means that we cannot hope to solve the decoding problem in polynomial time for all inputs.

  • Decoding problem is hard on average, which means that for well chose t, almost all codes are intractable.

The NP–completeness of a problem for a cryptographic use is a nice property but it is not the panacea to ensure its hardness, since it is quite possible that the security of a cryptosystem relies on the difficulty to solve an NP–complete problem but at the same time breaking the scheme amounts to solve the problem on a subset of inputs for which make the problem easy to solve.

Instead of considering the hardness based on its inputs, we focus on the algorithmic hardness of this decoding problem. Assume a probabilistic algorithm A that solves the decoding problem with success probability ε, and a single run of this algorithm costs a time T. Then the algorithm solves the decoding problem in average time T/ε (complexity).

ε=P(A(H,s=xH,w)=e s.t. |e|=t and eH=s)

Decoding Problem is a problem parametrized by n and two functions of n: R=k/n and τ=t/n, i.e., DP(n,q,R,τ). It can be solved in polynomial time as soon as τ[(1R)q1q,R+(1R)q1q]. For other τ, the best algorithms are exponential in τn. (proof in 1.3)

1.2 Random Codes

Random codes' parity-check matrix or generator matrix is drawn unifromly at random.

In cryptography, negligible functions are often used to describe the probability of an event occurring with negligible probability. We say a function f(n) is negligible, i.e., f(n)negl(n), if for all polynomial p(n), |f(n)|<1/|p(n)| for all sufficiently large n.

The introduction of the function p(n) in the definition of negligible functions serves to formalize the idea that the function being considered becomes increasingly smaller as the input size n grows. It helps establish a quantitative measure for the rate at which the function approaches zero.

The use of p(n) allows for generality in expressing the notion of negligibility. Different contexts and applications may require different rates of decay for a function to be considered negligible. The choice of p(n) provides a flexible framework for capturing these variations.

As all the parameters are functions of n, our asymptotic results will always hold for n+.

The q-ary entropy is defined as

hq:x[0,1]xlogq(xq1)(1x)logq(1x)

It is equal to the entropy of a random variable e over Fq distributed like the error for a q-ary symmetric channel of crossprobability x.

It can be verified that hq is an increasing function over [0,q1q] and a decreasing function over [q1q,1].

Let St denote the vectors that have Hamming distance t, then |St|=(nt)(q1)t. We also have

qn(hq(τ)+O(logq(n)n))(nt)(q1)tqnhq(τ)

Asymptotically, we have

1nlogq((nt)(q1)t)=hq(τ)+O(logqnn)

Probability PX(A) represents the probability that random variable X is equal to event A. We choose a random code by picking uniformly at random a parity-check matrix HFq(nk)×n.

Defining DP with generator or parity-check matrix is just a matter of personal taste, it does not change the average hardness.

Lemma 2.2.3. Given sFqnk and yFqn such that y0, we have for H being uniformly distributed at random in Fqnk,

PH(yH=s)=1qnk

This lemma gives the probability over all random codes that a fixed non-zero word y reaches some syndrome s according to a certain code.

Proof. We assume y1=1, then we are looking for the probability of the following event

i{1,,nk},hi,1=sij=2nhi,jyj

Since hi,1=sij=2nhi,jyj only have q possible values, and hi,j's are independent and equidistributed, the above nk equations will be independently true with probability 1/q. This concludes the proof.

Question: given a random code C and a fixed vector yFqn, how many codewords cC do we expect to be at Hamming distance t from y, i.e., y=c+e? Or equivalently, given a parity-check matrix of our random code C and a fixed syndrome s, how many error vectors e of Hamming weight t do we expect to reach the syndrome s according to H, i.e., eH=s?

Let Nt(C,s)=|{eSt:eH=s}|, which is a random variable that gives the number of solutions of DP with input (H,s), where sFqnk is fixed. On the other hand, Nt(C,0) gives the number of codewords cC of Hamming weight t.

This number is crucial to know, the reason is as follows.

A trivial solution to solve DP is to pick a ranfom error vector eSt,

  • If there is only one solution e, then our success probability is 1(nt)(q1)t.

  • If there are N solutions e, then the success probability is N(nt)(q1)t.

It is therefore crucial to know the value of N to be able to predict the running time of our algorithm.

Solution:

The expectation of Nt(C,s) is (nt)(q1)t/qnk. Then

1nlogq((nt)(q1)t/qnk)=hq(τ)(1R)+O(logqnn)

where 0τ,R1. Since hq(0)=0 and hq(1)=logq(q1), Nt(C,s) is expected to be exponentially small at one value τ and potentially a second value τ+ in the case that (1R)logq(q1). The analytic expression of τ and τ+ is

τ=gq(1R)τ+=gq+(1R) when R1logq(q1)

where gq (resp. gq+) denotes the inverse of hq over [0,q1q] (resp. [q1q,1]).

Quantaties τ and τ+ give the boundaries between which we expect Nt(C,s) to be exponentially large.

The average number of solutions of DP(n,q,R,τ) over the input distribution is given by

1+(nt)(q1)t1qnk

截屏2023-11-19 16.52.23

The expected minimum distance of a code is given by the so-called Gilbert-Varshamov distance, i.e., tGV(q,n,k), which is defined as the largest integer such that

l=0tGV(q,n,k)(nl)(q1)lqnk

Proof. l=0tGV(q,n,k)(nl)(q1)l/qnk is the expected number of codewords in a sphere centered around a codeword, with a radius of tGV.

The minimum Hamming weight of a code is defined as the smallest Hamming weight of all non-zero codewords.

It can be verified that

tGV(q,n,k)n=τ+o(1)

1.3 Information Set Decoding Algorithms

Prange's approach: using linearity.

We are looking for some codeword cC at distance t from y. Since C is a k-dimensional subspace in Fqn, there exists some set of positions of size k, called an information set, which uniquely determines every codewords.

This idea comes from that in the nullspace (H) there are nk free variables, and their positions are fixed.

Let C be an [n,k]-code, and J{1,,n} be of size k, then

J is an information set for CG generator matrix of C,GJ is invertible.H parity-check matrix of C,HJ¯ is invertible.

J is an information set means that positions in J are pivot variables, and those in J¯ are free variables.

Prange's idea is to pick some ranfom information set, and compute the unique codeword based on those coordinates, and then check whether the calculated codeword has distance t with y.

The algorithm:

  1. Pick the information set: Let J{1,,n} be a random set of size k. If HJ¯Fq(nk)×(nk) is not full-rank, pick another set J.

  2. Linear Algebra. Calculate HJ¯1.

  3. Test step: Pick eJ=xFqk according to distribution Dt, and let eFqn be such that

eJ¯=(sxHJ)HJ¯

If |e|t go back to step 1, otherwise it is a solution.

Picking x in step 3 is like fixing k unknowns to the value that we want. Suppose we want to find a solution e of small Hamming weight, fixing x to zero vector would be helpful. If we want to find e with large weight, fixing x to non-zero vector will improve the success probability. Therefore, we define the distribution Dt.

Dt is defined as

  • If t<q1q(nk), Dt only outputs 0Fqk.

  • If t(q1q(nk),k+q1q(nk)), Dt outputs uniform vectors of weight tq1q(nk).

  • If t>k+q1q(nk), Dt outputs uniform vectors of weight k.

Rough Analysis:

Assume s is uniformly distributed over Fqnk, which is equivalent to assuming that t/n belongs to [τ,τ+], the expected Hamming weight of e is given by

E(|e|)=|x|+q1q(nk)

where q1q(nk) is the expected value of distribution B(nk,q1q).

Since xFqk, we can easily reach any weight in

(q1q(nk),k+q1q(nk))

by carefully choosing |x|{0,,k}. This explains why DP is easy to solve in this interval.

Precise Analysis:

....

Dumer takes advantage of the birthday paradox to improve the search.

Lemma 3.2.1. Let L1, L2 be two lists of L random and independent elements in Fqr, we have