This code is part of the family of 2-dimensional codes, it's in fact a code of several lines which can encode
up to 2700 bytes what explain its name "Portable Document File". The encoding is done in two
stages : first the datas are converted into "codeword" (High level encoding) then those are
converted to bars and spaces patterns. (Low level encoding) In addition, a multi-level error correction system
is included, it allows to reconstruct badly printed, erased, fuzzy or torn off datas. In the
remainder of this presentation, the expression "codeword" will be shortened to CW and
"Reed-Solomon code" to RS.
The general structure.
- The width of the finest bar is called the module.
- Thereafter a bar module is symbolized by "1" and a space module by "0".
- The code consists of 3 to 90 rows.
- A line is composed of 1 to 30 columns of datas and its width goes from 90 to 583 modules with the margins.
-
Maximum number of CW per bar codes : 928 including 925 for datas. (1 for the length
descriptor and 2 at least for the correction of errors.)
- If necessary a mechanism called "Macro PDF417" allows to distribute more datas on several bar codes.
- There are 929 CW including 900 for the datas, they are numbered from 0 to 928.
- The errors correction levels goes from 0 to 8. The correction comprises 2 (on level 0) to 512 (on level 8) RS.
-
The row consists of a start character, a left side CW, 1 to 30 CW of datas, a right side CW
and a stop character. There must be a white margin of at least 2 modules on each side.
-
Filling MCs (We will use for example "900") can be inserted between the data MCs and the
correction RS, these must be at the end.
-
The first MC indicates the total number of MCs of the code including
the datas, the filling MCs and itself but excluding the correction RSs.
-
Example of a code with 14 MC of data, a 15th MC to indicate
the total, one filling MC and 4 correction RS. (Level 1)
D
é
b
u
t
|
G1 |
D15 |
D14 |
Dr1 |
F
i
n
|
G2 |
D13 |
D12 |
Dr2 |
G3 |
D11 |
D10 |
Dr3 |
G4 |
D9 |
D8 |
Dr4 |
G5 |
D7 |
D6 |
Dr5 |
G6 |
D5 |
D4 |
Dr6 |
G7 |
D3 |
D2 |
Dr7 |
G8 |
D1 |
D0 |
Dr8 |
G9 |
C3 |
C2 |
Dr9 |
G10 |
C1 |
C0 |
Dr10 |
- D15 = length descriptor (16 in this sample)
- D0 = padding
- D1 to D14 = datas
- G1 to G10 = left side CW
- Dr1 to Dr10 = right side CW
- C0 to C3 = error correction, level 1
Low level encoding.
-
Each CW is made of 17 modules, containing 4 bars and 4 spaces (Code name comes from there !) and it
start by a bar. Bars and spaces width is 1 to 6 modules. (Except for start and stop characters)
Sample :
soit la formulation suivante : 111 0 1 0 1 000 111 0000
- Start character is : 11111111 0 1 0 1 0 1 000
- Stop character is : 1111111 0 1 000 1 0 1 00 1 (Here there is a 5th bar, thus 18 modules.)
- There are 3 distinct tables for encoding the 929 CW.
- The 3 tables giving the patterns of the 929 MC begin like this :
First table |
Second table |
Third table |
Exhaustive tables file :
|
111 0 1 0 1 0 111 000000 |
11111 0 1 0 1 0 11 00000 |
11 0 1 0 1 0 11111 00000 |
1111 0 1 0 1 0 1111 0000 |
111111 0 1 0 1 0 111 000 |
111 0 1 0 1 0 111111 000 |
11111 0 1 0 1 0 11111 00 |
1111 0 1 0 1 00 1 000000 |
1 0 1 0 1 00 1111 000000 |
111 0 1 0 1 00 111 00000 |
11111 0 1 0 1 00 11 0000 |
11 0 1 0 1 00 11111 0000 |
..... |
..... |
..... |
-
Each row use only one encoding table, this table will be used again 3 rows further. Sample : Row 1 --> table 1,
row 2 --> table 2, row 3 --> table 3, row 4 --> table 1, .....etc. We have the formula : table number
= ( ( row number MOD 3 ) * 3 )
High level encoding.
Errors detection and correction.
- Error detection system use 2 CWs and correction system use some between 2 and 510
-
The correction system is based on "
Reed Solomon codes
" which enjoy the math students and terrify others ...
-
Number of CWs to add depend of the correction level used, because of the limit to 928 CWs in a bar code
(1 of which for the sum of CWs) the maximum level is limited by the number of data CWs. The number of CWs
that the error correction algorithm can reconstitute is equal to the number of CWs required by the
correction system : this is not magic but it's mathematical !
Level
|
Number of CWs required by the correction system,
2 of which for the detection ( 2level + 1)
|
Maximum number of data CWs
|
0 |
2 |
925 |
1 |
4 |
923 |
2 |
8 |
919 |
3 |
16 |
911 |
4 |
32 |
895 |
5 |
64 |
863 |
6 |
128 |
799 |
7 |
256 |
671 |
8 |
512 |
415 |
- The advisable correction level depend on the number of data CWs :
Number of data CWs |
Advisable level |
1 Ã 40 |
2 |
41 Ã 160 |
3 |
161 Ã 320 |
4 |
321 Ã 863 |
5 |
-
Reed Solomon codes are based on a polynomial equation where x power is 2s+1 with s =
error correction level used. For sample with the level 1 we use an equation like this :
a + bx + cx2 + dx3 + x4 The numbers a, b, c and d are the factors of
the polynomial equation.
-
For information the equation is : (x - 3)(x - 32)(x - 33).....(x -
3k) ( with k = 2s+1) We develop the polynomial equation and we apply a MOD 929
on each factor. These factors have been pre-computed for the 8 equations corresponding to the 8
correction levels. You can see the
factors file.
We can also calculate them.
Here in Basic the algorithm to calculate the coefficients :
Let
s the correction level used,
k = 2
s+1 the number of RS.
Now, still in Basic, the algorithm for calculating RS codes.
Find all the RS code calculations of the different 2D barcodes in Visual Basic 6.
It is a ZIP file without installation :
|
Those who have read and understood the codes of Reed Solomon will find themselves there ; for the few
ignorant who have not understood everything (like me!) It will be enough to apply the "recipe" by using
the codes obtained in the reverse order (last to first).
Bar code making.
Since we can create the bar code pattern it remains us to draw it on the
screen and to print it on a paper sheet. Two approaches are possible :
-
The graphic method where each bar is "drawn" like a solid rectangle. This method makes it possible to
calculate the width of each bar to the nearest pixel and to work on multiples of the width of a pixel of
the used device. This gives good accuracy especially if the device has a low density as is the case of
screens and inkjet printers. This method require special programming routines and does not allow to make
bar codes with a current software.
-
The special font in which each character is replaced by a bar code. This method allow the use of any
software like a text processing or a spreadsheet. (For example LibreOffice, the free clone of MSoffice !)
The scale settings according to the selected size can bring small misshaping of the bar drawing. With
a laser printer there's no probs.
It seems there is no free font for PDF417 on the net. I've decided consequently to draw this font and to
propose it for download. Because there's 929 CWs with 3 alternatives for each one obligate us to divide
the 17 bits of a CW in several parts. But divide by 17 ... hmmm ... a trick is necessary. If we considere
that the first bit is always "1" and the last one "0", we can imagine a separator like "01" and the remain
is only 15 bits in each CW; we can then divide these 15 bits into 3 parts. There will be then 2
5
= 32 possible groups affected to 32 characters of the font. The encoding software is in charge to transform
the 3 groups of each CW into a 3 characters string.
The font will contain thus the following
characters :
-
Character A (ASCII : 65) to F (ASCII :70) and a (ASCII
: 97) to z (ASCII :122) for the 32 groups having a formula "00000" to "11111"
- Character * (ASCII : 42) for the separator having a formula "01"
-
Character + (ASCII : 43) for the row start character without its last 0, formula
"11111111 0 1 0 1 0 1 00"
- Character - (ASCII : 45) for the end row character without its first 1, formula
"111111 0 1 000 1 0 1 00 1"
The string to send to the printer will look something like this : +*gfA*jhD*BAl*gCt*hjk*- and
this for each row.
The "pdf417.ttf" font
This font contains the 35 patterns describe above. The start row and end row codes embed a 2 modules
margin. The height is equal to 3 modules which is the most usual size.
Copy this file
in the font directory,
often named : C:\WINDOWS\FONTS
|
Encoding a pdf417 bar code
The software will evolve with 4 steps :
- Compaction of the datas into the CWs by using of the different methods and trying to optimize.
- Calculation of the correction CWs according to the selected security level.
- Splitting by rows and addition of left side and right side CWs.
-
Transformation of each CW into a 3 characters string and addition
of separators, start row character, end row character and carriage returns.
Because of the interaction between the different compaction modes and
the different sub-modes of the "text" mode it's difficult to make
a 100% optimization. Thus the software will split the string into "parts"
having the type "numeric", "text" or "byte" afterwards
it will change some parts for an other mode if the overload due to switch CWs is
greater than the compaction gain. We'll can't make allowance for all the parameters
like paddings, gain due to perfect multiple of 6, uni-character switch
or the switchs between the different sub-modes of the text mode : one would need for that
several thousands of programming lines.
Another difficulty comes from the size of the wielded numbers, for example the "Modulo 900" operation
applied on a 44 digits number generate an unavoidable overload error; the software will have to carry out
this calculation step by step.
A small program to test all that
|
Here is a small program
written with Visual Basic 6.
The setup file copy the
program, the Visual Basic
dependencies, the source
files and the font.
Setup file :
|
ZIP file without setup :
|
|
The PDF417$ function have about 500 lines, I thus don't reproduce it here, you can retrieve it in the
"form1.frm" file which is with the above program ; with setup program, the "form1.frm" file is in the
program directory, "sources" sub-directory.
The function call is like this :
result$ =
pdf417$(String$, Security%, ColNb%, ErrCode%)
with the string to encode in String$, correction level
in Security% (-1 = auto.), ColNb% for the number of data columns in a row (<1 = auto.) and ErrCode% for
retrieving an eventual error code. these 3 last parameters are not required and are passed by references ;
after the function return they contains the really used values. Values of ErrCode% after the function
return :
- 1 : String$ is empty
- 2 : String$ have too many datas, we go beyond the 928 CWs.
- 3 : Number of CWs per row too small, we go beyond 90 rows.
-
10 : The security level has being lowers not to exceed the 928 CWs. (It's
not an error, only a warning.)
It is enough now to display or to print the chains result$ with the pdf417 font for example in a text
processing. The Word users will be able to even integrate the pdf417$ function in a macro to automate the
treatment. To achieve all the treatments in a single function, I had to use "Gosub"
instead of functions with parameters ; I can already hear the programming esthetes screaming sacrilegious !