This code is part of the family of 2-dimensional codes, it can encode up to 2335 characters on a very small surface. The encoding is done in two stages : first the datas are converted to 8 bits "codeword" (High level encoding) then those are converted to small black and white squares. (Low level encoding) Moreover an error correction system is included, it allows to reconstitute badly printed, erased, fuzzy or torn off datas. In the continuation of this talk, the term "codeword" will be abbreviated to CW and Reed-Solomon code in RS. There are also micro QR codes not studied here.

The general structure.

The symbol is a square or rectangular array made with rows and columns. Each cell is a small square black for a bit set to 1 and white for a bit set to 0. The dimension of the square is named the module.
The colors can be inverted : white on black.
Extended Channel Interpretation (ECI) protocol provides a method to specify particular interpretations on byte values or to identify a particular page code.
The default ECI code is 000003 which designate the Latin alphabet ISO 8859-1.
There are two datamatrix standard : ECC 000-140 and ECC 200. Only ECC 200 can be used for a new project. This study is only dedicated to the ECC 200.
A symbol consists of one or several data regions. Each region has a one module wide perimeter.
Independently of the number of region, there is one, and only one, mapping matrix. Le size of the matrix is : "region size" x "number of region"
Example for the 36x36 symbol : "16x16" x "2x2" ---> matrix size is 32x32
Second example for the 16x48 symbol : "14x22" x "1x2" ---> matrix size is 14x44
The number of rows and the number of columns (including the perimeter) are always even (Odd for ECC 000-140 !)
If necessary a mechanism allows to distribute more datas onse veral symbols. (Up to 16)
The error correction mechanism is based on Reed-Solomon codes.
For square symbol of 48 x 48 and less, Reed-Solomon codes are appended after the datas; for other symbols they are interleaved : datas are divided in blocks.
Each symbol size has its own number of Reed-Solomon codes.
The total number of CW per symbol is equal to the number of cells in the matrix divided by 8 (Without decimal part)
The 8 bits of each CW are placed in the matrix in order from left to right and top to bottom; certain CW are split in order to fill the matrix.
A quiet zone of 1 module (minimum) is required on the 4 sides.
Be careful not to confuse the concept of regions which is a graphic division of the symbol and the concept of blocks which is a logical division of the data.

A 2 x 2 regions code :

Low level encoding.

Thereafter we'll use operators : + --> addition, x --> multiplication, \ --> integer division, MOD --> remainder of the integer division.
There are 24 sizes of square symbol and 6 sizes of rectangular symbol. The following array give basic values for each symbol size.

Symbol size Rows x columns	Number of data region (H x V)	Number of Reed Solomon CW	Number of block
Square symbols
10x10	1	5	1
12x12	1	7	1
14x14	1	10	1
16x16	1	12	1
18x18	1	14	1
20x20	1	18	1
22x22	1	20	1
24x24	1	24	1
26x26	1	28	1
32x32	2x2	36	1
36x36	2x2	42	1
40x40	2x2	48	1
44x44	2x2	56	1
48x48	2x2	68	1
52x52	2x2	2 x 42	2
64x64	4x4	2 x 56	2
72x72	4x4	4 x 36	4
80x80	4x4	4 x 48	4
88x88	4x4	4 x 56	4
96x96	4x4	4 x 68	4
104x104	4x4	6 x 56	6
120x120	6x6	6 x 68	6
132x132	6x6	8 x 62	8
144x144	6x6	10 x 62	8
Rectangular symbols
8x18	1	7	1
8x32	2	11	1
12x26	1	14	1
12x36	1x2	18	1
16x36	1x2	24	1
16x48	1x2	28	1

Other interesting values can be computed from these values. This is done with a spreadsheet :
Each region has a one module wide perimeter. Left and lower sides are entirely black, right and top sides are made up of alternating black and white squares.

Each CW is placed in the matrix (If there are several regions, they are assembled to form an unique matrix) on 45 degree parallel diagonal lines and the left top corner is always as shown below :
In this image, we can remark than CW nr. 2, 5 and 6 have a regular shape. CW nr. 1, 3, 4 are truncated and the remain of these CW is reported on the other side of the symbol. Here is the entire placement of the 8 x 8 matrix : You can remark on this image that the bit 8 of each CW is under the 45 degree parallel diagonal lines. Corner and border conditions are very intricate and different for each matrix size, fortunately Datamatrix standard give us an algorithm in order to make the placement.

High level encoding.

The hight level encoding support six compaction mode, ASCII mode is divided in 3 sub-mode :

Compaction mode	Données à encoder	Rate compaction
ASCII	ASCII character 0 to 127	1 byte per CW
ASCII extended	ASCII character 128 to 255	0.5 byte per CW
ASCII numeric	ASCII digits	2 byte per CW
C40	Upper-case alphanumeric	1.5 byte per CW
TEXT	Lower-case alphanumeric	1.5 byte per CW
X12	ANSI X12	1.5 byte per CW
EDIFACT	ASCII character 32 to 94	1.33 byte per CW
BASE 256	ASCII character 0 to 255	1 byte per CW

The default character encodation method is ASCII. Some special CWs allow to switch between the encoding methods.

Codeword	Data or function
1 to 128	ASCII datas
129	Padding
130 to 229	Pair of digits : 00 to 999
230	Switch to C40 method
231	Switch to Base 256 method
232	FNC1 character
233	Structure of several symbols
234	Reader programming
235	Shift to extended ASCII for one character
236	Macro
237	Macro
238	Switch to ANSI X12 method
239	Switch to TEXT method
240	Switch to EDIFACT method
241	Extended Channel Interpretation character
254	If ASCII method is in force : End of datas, next CWs are CW of padding If other method is in force : Switch back to ASCII method or indicate end of datas.

If the symbol is not full, pad CWs are required. After the last data CW, the 254 CW indicates the end of the datas or the return to ASCII method. First padding CW is 129 and next padding CWs are computed with the 253-state algorithm.

The 253-state algorithm.
Let P the number of data CWs from the beginning of datas, R a pseudo random number and CW the required pad CW.
R = ((149 * P) MOD 253) + 1
CW = (129 + R) MOD 254

The ASCII mode. This mode has 3 ways to encode character :

ASCII character in the range 0 to 127
CW = "ASCII value" + 1
Extended ASCII character in the range 128 to 255
A first CW with the value 235 and a second CW with the value : "ASCII value" - 127
Pair of digits 00, 01, 02 ..... 99
CW = "Pair of digits numerical value" + 130

C40, TEXT and X12 modes.
C40 and TEXT modes are similar : only uppercase and lowercase characters are inverted.
In these modes 3 data characters are compacted in 2 CWs. In C40 and TEXT modes 3 shift characters allow to indicate an other character set for the next character.
The 16 bits value of a CW pair is computed as following :
Value = C1 * 1600 + C2 * 40 + C3 + 1 with C1, C2 and C3 the 3 character values to compact. 254 CW indicate a return to the ASCII method exept if this mode allows to fill completely the symbol.
In C40 and TEXT mode a pad character with 0 value can be added at the 2 last characters in order to form a pair of CW.
If it remains to encode only one character in C40 or TEXT mode or 2 character in X12 mode ; it(they) must be encoded with ASCII method but if a single free CW remain in the symbol before data correction CWs, it is assumed that this CW is encoded using ASCII method without using the 254 CW.
"Upper Shift" character enable to encode extended ASCII character.

Value	Basic set for C40	Basic set for TEXT	"Shift 1" set	"Shift 2" set	"Shift 3" set for C40	"Shift 3" set for TEXT	Set for X12
0	Shift 1	Shift 1	NUL (0)	! (33)	‘ (96)	‘ (96)	CR (13)
1	Shift 2	Shift 2	SOH (1)	" (34)	a (97)	A (65)	* (42)
2	Shift 3	Shift 3	STX (2)	# (35)	b (98)	B (66)	> (62)
3	Espace (32)	Espace (32)	ETX (3)	$ (36)	c (99)	C (67)	Espace (32)
4	0 (48)	0 (48)	EOT (4)	% (37)	d (100)	D (68)	0 (48)
5	1 (49)	1 (49)	ENQ (5)	& (38)	e (101)	E (69)	1 (49)
6	2 (50)	2 (50)	ACK (6)	' (39)	f (102)	F (70)	2 (50)
7	3 (51)	3 (51)	BEL (7)	( (40)	g (103)	G (71)	3 (51)
8	4 (52)	4 (52)	BS (8)	) (41)	h (104)	H (72)	4 (52)
9	5 (53)	5 (53)	HT (9)	* (42)	I (105)	I (73)	5 (53)
10	6 (54)	6 (54)	LF (10)	+ (43)	j (106)	J (74)	6 (54)
11	7 (55)	7 (55)	VT (11)	, (44)	k (107)	K (75)	7 (55)
12	8 (56)	8 (56)	FF (12)	- (45)	l (108)	L (76)	8 (56)
13	9 (57)	9 (57)	CR (13)	. (46)	m (109)	M (77)	9 (57)
14	A (65)	a (97)	SO (14)	/ (47)	n (110)	N (78)	A (65)
15	B (66)	b (98)	SI (15)	: (58)	o (111)	O (79)	B (66)
16	C (67)	c (99)	DLE (16)	; (59)	p (112)	P (80)	C (67)
17	D (68)	d (100)	DC1 (17)	< (60)	q (113)	Q (81)	D (68)
18	E (69)	e (101)	DC2 (18)	= (61)	r (114)	R (82)	E (69)
19	F (70)	f (102)	DC3 (19)	>(62)	s (115)	S (83)	F (70)
20	G (71)	g (103)	DC4 (20)	? (63)	t (116)	T (84)	G (71)
21	H (72)	h (104)	NAK (21)	@ (64)	u (117)	U (85)	H (72)
22	I (73)	I (105)	SYN (22)	[ (91)	v (118)	V (86)	I (73)
23	J (74)	j (106)	ETB (23)	\ (92)	w (119)	W (87)	J (74)
24	K (75)	k (107)	CAN (24)	] (93)	x (120)	X (88)	K (75)
25	L (76)	l (108)	EM (25)	^ (94)	y (121)	Y (89)	L (76)
26	M (77)	m (109)	SUB (26)	_ (95)	z (122)	Z (90)	M (77)
27	N (78)	n (110)	ESC (27)	FNC1	{ (123)	{ (123)	N (78)
28	O (79)	o (111)	FS (28)		\| (124)	\| (124)	O (79)
29	P (80)	p (112)	GS (29)		} (125)	} (125)	P (80)
30	Q (81)	q (113)	RS (30)	Upper Shift	~ (126)	~ (126)	Q (81)
31	R (82)	r (114)	US (31)		DEL (127)	DEL (127)	R (82)
32	S (83)	s (115)					S (83)
33	T (84)	t (116)					T (84)
34	U (85)	u (117)					U (85)
35	V (86)	v (118)					V (86)
36	W (87)	w (119)					W (87)
37	X (88)	x 120)					X (88)
38	Y (89)	y (121)					Y (89)
39	Z (90)	z (122)					Z (90)

Sample, sequence to encode with C40 method : Ab

The 3 characters are : 14, 02, 02
14 * 1600 + 2 * 40 + 2 + 1 = 22 483
CW₁ = 22 483 \ 256 = 87
CW₂ = 22 483 MOD 256 = 211
The sequence is consequently : 87, 211

Extended characters are encoded as follows :

Generate code "1" to switch to set 2, then the code 30 which is the "upper shift" code.
Substract 128 from the ASCII value of the character to encode; we obtains a not-extended character.
Encode normally this character with changing the set if necessary.

Example in C40:
Character E (203) : 203 - 128 = 75 which is "K", line 24 of the basic set. Sequence : 1 30 24
Character e (235) : 235 - 128 = 107 which is "k", line 11 of the "shift 3" set. Sequence : 1 30 2 11

EDIFACT mode.
In this mode 4 data characters are compacted in 3 CWs. Each EDIFACT character is coded with 6 bits which are the 6 last bits of the ASCII value.

EDIFACT value	>ASCII value character	Comment
0 to 30	64 to 94	EDIFACT value = ASCII value - 64
31		End of datas, return to ASCII mode
32 to 63	32 to 63	EDIFACT value = ASCII value

Sample, sequence to encode with EDIFACT method : ABC!

The 4 EDIFACT character values are : 1, 2, 3, 33
1 * 262144 + 2 * 4096 + 3 * 64 + 33 = 270 561
CW₁ = 270 561 \ 65536 = 4
CW₂ = ( 270 561 MOD 65536 ) \ 256 = 32
CW₃ = 270 561 MOD 256 = 225
The sequence is consequently : 4, 32, 225

"Base 256" mode.
This mode can encode any byte.
After the 231 CW which switch to "base 256" mode, there is a length field. This field is build with 1 or 2 bytes.
Let N the number of data to encode :

If N < 250 a single byte is used, its value is N (from 0 to 249)
If N >= 250 two bytes are used, the value of the first one is : (N \ 250) + 249 (Values from 250 to 255) and the value of the second one is N MOD 250 (Values from 0 to 249).
If N finishes the filling of the symbol : the value of the byte is 0.

Moreover each CW (including the length field) must be computed with the 255-state algorithm.

The 255-state algorithm.
Let P the number of data CWs from the beginning of datas, R a pseudo random number, V the base 256 CW value and CW the required CW.
R = ((149 * P) MOD 255) + 1
CW = (V + R) MOD 256

Errors detection and correction.

The system of correction is based on the codes of " Reed Solomon " which make the joy of the mathematicians and the terror of the others ...
The number of correction CWs depend of the matrix size. (See table ), more exactly it depend of the bloc size.
Reed Solomon codes are based on a polynomial equation where x power is the number of error correction CWs used. or sample with the 8 x 8 matrix we use an equation like this : x⁵ + ax⁴ + bx³ + cx² + dx + e. The numbers a, b, c, d and e are the factors of the polynomial equation.

For information the equation is : (x - 2)(x - 2²)(x - 2³).....(x - 2^k) We develop the polynomial equation with Galois arithmetic on each factor...
There is 16 Reed Solomon block size (See table ) : 5, 7, 10, 11, 12, 14, 18, 20, 24, 28, 36, 42, 48, 56, 62, 68. The factors of these 16 polynomial equations have been pre-computed. You can see the factors file. We can also calculate them. But first a small explanation on the operations in a Galois body.

Arithmetic operations in a Galois body of characteristic 2.

The sum and the difference are the same function : the exclusive OR function.
A + B = A - B = A Xor B
The multiplication is more complicated ; first we need to create 2 tables containing the Logs and Antilogs of the body according to its "modulo" (Here 301) :

Now we can calculate Mult(a, b) = Alog((Log(a) + Log(b)) Mod 255)

Here is now in Basic the algorithm to calculate the factors :

And now, still in Basic, the algorithm for calculating correction CWs.

Find all the RS code calculations of the different 2D barcodes in Visual Basic 6.
It is a ZIP file without installation :

Those who have read and understood the codes of Reed Solomon will find themselves there ; for the few ignorant who have not understood everything (like me!) It will be enough to apply the "recipe" by using the codes obtained in the reverse order (last to first).

Bar code making.

Now that we know how to create a barcode pattern, we still have to draw it on the screen and print it on paper. Two approaches are possible :

The graphic method where each bar is "drawn" like a solid rectangle. This method makes it possible to calculate the width of each bar to the nearest pixel and to work on multiples of the width of a pixel of the used device. This gives good accuracy especially if the device has a low density as is the case of screens and inkjet printers. This method require special programming routines and does not allow to make bar codes with a current software.
The special font in which each character is replaced by a bar code. This method allow the use of any software like a text processing or a spreadsheet. (For example LibreOffice, the free clone of MSoffice !) The scale settings according to the selected size can bring small misshaping of the bar drawing. With a laser printer there's no probs.

It seems there is no free fontfor datamatrix barcode on the net. I've decided consequently to draw this font and to propose it for download. Since each symbol have an even row number and an even column number, I put in each character of the font 4 modules (2 rows and 2 columns). In this manner we have 16 combinations assigned to the 16 first capital letters.
If we give a value at each dot of this 2 x 2 matrix like this :

1	2
4	8

The ASCII value of the character associated to a given matrix is the sum each dot value + 65 (65 = A = no dot !)

The " datamatrix.ttf " font

This font contains the 16 character A (ASCII : 65) to P (ASCII : 80)

Copy this file
in the font directory,
often named :
C:\WINDOWS\FONTS

Encoding a datamatrix barcode

The software will evolve with 4 steps :

Compaction of the datas into the CWs by using of the different methods and trying to optimize and if necessary addition of padding.
Calculation of the correction CWs according to the matrix size, and splitting by blocks if necessary
Placement of the CWs in the matrix according to the algorithm supplied by datamatrix standard.
Splitting by regions and addition of the region perimeters.
Transformation of each row pair into a characters string. The string length is number of module / 2

Because of the interaction between the different compaction modes it's difficult to make a 100% optimization. Thus the software will split the string into "parts" having the type "numeric", "text" or "byte" afterwards it will change some parts for an other mode if the overload due to switch CWs is greater than the compaction gain. We'll can't make allowance for all the parameters like paddings, ...

WORK IN PROGRESS

A small program to test all that

WORK IN PROGRESS

Do you like this page ?

Is it useful to you ?

Index des mots-clés

The Datamatrix code

The general structure.

Low level encoding.

High level encoding.

Errors detection and correction.

Bar code making.

The " datamatrix.ttf " font

Encoding a datamatrix barcode

WORK IN PROGRESS

A small program to test all that

WORK IN PROGRESS

Click here !