APPENDIX E
Encoding and Decoding Algorithms for Multimedia Objects
The algorithm for encoding binary images, compatible with GEDCOM transmission, is similar to an encoding scheme that would be used in creating a hexadecimal representation, but it uses a base-64 number representation rather than base-16 number representation.
This algorithm is for converting multimedia images represented in binary numbers into a collection that does not contain any of the ASCII control characters. This conversion eliminates the occurrence of special characters such as the "@" which has special meaning to GEDCOM.
The encoding routine converts a binary multimedia file segment of from 1 to 54 bytes in length into an encoded GEDCOM line value of 2 to 64 bytes in length. This encoded value becomes the <ENCODED_MULTIMEDIA_LINE> used in the MULTIMEDIA_RECORD (see page *.)
The algorithm accomplishes its goal using the following steps:
1. Each 3 bytes (24 bits) of the binary 1 to 54 character segment is divided into four (6-bit) values. Each of these (6-bit) values are converted into an (8-bit) character making a character whose hexadecimal representation is between 0x00 and 0x3F (0 to 63 decimal.)
2. Each of the 4 new characters represents an Encoding key which is used to obtain the new replacement character from an Encoding Table included in this appendix.
3. Exception processing may be required in processing the last 3 byte chunk of the 1 to 54 character segment, which may consist of 0, 1, or 2 bytes:
Retrieved Action
a. 0 bytes Pad the last 3 characters with 0xFF. The conversion is complete.
b. 1 byte: Pad last two bytes with 0xFF then complete steps 1 and 2 above.
c. 2 bytes: Pad last byte with 0xFF then complete steps 1 and 2 above.
5. Repeat until all characters in the received line value has been substituted. The return value of new encoded characters should contain from 4 to 72 characters. The length of the return value will always be a multiple of 4.
The Decoding routine converts the encoded line value back into the original binary character multimedia file segment.
The decoding algorithm can be accomplished in the following steps:
1. Each encoded multimedia line segment is divided into sets of 4 (8-bit) characters.
2. Each of these characters becomes a decoding key used to look up a corresponding character from the Decoding Table. A new (24-bit) group is formed by concatenating the low-order 6 bits from each of the 4 characters obtained from the decoding table.
3. Divide this new 24 bit group created by step 2 into three (8-bit) characters and concatenate them into the stream of characters being built as the decoded results.
4. Processing ends when the 0xFF padded bytes are encountered.
Encoding Table
Encoding key | Replacement character | Encoding key | Replacement character | |
---|---|---|---|---|
0x00 | 0x2E . | 0x23 | 0x58 X | |
0x01 | 0x2F / | 0x24 | 0x59 Y | |
0x02 | 0x30 0 | 0x25 | 0x5A Z | |
0x03 | 0x31 1 | ---- | -- | |
0x04 | 0x32 2 | 0x26 | 0x61 a | |
0x05 | 0x33 3 | 0x27 | 0x62 b | |
0x06 | 0x34 4 | 0x28 | 0x63 c | |
0x07 | 0x35 5 | 0x29 | 0x64 d | |
0x08 | 0x36 6 | 0x2A | 0x65 e | |
0x09 | 0x37 7 | 0x2B | 0x66 f | |
0x0A | 0x38 8 | 0x2C | 0x67 g | |
0x0B | 0x39 9 | 0x2D | 0x68 h | |
---- | ---- | 0x2E | 0x69 i | |
0x0C | 0x41 A | 0x2F | 0x6A j | |
0x0D | 0x42 B | 0x30 | 0x6B k | |
0x0E | 0x43 C | 0x31 | 0x6C l | |
0x0F | 0x44 D | 0x32 | 0x6D m | |
0x10 | 0x45 E | 0x33 | 0x6E n | |
0x11 | 0x46 F | 0x34 | 0x6F o | |
0x12 | 0x47 G | 0x35 | 0x70 p | |
0x13 | 0x48 H | 0x36 | 0x71 q | |
0x14 | 0x49 I | 0x37 | 0x72 r | |
0x15 | 0x4A J | 0x38 | 0x73 s | |
0x16 | 0x4B K | 0x39 | 0x74 t | |
0x17 | 0x4C L | 0x3A | 0x75 u | |
0x18 | 0x4D M | 0x3B | 0x76 v | |
0x19 | 0x4E N | 0x3C | 0x77 w | |
0x1A | 0x4F O | 0x3D | 0x78 x | |
0x1B | 0x50 P | 0x3E | 0x79 y | |
0x1C | 0x51 Q | 0x3F | 0x7A z | |
0x1D | 0x52 R | |||
0x1E | 0x53 S | |||
0x1F | 0x54 T | |||
0x20 | 0x55 U | |||
0x21 | 0x56 V | |||
0x22 | 0x57 W |
Decoding Table
Decoding key | Replacement character | Decoding key | Replacement character | |
---|---|---|---|---|
0x2E . | 0x00 | 0x57 W | 0x22 | |
0x2F / | 0x01 | 0x58 X | 0x23 | |
0x30 0 | 0x02 | 0x59 Y | 0x24 | |
0x31 1 | 0x03 | 0x5A Z | 0x25 | |
0x32 2 | 0x04 | 0x5B - 0x60 | not valid | |
0x33 3 | 0x05 | 0x61 a | 0x26 | |
0x34 4 | 0x06 | 0x62 b | 0x27 | |
0x35 5 | 0x07 | 0x63 c | 0x28 | |
0x36 6 | 0x08 | 0x64 d | 0x29 | |
0x37 7 | 0x09 | 0x65 e | 0x2A | |
0x38 8 | 0x0A | 0x66 f | 0x2B | |
0x39 9 | 0x0B | 0x67 g | 0x2C | |
0x3A - 0x40 | not valid | 0x68 h | 0x2D | |
0x41 A | 0x0C | 0x69 i | 0x2E | |
0x42 B | 0x0D | 0x6A j | 0x2F | |
0x43 C | 0x0E | 0x6B k | 0x30 | |
0x44 D | 0x0F | 0x6C l | 0x31 | |
0x45 E | 0x10 | 0x6D m | 0x32 | |
0x46 F | 0x11 | 0x6E n | 0x33 | |
0x47 G | 0x12 | 0x6F o | 0x34 | |
0x48 H | 0x13 | 0x70 p | 0x35 | |
0x49 I | 0x14 | 0x71 q | 0x36 | |
0x4A J | 0x15 | 0x72 r | 0x37 | |
0x4B K | 0x16 | 0x73 s | 0x38 | |
0x4C L | 0x17 | 0x74 t | 0x39 | |
0x4D M | 0x18 | 0x75 u | 0x3A | |
0x4E N | 0x19 | 0x76 v | 0x3B | |
0x4F O | 0x1A | 0x77 w | 0x3C | |
0x50 P | 0x1B | 0x78 x | 0x3D | |
0x51 Q | 0x1C | 0x79 y | 0x3E | |
0x52 R | 0x1D | 0x7A z | 0x3F | |
0x53 S | 0x1E | |||
0x54 T | 0x1F | |||
0x55 U | 0x20 | |||
0x56 V | 0x21 |
Copyright © 1987, 1989, 1992, 1993, 1995 by The Church of Jesus Christ of Latter-day Saints. This document may be copied for purposes of review or programming of genealogical software, provided this notice is included. All other rights reserved.
Disclaimer: This HTML version of the GEDCOM 5.5
specification should be equivalent to the LDS wordperfect original. In
the conversion process I have tried not to break anything however, the
LDS original should always be considered the definitive
version.
Clive Stubbings, October 2000