INTERNET-DRAFT D. Endler Dalen Knowledge Systems March-2013 Rev. April-2013 Embedded Extension Identifier Specification for the AL-KINDI Implementation Version 1 of the RC4 File Format Specification Contents 1. Introduction....................................................... 2. Licensing.......................................................... 3. The embedded extension identifier description...................... 4. Contents of the file fields........................................ 6. RC4 with embedded extension identifiers file example............... 5. References......................................................... 1 - INTRODUCTION ---------------- A file in the RC4 format has the feature to record metadata through its extension identifiers. With this metadata it's possible to record useful information about the plaintext file, which can be used in its decryption. But the RC4 extension identifiers are stored in plaintext format, so they can be read without the encryption key. With this data it is possible, for example, to compare an encrypted file with a plaintext file and comes to the conclusion that they have the same data, even if you don't have the decryption key. That is why it is necessary to safeguard the information about the encrypted file (such as the original file name, file size, data hash and creation date), not registering its metadata in the form of plaintext. The purpose of the embedded extension identifiers is to protect the metadata by encrypting them together with the file, still using the AL-KINDI Implementation Version 1 of the RC4 File Format Specification. The April 2013 revision of this specification includes the embedded extension FILE-MIME-TYPE. 2 - LICENSING ------------- Copyright (C) 2013 Dalen Knowledge Systems. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license can be found at . 3 - THE EMBEDDED EXTENSION IDENTIFIER DESCRIPTION ------------------------------------------------- To store an encrypted file with embedded extension identifiers, the reserved extension identifier "FILE-CONTENT" of the RC4 Formated File must have the value "ENCRYPTED EMB". The embedded extension identifiers are included within the field file_data as defined in the RC4 File Format Specification. They are included in it before being encrypted. The field plaintext_data is defined as been the field file_data before the encryption. The format of the field file_data complies with the following rules: file_data = Cipher(plaintext_data) plaintext_data = embedded_signature *embedded_extension identifier end_of_extension_identifier plaintext_file_data embedded_signature = "EMBEDDED" embedded_extension_identifier = id_size id_content id_size = SHORTINT id_content = id_name ["=" id_value] 1* id_name = 1*TEXT id_value = 1*TEXT end_of_extension_identifier = 2 plaintext_file_data = 1*(2^64)OCTET SHORTINT = <2OCTET representing the integer value (1st OCTET) + 256 * (2nd OCTET)> TEXT = CTL = OCTET = file_data This is the file data field, as defined in the RC4 File Format Specification. Cipher() This is the function that performs the plaintext data encryption into the ciphertext. plaintext_data This is the text to be encrypted. It includes all the embedded extension identifiers as well as the plaintext file data. embedded_signature The sequence of OCTETs 0x45 0x4D 0x42 0x45 0x44 0x44 0x45 0x44, resulting the text "EMBEDDED". When the file is decrypted, if the plaintext_data does not start with this sequence, it is very likely that the decryption key is incorrect and the decryption process must be aborted. embedded_extension_identifier A pair "name" and "value" that provides information about the file. This sequence can be repeated as many times as needed. It's composed by the id_size and the id_content. id_size The number of OCTETs of the corresponding id_content field (including the ending OCTETs 0x00). It has 2 OCTETs representing an integer value, in witch the LSB comes first. id_content The extension identifier content, having the id_name and the optional corresponding id_value (both in TEXT format). Binary content for the identifier value must be encoded to TEXT using some text data encoding (e.g. Base64, described in [RFC 4648]). end_of_extension_identifier This field indicates that there are no more further extension identifiers. A software tool must obtain the extension identifiers until it finds a extension_identifier with id_size equals to 0, in witch case is the end_of_extension_identifier field. plaintext_file_data This is the original file data, before encryption. All the embedded extension are optional for the file. The identifiers that can be used are the following: FILE-SIZE: Contains the size of the original file used to produce the encrypted file. This embedded extension identifier contains the length of the field plaintext_file_data. It must be in the format: 1NOZERODIGIT *DIGIT NOZERODIGIT = '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' DIGIT = '0' | NOZERODIGIT (e.g. "FILE-SIZE=10234"). FILE-MIME-TYPE: This indicates the format of the file data. The value must be in the Multipurpose Internet Mail Extensions (MIME) format, as described in [RFC 2045] [RFC 2046] and [RFC 2047]. If not present, then the value "application/octet-stream" must be assumed. (e.g. "FILE-MIME-TYPE=text/plain") FILE-NAME: This is the original file name that has been encrypted. It must be in the format: 1*TEXT (e.g. "FILE-NAME=secretFile.txt"). The file name shall not include the path. FILE-DATE: This indicates the modification date and time of the original file. The time zone must be the local time and the value must be in the format: year '-' month '-' day 'T' hour ':' minute ':' second year = NOZERODIGIT 3DIGIT month = <2DIGIT, from '01' to '12'> day = <2DIGIT, from '01' to '31'. The values '29', '30' and '31' must be in accordance with the corresponding month.> hour = <2DIGIT, from '01' to '23'> minute = <2DIGIT, from '01' to '59'> second = <2DIGIT, from '01' to '60'. The value '60' is used only in the case of a leap second.> (e.g. "FILE-DATE=2012-08-20T21:15:04"). Refer to the [RFC 3339] about date and time formats. 4- CONTENTS OF THE FILE FIELDS ------------------------------- When using the embedded extension identifiers, the following rules must be followed for the contents of the RC4 formatted file: data_size The data_size field is no longer used. It must contain the value zero. The embedded extension identifier FILE-SIZE is used instead. sha256_checksum The sha256_checksum field value must be calculated as specified in the RC4 File Format Specification, but its value must be encrypted using the same encryption method and key used to encrypt the file. The encryption of the SHA-256 must be done as if the content of this field was placed immediately after the contents of the file. If the encryption algorithm divides the file into blocks of same size, then the encryption of the SHA-256 must be done starting on a new block, after the file is padded. If the content of the encrypted SHA-256 is greater than 256-bits, then the OCTETs in excess must be discarded. On the other hand, if the content is less than 256-bits, the missing bits must be padded with zeros. extension_identifier The reserved extension identifier "FILE-NAME" is no longer used, and should not be included in the file. The embedded extension identifier FILE-NAME is used instead. The reserved extension identifier "FILE-MIME-TYPE" is no longer used, and should not be included in the file. The embedded extension identifier FILE-MIME-TYPE is used instead. The reserved extension identifier "FILE-CONTENT" must be filled with the value "ENCRYPTED EMB". All the other fields follow the original rules as described in the RC4 File Format Specification. 6. RC4 WITH EMBEDDED EXTENSION IDENTIFIERS FILE EXAMPLE ------------------------------------------------------- The hexdump below illustrates an example of the content of an encrypted file in the RC4 File Format using embedded extension identifiers. Note that there are no information about the file size, and file name, as they are embedded and encrypted together with the data. 00000000: 52 43 34 01 41 4C 2D 4B 49 4E 44 49 96 0B 9F 79 RC4.AL-KINDI...y 00000010: CE AE 2C 54 86 6F B6 C2 09 72 7D E5 2E 58 7E 06 ..,T.o...r}..X~. 00000020: 0E 90 42 21 91 43 64 CA B1 DB 7A 7B 00 00 00 00 ..B!.Cd...z{.... 00000030: 00 00 00 00 1F 00 43 52 45 41 54 45 44 2D 42 59 ......CREATED-BY 00000040: 3D 41 45 53 20 47 65 6E 65 72 61 74 6F 72 20 76 =AES Generator v 00000050: 31 2E 31 30 00 18 00 43 52 45 41 54 45 44 2D 44 1.10...CREATED-D 00000060: 41 54 45 3D 32 30 31 32 2D 30 38 2D 32 30 00 16 ATE=2012-08-20.. 00000070: 00 43 52 45 41 54 45 44 2D 54 49 4D 45 3D 32 31 .CREATED-TIME=21 00000080: 3A 31 35 3A 30 34 00 1B 00 46 49 4C 45 2D 43 4F :15:04...FILE-CO 00000090: 4E 54 45 4E 54 3D 45 4E 43 52 59 50 54 45 44 20 NTENT=ENCRYPTED 000000A0: 45 4D 42 00 2A 00 55 55 49 44 3D 38 32 61 61 32 EMB.*.UUID=82aa2 000000B0: 36 34 38 2D 64 36 33 64 2D 34 34 32 61 2D 37 62 648-d63d-442a-7b 000000C0: 61 65 2D 35 36 38 39 31 66 63 32 64 62 66 38 00 ae-56891fc2dbf8. 000000D0: 2C 00 4B 45 59 2D 49 44 3D 39 65 62 62 34 61 35 ,.KEY-ID=9ebb4a5 000000E0: 63 2D 39 37 37 65 2D 34 31 35 36 2D 36 30 34 34 c-977e-4156-6044 000000F0: 2D 33 62 35 64 39 33 30 65 37 38 31 36 00 13 00 -3b5d930e7816... 00000100: 4D 45 54 48 4F 44 3D 41 45 53 2D 32 35 36 20 43 METHOD=AES-256 C 00000110: 42 43 00 80 00 00 00 00 00 00 00 00 00 00 00 00 BC.............. 00000120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000190: 00 00 00 00 00 00 00 9A C3 D5 61 A4 77 DD 9F F3 ..........a.w... 000001A0: 55 81 4D 20 1E 81 21 F4 9E 1A E7 15 8B 7E 12 27 U.M ..!......~.' 000001B0: 01 B1 D0 06 9E 86 BB B5 8C 7A 3E FF AC F0 9C BE .........z>..... 000001C0: 40 5B FE FC 0D B0 90 07 49 E7 BB 0C D9 7F FB B2 @[......I....... 000001D0: 9D 0C 51 50 6B 51 E8 A9 DC F6 6D B9 63 9F 3C 13 ..QPkQ....m.c.<. 000001E0: 90 E4 90 A5 6D 9D CA 9C 23 BD 1D 4B D8 B2 10 45 ....m...#..K...E 000001F0: 53 48 06 C8 3F 74 61 91 4C BA 5B 86 51 99 A1 D0 SH..?ta.L.[.Q... 00000200: 81 E4 BC 44 18 55 E8 99 1C FE D2 9D 11 37 78 94 ...D.U.......7x. 00000210: 9E E1 7C 86 FE 89 9E 53 35 96 87 40 EB 4A 8F 69 ..|....S5..@.J.i 00000220: 05 6B B1 FF 2F 05 92 D3 9A 91 F2 C3 34 FC B6 D8 .k../.......4... 00000230: 5D 2C 43 A1 C3 E8 1C 10 A6 B9 BB 9A CF 2E 92 8E ],C............. 00000240: 18 41 25 23 EE 8F B3 04 BA E9 0D 49 9D AF 4C FF .A%#.......I..L. 00000250: 55 5E CC D4 4F 71 D5 0C 3B F9 EB C3 7C 34 65 5F U^..Oq..;...|4e_ 00000260: 57 17 99 7A 0B 94 FA 4B 6E 1B 8C 82 0A 77 64 6B W..z...Kn....wdk 00000270: B8 1E 67 F5 07 AE 74 58 78 CF 6D 66 AF 65 C1 F8 ..g...tXx.mf.e.. 00000280: 81 B1 E1 33 FE B2 79 5F 81 F3 0D 9A 21 7A 55 CC ...3..y_....!zU. 00000290: 1A 29 CD 90 15 C5 C9 29 31 9B F0 18 13 DE 59 00 .).....)1.....Y. 000002A0: 9B 70 44 30 13 72 FE 26 3A F0 8A C7 5C 09 E0 E7 .pD0.r.&:...\... 000002B0: 84 C9 86 22 B2 3E A4 59 3B 94 39 E6 8E 33 D8 57 ...".>.Y;.9..3.W 000002C0: 58 8D E0 C5 35 83 D0 61 51 52 21 1D 89 00 5B 1F X...5..aQR!...[. 000002D0: 00 FF DB 06 A4 4A A0 05 35 EE DE CA 45 49 24 64 .....J..5...EI$d 000002E0: 7D 18 D3 D1 02 95 12 8B 5C 99 DF D3 8C 73 3B A9 }.......\....s;. 000002F0: D1 E1 0B 56 2F 5E 7F 77 7F 36 73 61 47 A3 9C F7 ...V/^.w.6saG... 00000300: 20 87 D0 77 AF 47 FA 9F 2F A3 D9 14 AF E5 18 C1 ..w.G../....... 00000310: 62 0D 7B D1 FF CB 95 89 51 1C 0D 90 07 5F A7 C7 b.{.....Q...._.. 00000320: 6C C3 1A FE 27 6D 36 l...'m6 5 - REFERENCES -------------- [RFC 3174] Eastlake 3rd, D., Jones, P., "US Secure Hash Algorithm 1 (SHA1)", RFC 3174, September 2001, [RFC 3339] Klyne, G., Newman, C., "Date and Time on the Internet: Timestamps", RFC 3339, July 2002, [RFC 4648] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, SJD, October 2006, [RFC 6234] Eastlake 3rd, D., Hansen, T., "US Secure Hash Algorithms (SHA and SHA-based HMAC and HKDF)", RFC 6234, May 2011, [RC4 File Format] Endler, D., "RC4 File Format Specification - AL-KINDI Implementation Version 1", October 2012.