INTERNET-DRAFT D. Endler Dalen Knowledge Systems October-2012 Rev. March-2013 Rev. April-2013 Rev. May-2013 RC4 File Format Specification AL-KINDI Implementation Version 1 Contents 1. Introduction....................................................... 2. Licensing.......................................................... 3. The RC4 File Format Description.................................... 4. Reserved extension Identifiers..................................... 5. Using the RC4 File Format to store keys............................ 6. RC4 File Examples.................................................. 7. References......................................................... 8. GNU Free Documentation License..................................... 1. INTRODUCTION --------------- This specification describes the RC4 File Format for the version 1 of the "AL-KINDI" implementation. The RC4 File Format was originally created to store files encrypted by the RC4 algorithm. However its original purpose fell into disuse because the RC4 algorithm has several safety issues when used to encrypt files. The RC4 File Format is very suitable for storing encrypted data. Thus it can also be used to store data encrypted by other algorithms, such as the Advanced Encryption Standard (AES). The version 1 of the RC4 File Format includes the data_size field, which facilitates its use if the output has different size of the input. The March, 2013 revision includes references to the Embedded Extension Identifier Specification, described in [RC4 File Format EMB]. The April, 2013 revision includes the reserved extension identifier FILE-MIME-TYPE for the definition of the file data MIME type. 2. LICENSING ------------ Copyright (C) 2012, 2013 Dalen Knowledge Systems. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license can be found at . The RC4 File Format is inspired in the AESCrypt file format. AESCrypt Copyright (C) 2006, 2007, 2008 Paul E. Jones All Rights Reserved. 3. THE RC4 FILE FORMAT DESCRIPTION ---------------------------------- The RC4 File Format is described using the augmented Backus-Naur Form (BNF), as described in the section 2.1 of the [RFC 2616]. The file format complies with the following rules: rc4_file = signature sha256_checksum data_size *extension identifier [special_last_extension] end_of_extension_identifier file_data signature = stamp version implementation stamp = "RC4" version = implementation = "AL-KINDI" sha256_checksum = 32OCTET data_size = LONGINT extension_identifier = id_size id_content id_size = SHORTINT id_content = id_name ["=" id_value] 1* id_name = 1*TEXT id_value = 1*TEXT special_last_extension = last_id_size last_id_content last_id_size = SHORTINT last_id_content = 1*128 end_of_extension_identifier = 2 file_data = 1*(2^64)OCTET LONGINT = <4SHORTINT representing the integer value (1st SHORTINT) + 2^16 * (2nd SHORTINT) + 2^32 * (3rd SHORTINT) + 2^48 * (4th SHORTINT)> SHORTINT = <2OCTET representing the integer value (1st OCTET) + 256 * (2nd OCTET)> OCTET = TEXT = CTL = signatute This is the initial sequence of OCTETs that forms the signature of an RC4 formated file. It must be used by a software tool to verify the compliance of the file format to this specification. It's 12 OCTETs long, composed by the stamp, the version and the implementation fields. Developers that modify the file format described in this specication must keep this field size unchanged, substituting only the contents of the fields version and implementation to other from its own. stamp The sequence of OCTETs 0x52 0x43 0x34, resulting the text "RC4". version An OCTET representing the version of the RC4 File Format used to create this file. This is the version number of this paricular implementation of the RC4 File Format. Other imlementations have their own version numbering control. implementation The sequence of OCTETs 0x41 0x4C 0x2D 0x4B 0x49 0x4E 0x44 0x49, resulting the text "AL-KINDI". This is the name of the implementation of the file format. Developers that wish to modify the file format described in this specification should change this name to other from its own, provided that it fills exactly 8 OCTETs. sha256_checksum If this file contains encrytpted data, this field contains the SHA-256 hash obtained from the original file used to produce this encrypted file. If this file contains a key or any other plaintext, this field contains the SHA-256 hash obtained from the file data itself. The SHA-256 hash must be calculated using the US Secure Hash Algorithm described in both [RFC 3174] and [RFC 6234]. This code can be used by a software tool to verify the integrity of a decrypted file, comparing it to the SHA-256 hash obtained from the file after the decryption. e.g., the sequence: 0x7F 0x83 0xB1 0x65 0x7F 0xF1 0xFC 0x53 0xB9 0x2D 0xC1 0x81 0x48 0xA1 0xD6 0x5D 0xFC 0x2D 0x4B 0x1F 0xA3 0xD6 0x77 0x28 0x4A 0xDD 0xD2 0x00 0x12 0x6D 0x90 0x69 Is the SHA-256 hash code for the sequence "Hello World!". If the file is encrypted using the ARK data format, this field shall be filled with the sequence of 32 OCTETs 0x00, because the SHA-256 checksum is already embedded into the file_data field (see the ARK data format in [ARK]). Refer to the Embedded Extension Identifier Specification [RC4 File Format EMB] for specific contents of the sha256_checksum field for the "ENCRYPTED EMB" file type. data_size If this file contains encrytpted data, this field contains the size of the original file used to produce this encrypted file. Note that the size of the encrypted file can be different from the size of the original file. If this file contains a key or any other plaintext, this field contains the size of the data itself. If the file is encrypted using the ARK data format,this field shall be filled with the value zero (see the ARK data format in [ARK]). Refer to the Embedded Extension Identifier Specification [RC4 File Format EMB] for specific contents of the data_size field for the "ENCRYPTED EMB" file type. The data_size shall be stored in the little-endian order. extension_identifier A pair "name" and "value" that provides information about the file. This sequence can be repeated as many times as needed. It's composed by the id_size and the id_content. id_size The number of OCTETs of the corresponding id_content field (including the ending OCTETs 0x00). It has 2 OCTETs representing an integer value, in the little-endian order. id_content The extension identifier content, having the id_name and the optional corresponding id_value (both in TEXT format). Some identifier names are reserved and have special meanings. See the section "RESERVED EXTENSION IDENTIFIERS" for more details. Binary content for the identifier value must be encoded to TEXT using some text data encoding (e.g. Base64, described in [RFC 4648]). special_last_extension A special last extension is defined that has no name, but is merely a "container" for extensions to be added after the RC4 file is initially created. Such an extension avoids the need to read and re-write the entire file in order to add a small extension. Software tools that create RC4 files should insert a 128-octet "container" extension. Developers may then insert extensions into this "container" area and reduce the size of this "container" as necessary. If larger extensions are added or the "container" area is filled entirely, then reading and re-writing the entire file would be necessary to add additional extensions. The field last_id_size contains the size of this container, and all the field last_id_content must be filled with the OCTET 0x00. end_of_extension_identifier This field indicates that there are no more further extension identifiers. A software tool must obtain the extension identifiers until it finds an extension_identifier with id_size equals to 0, in witch case is the end_of_extension_identifier field. file_data The file data. The data comprehends all the content of the file, from this point to the end of the file (EOF). Refer to the ARK Data Format for Archiving in an RC4 Formatted File [ARK] for more information about the file_data contents for the "ENCRYPTED ARK" file type. Refer to the Embedded Extension Identifier Specification [RC4 File Format EMB] for more information about the file_data contents for the "ENCRYPTED EMB" file type. 4. RESERVED EXTENSION IDENTIFIERS --------------------------------- All the reserved names are optional for the file. The reserved identifiers are the following: CREATED-BY: This is a developer-defined text string that identifies the software product, manufacturer, or other useful information (such as software version). CREATED-DATE: This indicates the date that the file was created. The format of the date string is YYYY-MM-DD (e.g. 2012-08-20). CREATED-TIME: This indicates the time that the file was created. The format of the time string is in 24-hour format like HH:MM:SS (e.g, 21:15:04). The time zone must be UTC+00:00. FILE-CONTENT: The content type of the file. The possible values are: - "PLAINTEXT" for unencrypted data (e.g., key file); - "ENCRYPTED" for encrypted data; - "ENCRYPTED EMB" for encrypted data with embedded extension identifiers (see [RC4 File Format EMB]); - "ENCRYPTED ARK" for encrypted ARK Data Format (see [ARK]). FILE-NAME: This is the original file name that has been encrypted. If the file contents are encrypted either with embedded extention identifiers (see [RC4 File Format EMB]) or with ARK Data Format (see [ARK]), then this identifier shall be ommited. FILE-MIME-TYPE: This indicates the format of the file data. The value must be in the Multipurpose Internet Mail Extensions (MIME) format, as described in [RFC 2045] [RFC 2046] and [RFC 2047]. If not present, then the value "application/octet-stream" must be assumed. If the file contents are encrypted in the ARK Data Format, this identifier must have either the value "application/x-ark" or the value "application/x-ark-zlib"(see [ARK]). If the file contents are encrypted with embedded extention identifiers then this identifier shall be ommited (see [RC4 File Format EMB]). KEY-INDEX: This is the key index used. It is useful to identify which key was used to encrypt the file, when you have a list of keys stored in your system. This identifier shall be used only in the cases when the software tool does not provide identification for the keys. If the identification for the keys are available, the "KEY-ID" extension identifier shall be used instead. KEY-ID: This is the identification of the key used (usually the UUID of the key). It can be used by a software tool to find the correct key to decrypt the file. METHOD: This is a text that identifies the algorithm used to encrypt the file. It can be used by the software tool to decrypt the file using the same algorithm used to encrypt it. UUID: This is the Universally Unique IDentifier for the file, in the format "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX", as described in the section 3 of [RFC 4122]. Those reserved identifiers shall be used by the software tool that reads from and writes to a RC4 file to perform specific actions. 5. USING THE RC4 FILE FORMAT TO STORE KEYS ------------------------------------------ The RC4 File Format can also be used to store encryption keys. In this case, the following directions must be taken: - The data_size field contains the size of the key (in OCTETS). - The file_data field contains the key itself. - The SHA256_checksum field contains the SHA256 hash of the key data. - The reserved extension identifiers FILE-NAME, KEY-INDEX and KEY-ID are not used. 6. RC4 FILE EXAMPLES -------------------- The hexdump below illustrates an example of the content of an encrypted file in the RC4 File Format. 00000000: 52 43 34 01 41 4C 2D 4B 49 4E 44 49 7F 83 B1 65 RC4.AL-KINDI...e 00000010: 7F F1 FC 53 B9 2D C1 81 48 A1 D6 5D FC 2D 4B 1F ...S.-..H..].-K. 00000020: A3 D6 77 28 4A DD D2 00 12 6D 90 69 0C 00 00 00 ..w(J....m.i.... 00000030: 00 00 00 00 1F 00 43 52 45 41 54 45 44 2D 42 59 ......CREATED-BY 00000040: 3D 41 45 53 20 67 65 6E 65 72 61 74 6F 72 20 76 =AES generator v 00000050: 31 2E 31 30 00 18 00 43 52 45 41 54 45 44 2D 44 1.10...CREATED-D 00000060: 41 54 45 3D 32 30 31 32 2D 30 38 2D 32 30 00 16 ATE=2012-08-20.. 00000070: 00 43 52 45 41 54 45 44 2D 54 49 4D 45 3D 32 31 .CREATED-TIME=21 00000080: 3A 31 35 3A 30 34 00 14 00 46 49 4C 45 2D 4E 41 :15:04...FILE-NA 00000090: 4D 45 3D 68 65 6C 6C 6F 2E 74 78 74 00 17 00 46 ME=hello.txt...F 000000A0: 49 4C 45 2D 43 4F 4E 54 45 4E 54 3D 45 4E 43 52 ILE-CONTENT=ENCR 000000B0: 59 50 54 45 44 00 0C 00 4B 45 59 2D 49 4E 44 45 YPTED...KEY-INDE 000000C0: 58 3D 30 00 2C 00 4B 45 59 2D 49 44 3D 31 62 31 X=0.,.KEY-ID=1b1 000000D0: 36 39 63 39 36 2D 33 37 31 64 2D 34 62 64 64 2D 69c96-371d-4bdd- 000000E0: 37 34 65 37 2D 64 37 35 65 66 34 32 33 37 30 65 74e7-d75ef42370e 000000F0: 39 00 2A 00 55 55 49 44 3D 39 30 62 30 32 62 31 9.*.UUID=90b02b1 00000100: 36 2D 36 65 38 66 2D 34 30 34 35 2D 37 39 36 31 6-6e8f-4045-7961 00000110: 2D 30 66 30 37 34 62 36 38 30 33 34 37 00 13 00 -0f074b680347... 00000120: 4D 45 54 48 4F 44 3D 41 45 53 2D 32 35 36 20 43 METHOD=AES-256 C 00000130: 42 43 00 80 00 00 00 00 00 00 00 00 00 00 00 00 BC.............. 00000140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000001A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000001B0: 00 00 00 00 00 00 00 51 A6 17 67 BD 58 63 9A 1F .......Q..g.Xc.. 000001C0: 7D 4C 16 47 7E 26 32 37 1D 3C 5A C1 B6 09 6D 58 }L.G~&27. [RFC 2046] Freed, N., Borenstein, N., "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, November, 1996. [RFC 2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text", RFC 2047, November, 1996. [RFC 2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., Berners-Lee, T., "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999, [RFC 3174] Eastlake 3rd, D., Jones, P., "US Secure Hash Algorithm 1 (SHA1)", RFC 3174, September 2001, [RFC 4122] Leach, P., Mealling, M., Salz, R., "A Universally Unique IDentifier (UUID) URN Namespace", RFC 4122, July 2005, [RFC 4648] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, SJD, October 2006, [RFC 6234] Eastlake 3rd, D., Hansen, T., "US Secure Hash Algorithms (SHA and SHA-based HMAC and HKDF)", RFC 6234, May 2011, [RC4 File Format EMB] Endler, D., "Embedded Extension Identifier Specification for the AL-KINDI Implementation Version 1 of the RC4 File Format Specification", March, 2013. [ARK] Endler, D., "The ARK Data Format for Archiving in an RC4 Formatted File", May, 2013. [AL-KINDI] Abu Yusuf Ya'qub ibn 'Ishaq as Sabbah al-Kindi was a pioneer in cryptanalysis and devised several new methods of breaking ciphers. August 20 is the Day of the Cryptic Secret