File Index
Mapping content hashes to human-readable filenames
File Index
The File Index is a component of the Aqua Tree that maps content hashes to human-readable filenames. It provides context about what each hash represents, making Aqua Trees more understandable and navigable.
Purpose
When content is hashed (documents, images, data files), the resulting hash is a cryptographic string that provides no information about the original content. The file index solves this by maintaining a mapping between:
- Content hashes: SHA3-256 hashes of file content
- Filenames: Human-readable names that describe the content
This enables users and applications to understand what content each hash represents without needing to store or transmit the actual content.
Structure
The file index is a simple key-value object:
1{2 "file_index": {3 "0xe1bcaa92b0ea2f0eb1f046ca4fc877f26726e5bec8b1a5cf25504a29bc4e0f28": "document.pdf",4 "0x9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08": "test.txt",5 "0x3b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9": "image.png"6 }7}Field Format
| Component | Description | Format |
|---|---|---|
| Key | Content hash | Hex string prefixed with 0x, typically 64 characters (SHA3-256) |
| Value | Filename | String, typically includes file extension |
Hash Sources
File index entries can reference hashes from several sources:
1. Object Revision Payloads
When an object revision contains content:
1{2 "revisions": {3 "0xrev_hash...": {4 "payload": {5 "hash": "0xe1bcaa92b0ea2f0eb1f046ca4fc877f26726e5bec8b1a5cf25504a29bc4e0f28",6 "payload_type": "application/pdf",7 "descriptor": "Contract Document"8 }9 }10 },11 "file_index": {12 "0xe1bcaa92b0ea2f0eb1f046ca4fc877f26726e5bec8b1a5cf25504a29bc4e0f28": "contract.pdf"13 }14}2. Linked External Trees
When link revisions reference other Aqua Trees:
1{2 "revisions": {3 "0xlink_hash...": {4 "revision_type": "0x1c3e5a7b9d2f4e6a8c0b1d3f5e7a9c2b4d6e8f0a1c3e5a7b9d2f4e6a8c0b1d3f",5 "links": [6 {7 "verification_hash": "0xexternal_tree_hash...",8 "content_hash": "0x3b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9"9 }10 ]11 }12 },13 "file_index": {14 "0x3b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9": "component_tree.aqua.json"15 }16}3. Template Content
When template revisions define schemas:
1{2 "file_index": {3 "0xtemplate_hash...": "user_credential_schema.json"4 }5}Complete Example
Here's a complete Aqua Tree showing how file_index integrates:
1{2 "revisions": {3 "0x742b74c87ccd7bfc76eaec416027a0bc039b59b9c2d452ea55a5c0e9b0e3f08e": {4 "revision_type": "0x742b74c87ccd7bfc76eaec416027a0bc039b59b9c2d452ea55a5c0e9b0e3f08e",5 "nonce": "0x3fa8b1c2d3e4f5a67b8c9d0e1f2a3b4c",6 "local_timestamp": 1704067200,7 "version": "https://aqua-protocol.org/docs/v4/schema",8 "method": "scalar",9 "hash_type": "FIPS_202-SHA3-256",10 "payload": {11 "payload_type": "application/pdf",12 "hash": "0x9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08",13 "hash_type": "FIPS_202-SHA3-256",14 "descriptor": "Sales Contract"15 }16 },17 "0xsig_hash...": {18 "revision_type": "0x8e5b2f9c4d3a1e7b6c8f9d0e2a5b3c4d1e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b",19 "previous_hash": "0x742b74c87ccd7bfc76eaec416027a0bc039b59b9c2d452ea55a5c0e9b0e3f08e",20 "signature_type": "eip191",21 "signature": "0x...",22 "wallet_address": "0x742d35Cc6634C0532925a3b844Bc9e7595f0bEb8"23 }24 },25 "file_index": {26 "0x9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08": "sales_contract_2024.pdf"27 }28}Benefits
1. Human Readability
Users can understand what content is in the tree without decoding hashes:
1"file_index": {2 "0xabc...": "proposal.docx",3 "0xdef...": "budget.xlsx",4 "0x123...": "diagram.png"5}Instead of just seeing cryptographic hashes, users see meaningful filenames.
2. Content Discovery
Applications can list available content:
1const filenames = Object.values(aquaTree.file_index);2console.log("Available files:", filenames);3// Output: ["proposal.docx", "budget.xlsx", "diagram.png"]3. Reverse Lookup
Find the hash for a known filename:
1function findHashByFilename(aquaTree, filename) {2 return Object.entries(aquaTree.file_index)3 .find(([_, name]) => name === filename)?.[0];4}5 6const hash = findHashByFilename(tree, "contract.pdf");4. Linked Tree Context
When trees link to external trees, the file index identifies them:
1"file_index": {2 "0xhash1...": "main_document.pdf",3 "0xhash2...": "appendix_a.aqua.json", // External tree4 "0xhash3...": "appendix_b.aqua.json" // External tree5}Best Practices
Descriptive Filenames
Use clear, descriptive names with appropriate extensions:
1// Good2"file_index": {3 "0xabc...": "employment_contract_john_doe_2024.pdf",4 "0xdef...": "diploma_computer_science_2024.pdf"5}6 7// Avoid8"file_index": {9 "0xabc...": "file1.dat",10 "0xdef...": "doc.bin"11}Include Extensions
Always include file extensions to indicate content type:
1"file_index": {2 "0xabc...": "document.pdf", // PDF document3 "0xdef...": "image.png", // PNG image4 "0x123...": "data.json", // JSON data5 "0x456...": "archive.zip", // Compressed archive6 "0x789...": "video.mp4" // Video file7}Unique Names
Ensure filenames are unique within a tree to avoid ambiguity:
1// Good - unique names2"file_index": {3 "0xabc...": "contract_v1.pdf",4 "0xdef...": "contract_v2.pdf"5}6 7// Avoid - duplicate names8"file_index": {9 "0xabc...": "contract.pdf",10 "0xdef...": "contract.pdf" // Ambiguous!11}External Tree Naming
For linked external trees, use the .aqua.json extension:
1"file_index": {2 "0xhash...": "component_a.aqua.json",3 "0xhash...": "dependency_b.aqua.json"4}Optional Nature
While the file index is a standard component of Aqua Trees, entries are optional:
- Not all hashes need file index entries
- The tree remains valid without file_index
- Applications can function using hashes alone
However, including file index entries significantly improves usability.
Validation
The file index should be validated for:
1. Hash Format
All keys must be valid hex strings:
1function isValidHash(hash: string): boolean {2 return /^0x[0-9a-fA-F]{64}$/.test(hash);3}2. Referenced Hashes
Hashes in file_index should appear in revisions:
1function validateFileIndex(aquaTree) {2 const referencedHashes = new Set();3 4 // Collect all content hashes from revisions5 Object.values(aquaTree.revisions).forEach(rev => {6 if (rev.payload?.hash) {7 referencedHashes.add(rev.payload.hash);8 }9 if (rev.links) {10 rev.links.forEach(link => referencedHashes.add(link.content_hash));11 }12 });13 14 // Check file_index entries15 for (const hash of Object.keys(aquaTree.file_index)) {16 if (!referencedHashes.has(hash)) {17 console.warn(`Orphaned file_index entry: ${hash}`);18 }19 }20}3. Filename Validity
Filenames should not contain invalid characters:
1function isValidFilename(filename: string): boolean {2 // Avoid: null bytes, path separators, control characters3 return !/[\x00\/\\]/.test(filename);4}Use Cases
Document Management Systems
Track document names across revisions:
1"file_index": {2 "0xv1...": "proposal_draft_v1.docx",3 "0xv2...": "proposal_draft_v2.docx",4 "0xfinal...": "proposal_final.docx"5}Multi-File Projects
Reference multiple files in a project:
1"file_index": {2 "0xreadme...": "README.md",3 "0xcode...": "main.rs",4 "0xconfig...": "config.toml",5 "0xdocs...": "documentation.pdf"6}Supply Chain Tracking
Identify product-related documents:
1"file_index": {2 "0xcert...": "organic_certification.pdf",3 "0xinspect...": "quality_inspection_report.pdf",4 "0xship...": "shipping_manifest.pdf"5}Credential Systems
Map credential hashes to recipient identifiers:
1"file_index": {2 "0xdiploma...": "diploma_john_doe_2024.pdf",3 "0xtranscript...": "transcript_john_doe_2024.pdf"4}Storage Considerations
Size Impact
Each file index entry adds approximately 100-150 bytes:
- 64-character hash: ~66 bytes
- Filename: 20-50 bytes typical
- JSON overhead: ~20 bytes
For large trees, this is minimal compared to revision data.
Compression
File index compresses well with gzip due to repetitive patterns:
- Hash prefixes (
0x) - Common file extensions (
.pdf,.json)
Related Documentation
- Aqua Tree - Complete tree structure
- Object Revision - Revisions containing content hashes
- Link Revision - Linking to external trees
- Template Revision - Schema definitions
