Upload

Uploading files involves chunked file transfers to multiple blobbers using erasure coding (EC). This approach ensures fault tolerance, redundancy, and efficient retrieval.

The system uses n:p EC ratio, where:

  • n = Number of data blobbers (storing actual data)

  • p = Number of parity blobbers (storing error correction data)

Fig1: Upload Flow

This method provides faster upload/download speeds through parallel processing and ensures data reliability.

1. Upload Process

The upload mechanism follows a chunked approach for handling large files efficiently. Here is step by step process:

  1. Chunk Preparation

    • Read the file in 64 KB chunks. Ensure each chunk has n * 64 KB fragments.

    • If the last fragment is smaller, pad it with null bytes.

  2. Erasure Encoding

    • Compute p parity-shards using Reed-Solomon encoding.

    • This ensures data redundancy and fault tolerance.

  3. Shard Distribution

    • Each shard (data or parity) is sent to a respective blobber.

    • n + p upload requests are created.

  4. Temporary Storage on Blobbers

    • Blobbers store chunks in a temporary directory.

    • Subsequent fragments are appended to the same file.

  5. Chunk Upload Confirmation

    • Once all fragments are uploaded successfully, the file is considered fully uploaded.

2. Merkle Root Calculation

The Merkle root ensures data integrity and is uniquely computed for each blobber. Lets talk about Merkle Root Calculation Process

  • Chunk Division: Each 64 KB chunk is split into 64-byte blocks, creating 1024 blocks.

  • Leaf Hash Calculation: Each index i (0-1023) has m number of 64-byte blocks (continuous data). Compute the hash of these m blocks → This forms the Merkle tree leaf.

  • Merkle Tree Construction: The Merkle root is derived from these 1024 leaf hashes.

  • Unique Blobber Merkle Roots: Since each blobber stores different shards, their Merkle roots differ.

3. Challenge Issuance from Smart Contract

Challenges ensure file integrity and availability by periodically verifying blobber-stored data. Here is how challenges are issued.

  1. Block Count Tracking

    • Each uploaded file has num_of_blocks, representing the total 64 KB chunks stored per blobber.

    • A directory's num_of_blocks is the sum of its children's num_of_blocks.

  2. Random Block Selection for Challenge

    • The network selects a random block_num using:

      block_num = rand.NewSource(random_seed).Int63n(total_num_of_blocks)
    • The random seed is generated by the blockchain.

  3. Fetching Object Path

    • The challenge queries GetObjectPath() to find:

      • File reference

      • Merkle root

      • Write markers history

  4. Blobber’s Response to Challenge

    • Check if a challenge exists for the blobber.

    • Retrieve the required block.

    • Recalculate the Merkle tree.

    • Submit verification proof to the validator.

  5. Validation and Rewards

    • Validators verify the challenge response.

    • If successful:

      • Blobber earns rewards from the challenge pool.

    • If failed:

      • Stake is penalized (if blobber fails multiple challenges).

4. Observations & Optimization Strategies

To improve efficiency, dynamic configurations can be considered:

File Size

Block Size

Merkle Chunk Size

Optimizations

< 1 KB

1 KB

1 Byte

Fewer Merkle leaves

> 1 GB

50 MB

50 KB

Reduced computational load

By dynamically adjusting chunk sizes, the system can optimize performance based on:

  • File size

  • Data-shards count

  • Token economy impact

  • Computational efficiency

Last updated