Blobber Repair Protocol

The Blobber Repair Protocol ensures that all blobbers within a decentralized storage allocation maintain a consistent and up-to-date version of stored data.

This process becomes necessary when:

A blobber misses a commit, leading to data inconsistencies.
A new blobber is added or an existing one is replaced in the storage allocation.

Decentralized storage uses Reed-Solomon erasure coding, which allows data recovery as long as at least data_shards number of blobbers possess the correct data. The repair process ensures that all blobbers sync to the same allocation root, maintaining data consistency and integrity across the network.

Problem Statement

Decentralized storage relies on independent blobbers to store erasure-encoded data. However, inconsistencies can arise due to:

Missed Commits: A blobber might fail to receive or store a data update, leading to incomplete or outdated data.
Blobber Addition or Replacement: New blobbers start with an empty or outdated state and need to catch up with the latest allocation root.
Data Integrity Concerns: Ensuring that all blobbers store the correct and latest version of data is crucial to preventing corruption or loss.

To resolve these challenges, a structured repair process is necessary to synchronize all blobbers to the same allocation root.

Repair Process

The repair process follows a structured approach to efficiently identify and synchronize missing or outdated data across blobbers.

1. Allocation Root Consensus

The client fetches the allocation roots from all participating blobbers.
Blobbers are grouped into sets based on their allocation roots.
The largest consistent set (with at least data_shards blobbers sharing the same allocation root) is considered the master set.
Blobbers not in the master set are identified as secondary blobbers requiring repair.

2. File Synchronization Using a Lead Blobber

A lead blobber is selected from each set to act as a representative.
The lead blobber generates a list of files using a paginated approach.
The client then runs a diff function to classify discrepancies:
- Missing Files: Files present in the master set but absent in secondary blobbers.
- Extra Files: Files stored by secondary blobbers but missing in the master set.
- Modified Files: Files with mismatched file hashes, indicating outdated versions that need updating.
Based on the differences, file operations are queued for synchronization.

3. Repair Execution

Batch processing is used for high-throughput repair operations.
Files requiring repair are downloaded from the master set and uploaded to secondary blobbers.
Pipelining is employed:
- Data is streamed directly from the master set to secondary blobbers, reducing disk writes and improving performance.
The repair process iterates through all files until the dataset is fully synchronized.

4. Ensuring Synchronization

Once the repair operations are completed, the allocation root is updated for all blobbers.
The client re-verifies that all blobbers now share the same allocation root.
This guarantees that all blobbers in the allocation are fully synchronized, eliminating inconsistencies and ensuring data integrity.

Security & Data Integrity Measures

Merkle Proofs are used to verify file integrity post-repair. The final allocation root hash is validated against the expected consensus hash. If a repaired blobber fails to match the expected state, further verification or re-repair is triggered.

The Blobber Repair Protocol is a critical component of decentralized storage, ensuring data consistency, integrity, and availability.

By using allocation root consensus, parallel processing, and direct streaming, the repair process ensures that all blobbers remain synchronized, even in the face of failures, updates, or missing commits.

PreviousTwo Commit NextZS3 Server

Last updated 4 months ago