Chaos Testing

Performing chaos testing on the testnet network.

It's about how to make sharder/miners/blobbers/validators/authorizers down(stop) and up(start). Updated chaos script is available here --> https://github.com/0chain/0helm/blob/staging/values/testnet/common/chaostest.sh

Chaos script setup

  • node_ip --> server public ip's that are participating in network.

    # Node ips to run chaos testing
    node_ip=("65.108.0.162" "116.202.51.26" "65.21.193.99" "157.90.0.215" "168.119.148.69" "65.21.199.153" "65.21.143.178" "65.21.143.177" "65.21.237.179" "65.108.15.39" "65.108.6.184" "65.21.237.91" "65.21.237.189" "65.21.233.118" "65.108.4.213" "65.108.13.224" "85.10.209.249" "65.108.13.216" "157.90.1.10" "65.108.15.41" "65.108.15.40" "46.4.94.241" "23.88.66.180" "23.88.66.178" "94.130.163.35" "168.119.79.45" "157.90.32.234" "168.119.78.209" "168.119.5.25")

  • progress_timer() --> is used to set the time gap between making services down(stop) and up(start). Time gap can be increased/decreased by setting the secs variable.

    #To display the time count of 5 minutes.
    progress_timer() {
        secs=$((5 * 60))
        echo -ne "Time in seconds to wait $secs\033[0K\r"
        while [ $secs -gt 0 ]; do
            # echo -ne "wait $secs\033[0K\r"
            sleep 1
            : $((secs--))
        done
    }

  • for loop_count in {1..18}; do --> this line decides how long should chaos run. According to the above progress_timer() function single iteration of loop takes 10+ minutes.

  • rand=($(shuf -i 0-29 -n 8)) --> here we are randomly selecting 8 servers out of 30 servers. If we want to select more/less servers, instead of 8 we can specify custom number but it should be less than 30.

  • ssh -t "${node_ip[$node]}" "docker stop sharder1_cassandra_1 sharder1_postgres_1 sharder1_sharder_1 miner1_redis_1 miner1_redis_txns_1 miner1_miner_1 miner2_redis_1 miner2_redis_txns_1 miner2_miner_1 miner3_redis_1 miner3_redis_txns_1 miner3_miner_1 blobber01_postgres_1 blobber01_validator_1 blobber01_blobber_1" --> doing ssh into the servers to scale down the sharder, miner etc.

    #Scale down and up miner/sharders one by one randomly.
    for loop_count in {1..18}; do
        rand=($(shuf -i 0-29 -n 8))
        for node in ${rand[@]}; do
            echo -e "\n \e[93m ===================================== Putting down miners/sharders on node ${node_ip[$node]}. ======================================  \e[39m"
            ssh -t "${node_ip[$node]}" "docker stop sharder1_cassandra_1 sharder1_postgres_1 sharder1_sharder_1 miner1_redis_1 miner1_redis_txns_1 miner1_miner_1 miner2_redis_1 miner2_redis_txns_1 miner2_miner_1 miner3_redis_1 miner3_redis_txns_1 miner3_miner_1 blobber01_postgres_1 blobber01_validator_1 blobber01_blobber_1"
        done

  • ssh -t "${node_ip[$node]}" "docker start sharder1_cassandra_1 sharder1_postgres_1 sharder1_sharder_1 miner1_redis_1 miner1_redis_txns_1 miner1_miner_1 miner2_redis_1 miner2_redis_txns_1 miner2_miner_1 miner3_redis_1 miner3_redis_txns_1 miner3_miner_1 blobber01_postgres_1 blobber01_validator_1 blobber01_blobber_1" --> doing ssh into the servers to scale up the sharder, miner etc.

    #Scale down and up miner/sharders one by one randomly.
    for loop_count in {1..18}; do
        rand=($(shuf -i 0-29 -n 8))
        for node in ${rand[@]}; do
            echo -e "\n \e[93m ===================================== Putting down miners/sharders on node ${node_ip[$node]}. ======================================  \e[39m"
            ssh -t "${node_ip[$node]}" "docker start sharder1_cassandra_1 sharder1_postgres_1 sharder1_sharder_1 miner1_redis_1 miner1_redis_txns_1 miner1_miner_1 miner2_redis_1 miner2_redis_txns_1 miner2_miner_1 miner3_redis_1 miner3_redis_txns_1 miner3_miner_1 blobber01_postgres_1 blobber01_validator_1 blobber01_blobber_1"
        done

CICD & Gitactions

Gitactions cron job to start chaos is created to schedule in every 4 hr. It can be manually triggered too if needed. Link to cronjob is --> https://github.com/0chain/0helm/blob/staging/.github/workflows/choas_test_testnetset.yaml NOTE: If something is changed into the chaos script, chaos gitactions should be manually triggered first from staging branch so that cron job picks up your latest changes on next scheduled.

Enable chaos tests along with network deployment.

Enable chaos manually or via cronjob

In case testnet redeployment is already running & chaos cron job also gets schedule, it will wait until testnet gets deployed along with chaos(if it is enabled during redeployment)

Cronjob can also be triggered manually as shown below. Use staging branch to start the loadtest manually.

Last updated