User Tools

Site Tools


crosscheck

This is an old revision of the document!


A moderate cross-check done already

Computing resources available in clouds are diverse in their performance, price and lifetime. To save cost, I had to continuously seek inexpensive resources and create instances in daily basis. The total number of instances ever created exceeded five hundreds. I did the following check as a minimal verification of the resouces:

  • Every time a new instance created a benchmark program which counts a small subset was executed and the time and the result was recorded. If the answer was wrong, that resource was never used. I actually saw a few such resources and it is noteworthy that some versions of docker environment with multiple RTX-4090s produced wrong answers due to a failure of inter-GPU atomic transactions. I avoided multiple 4090s working together and used them separately instead.
  • As a postmortem verification, independent re-counting have been done for every instance at a sampling rate of once per day. If any wrong results were found, all results that resource produced were considered unreliable.

The full cross-check in progress

If every subtotal is calculated twice and the two results match, the counts should be considered correct (provided the code is correct). The re-counting is in progress and is done up to 8% as of 2023/09/05.

Errors Found (updated on 2023.09.05)

During the thorough cross-check, it was discovered that a portion of the results generated by an instance was incorrect. The instance ran with two RTX-4090s for 60 hours and generated 3,771 sub-subtotals. Out of the 3,771 sub-subtotals only 11 was incorrect and these errors occurred sporadically over a period of 5.5 hours. All incorrect results were generated by only one of the two RTX-4090s. It is unlikely that these errors are due to logical flaws or coding mistakes. Hardware defects or instability are the most probable causes.

Correcting these errors increased the number by 864(36×24).

While these errors have not damaged my confidence in the logic and the code I used, it is possible that other errors of similar nature may still be contained in the result. Therefore, the results should be considered unconfirmed until the thorough cross-check will be completed.

crosscheck.1694010154.txt.gz ยท Last modified: 2023/09/06 23:22 by mino

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki