User Tools

Site Tools


crosscheck

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
crosscheck [2023/09/06 23:22] – [The full cross-check in progress] minocrosscheck [Unknown date] (current) – removed - external edit (Unknown date) 127.0.0.1
Line 1: Line 1:
-==== A moderate cross-check done already ====== 
  
-Computing resources available in clouds are diverse in their performance, price and lifetime. To save cost, I had to continuously seek inexpensive resources and create instances in daily basis. The total number of instances ever created exceeded five hundreds. I did the following check as a minimal verification of the resouces: 
- 
-  * Every time a new instance created a benchmark program which counts a small subset was executed and the time and the result was recorded. If the answer was wrong, that resource was never used. I actually saw a few such resources and it is noteworthy that some versions of docker environment with multiple RTX-4090s produced wrong answers due to a failure of inter-GPU atomic transactions. I avoided multiple 4090s working together and used them separately instead. 
-  * As a postmortem verification, independent re-counting have been done for every instance at a sampling rate of once per day. If any wrong results were found, all results that resource produced were considered unreliable. 
-   
-==== The full cross-check in progress ==== 
- 
-If every subtotal is calculated twice and the two results match, the counts should be considered correct (provided the code is correct). The re-counting is in progress and is done up to 8% as of 2023/09/05. 
- 
-==== Errors Found (updated on 2023.09.05) ==== 
- 
-During the thorough cross-check, it was discovered that a portion of the results generated by an instance was incorrect. The instance ran with two RTX-4090s for 60 hours and generated 3,771 sub-subtotals. Out of the 3,771 sub-subtotals only 11 was incorrect and these errors occurred sporadically over a period of 5.5 hours. All incorrect results were generated by only one of the two RTX-4090s. It is unlikely that these errors are due to logical flaws or coding mistakes. Hardware defects or instability are the most probable causes. 
- 
-Correcting these errors increased the number by 864(36x24). 
- 
-While these errors have not damaged my confidence in the logic and the code I used, it is possible that other errors of similar nature may still be contained in the result. Therefore, the results should be considered unconfirmed until the thorough cross-check will be completed. 
- 
-  
crosscheck.1694010154.txt.gz · Last modified: 2023/09/06 23:22 by mino

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki