doublecheck
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| doublecheck [2023/11/08 12:33] – [Preliminary double-check done already] mino | doublecheck [2024/05/13 18:10] (current) – [The thorough double-check in progress] mino | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ==== Preliminary double-check done already ==== | ==== Preliminary double-check done already ==== | ||
| - | Computing resources available in clouds are diverse in their performance, | + | Computing resources available in clouds are diverse in their performance, |
| * Every time a new instance created a benchmark program which counts a small subset was executed and the time and the result was recorded. If the answer was wrong, that resource was never used. I actually saw a few such resources and it is noteworthy that some versions of docker environment with multiple RTX-4090s produced wrong answers due to a failure of inter-GPU atomic transactions. I avoided multiple 4090s working together and used them separately instead. | * Every time a new instance created a benchmark program which counts a small subset was executed and the time and the result was recorded. If the answer was wrong, that resource was never used. I actually saw a few such resources and it is noteworthy that some versions of docker environment with multiple RTX-4090s produced wrong answers due to a failure of inter-GPU atomic transactions. I avoided multiple 4090s working together and used them separately instead. | ||
| - | * As a postmortem verification, | + | * As a postmortem verification, |
| | | ||
| ==== The thorough double-check in progress ==== | ==== The thorough double-check in progress ==== | ||
| - | If every subtotal is calculated twice and the two results match, the counts should be considered correct (provided the code is correct). The re-counting is in progress and has been done up to 10% as of 2023/11/07. | + | If every subtotal is calculated twice and the two results match, the counts should be considered correct (provided the code is correct). The re-counting is in progress and is 70% completed |
| - | ==== Errors Found (updated on 2023.09.07) | + | ==== Errors Found ==== |
| - | During the thorough double-check, | + | === updated on 2024.02.17 === |
| - | Correcting these errors increased | + | Another erroneous instance was found. It ran with an RTX-4090 for about one month and produced about 19,000 sub-subtotals. Out of those sub-subtotals 6 were incorrect. All of the incorrect results were produced in the last one hour of the lifetime of the instance. After the erroneous behavior, the GPU of the instance became unusable with an error message of " |
| - | While these errors have not damaged my confidence in the logic and the code I used, it is possible that errors | + | As the result |
| + | === updated on 2023.09.07 === | ||
| + | During the thorough double-check, | ||
| + | |||
| + | As the result of the correction, the number increased by 960(40x24). | ||
| + | |||
| + | While these errors have not damaged my confidence in the logic and the code used in the calculation, | ||
doublecheck.1699414411.txt.gz · Last modified: 2023/11/08 12:33 by mino
