Hi @hsnyder.

Sorry for a delay with answering to your previous post. I’ve reinstalled OS (now we use Ubuntu 18 LTS) and also updated the CUDA to the newest version (11.4). Unfortunately, things are still going badly. We’ve tried to make many Ab-initio reconstructions and all of them failed with " ====== Job process terminated abnormally." message. Below I paste last 30 lines of two of them:

1st log:

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

========= sending heartbeat

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

========= sending heartbeat

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty

========= main process now complete.

========= monitor process now complete.

2nd log:

/home/michal/Apps/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log

return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax

/home/michal/Apps/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:890: RuntimeWarning: invalid value encountered in true_divide

frc[k, :copylen] = (AB / n.sqrt(AA*BB))[:copylen]*

/home/michal/Apps/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log

return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax

/home/michal/Apps/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:890: RuntimeWarning: invalid value encountered in true_divide

frc[k, :copylen] = (AB / n.sqrt(AABB))[:copylen]

/home/michal/Apps/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log

return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax

/home/michal/Apps/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:890: RuntimeWarning: invalid value encountered in true_divide

frc[k, :copylen] = (AB / n.sqrt(AA*BB))[:copylen]*

/home/michal/Apps/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log

return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax

/home/michal/Apps/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:890: RuntimeWarning: invalid value encountered in true_divide

frc[k, :copylen] = (AB / n.sqrt(AABB))[:copylen]

/home/michal/Apps/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log

return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax

/home/michal/Apps/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:890: RuntimeWarning: invalid value encountered in true_divide

frc[k, :copylen] = (AB / n.sqrt(AA*BB))[:copylen]*

/home/michal/Apps/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log

return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax

/home/michal/Apps/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:890: RuntimeWarning: invalid value encountered in true_divide

frc[k, :copylen] = (AB / n.sqrt(AABB))[:copylen]

/home/michal/Apps/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log

return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax

/home/michal/Apps/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:890: RuntimeWarning: invalid value encountered in true_divide

frc[k, :copylen] = (AB / n.sqrt(AA*BB))[:copylen]

========= main process now complete.

========= monitor process now complete.

As you can see warnings are different at each time despite the data is the same. What is more we are still have many troubles when working on different jobs (on machine with GTX all of them are going smoothly without any errors) - please see attached print scr.

We’ve also noticed that computations made on RTX 3090 are surprisingly slower in comparison to the GTX 1070.

Marcel