Skip to content

Conversation

@kba
Copy link
Member

@kba kba commented May 3, 2022

contains #314 (but with ocrd_cis in the main venv), #315, #316 plus the regular updates to the processor projects.

stweil and others added 11 commits April 27, 2022 09:48
Move ocrd_detectron2 to headless-tf1 to avoid conflicts in the
main virtual environment.

Signed-off-by: Stefan Weil <[email protected]>
Move ocrd_cis to headless-tf1 to avoid a conflict with ocrd_calamari.

Signed-off-by: Stefan Weil <[email protected]>
Makefile uses GNU parallel semaphores not only for git but also for pip,
but the old rule only cleaned the former ones.

Signed-off-by: Stefan Weil <[email protected]>
Signed-off-by: Stefan Weil <[email protected]>
Copy link
Collaborator

@stweil stweil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@stweil
Copy link
Collaborator

stweil commented May 3, 2022

CircleCI fails again because https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 is broken. That's unrelated to the changes here, and all we can do is wait and hope that it will work again soon. Maybe repeating the check is sufficient.

@kba
Copy link
Member Author

kba commented May 3, 2022

CircleCI fails again because https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 is broken. That's unrelated to the changes here, and all we can do is wait and hope that it will work again soon. Maybe repeating the check is sufficient.

Yeah, I am debugging this right now. The docker image for the CUDA docker images have been updated yesterday, I assume with a fix to the issue with the signatures. However, our ocrd/core-cuda image needs rebuilding, which currently fails

apt-get -y install --no-install-recommends cuda-runtime-10-0 cuda-runtime-10-1 cuda-runtime-10-2 cuda-runtime-11-0 cuda-runtime-11-1 cuda-runtime-11-3 libcudnn7
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package libcudnn7

@stweil
Copy link
Collaborator

stweil commented May 3, 2022

@kba, it is fixed by running curl https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub | sudo apt-key add -.

@kba
Copy link
Member Author

kba commented May 3, 2022

curl https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub | sudo apt-key add -.

The problem is that we need to fix it in the ocrd/core-cuda image, which I am trying to rebuild. That is in turn based on the updated nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu18.04, which does not have the GPG key problem but a new one:

apt-get -y install --no-install-recommends cuda-runtime-10-0 cuda-runtime-10-1 cuda-runtime-10-2 cuda-runtime-11-0 cuda-runtime-11-1 cuda-runtime-11-3 libcudnn7             
Reading package lists...                                                                                                                                    
Building dependency tree...                                                                                                                                                  
Reading state information...                                                                                                                                                 
E: Unable to locate package libcudnn7  

@stweil
Copy link
Collaborator

stweil commented May 3, 2022

That problem is gone as soon as you have installed the key from https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub and run apt update.

@kba
Copy link
Member Author

kba commented May 3, 2022

That problem is gone as soon as you have installed the key from https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub and run apt update.

As I said, that GPG key problem does not occur with the docker image, that has been fixed upstream AFAICT. However, this updated nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu18.04 does not contain libcudnn7 anymore, only libcudnn8. Neither @bertsky nor myself were sure why we even had that in. So I removed it, rebuilt and uploaded ocrd/core-cuda and restarted the build nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu18.04 and let's all cross fingers that this fixes the build and does not cause new problems 🤞

@kba
Copy link
Member Author

kba commented May 3, 2022

It failed again, but this time at the make check stage: https://app.circleci.com/pipelines/github/OCR-D/ocrd_all/794/workflows/a912a236-0452-43c5-b3e0-d67fa69c25c0/jobs/975 If anybody sees the problem, I'd appreciate any hints. Why is tensorflow 2.4.4 installed into the tf1 sub-venv?

@bertsky
Copy link
Collaborator

bertsky commented May 3, 2022

By updating sbb_binarization to its current head, you dragged in TF2 into headless-tf1.

@bertsky
Copy link
Collaborator

bertsky commented May 3, 2022

@kba I therefore suggest moving sbb_binarization and eynollah into top level venv (and removing their tf1nvidia recipe line).

@bertsky
Copy link
Collaborator

bertsky commented May 3, 2022

Alas,

tensorflow 2.5.0 has requirement tensorflow-estimator<2.6.0,>=2.5.0rc0, but you have tensorflow-estimator 2.4.0.

The problem again comes from the fixed (freezed / pip-tool generated) requirements.txt in ocrd_pc_segmentation. TF 2.6 requires tensorflow-estimator 2.6, but ocr4all-pixel-classifier insists on TF 2.5.

We have a few choices here:

  1. asking @crater2150 to update his ocr4all-pixel-classifier/requirements.in once again – in this case removing the upper end in
tensorflow >= 2.0.0, <= 2.5.0
  1. moving ocrd_pc_segmentation to its own dedicated venv (which will cost at least 2 GB extra, all the benefits of the recent changes go to waste)
  2. disabling ocrd_pc_segmentation for the time being, since AFAICT no-one has actually been using it in OCR-D (please correct me, @chreul @maxnth)

@cneud
Copy link
Member

cneud commented May 3, 2022

My prefered solution would be (3) for the benefit of being able to flexibly make a new ocrd_all release quickly without depending on ocrd_pc_segmentation. We can re-activate it later if we see the need or a fix has been implemented. The drawback is that we then also have to make this clear in the documentation (it seems to have only this mention though - and we have to remove sbb-textline-detector there also anyway).

@bertsky
Copy link
Collaborator

bertsky commented May 3, 2022

Okay, so just to see how we would fare with that choice in CI, I added a commit disabling ocrd_pc_segmentation for now. (We can drop it from the PR if it does not help or there is no consensus.)

Besides the WF guide, there's also a markdown checkbox to be unchecked in the README here.

@bertsky
Copy link
Collaborator

bertsky commented May 3, 2022

Ah, limitless joy – here comes another surprise from the TF trolls for us:

ImportError: cannot import name 'LayerNormalization'

(failing in ocrd-calamari-recognize, ocrd-anybaseocr-layout-analysis, ocrd-anybaseocr-tiseg, ocrd-eynollah-segment, ocrd-sbb-binarize)
I'll investigate ways out of this via SSH...

@bertsky
Copy link
Collaborator

bertsky commented May 3, 2022

I can immediately see multiple problems here:

  • eynollah and sbb_binarization depend on tensorflow-gpu (newest: v2.4.4), while the other TF2 modules use tensorflow (newest: v2.6.2). pip is still dumb-ass and allows both but cripples the variant that gets installed first.
  • sbb_binarization still constraints h5py < 3 which conflicts with h5py requirements for TF2 (and should only be relevant for TF1 IIRC)
  • ocrd_anybaseocr still depends on keras (v2.6.0), which should not be necessary in TF2

@stweil
Copy link
Collaborator

stweil commented May 3, 2022

  • sbb_binarization still constraints h5py < 3 which conflicts with h5py requirements for TF2 (and should only be relevant for TF1 IIRC)

That's the current show stopper, see my comment there.

@cneud
Copy link
Member

cneud commented May 3, 2022

  • eynollah and sbb_binarization depend on tensorflow-gpu (newest: v2.4.4), while the other TF2 modules use tensorflow (newest: v2.6.2). pip is still dumb-ass and allows both but cripples the variant that gets installed first.
  • sbb_binarization still constraints h5py < 3 which conflicts with h5py requirements for TF2 (and should only be relevant for TF1 IIRC)

Oops. We'll look into those asap.

@stweil
Copy link
Collaborator

stweil commented May 3, 2022

eynollah and sbb_binarization depend on tensorflow-gpu, while the other TF2 modules use tensorflow

It seems that meanwhile it is possible to install both tensorflow and tensorflow-gpu (with compatible version 2.x.y) from PyPI at the same time without any conflict, but I am not sure whether that is reasonable or makes a difference.

@stweil
Copy link
Collaborator

stweil commented May 3, 2022

Allthough it would be good if ocrd_pc_segmentation could relax the upper limit tensorflow<=2.5.0, we can currently build with tensorflow==2.5.0. So there is no need to remove it. I removed the commit which did that.

@stweil stweil force-pushed the update-2022-05-03 branch from 29bb998 to 520f101 Compare May 3, 2022 20:42
@bertsky
Copy link
Collaborator

bertsky commented May 3, 2022

eynollah and sbb_binarization depend on tensorflow-gpu, while the other TF2 modules use tensorflow

It seems that meanwhile it is possible to install both tensorflow and tensorflow-gpu (with compatible version 2.x.y) from PyPI at the same time without any conflict, but I am not sure whether that is reasonable or makes a difference.
Allthough it would be good if ocrd_pc_segmentation could relax the upper limit tensorflow<=2.5.0, we can currently build with tensorflow==2.5.0. So there is no need to remove it.

No, it does not – I just explained it above!

I removed the commit which did that.

...and one step backwards, again – I'm out.

@bertsky
Copy link
Collaborator

bertsky commented May 3, 2022

force-push removing while others are already at the job – @stweil!!!

@chreul
Copy link

chreul commented May 4, 2022

[...] ocr4all-pixel-classifier [...]
3. disabling ocrd_pc_segmentation for the time being, since AFAICT no-one has actually been using it in OCR-D (please correct me, @chreul @maxnth)

ocr4all-pixel-classifier is not directly related to ocr4all. we do not use ocrd_pc_segmentation and do not intend to do so in the future.

@stweil
Copy link
Collaborator

stweil commented May 4, 2022

I don't object removing OCR-D processors which nobody uses. Technically there is no need to remove ocrd_pc_segmentation.

@kba, I suggest to merge this PR – either with or without that processor.

@kba
Copy link
Member Author

kba commented May 4, 2022

I don't object removing OCR-D processors which nobody uses. Technically there is no need to remove ocrd_pc_segmentation.

@kba, I suggest to merge this PR – either with or without that processor.

On it.

@bertsky
Copy link
Collaborator

bertsky commented May 4, 2022

Technically there is no need to remove ocrd_pc_segmentation.

There is!

See above.

@kba kba merged commit b61f8a1 into master May 4, 2022
@stweil stweil deleted the update-2022-05-03 branch May 4, 2022 07:06
@stweil
Copy link
Collaborator

stweil commented May 4, 2022

All Docker builds in CircleCI now passed again. The longest run took 52 minutes, so was well below one hour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants