Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ubi task not starting #9

Closed
ThomasBlock opened this issue Jan 30, 2024 · 14 comments
Closed

ubi task not starting #9

ThomasBlock opened this issue Jan 30, 2024 · 14 comments

Comments

@ThomasBlock
Copy link

I managed to get computing-provider running. But the ubi tasks are not starting. What might be missing?

My node setup:
Swan1 = go-computing-provider, public ip address, kubernetes, no gpu
Swan2 = kubernetes, gpu A4000
Swan3 = kubernetes, gpu RTX 4060TI ( not in official support list? )

computing-provider ubi-task list 
TASK ID	TASK TYPE	ZK TYPE    	TRANSACTION HASH	STATUS 	REWARD	CREATE TIME         
42     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-29 07:23:26	
61     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-29 20:04:07	
66     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-29 22:04:08	
69     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 00:04:07	
72     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 02:04:07	
75     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 04:04:07	
78     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 06:04:07	
81     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 08:04:07	
84     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 10:04:07

one idea might be that i have too many cpu spaces - so what i dont have enough resources to do ubi task. but i checked this and also killed some tasks:

computing-provider task list
TASK UUID    	TASK TYPE	WALLET ADDRESS	SPACE UUID   	SPACE NAME       	STATUS  
...4175204b63	CPU      	0x273...e40F3 	...7e1249bd88	Finder           	Running	
...12ac3e540d	CPU      	0xB87...E405f 	...adc3915681	NothnqDego       	Running	
...93e2992f27	CPU      	0x195...554d1 	...3b6016cfed	pac-manTR        	Running	
...0472b892e9	CPU      	0x195...554d1 	...27c6305c31	Protext          	Running	
...66a59f77cc	CPU      	0x78A...579f3 	...9499b09b96	Scorpius97-Pacman	Running	
...e1eebd130d	CPU      	0x195...554d1 	...2c1157e6a0	Protektoria1     	Running	
...14d33dd1c1	CPU      	0x0Ad...54229 	...360a898ee2	Justinhulu       	Running	
...9cf95fb8bb	CPU      	0x8F3...bd69f 	...c53221a17f	Kikisweet        	Running	
...ebae28699e	CPU      	0x273...e40F3 	...a886138565	shape            	Running	
...71b5077678	CPU      	0x05E...d3C31 	...9435a5f8c7	SvoRa            	Running	
...d9aafe3005	CPU      	0x7BD...431B8 	...bc703ce7d8	bunalisebastian  	Running	
...99e344fcfc	CPU      	0x0Ad...54229 	...0169c4c478	Robbie122        	Running	

here my config

cat ~/cp/config.toml 
[API]
UbiTask = true 
Port = 8085
MultiAddress = "/ip4/XXX/tcp/8085" 
Domain = "XXX"
NodeName = "XXX"
RedisUrl = "redis://127.0.0.1:6379"
RedisPassword = ""
[UBI]
UbiTask = true 
UbiEnginePk = "0xB5aeb540B4895cd024c1625E146684940A849ED9"
UbiUrl ="https://ubi-task.swanchain.io/v1"  
[LOG]
CrtFile = "/home/user/ssl/server.crt"
KeyFile = "/home/user/ssl/server.key"
[HUB]
ServerUrl = "https://orchestrator-api.swanchain.io"
AccessToken = "XXX"
WalletAddress = "XXX"
BalanceThreshold= 0.3
[MCS]
ApiKey = "XXX"
BucketName = "XXX"
Network = "polygon.mumbai" 
FileCachePath = "/tmp"
[Registry]
ServerAddress = "192.168.128.71:5000" 
UserName = ""
Password = ""
[RPC]
SWAN_TESTNET ="https://saturn-rpc.swanchain.io"    
SWAN_MAINNET= ""                                   
[CONTRACT]
SWAN_CONTRACT="0x91B25A65b295F0405552A4bbB77879ab5e38166c"
SWAN_COLLATERAL_CONTRACT="0xB8D9744b46C1ABbd02D62a7eebF193d83965ba39" 
cat fil-c2.env 
FIL_PROOFS_PARAMETER_CACHE="/var/tmp/filecoin-proof-parameters"
RUST_GPU_TOOLS_CUSTOM_GPU="NVIDIA RTX A4000:6144" #Shading Units

ls /var/tmp/filecoin-proof-parameters
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.params
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.vk
...

here the logs regarding ubi:

[GIN] 2024/01/30 - 10:04:07 | 200 |  312.427145ms |   38.104.153.43 | GET      "/api/v1/computing/cp"
time="2024-01-30 10:04:07.711" level=info msg="receive ubi task received: {ID:84 Name:1000-0-7-170 Type:1 ZkType:fil-c2-512M InputParam:https://286cb2c989.acl.multichain.storage/ipfs/QmVnWcUMh4X8YG86BJqd7icf6bEdqAtrU48eMk2XuVH3jo Signature:0xe19f3fb145e581f6f169b5176292a7c8f394a1ea996cc8c2f1f0e8bb13f7556673dd9b06d8b7b3788bd6e69cb9b99692833c55f92
0d765e941cf50465321808c01 Resource:0xc000d5c880}" func=DoUbiTask file="cp_service.go:546"
time="2024-01-30 10:04:07.711" level=info msg="ubi task sign verifing, task_id: 84,  type: fil-c2-512M, verify: true" func=DoUbiTask file="cp_service.go:585"
time="2024-01-30 10:04:08.253" level=error msg="nodeName: %s, error: %+vswan1invalid character '.' looking for beginning of value" func=GetNodeGpuSummary file="k8s_service.go:527"
time="2024-01-30 10:04:08.253" level=info msg="checkResourceAvailableForUbi: needCpu: 1, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1257"
time="2024-01-30 10:04:08.253" level=info msg="checkResourceAvailableForUbi: remainderCpu: -2, remainderMemory: 0.00, remainderStorage: 293.00" func=checkResourceAvailableForUbi file="cp_service.go:1258"
time="2024-01-30 10:04:08.253" level=info msg="checkResourceAvailableForUbi: needCpu: 1, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1257"
time="2024-01-30 10:04:08.253" level=info msg="checkResourceAvailableForUbi: remainderCpu: 1, remainderMemory: 43.00, remainderStorage: 1413.00" func=checkResourceAvailableForUbi file="cp_service.go:1258"
time="2024-01-30 10:04:08.253" level=info msg="checkResourceAvailableForUbi: needCpu: 1, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1257"
time="2024-01-30 10:04:08.253" level=info msg="checkResourceAvailableForUbi: remainderCpu: 5, remainderMemory: 47.00, remainderStorage: 1413.00" func=checkResourceAvailableForUbi file="cp_service.go:1258"
time="2024-01-30 10:04:08.253" level=info msg="gpuName: NVIDIA-A4000, nodeGpu: map[:0 kubernetes.io/os:0], nodeGpuSummary: map[swan2:map[NVIDIA-A4000:1] swan3:map[NVIDIA-4060-Ti:1]]" func=checkResourceAvailableForUbi file="cp_service.go:1269"
time="2024-01-30 10:04:08.254" level=warning msg="ubi task id: 84, type: GPU, not found a resources available" func=DoUbiTask file="cp_service.go:639"
[GIN-debug] [WARNING] Headers were already written. Wanted to override status code 500 with 200
[GIN] 2024/01/30 - 10:04:08 | 500 |  543.238653ms |   38.104.153.43 | POST     "/api/v1/computing/cp/ubi"

There are no pods created for ubi

kubectl get po -A
NAMESPACE                                       NAME                                                           READY   STATUS             RESTARTS          AGE
ingress-nginx                                   ingress-nginx-admission-create-mh72w                           0/1     Completed          0                 5d23h
ingress-nginx                                   ingress-nginx-admission-patch-2d9rc                            0/1     Completed          0                 5d23h
ingress-nginx                                   ingress-nginx-controller-7fcc98f6bc-5phzb                      1/1     Running            1 (3d1h ago)      5d23h
kube-system                                     calico-kube-controllers-74d5f9d7bb-p2t7w                       1/1     Running            2 (33h ago)       6d
kube-system                                     calico-node-5cns9                                              1/1     Running            4 (16h ago)       23h
kube-system                                     calico-node-99rh4                                              1/1     Running            5 (16h ago)       6d
kube-system                                     calico-node-fbxhw                                              1/1     Running            1 (3d1h ago)      6d
kube-system                                     coredns-5dd5756b68-bvj46                                       1/1     Running            1 (3d1h ago)      6d1h
kube-system                                     coredns-5dd5756b68-sss5q                                       1/1     Running            1 (3d1h ago)      6d1h
kube-system                                     etcd-swan1                                                     1/1     Running            3 (3d1h ago)      6d1h
kube-system                                     kube-apiserver-swan1                                           1/1     Running            4 (33h ago)       6d1h
kube-system                                     kube-controller-manager-swan1                                  1/1     Running            12 (5m3s ago)     6d1h
kube-system                                     kube-proxy-lq975                                               1/1     Running            3 (3d1h ago)      6d1h
kube-system                                     kube-proxy-nttzs                                               1/1     Running            6 (16h ago)       6d
kube-system                                     kube-proxy-pz5g8                                               1/1     Running            4 (16h ago)       23h
kube-system                                     kube-scheduler-swan1                                           1/1     Running            13 (5m2s ago)     6d1h
kube-system                                     nvidia-device-plugin-daemonset-2wgbt                           1/1     Running            1 (3d1h ago)      6d
kube-system                                     nvidia-device-plugin-daemonset-df6lz                           1/1     Running            7 (17h ago)       23h
kube-system                                     nvidia-device-plugin-daemonset-shzdv                           1/1     Running            4 (16h ago)       5d23h
kube-system                                     resource-exporter-ds-fs5b5                                     1/1     Running            4 (16h ago)       4d23h
kube-system                                     resource-exporter-ds-mzgn7                                     0/1     CrashLoopBackOff   553 (4m54s ago)   46h
kube-system                                     resource-exporter-ds-whwkj                                     1/1     Running            92 (17h ago)      23h
ns-0x05eecd336633a443a5679e47797374fbb4cd3c31   deploy-fe29213a-f795-43d4-a7f6-6f9435a5f8c7-585c5f6854-tk84l   1/1     Running            0                 11h
ns-0x0ad8a3fdd123ef21ccccb6433bc555f67f154229   deploy-58b4b3cd-bcd7-4079-8b3c-7e360a898ee2-569f8d94c4-czn9m   1/1     Running            0                 16h
ns-0x0ad8a3fdd123ef21ccccb6433bc555f67f154229   deploy-74a62385-7b95-4d6d-9054-830169c4c478-7b9d7f6cd-bcgmw    1/1     Running            0                 16h
ns-0x195de990f6c8930194dd62ac21ceee04cf8554d1   deploy-0c103c54-e988-4112-afa8-933b6016cfed-7c5f959474-l5m5z   1/1     Running            0                 16h
ns-0x195de990f6c8930194dd62ac21ceee04cf8554d1   deploy-18a93794-f3a1-448b-ad3d-0927c6305c31-57c59c6856-vtv6d   1/1     Running            0                 16h
ns-0x195de990f6c8930194dd62ac21ceee04cf8554d1   deploy-3b84bed5-d2a2-400f-bd5b-972c1157e6a0-7c9cfb46d4-r89dr   1/1     Running            0                 16h
ns-0x2733c8521c1b80939415bf521775769cdabe40f3   deploy-70f77042-33e6-4e55-a3d6-ca7e1249bd88-69f5c56558-9n6ks   1/1     Running            0                 10h
ns-0x2733c8521c1b80939415bf521775769cdabe40f3   deploy-ed283275-cbfd-4248-a101-59a886138565-784856b97b-gn88p   1/1     Running            0                 10h
ns-0x3aa50e86b3ac589bf3a9b9d3f90bb6801611e8ed   deploy-994339f1-ff37-453c-8234-5df2f8e732a3-7d8999c64b-fgmcd   1/1     Running            0                 5m53s
ns-0x78a170be72f00f8a49538bc9895377984a8579f3   deploy-bd3691ee-c404-4973-8326-919499b09b96-7b4f6cb7cb-9flhd   1/1     Running            0                 51m
ns-0x7bdd1675943d8980facd61bb1253789a806431b8   deploy-96ea9115-4ac6-477e-984e-80bc703ce7d8-55f689c96c-88cfs   1/1     Running            0                 15h
ns-0x8f3d04858ba5da1f18500be92ce74fb2a61bd69f   deploy-5a601a1d-dacf-4b42-bd64-98c53221a17f-867c49d6f7-hpp8t   1/1     Running            0                 15h
ns-0xb87a6b7ed42a331cc4ba85df42063668cdfe405f   deploy-e2ba5400-1983-4062-a429-86adc3915681-68dffd47db-8w7ph   1/1     Running            0                 16h
tigera-operator                                 tigera-operator-94d7f7696-kgx6q                                1/1     Running            28 (5m3s ago)     6d

@Normalnoise
Copy link
Collaborator

can you provide file list in /var/tmp/filecoin-proof-parameters?

@ThomasBlock
Copy link
Author

can you provide file list in /var/tmp/filecoin-proof-parameters?

yes - i downloaded it with filecoin is that okay? ( see swanchain/ubi-benchmark#1 )

ls /var/tmp/filecoin-proof-parameters
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.params
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.vk
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-92180959e1918d26350b8e6cfe217bbdd0a2d8de51ebec269078b364b715ad63.vk
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-fb9e095bebdd77511c0269b967b4d87ba8b8a525edaa0e165de23ba454510194.vk
v28-empty-sector-update-merkletree-poseidon_hasher-8-8-0-3b7f44a9362e3985369454947bc94022e118211e49fd672d52bec1cbfd599d18.params
v28-empty-sector-update-merkletree-poseidon_hasher-8-8-0-3b7f44a9362e3985369454947bc94022e118211e49fd672d52bec1cbfd599d18.vk
v28-empty-sector-update-merkletree-poseidon_hasher-8-8-2-102e1444a7e9a97ebf1e3d6855dcc77e66c011ea66f936d9b2c508f87f2f83a7.vk
v28-fil-inner-product-v1.srs
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0170db1f394b35d995252228ee359194b13199d259380541dc529fb0099096b0.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0cfb4f178bbb71cf2ecfcd42accce558b27199ab4fb59cb78f2483fe21ef36d9.params
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0cfb4f178bbb71cf2ecfcd42accce558b27199ab4fb59cb78f2483fe21ef36d9.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-3ea05428c9d11689f23529cde32fd30aabd50f7d2c93657c1d3650bca3e8ea9e.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-50c7368dea9593ed0989e70974d28024efa9d156d585b7eea1be22b2e753f331.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-5294475db5237a2e83c3e52fd6c2b03859a1831d45ed08c4f35dbf9a803165a9.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-7d739b8cf60f1b0709eeebee7730e297683552e4b69cab6984ec0285663c5781.params
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-7d739b8cf60f1b0709eeebee7730e297683552e4b69cab6984ec0285663c5781.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-0377ded656c6f524f1618760bffe4e0a1c51d5a70c4509eedae8a27555733edc.params
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-0377ded656c6f524f1618760bffe4e0a1c51d5a70c4509eedae8a27555733edc.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-559e581f022bb4e4ec6e719e563bf0e026ad6de42e56c18714a2c692b1b88d7e.params
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-559e581f022bb4e4ec6e719e563bf0e026ad6de42e56c18714a2c692b1b88d7e.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-2-2627e4006b67f99cef990c0a47d5426cb7ab0a0ad58fc1061547bf2d28b09def.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-2-b62098629d07946e9028127e70295ed996fe3ed25b0f9f88eb610a0ab4385a3c.vk
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-032d3138d22506ec0082ed72b2dcba18df18477904e35bafee82b3793b06832f.vk
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.params
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-ecd683648512ab1765faa2a5f14bab48f676e633467f0aa8aad4b55dcb0652bb.vk
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-0-sha256_hasher-82a357d2f2ca81dc61bb45f4a762807aedee1b0a53fd6c4e77b46a01bfef7820.params
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-0-sha256_hasher-82a357d2f2ca81dc61bb45f4a762807aedee1b0a53fd6c4e77b46a01bfef7820.vk
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-2-sha256_hasher-96f1b4a04c5c51e4759bbf224bbc2ef5a42c7100f16ec0637123f16a845ddfb2.vk

@Normalnoise
Copy link
Collaborator

can you provide your computing-provider version?

@ThomasBlock
Copy link
Author

ThomasBlock commented Jan 30, 2024

can you provide your computing-provider version?

i compiled it

VERSION:
0.4.1+git.0067c20

edit:
git checkout fea-ubi-task

0.4.1+git.428777c

@sonic-chain
Copy link

You can test whether the ubi-task environment is installed correctly according to the following document:
https://docs.swanchain.io/orchestrator/as-a-computing-provider/computing-provider-setup/faq#q-how-can-i-verify-if-my-computing-provider-is-set-up-to-receive-ubi-tasks.
While the container is running, you can view the pod logs to troubleshoot errors.

@ThomasBlock
Copy link
Author

You can test whether the ubi-task environment is installed correctly according to the following document: https://docs.swanchain.io/orchestrator/as-a-computing-provider/computing-provider-setup/faq#q-how-can-i-verify-if-my-computing-provider-is-set-up-to-receive-ubi-tasks. While the container is running, you can view the pod logs to troubleshoot errors.

Yes thank you for the feedback. This somehow works.. The pod is created and finshes.. it was quite fastly deleted, so i could no longer read logs.. but in the task list, it still seems unfinished ( and is still labeled as a "CPU" task" )

curl -k --location --request POST 'https://***/api/v1/computing/cp/ubi' ...
{"status":"success","code":200,"data":"success"}
time="2024-01-30 12:42:05.157" level=info msg="receive ubi task received: {ID:1 Name:test-ubi Type:1 ZkType:fil-c2-512M InputParam:https://286cb2c989.acl.multichain.storage/ipfs/QmYg4CfA5E2zR4ktb5B3PafAeCWyEEXiKUVS4g2UE9occ5 Signature:0x13cb4547123ddc947aaebf9e4b2026fe1115390bbaa32f3579fe966fc1cc1cf05bc3e2d2516f86e65c370d879ad052805a6ea343fe7fed35d981c49870b12d3e01 Resource:0xc0007d2040}" func=DoUbiTask file="cp_service.go:547"
time="2024-01-30 12:42:05.158" level=info msg="ubi task sign verifing, task_id: 1,  type: fil-c2-512M, verify: true" func=DoUbiTask file="cp_service.go:586"
time="2024-01-30 12:42:05.812" level=error msg="nodeName: %s, error: %+vswan1invalid character '.' looking for beginning of value" func=StatisticalSources file="k8s_service.go:347"
time="2024-01-30 12:42:05.830" level=error msg="nodeName: %s, error: %+vswan1invalid character '.' looking for beginning of value" func=GetNodeGpuSummary file="k8s_service.go:527"
time="2024-01-30 12:42:05.831" level=info msg="checkResourceAvailableForUbi: needCpu: 2, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1269"
time="2024-01-30 12:42:05.831" level=info msg="checkResourceAvailableForUbi: remainderCpu: -4, remainderMemory: 12.00, remainderStorage: 293.00" func=checkResourceAvailableForUbi file="cp_service.go:1270"
time="2024-01-30 12:42:05.831" level=info msg="checkResourceAvailableForUbi: needCpu: 2, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1269"
time="2024-01-30 12:42:05.831" level=info msg="checkResourceAvailableForUbi: remainderCpu: 19, remainderMemory: 61.00, remainderStorage: 1588.00" func=checkResourceAvailableForUbi file="cp_service.go:1270"
time="2024-01-30 12:42:05.831" level=info msg="gpuName: NVIDIA-A4000, nodeGpu: map[:0 kubernetes.io/os:0], nodeGpuSummary: map[swan2:map[NVIDIA-A4000:1] swan3:map[NVIDIA-4060-Ti:1]]" func=checkResourceAvailableForUbi file="cp_service.go:1281"
[GIN] 2024/01/30 - 12:42:05 | 200 |  673.749131ms | 212.102.118.102 | POST     "/api/v1/computing/cp/ubi"
kubectl get po -A
NAMESPACE                                       NAME                                                           READY   STATUS             RESTARTS          AGE
...
ubi-task-1                                      fil-c2-512m-1-8cbzj                                            0/1     Completed          0                 76s
computing-provider ubi-task list 
TASK ID	TASK TYPE	ZK TYPE    	TRANSACTION HASH	STATUS 	REWARD	CREATE TIME         
...	
84     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 10:04:07	
1      	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 12:42:05

@ThomasBlock
Copy link
Author

ThomasBlock commented Jan 30, 2024

here the log

kubectl logs -f -n ubi-task-2 fil-c2-512m-2-vxz9j
2024-01-30T11:51:28.427Z	INFO	ubi-bench	ubi-bench/main.go:96	Starting ubi-bench
2024-01-30T11:51:28.427Z	INFO	ubi-bench	ubi-bench/main.go:565	json param file of c1: /var/tmp/fil-c2-param/test-ubi.json
2024-01-30T11:51:28.427Z	WARN	ubi-bench	ubi-bench/main.go:113	reading input file:
    main.glob..func4
        /opt/ubi-benchmark/cmd/ubi-bench/main.go:568
  - open /var/tmp/fil-c2-param/test-ubi.json: no such file or directory

to this request

--data-raw '{
    "id": 2,
    "name": "test-ubi",
    "type": 1,
    "zk_type": "fil-c2-512M",
    "input_param": "https://286cb2c989.acl.multichain.storage/ipfs/QmYg4CfA5E2zR4ktb5B3PafAeCWyEEXiKUVS4g2UE9occ5",
    "resource": {"cpu": "2", "gpu": "1", "memory": "5.00 GiB", "storage": "1.00 GiB"},
    "signature": "0x4d8d7efb7e77c8c0c7f8a92ee9f9bfc9eb5a0bec9a00544312d6b4d680914cf53088de6d3747e361629c6c80b431596e294720a661a1fd9214b5e1d109c1a3e100"
}'

@ThomasBlock
Copy link
Author

Same for official ubi task

computing-provider ubi-task list 
TASK ID	TASK TYPE	ZK TYPE    	TRANSACTION HASH	STATUS 	REWARD	CREATE TIME         
42     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-29 07:23:26	
61     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-29 20:04:07	
66     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-29 22:04:08	
69     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 00:04:07	
72     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 02:04:07	
75     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 04:04:07	
78     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 06:04:07	
81     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 08:04:07	
84     	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 10:04:07	
1      	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 12:42:05	
2      	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 12:51:27	
96     	CPU      	fil-c2-512M	                	running	10.00 	2024-01-30 14:57:18	
103    	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 16:57:18	
107    	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 18:57:18	
112    	CPU      	fil-c2-512M	                	running	0.0   	2024-01-30 20:57:18
[GIN] 2024/01/30 - 20:57:18 | 200 |  221.216482ms |   38.104.153.43 | GET      "/api/v1/computing/cp"
time="2024-01-30 20:57:18.463" level=info msg="receive ubi task received: {ID:112 Name:1000-0-7-196 Type:1 ZkType:fil-c2-512M InputParam:https://286cb2c989.acl.multichain.storage/ipfs/Qme28CvgAXj244mZwCt17xCdXCn19U7S18R5ribFzxfnp6 Signature:0x8e9e723061609a62462be4f2fff185ab6960730bf76ad62d0eeb4028ebedfb2b0f106ca94e44c17eeb3a9f31039b467e858056db39453db71166eb1b8fc5b14000 Resource:0xc000562240}" func=DoUbiTask file="cp_service.go:547"
time="2024-01-30 20:57:18.464" level=info msg="ubi task sign verifing, task_id: 112,  type: fil-c2-512M, verify: true" func=DoUbiTask file="cp_service.go:586"
kubectl logs -f -n ubi-task-112 fil-c2-512m-112-lxz6j
2024-01-30T19:57:19.746Z	INFO	ubi-bench	ubi-bench/main.go:96	Starting ubi-bench
2024-01-30T19:57:19.746Z	INFO	ubi-bench	ubi-bench/main.go:565	json param file of c1: /var/tmp/fil-c2-param/1000-0-7-196.json
2024-01-30T19:57:19.746Z	WARN	ubi-bench	ubi-bench/main.go:113	reading input file:
    main.glob..func4
        /opt/ubi-benchmark/cmd/ubi-bench/main.go:568
  - open /var/tmp/fil-c2-param/1000-0-7-196.json: no such file or directory

@sonic-chain
Copy link

sonic-chain commented Feb 1, 2024

  • delete ubi-worker images:
 docker rmi -f filswan/ubi-worker:v1.0
  • Restart the service using the computing-provider version v0.4.2

@ThomasBlock
Copy link
Author

  • delete ubi-worker images:
 docker rmi -f filswan/ubi-worker:v1.0
  • Restart the service using the computing-provider version v0.4.2

When containerd is involved, we need these commands:

ctr -n k8s.io images list | grep ubi
ctr -n k8s.io images remove docker.io/filswan/ubi-worker:v1.0

but still no luck for me. here is a new error: @Normalnoise

kubectl describe po -n ubi-task-8
Name:             fil-c2-512m-8-k6mj7
Namespace:        ubi-task-8
Priority:         0
Service Account:  default
Node:             swan2/192.168.128.72
Start Time:       Thu, 01 Feb 2024 19:51:17 +0100
Labels:           batch.kubernetes.io/controller-uid=7377996f-6097-4c2e-bb50-deb357498e15
                  batch.kubernetes.io/job-name=fil-c2-512m-8
                  controller-uid=7377996f-6097-4c2e-bb50-deb357498e15
                  job-name=fil-c2-512m-8
Annotations:      cni.projectcalico.org/containerID: 9cfc930be8766f62c53d81507264b5ecea254ed5083ba5dbd072b1ba2944f46b
                  cni.projectcalico.org/podIP: 
                  cni.projectcalico.org/podIPs: 
Status:           Succeeded
IP:               172.16.177.91
IPs:
  IP:           172.16.177.91
Controlled By:  Job/fil-c2-512m-8
Containers:
  fil-c2-512m-8keoxr:
    Container ID:  containerd://fd24822abb6def8cf12037b34910b8d3b1ea4db583f849fe5b1ee2e2f6674db0
    Image:         filswan/ubi-worker:v1.0
    Image ID:      docker.io/filswan/ubi-worker@sha256:e1c9498b3911e7a028dbe0b908754c367c789bf8c0e2b9bd793895993ae96c84
    Port:          <none>
    Host Port:     <none>
    Command:
      ubi-bench
      c2
      /var/tmp/fil-c2-param/test-ubi.json
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 01 Feb 2024 19:51:17 +0100
      Finished:     Thu, 01 Feb 2024 19:51:17 +0100
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:                4
      ephemeral-storage:  2Gi
      memory:             10Gi
      nvidia.com/gpu:     1
    Requests:
      cpu:                2
      ephemeral-storage:  1Gi
      memory:             5Gi
      nvidia.com/gpu:     1
    Environment:
      RUST_GPU_TOOLS_CUSTOM_GPU:  NVIDIA RTX A4000:6144
      RECEIVE_PROOF_URL:          https://swan1:8085/api/v1/computing/cp/receive/ubi
      TASKID:                     8
      TASK_TYPE:                  1
      ZK_TYPE:                    fil-c2-512M
      NAME_SPACE:                 ubi-task-8
      PARAM_PATH:                 /share/cp/zk-pool/fil-c2-512M/test-ubi
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sx9sk (ro)
      /var/tmp/fil-c2-param from fil-c2-input-volume (rw)
      /var/tmp/filecoin-proof-parameters from proof-params (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  proof-params:
    Type:          HostPath (bare host directory volume)
    Path:          /var/tmp/filecoin-proof-parameters
    HostPathType:  
  fil-c2-input-volume:
    Type:          HostPath (bare host directory volume)
    Path:          /share/cp/zk-pool/fil-c2-512M/test-ubi
    HostPathType:  
  kube-api-access-sx9sk:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason   Age   From     Message
  ----    ------   ----  ----     -------
  Normal  Pulled   38s   kubelet  Container image "filswan/ubi-worker:v1.0" already present on machine
  Normal  Created  38s   kubelet  Created container fil-c2-512m-8keoxr
  Normal  Started  38s   kubelet  Started container fil-c2-512m-8keoxr
ls /share/cp/zk-pool/fil-c2-512M/test-ubi
test-ubi.json
ls /var/tmp/filecoin-proof-parameters
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.params
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.vk
...
 kubectl logs -f -n ubi-task-8                                      fil-c2-512m-8-k6mj7
2024-02-01T18:51:17.799Z	INFO	ubi-bench	ubi-bench/main.go:96	Starting ubi-bench
2024-02-01T18:51:17.799Z	INFO	ubi-bench	ubi-bench/main.go:556	get param from mcs url: 
2024-02-01T18:51:17.799Z	WARN	ubi-bench	ubi-bench/main.go:113	error making request to mcs url: Get "": unsupported protocol scheme ""

@sonic-chain
Copy link

sonic-chain commented Feb 2, 2024

You need to pull the code to compile, and then restart the cp service:

git clone https://github.com/swanchain/go-computing-provider.git
cd go-computing-provider && git checkout v0.4.2
make && make install

@ThomasBlock
Copy link
Author

ah okay. so you update the code without further increasing version number, i see. now we are one step further, and have a new problem.

computing-provider -v
computing-provider version 0.4.2+git.24931a7
kubectl logs -f -n ubi-task-10                                     fil-c2-512m-10-c5bml
2024-02-02T11:19:32.347Z	INFO	ubi-bench	ubi-bench/main.go:96	Starting ubi-bench
2024-02-02T11:19:32.347Z	INFO	ubi-bench	ubi-bench/main.go:556	get param from mcs url: https://286cb2c989.acl.multichain.storage/ipfs/QmYg4CfA5E2zR4ktb5B3PafAeCWyEEXiKUVS4g2UE9occ5
2024-02-02T11:19:32.894Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-empty-sector-update-merkletree-poseidon_hasher-8-8-2-102e1444a7e9a97ebf1e3d6855dcc77e66c011ea66f936d9b2c508f87f2f83a7.vk is ok
2024-02-02T11:19:32.894Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0170db1f394b35d995252228ee359194b13199d259380541dc529fb0099096b0.vk is ok
2024-02-02T11:19:32.894Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-559e581f022bb4e4ec6e719e563bf0e026ad6de42e56c18714a2c692b1b88d7e.vk is ok
2024-02-02T11:19:32.894Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk is ok
2024-02-02T11:19:32.894Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-032d3138d22506ec0082ed72b2dcba18df18477904e35bafee82b3793b06832f.vk is ok
2024-02-02T11:19:32.894Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-ecd683648512ab1765faa2a5f14bab48f676e633467f0aa8aad4b55dcb0652bb.vk is ok
2024-02-02T11:19:32.894Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-7d739b8cf60f1b0709eeebee7730e297683552e4b69cab6984ec0285663c5781.vk is ok
2024-02-02T11:19:32.894Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-empty-sector-update-merkletree-poseidon_hasher-8-8-0-3b7f44a9362e3985369454947bc94022e118211e49fd672d52bec1cbfd599d18.vk is ok
2024-02-02T11:19:32.894Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-2-sha256_hasher-96f1b4a04c5c51e4759bbf224bbc2ef5a42c7100f16ec0637123f16a845ddfb2.vk is ok
2024-02-02T11:19:32.895Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.vk is ok
2024-02-02T11:19:32.895Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-5294475db5237a2e83c3e52fd6c2b03859a1831d45ed08c4f35dbf9a803165a9.vk is ok
2024-02-02T11:19:32.894Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-92180959e1918d26350b8e6cfe217bbdd0a2d8de51ebec269078b364b715ad63.vk is ok
2024-02-02T11:19:32.895Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-0-sha256_hasher-82a357d2f2ca81dc61bb45f4a762807aedee1b0a53fd6c4e77b46a01bfef7820.vk is ok
2024-02-02T11:19:32.895Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0cfb4f178bbb71cf2ecfcd42accce558b27199ab4fb59cb78f2483fe21ef36d9.vk is ok
2024-02-02T11:19:32.895Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-3ea05428c9d11689f23529cde32fd30aabd50f7d2c93657c1d3650bca3e8ea9e.vk is ok
2024-02-02T11:19:32.895Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-2-b62098629d07946e9028127e70295ed996fe3ed25b0f9f88eb610a0ab4385a3c.vk is ok
2024-02-02T11:19:32.895Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-fb9e095bebdd77511c0269b967b4d87ba8b8a525edaa0e165de23ba454510194.vk is ok
2024-02-02T11:19:32.895Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-50c7368dea9593ed0989e70974d28024efa9d156d585b7eea1be22b2e753f331.vk is ok
2024-02-02T11:19:32.897Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-0377ded656c6f524f1618760bffe4e0a1c51d5a70c4509eedae8a27555733edc.vk is ok
2024-02-02T11:19:32.897Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-2-2627e4006b67f99cef990c0a47d5426cb7ab0a0ad58fc1061547bf2d28b09def.vk is ok
2024-02-02T11:19:33.175Z	INFO	paramfetch	[email protected]/paramfetch.go:209	Parameter file /var/tmp/filecoin-proof-parameters/v28-fil-inner-product-v1.srs is ok
2024-02-02T11:19:33.175Z	INFO	paramfetch	[email protected]/paramfetch.go:233	parameter and key-fetching complete
2024-02-02T11:19:33.176 INFO filecoin_proofs::api::seal > seal_commit_phase2:start: SectorId(0)
2024-02-02T11:19:33.176 INFO filecoin_proofs::caches > trying parameters memory cache for: STACKED[536870912]
2024-02-02T11:19:33.176 INFO filecoin_proofs::caches > no params in memory cache for STACKED[536870912]
2024-02-02T11:19:33.176 INFO storage_proofs_core::parameter_cache > parameter set identifier for cache: layered_drgporep::PublicParams{ graph: stacked_graph::StackedGraph{expansion_degree: 8 base_graph: drgraph::BucketGraph{size: 16777216; degree: 6; hasher: poseidon_hasher} }, challenges: LayerChallenges { layers: 2, max_count: 2 }, tree: merkletree-poseidon_hasher-8-0-0 }
2024-02-02T11:19:33.177 INFO storage_proofs_core::parameter_cache > ensuring that all ancestor directories for: "/var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.params" exist
2024-02-02T11:19:33.177 INFO storage_proofs_core::parameter_cache > checking cache_path: "/var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.params" for parameters
2024-02-02T11:19:33.177 INFO storage_proofs_core::parameter_cache > Verify production parameters is false
2024-02-02T11:19:33.252 INFO storage_proofs_core::parameter_cache > read parameters from cache "/var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.params" 
2024-02-02T11:19:33.253 INFO bellperson::groth16::prover::native > Bellperson 0.26.0 is being used!
2024-02-02T11:19:35.352 INFO bellperson::groth16::prover::native > synthesis time: 2.099376862s
2024-02-02T11:19:35.352 INFO bellperson::groth16::prover::native > starting proof timer
2024-02-02T11:19:35.513 INFO bellperson::gpu::locks > GPU is available for FFT!
2024-02-02T11:19:35.513 INFO bellperson::gpu::locks > BELLPERSON_GPUS_PER_LOCK fallback to single lock mode
2024-02-02T11:19:35.579 INFO ec_gpu_gen::fft > FFT: 1 working device(s) selected. 
2024-02-02T11:19:35.579 INFO ec_gpu_gen::fft > FFT: Device 0: NVIDIA RTX A4000
2024-02-02T11:19:35.579 INFO bellperson::gpu::locks > GPU FFT kernel instantiated!
2024-02-02T11:19:37.174 INFO bellperson::gpu::locks > GPU is available for Multiexp!
2024-02-02T11:19:37.174 INFO bellperson::gpu::locks > BELLPERSON_GPUS_PER_LOCK fallback to single lock mode
2024-02-02T11:19:37.174 INFO bellperson::gpu::multiexp > Multiexp: CPU utilization: 0.
2024-02-02T11:19:37.175 INFO ec_gpu_gen::multiexp > Multiexp: 1 working device(s) selected.
2024-02-02T11:19:37.175 INFO ec_gpu_gen::multiexp > Multiexp: Device 0: NVIDIA RTX A4000 (Chunk-size: 91400704)
2024-02-02T11:19:37.175 INFO bellperson::gpu::locks > GPU Multiexp kernel instantiated!
2024-02-02T11:19:40.300 INFO bellperson::gpu::locks > GPU is available for Multiexp!
2024-02-02T11:19:40.300 INFO bellperson::gpu::locks > BELLPERSON_GPUS_PER_LOCK fallback to single lock mode
2024-02-02T11:19:40.300 INFO bellperson::gpu::multiexp > Multiexp: CPU utilization: 0.
2024-02-02T11:19:40.300 INFO ec_gpu_gen::multiexp > Multiexp: 1 working device(s) selected.
2024-02-02T11:19:40.300 INFO ec_gpu_gen::multiexp > Multiexp: Device 0: NVIDIA RTX A4000 (Chunk-size: 44132059)
2024-02-02T11:19:40.300 INFO bellperson::gpu::locks > GPU Multiexp kernel instantiated!
2024-02-02T11:19:40.682 INFO bellperson::groth16::prover::native > prover time: 5.329874841s
2024-02-02T11:19:40.711 INFO filecoin_proofs::caches > trying parameters memory cache for: STACKED[536870912]-verifying-key
2024-02-02T11:19:40.711 INFO filecoin_proofs::caches > no params in memory cache for STACKED[536870912]-verifying-key
2024-02-02T11:19:40.711 INFO storage_proofs_core::parameter_cache > parameter set identifier for cache: layered_drgporep::PublicParams{ graph: stacked_graph::StackedGraph{expansion_degree: 8 base_graph: drgraph::BucketGraph{size: 16777216; degree: 6; hasher: poseidon_hasher} }, challenges: LayerChallenges { layers: 2, max_count: 2 }, tree: merkletree-poseidon_hasher-8-0-0 }
2024-02-02T11:19:40.711 INFO storage_proofs_core::parameter_cache > ensuring that all ancestor directories for: "/var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk" exist
2024-02-02T11:19:40.711 INFO storage_proofs_core::parameter_cache > checking cache_path: "/var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk" for verifying key
2024-02-02T11:19:40.711 INFO storage_proofs_core::parameter_cache > Verify production parameters is false
2024-02-02T11:19:40.713 INFO storage_proofs_core::parameter_cache > read verifying key from cache "/var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk" 
2024-02-02T11:19:40.721 INFO filecoin_proofs::api::seal > verify_seal:start: SectorId(0)
2024-02-02T11:19:40.721 INFO filecoin_proofs::caches > trying parameters memory cache for: STACKED[536870912]-verifying-key
2024-02-02T11:19:40.721 INFO filecoin_proofs::caches > found params in memory cache for STACKED[536870912]-verifying-key
2024-02-02T11:19:40.722 INFO filecoin_proofs::api::seal > verify_seal:finish: SectorId(0)
2024-02-02T11:19:40.722 INFO filecoin_proofs::api::seal > seal_commit_phase2:finish: SectorId(0)
time="2024-02-02 11:19:40.757" level=error msg="Failed send a request, error: Post \"https://swan1:8085/api/v1/computing/cp/receive/ubi\": dial tcp: lookup swan1 on 10.96.0.10:53: no such host" func=func4 file="main.go:644"
2024-02-02T11:19:40.757Z	WARN	ubi-bench	ubi-bench/main.go:113	Post "https://swan1:8085/api/v1/computing/cp/receive/ubi": dial tcp: lookup swan1 on 10.96.0.10:53: no such host

the mentioned Ip address is indeed wrong and can nowhere be found:

kubectl get po -A -o wide
NAMESPACE                                       NAME                                                           READY   STATUS      RESTARTS        AGE     IP               NODE    NOMINATED NODE   READINESS GATES
ingress-nginx                                   ingress-nginx-admission-create-mh72w                           0/1     Completed   0               9d      <none>           swan1   <none>           <none>
ingress-nginx                                   ingress-nginx-admission-patch-2d9rc                            0/1     Completed   0               9d      <none>           swan1   <none>           <none>
ingress-nginx                                   ingress-nginx-controller-7fcc98f6bc-5phzb                      1/1     Running     1 (6d2h ago)    9d      172.16.100.76    swan1   <none>           <none>
kube-system                                     calico-kube-controllers-74d5f9d7bb-p2t7w                       1/1     Running     4 (12h ago)     9d      172.16.100.74    swan1   <none>           <none>
kube-system                                     calico-node-5cns9                                              1/1     Running     4 (3d17h ago)   4d      192.168.128.73   swan3   <none>           <none>
kube-system                                     calico-node-99rh4                                              1/1     Running     6 (3d ago)      9d      192.168.128.72   swan2   <none>           <none>
kube-system                                     calico-node-fbxhw                                              1/1     Running     1 (6d2h ago)    9d      192.168.128.71   swan1   <none>           <none>
kube-system                                     calico-node-ss6qg                                              1/1     Running     1 (2d15h ago)   2d16h   192.168.128.75   swan5   <none>           <none>
kube-system                                     coredns-5dd5756b68-bvj46                                       1/1     Running     1 (6d2h ago)    9d      172.16.100.73    swan1   <none>           <none>
kube-system                                     coredns-5dd5756b68-sss5q                                       1/1     Running     1 (6d2h ago)    9d      172.16.100.75    swan1   <none>           <none>
kube-system                                     etcd-swan1                                                     1/1     Running     3 (6d2h ago)    9d      192.168.128.71   swan1   <none>           <none>
kube-system                                     kube-apiserver-swan1                                           1/1     Running     5 (12h ago)     9d      192.168.128.71   swan1   <none>           <none>
kube-system                                     kube-controller-manager-swan1                                  1/1     Running     29 (11h ago)    9d      192.168.128.71   swan1   <none>           <none>
kube-system                                     kube-proxy-lq975                                               1/1     Running     3 (6d2h ago)    9d      192.168.128.71   swan1   <none>           <none>
kube-system                                     kube-proxy-nttzs                                               1/1     Running     7 (3d ago)      9d      192.168.128.72   swan2   <none>           <none>
kube-system                                     kube-proxy-pz5g8                                               1/1     Running     4 (3d17h ago)   4d      192.168.128.73   swan3   <none>           <none>
kube-system                                     kube-proxy-v546g                                               1/1     Running     1 (2d15h ago)   2d16h   192.168.128.75   swan5   <none>           <none>
kube-system                                     kube-scheduler-swan1                                           1/1     Running     31 (11h ago)    9d      192.168.128.71   swan1   <none>           <none>
kube-system                                     nvidia-device-plugin-daemonset-2wgbt                           1/1     Running     1 (6d2h ago)    9d      172.16.100.78    swan1   <none>           <none>
kube-system                                     nvidia-device-plugin-daemonset-98njt                           1/1     Running     1 (2d15h ago)   2d16h   172.16.41.146    swan5   <none>           <none>
kube-system                                     nvidia-device-plugin-daemonset-df6lz                           1/1     Running     7 (3d17h ago)   4d      172.16.59.75     swan3   <none>           <none>
kube-system                                     nvidia-device-plugin-daemonset-shzdv                           1/1     Running     5 (3d ago)      9d      172.16.177.102   swan2   <none>           <none>
kube-system                                     resource-exporter-ds-4z55l                                     1/1     Running     0               16h     172.16.100.94    swan1   <none>           <none>
kube-system                                     resource-exporter-ds-6gwt2                                     1/1     Running     0               16h     172.16.177.99    swan2   <none>           <none>
kube-system                                     resource-exporter-ds-7fxgn                                     1/1     Running     0               16h     172.16.59.77     swan3   <none>           <none>
kube-system                                     resource-exporter-ds-tt7hk                                     1/1     Running     0               16h     172.16.41.180    swan5   <none>           <none>
ns-0x066af13bc249371c72939e793157ae05cbbcc981   deploy-a74cb467-da69-45f0-a559-c6f71e41cdd9-6c9bb7bb85-44xwz   1/1     Running     0               9m33s   172.16.59.80     swan3   <none>           <none>
ns-0x20445a11e5c6309579387e47564e29a174c02eb7   deploy-9a5980f2-46c2-4e5f-b660-32287f69434c-5f9c96c68c-z7szn   1/1     Running     0               2d15h   172.16.41.149    swan5   <none>           <none>
ns-0x2733c8521c1b80939415bf521775769cdabe40f3   deploy-70f77042-33e6-4e55-a3d6-ca7e1249bd88-69f5c56558-9n6ks   1/1     Running     0               3d11h   172.16.100.87    swan1   <none>           <none>
ns-0x45bcb503b0b85eb6ee6a1490aa64065597897502   deploy-2ce4655b-d152-4086-855e-4d3e9a141683-68fcc49f89-lvjc8   1/1     Running     0               2d3h    172.16.41.157    swan5   <none>           <none>
ns-0x45bcb503b0b85eb6ee6a1490aa64065597897502   deploy-705a9a33-ab71-448e-8878-647fdf49ddd0-87b9fb9f5-9jhxl    1/1     Running     0               2d3h    172.16.41.159    swan5   <none>           <none>
ns-0x5a37e272299581edb615c1483fae4af7801b91b9   deploy-6cc79c0e-d7a0-4b90-8a5b-745924d7592c-5ffc69ffc7-h5ljr   1/1     Running     0               17h     172.16.177.95    swan2   <none>           <none>
ns-0x66e91a773df9d1966ca7615179d86d8b0740cfe2   deploy-689c6d8a-9e8d-4761-b96f-099a7567bebb-bf65cb6d7-5xjst    1/1     Running     0               23h     172.16.41.164    swan5   <none>           <none>
ns-0x80a6c6848dff59dc333b2cb791b7856d303c0433   deploy-61dc252f-1b8a-4f9e-876b-482c352e7c20-7b7f44d66f-w67db   1/1     Running     0               19h     172.16.177.86    swan2   <none>           <none>
ns-0x82d9125d91b90a94b251a1ec9dd5af43a9bb6e4a   deploy-0fd8a4ac-3c03-451e-91d6-e546f7c45f9a-84f9457bcf-6xhf5   1/1     Running     0               23h     172.16.41.166    swan5   <none>           <none>
ns-0x82d9125d91b90a94b251a1ec9dd5af43a9bb6e4a   deploy-cdc2cf82-0bdc-4032-b20b-58ef3ba726f7-7968864b66-x4rt9   1/1     Running     0               23h     172.16.41.165    swan5   <none>           <none>
ns-0xf7cbba96282d30b01d4a9de0701bd2dadf74a8ff   deploy-c0485940-96ff-4c4e-96ae-0ffc0012d02a-754bb9bc58-7brks   1/1     Running     0               20h     172.16.177.79    swan2   <none>           <none>
tigera-operator                                 tigera-operator-94d7f7696-kgx6q                                1/1     Running     50 (11h ago)    9d      192.168.128.72   swan2   <none>           <none>

@ThomasBlock
Copy link
Author

ThomasBlock commented Feb 2, 2024

same for official ubi task by the way

 kubectl describe po -n ubi-task-574
Name:             fil-c2-512m-574-dsbzd
Namespace:        ubi-task-574
Priority:         0
Service Account:  default
Node:             swan2/192.168.128.72
Start Time:       Fri, 02 Feb 2024 13:30:19 +0100
Labels:           batch.kubernetes.io/controller-uid=d5703705-b1bc-4826-b741-52ed5c3b0a46
                  batch.kubernetes.io/job-name=fil-c2-512m-574
                  controller-uid=d5703705-b1bc-4826-b741-52ed5c3b0a46
                  job-name=fil-c2-512m-574
Annotations:      cni.projectcalico.org/containerID: 4b1fdae9dd48562356512ee155d351958d85af2a76e59a139c229b099ac54e64
                  cni.projectcalico.org/podIP: 
                  cni.projectcalico.org/podIPs: 
Status:           Succeeded
IP:               172.16.177.116
IPs:
  IP:           172.16.177.116
Controlled By:  Job/fil-c2-512m-574
Containers:
  fil-c2-512m-574fcugq:
    Container ID:  containerd://053ad3e8c4abd913dab965518f003f5b75b377abced535af60adaa1cfa2f7fac
    Image:         filswan/ubi-worker:v1.0
    Image ID:      docker.io/filswan/ubi-worker@sha256:e1c9498b3911e7a028dbe0b908754c367c789bf8c0e2b9bd793895993ae96c84
    Port:          <none>
    Host Port:     <none>
    Command:
      ubi-bench
      c2
      /var/tmp/fil-c2-param/1000-0-7-612.json
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 02 Feb 2024 13:30:19 +0100
      Finished:     Fri, 02 Feb 2024 13:30:29 +0100
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:                2
      ephemeral-storage:  2Gi
      memory:             10Gi
      nvidia.com/gpu:     1
    Requests:
      cpu:                1
      ephemeral-storage:  1Gi
      memory:             5Gi
      nvidia.com/gpu:     1
    Environment:
      RUST_GPU_TOOLS_CUSTOM_GPU:  NVIDIA RTX A4000:6144,NVIDIA GeForce RTX 4060 Ti:4352
      RECEIVE_PROOF_URL:          https://swan1:8085/api/v1/computing/cp/receive/ubi
      TASKID:                     574
      TASK_TYPE:                  1
      ZK_TYPE:                    fil-c2-512M
      NAME_SPACE:                 ubi-task-574
      PARAM_URL:                  https://286cb2c989.acl.multichain.storage/ipfs/QmcVwLYXHCar7Hg2wBiYwoY3jtxz7SF3hkYu6SmA7DRco5
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8scgp (ro)
      /var/tmp/filecoin-proof-parameters from proof-params (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  proof-params:
    Type:          HostPath (bare host directory volume)
    Path:          /var/tmp/filecoin-proof-parameters
    HostPathType:  
  kube-api-access-8scgp:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason   Age   From     Message
  ----    ------   ----  ----     -------
  Normal  Pulled   72s   kubelet  Container image "filswan/ubi-worker:v1.0" already present on machine
  Normal  Created  72s   kubelet  Created container fil-c2-512m-574fcugq
  Normal  Started  72s   kubelet  Started container fil-c2-512m-574fcugq

swan1 is the correct hostname.. so dns does not work.. maybe it should be changed to the external domain hostname.. or just the IP adress.. how can achieve that RECEIVE_PROOF_URL is changed?

192.168.128.71 swan1 is in /etc/hosts on each host

i tried

nano cp/fil-c2.env
RECEIVE_PROOF_URL="http://192.168.128.71:8085/api/v1/computing/cp/receive/ubi"

but then compute-priver says
W0202 13:40:17.230078 1853483 warnings.go:70] spec.template.spec.containers[0].env[2]: hides previous definition of "RECEIVE_PROOF_URL"

    Environment:
      RUST_GPU_TOOLS_CUSTOM_GPU:  NVIDIA RTX A4000:6144,NVIDIA GeForce RTX 4060 Ti:4352
      RECEIVE_PROOF_URL:          http://192.168.128.71:8085/api/v1/computing/cp/receive/ubi
      RECEIVE_PROOF_URL:          https://swan1:8085/api/v1/computing/cp/receive/ubi
      TASKID:                     12
      TASK_TYPE:                  1
      ZK_TYPE:                    fil-c2-512M
      NAME_SPACE:                 ubi-task-12
      PARAM_URL:                  https://286cb2c989.acl.multichain.storage/ipfs/QmYg4CfA5E2zR4ktb5B3PafAeCWyEEXiKUVS4g2UE9occ5

@ThomasBlock
Copy link
Author

here is a way i solved the dns problem:

kubectl edit cm coredns -n kube-system
swan1. {
        hosts {
            192.168.128.71 swan1
        }
    }

kubectl rollout restart deployment coredns -n kube-system

ubi is now starting. but i am not connected to the hub ( #12 (comment) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants