-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could not load weights for finetune (likely because you are finetuning a previously finetuned network). Attempting to finetune from a full finetune model file. #129
Comments
Hi @yuan0821 Sorry for the delay in getting back to your issues. The reason why you are getting this error is probably because the weights file shape is mismatched. I would check the shape of weights you are trying to load vs. the shape of layers of the model you are trying to load into. You can do this by putting a debugger before Here is a deeper technical reason for the issue: |
Hi @data-hound And please suggest if there are problems with the crop, mono channel, or n_views,... Thank you very much. |
Hi @yuan0821 As for putting the debugger on line 1348, you can simply add this line Mono channel and n_views look good to me so far. About the crop parameter, you may want to see the input shape expected by the model vs the input shape generated by the generator. This can be done by calling the |
Hi @data-hound |
Awesome! |
Hi @data-hound |
Hi @data-hound I think the custom_objects={ is in the load_model function already as the pic shows. |
`(Pdb) weightspath (Pdb) from tensorflow.keras.models import Model, load_model (Pdb) model = load_model(weightspath, custom_objects={"ops": ops,b"slice_input": slice_input, "mask_nan_keep_loss": losses.mask_nan_keep_loss, "mask_nan_l1_loss": losses.mask_nan_l1_loss,"euclidean_distance_3D": losses.euclidean_distance_3D,"centered_euclidean_distance_3D": losses.centered_euclidean_distance_3D,}, compile=False,) |
packages in environment at E:\anaconda\envs\tf26:Name Version Build Channelabsl-py 0.15.0 pypi_0 pypi |
Hi @yuan0821 I see your problem. I tried loading the (local copy of the) same weights and I got |
Hi @data-hound Could you please share some available 3/4/5/6 mono camera weight files? Thank you so much. |
Hi @yuan0821 From the issue that you linked, you can try downloading the MAX 3-cam weights and try running - as is mentioned in the same comment, you can finrtune AVG network starting from either of the weight files. Let me know if that lets you move forward. |
Hi @data-hound from tensorflow.keras.models import load_model model = load_model(weightspath, custom_objects={"ops": ops,b"slice_input": slice_input, "mask_nan_keep_loss": losses.mask_nan_keep_loss, "mask_nan_l1_loss": losses.mask_nan_l1_loss,"euclidean_distance_3D": losses.euclidean_distance_3D,"centered_euclidean_distance_3D": losses.centered_euclidean_distance_3D,}) model.summary() (Pdb) model.summary() Layer (type) Output Shape Param # Connected toinput_1 (InputLayer) [(None, 64, 64, 64, 0 conv3d (Conv3D) (None, 64, 64, 64, 6 6976 input_1[0][0] instance_normalization (Instanc (None, 64, 64, 64, 6 2 conv3d[0][0] activation (Activation) (None, 64, 64, 64, 6 0 instance_normalization[0][0] conv3d_1 (Conv3D) (None, 64, 64, 64, 6 110656 activation[0][0] instance_normalization_1 (Insta (None, 64, 64, 64, 6 2 conv3d_1[0][0] activation_1 (Activation) (None, 64, 64, 64, 6 0 instance_normalization_1[0][0] max_pooling3d (MaxPooling3D) (None, 32, 32, 32, 6 0 activation_1[0][0] conv3d_2 (Conv3D) (None, 32, 32, 32, 1 221312 max_pooling3d[0][0] instance_normalization_2 (Insta (None, 32, 32, 32, 1 2 conv3d_2[0][0] activation_2 (Activation) (None, 32, 32, 32, 1 0 instance_normalization_2[0][0] conv3d_3 (Conv3D) (None, 32, 32, 32, 1 442496 activation_2[0][0] instance_normalization_3 (Insta (None, 32, 32, 32, 1 2 conv3d_3[0][0] activation_3 (Activation) (None, 32, 32, 32, 1 0 instance_normalization_3[0][0] max_pooling3d_1 (MaxPooling3D) (None, 16, 16, 16, 1 0 activation_3[0][0] conv3d_4 (Conv3D) (None, 16, 16, 16, 2 884992 max_pooling3d_1[0][0] instance_normalization_4 (Insta (None, 16, 16, 16, 2 2 conv3d_4[0][0] activation_4 (Activation) (None, 16, 16, 16, 2 0 instance_normalization_4[0][0] conv3d_5 (Conv3D) (None, 16, 16, 16, 2 1769728 activation_4[0][0] instance_normalization_5 (Insta (None, 16, 16, 16, 2 2 conv3d_5[0][0] activation_5 (Activation) (None, 16, 16, 16, 2 0 instance_normalization_5[0][0] max_pooling3d_2 (MaxPooling3D) (None, 8, 8, 8, 256) 0 activation_5[0][0] conv3d_6 (Conv3D) (None, 8, 8, 8, 512) 3539456 max_pooling3d_2[0][0] instance_normalization_6 (Insta (None, 8, 8, 8, 512) 2 conv3d_6[0][0] activation_6 (Activation) (None, 8, 8, 8, 512) 0 instance_normalization_6[0][0] conv3d_7 (Conv3D) (None, 8, 8, 8, 512) 7078400 activation_6[0][0] instance_normalization_7 (Insta (None, 8, 8, 8, 512) 2 conv3d_7[0][0] activation_7 (Activation) (None, 8, 8, 8, 512) 0 instance_normalization_7[0][0] conv3d_transpose (Conv3DTranspo (None, 16, 16, 16, 2 1048832 activation_7[0][0] concatenate (Concatenate) (None, 16, 16, 16, 5 0 conv3d_transpose[0][0] conv3d_8 (Conv3D) (None, 16, 16, 16, 2 3539200 concatenate[0][0] instance_normalization_8 (Insta (None, 16, 16, 16, 2 2 conv3d_8[0][0] activation_8 (Activation) (None, 16, 16, 16, 2 0 instance_normalization_8[0][0] conv3d_9 (Conv3D) (None, 16, 16, 16, 2 1769728 activation_8[0][0] instance_normalization_9 (Insta (None, 16, 16, 16, 2 2 conv3d_9[0][0] activation_9 (Activation) (None, 16, 16, 16, 2 0 instance_normalization_9[0][0] conv3d_transpose_1 (Conv3DTrans (None, 32, 32, 32, 1 262272 activation_9[0][0] concatenate_1 (Concatenate) (None, 32, 32, 32, 2 0 conv3d_transpose_1[0][0] conv3d_10 (Conv3D) (None, 32, 32, 32, 1 884864 concatenate_1[0][0] instance_normalization_10 (Inst (None, 32, 32, 32, 1 2 conv3d_10[0][0] activation_10 (Activation) (None, 32, 32, 32, 1 0 instance_normalization_10[0][0] conv3d_11 (Conv3D) (None, 32, 32, 32, 1 442496 activation_10[0][0] instance_normalization_11 (Inst (None, 32, 32, 32, 1 2 conv3d_11[0][0] activation_11 (Activation) (None, 32, 32, 32, 1 0 instance_normalization_11[0][0] conv3d_transpose_2 (Conv3DTrans (None, 64, 64, 64, 6 65600 activation_11[0][0] concatenate_2 (Concatenate) (None, 64, 64, 64, 1 0 conv3d_transpose_2[0][0] conv3d_12 (Conv3D) (None, 64, 64, 64, 6 221248 concatenate_2[0][0] instance_normalization_12 (Inst (None, 64, 64, 64, 6 2 conv3d_12[0][0] activation_12 (Activation) (None, 64, 64, 64, 6 0 instance_normalization_12[0][0] conv3d_13 (Conv3D) (None, 64, 64, 64, 6 110656 activation_12[0][0] instance_normalization_13 (Inst (None, 64, 64, 64, 6 2 conv3d_13[0][0] activation_13 (Activation) (None, 64, 64, 64, 6 0 instance_normalization_13[0][0]Total params: 22,398,940 |
During handling of the above exception, another exception occurred: Traceback (most recent call last): |
Hi @data-hound Thank you so much for your time!! I am struggle with the problem for a long time. |
Hi @yuan0821 At the point where you got this message from, can you please try the following:
About this, I could not figure out there being a problem with n_views, mono and n_channels_in - although these are usual suspects. (Can you please paste the parameters in a more readable format?) |
hi @data-hound
And I found I had to duplicate the cameras (videos, params, sync, label3dData) to match the weight file so that dannce-predict could run smoothly. |
Hi anyone could do me a favor to check what the problem of this error is?
I use 3 mono cameras for recording, and fine-tune with 3 AVG mono network.
Thank you so much!! @spoonsso
(tf25) F:\testdannce120\dannce\demo\new919>dannce-train .\dannce_config_919.yaml 2022-10-24 10:33:51.991035: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll io_config not found in io.yaml file, falling back to main config extension not found in io.yaml file, falling back to main config crop_height not found in io.yaml file, falling back to main config crop_width not found in io.yaml file, falling back to main config n_channels_in not found in io.yaml file, falling back to main config new_n_channels_out not found in io.yaml file, falling back to main config camnames not found in io.yaml file, falling back to main config mono not found in io.yaml file, falling back to main config n_views not found in io.yaml file, falling back to main config batch_size not found in io.yaml file, falling back to main config epochs not found in io.yaml file, falling back to main config net_type not found in io.yaml file, falling back to main config train_mode not found in io.yaml file, falling back to main config num_validation_per_exp not found in io.yaml file, falling back to main config vol_size not found in io.yaml file, falling back to main config nvox not found in io.yaml file, falling back to main config max_num_samples not found in io.yaml file, falling back to main config dannce_finetune_weights not found in io.yaml file, falling back to main config com_train_dir set to: .\COM\train_results\ com_predict_dir set to: .\COM\predict_results\ dannce_train_dir set to: .\DANNCE\train_results\ dannce_predict_dir set to: .\DANNCE\predict_results\ exp set to: [{'label3d_file': './testlabel3d_dannce.mat'}] io_config set to: io.yaml extension set to: .avi crop_height set to: [0, 2240] crop_width set to: [0, 2048] n_channels_in set to: 1 new_n_channels_out set to: 16 camnames set to: ['Camera1', 'Camera2', 'Camera3'] mono set to: True n_views set to: 3 batch_size set to: 4 epochs set to: 10 net_type set to: AVG train_mode set to: finetune num_validation_per_exp set to: 0 vol_size set to: 120 nvox set to: 64 max_num_samples set to: 100 dannce_finetune_weights set to: F:\testdannce120\dannce\weight\avg_3\ base_config set to: .\dannce_config_919.yaml viddir set to: videos n_channels_out set to: 20 sigma set to: 10 verbose set to: 1 net set to: None gpu_id set to: 0 immode set to: vid mirror set to: False loss set to: mask_nan_keep_loss num_train_per_exp set to: None metric set to: ['euclidean_distance_3D'] lr set to: 0.001 augment_hue set to: False augment_brightness set to: False augment_hue_val set to: 0.05 augment_bright_val set to: 0.05 augment_rotation_val set to: 5 data_split_seed set to: None valid_exp set to: None com_fromlabels set to: False medfilt_window set to: None com_file set to: None new_last_kernel_size set to: [3, 3, 3] n_layers_locked set to: 2 vmin set to: None vmax set to: None interp set to: nearest depth set to: False comthresh set to: 0 weighted set to: False com_method set to: median cthresh set to: None channel_combo set to: None predict_mode set to: torch rotate set to: True augment_continuous_rotation set to: False drop_landmark set to: None use_npy set to: False rand_view_replace set to: True n_rand_views set to: 0 multi_gpu_train set to: False heatmap_reg set to: False heatmap_reg_coeff set to: 0.01 save_pred_targets set to: False start_batch set to: 0 vid_dir_flag set to: None chunks set to: None lockfirst set to: None load_valid set to: None raw_im_h set to: None raw_im_w set to: None n_instances set to: 1 start_sample set to: None write_npy set to: None dannce_predict_model set to: None expval set to: None com_thresh set to: None cam3_train set to: None debug_volume_tifdir set to: None downfac set to: None from_weights set to: None dannce_predict_vol_tifdir set to: None Setting vid_dir_flag to True. Setting extension to .avi. Setting chunks to {'Camera1': array([0]), 'Camera2': array([0]), 'Camera3': array([0])}. Setting n_channels_in to 3. Setting raw_im_h to 2560. Setting raw_im_w to 2560. Setting expval to True. Setting net to finetune_AVG. Setting maxbatch to 25. Setting start_batch to 0. Setting vmin to -60.0. Setting vmax to 60.0. Fine-tuning from F:\testdannce120\dannce\weight\avg_3\weights_multigpu.30-8.19468_singleGPU.hdf5 Experiment 0 using videos in .\videos Experiment 0 using camnames: ['Camera1', 'Camera2', 'Camera3'] {'0_Camera1': array([0]), '0_Camera2': array([0]), '0_Camera3': array([0])} ./testlabel3d_dannce.mat Experiment 0 using com3d: ./testlabel3d_dannce.mat Removed 0 samples from the dataset because they either had COM positions over cthresh, or did not have matching sampleIDs in the COM file Using 13 samples total. Using the following cameras: ['Camera1', 'Camera2', 'Camera3'] TRAIN EXPTS: [0] None None Loading training data into memory. This can take a while to seek through large sets of video. This process is much faster if the frame indices are sorted in ascending order in your label data file. Loading new video: .\videos\Camera1\0.avi for 0_Camera1 Loading new video: .\videos\Camera2\0.avi for 0_Camera2 Loading new video: .\videos\Camera3\0.avi for 0_Camera3 Loading validation data into memory Using default n_rand_views augmentation with 3 views and with replacement To disable n_rand_views augmentation, set it to None in the config. Initializing Network... 2022-10-24 10:34:05.555438: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll 2022-10-24 10:34:05.585390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s 2022-10-24 10:34:05.585661: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll 2022-10-24 10:34:05.586291: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll 2022-10-24 10:34:05.586364: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll 2022-10-24 10:34:05.586429: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll 2022-10-24 10:34:05.587624: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll 2022-10-24 10:34:05.587703: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll 2022-10-24 10:34:05.587769: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll 2022-10-24 10:34:05.587833: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll 2022-10-24 10:34:05.587922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2022-10-24 10:34:05.588292: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-10-24 10:34:05.680864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s 2022-10-24 10:34:05.681225: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2022-10-24 10:34:06.267527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-10-24 10:34:06.267717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2022-10-24 10:34:06.269133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N 2022-10-24 10:34:06.269805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7433 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:41:00.0, compute capability: 8.6) WARNING:tensorflow:Collective ops is not configured at program startup. Some performance features may not be enabled. Number of devices: 1 NUM CAMERAS: 3 using instance normalization E:\anaconda\envs\tf25\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py:375: UserWarning: The
lrargument is deprecated, use
learning_rateinstead. "The
lrargument is deprecated, use
learning_rateinstead.")
`Correcting mismatch in layer name, model: image_input, weights: input_1
Correcting mismatch in layer name, model: conv3d, weights: conv3d_1
Correcting mismatch in layer name, model: instance_normalization, weights: instance_normalization_1
Correcting mismatch in layer name, model: activation, weights: activation_1
Correcting mismatch in layer name, model: conv3d_1, weights: conv3d_2
Correcting mismatch in layer name, model: instance_normalization_1, weights: instance_normalization_2
Correcting mismatch in layer name, model: activation_1, weights: activation_2
Correcting mismatch in layer name, model: max_pooling3d, weights: max_pooling3d_1
Correcting mismatch in layer name, model: conv3d_2, weights: conv3d_3
Correcting mismatch in layer name, model: instance_normalization_2, weights: instance_normalization_3
Correcting mismatch in layer name, model: activation_2, weights: activation_3
Correcting mismatch in layer name, model: conv3d_3, weights: conv3d_4
Correcting mismatch in layer name, model: instance_normalization_3, weights: instance_normalization_4
Correcting mismatch in layer name, model: activation_3, weights: activation_4
Correcting mismatch in layer name, model: max_pooling3d_1, weights: max_pooling3d_2
Correcting mismatch in layer name, model: conv3d_4, weights: conv3d_5
Correcting mismatch in layer name, model: instance_normalization_4, weights: instance_normalization_5
Correcting mismatch in layer name, model: activation_4, weights: activation_5
Correcting mismatch in layer name, model: conv3d_5, weights: conv3d_6
Correcting mismatch in layer name, model: instance_normalization_5, weights: instance_normalization_6
Correcting mismatch in layer name, model: activation_5, weights: activation_6
Correcting mismatch in layer name, model: max_pooling3d_2, weights: max_pooling3d_3
Correcting mismatch in layer name, model: conv3d_6, weights: conv3d_7
Correcting mismatch in layer name, model: instance_normalization_6, weights: instance_normalization_7
Correcting mismatch in layer name, model: activation_6, weights: activation_7
Correcting mismatch in layer name, model: conv3d_7, weights: conv3d_8
Correcting mismatch in layer name, model: instance_normalization_7, weights: instance_normalization_8
Correcting mismatch in layer name, model: activation_7, weights: activation_8
Correcting mismatch in layer name, model: conv3d_transpose, weights: conv3d_transpose_1
Correcting mismatch in layer name, model: concatenate, weights: concatenate_1
Correcting mismatch in layer name, model: conv3d_8, weights: conv3d_9
Correcting mismatch in layer name, model: instance_normalization_8, weights: instance_normalization_9
Correcting mismatch in layer name, model: activation_8, weights: activation_9
Correcting mismatch in layer name, model: conv3d_9, weights: conv3d_10
Correcting mismatch in layer name, model: instance_normalization_9, weights: instance_normalization_10
Correcting mismatch in layer name, model: activation_9, weights: activation_10
Correcting mismatch in layer name, model: conv3d_transpose_1, weights: conv3d_transpose_2
Correcting mismatch in layer name, model: concatenate_1, weights: concatenate_2
Correcting mismatch in layer name, model: conv3d_10, weights: conv3d_11
Correcting mismatch in layer name, model: instance_normalization_10, weights: instance_normalization_11
Correcting mismatch in layer name, model: activation_10, weights: activation_11
Correcting mismatch in layer name, model: conv3d_11, weights: conv3d_12
Correcting mismatch in layer name, model: instance_normalization_11, weights: instance_normalization_12
Correcting mismatch in layer name, model: activation_11, weights: activation_12
Correcting mismatch in layer name, model: conv3d_transpose_2, weights: conv3d_transpose_3
Correcting mismatch in layer name, model: concatenate_2, weights: concatenate_3
Correcting mismatch in layer name, model: conv3d_12, weights: conv3d_13
Correcting mismatch in layer name, model: instance_normalization_12, weights: instance_normalization_13
Correcting mismatch in layer name, model: activation_12, weights: activation_13
Correcting mismatch in layer name, model: conv3d_13, weights: conv3d_14
Correcting mismatch in layer name, model: instance_normalization_13, weights: instance_normalization_14
Correcting mismatch in layer name, model: activation_13, weights: activation_14
Could not load weights for finetune (likely because you are finetuning a previously finetuned network). Attempting to finetune from a full finetune model file.
Traceback (most recent call last):
File "f:\testdannce120\dannce\dannce\interface.py", line 1120, in dannce_train
*fargs
File "f:\testdannce120\dannce\dannce\engine\nets.py", line 1129, in finetune_AVG
model = renameLayers(model, weightspath)
File "f:\testdannce120\dannce\dannce\engine\nets.py", line 1348, in renameLayers
model.load_weights(weightspath, by_name=True)
File "E:\anaconda\envs\tf25\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2324, in load_weights
f, self.layers, skip_mismatch=skip_mismatch)
File "E:\anaconda\envs\tf25\lib\site-packages\tensorflow\python\keras\saving\hdf5_format.py", line 768, in load_weights_from_hdf5_group_by_name
layer, weight_values, original_keras_version, original_backend)
File "E:\anaconda\envs\tf25\lib\site-packages\tensorflow\python\keras\saving\hdf5_format.py", line 404, in preprocess_weights_for_loading
weights[0] = np.transpose(weights[0], (3, 2, 0, 1))
File "<array_function internals>", line 6, in transpose
File "E:\anaconda\envs\tf25\lib\site-packages\numpy\core\fromnumeric.py", line 653, in transpose
return _wrapfunc(a, 'transpose', axes)
File "E:\anaconda\envs\tf25\lib\site-packages\numpy\core\fromnumeric.py", line 58, in _wrapfunc
return bound(*args, **kwds)
ValueError: axes don't match array
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:\anaconda\envs\tf25\Scripts\dannce-train-script.py", line 33, in
sys.exit(load_entry_point('dannce', 'console_scripts', 'dannce-train')())
File "f:\testdannce120\dannce\dannce\cli.py", line 66, in dannce_train_cli
dannce_train(params)
File "f:\testdannce120\dannce\dannce\interface.py", line 1126, in dannce_train
*fargs
File "f:\testdannce120\dannce\dannce\engine\nets.py", line 1205, in finetune_fullmodel_AVG
for layer in model.layers[1].layers:
AttributeError: 'Conv3D' object has no attribute 'layers'`
The text was updated successfully, but these errors were encountered: