Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not load weights for finetune (likely because you are finetuning a previously finetuned network). Attempting to finetune from a full finetune model file. #129

Open
yuan0821 opened this issue Oct 24, 2022 · 18 comments

Comments

@yuan0821
Copy link

Hi anyone could do me a favor to check what the problem of this error is?
I use 3 mono cameras for recording, and fine-tune with 3 AVG mono network.

Thank you so much!! @spoonsso

(tf25) F:\testdannce120\dannce\demo\new919>dannce-train .\dannce_config_919.yaml 2022-10-24 10:33:51.991035: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll io_config not found in io.yaml file, falling back to main config extension not found in io.yaml file, falling back to main config crop_height not found in io.yaml file, falling back to main config crop_width not found in io.yaml file, falling back to main config n_channels_in not found in io.yaml file, falling back to main config new_n_channels_out not found in io.yaml file, falling back to main config camnames not found in io.yaml file, falling back to main config mono not found in io.yaml file, falling back to main config n_views not found in io.yaml file, falling back to main config batch_size not found in io.yaml file, falling back to main config epochs not found in io.yaml file, falling back to main config net_type not found in io.yaml file, falling back to main config train_mode not found in io.yaml file, falling back to main config num_validation_per_exp not found in io.yaml file, falling back to main config vol_size not found in io.yaml file, falling back to main config nvox not found in io.yaml file, falling back to main config max_num_samples not found in io.yaml file, falling back to main config dannce_finetune_weights not found in io.yaml file, falling back to main config com_train_dir set to: .\COM\train_results\ com_predict_dir set to: .\COM\predict_results\ dannce_train_dir set to: .\DANNCE\train_results\ dannce_predict_dir set to: .\DANNCE\predict_results\ exp set to: [{'label3d_file': './testlabel3d_dannce.mat'}] io_config set to: io.yaml extension set to: .avi crop_height set to: [0, 2240] crop_width set to: [0, 2048] n_channels_in set to: 1 new_n_channels_out set to: 16 camnames set to: ['Camera1', 'Camera2', 'Camera3'] mono set to: True n_views set to: 3 batch_size set to: 4 epochs set to: 10 net_type set to: AVG train_mode set to: finetune num_validation_per_exp set to: 0 vol_size set to: 120 nvox set to: 64 max_num_samples set to: 100 dannce_finetune_weights set to: F:\testdannce120\dannce\weight\avg_3\ base_config set to: .\dannce_config_919.yaml viddir set to: videos n_channels_out set to: 20 sigma set to: 10 verbose set to: 1 net set to: None gpu_id set to: 0 immode set to: vid mirror set to: False loss set to: mask_nan_keep_loss num_train_per_exp set to: None metric set to: ['euclidean_distance_3D'] lr set to: 0.001 augment_hue set to: False augment_brightness set to: False augment_hue_val set to: 0.05 augment_bright_val set to: 0.05 augment_rotation_val set to: 5 data_split_seed set to: None valid_exp set to: None com_fromlabels set to: False medfilt_window set to: None com_file set to: None new_last_kernel_size set to: [3, 3, 3] n_layers_locked set to: 2 vmin set to: None vmax set to: None interp set to: nearest depth set to: False comthresh set to: 0 weighted set to: False com_method set to: median cthresh set to: None channel_combo set to: None predict_mode set to: torch rotate set to: True augment_continuous_rotation set to: False drop_landmark set to: None use_npy set to: False rand_view_replace set to: True n_rand_views set to: 0 multi_gpu_train set to: False heatmap_reg set to: False heatmap_reg_coeff set to: 0.01 save_pred_targets set to: False start_batch set to: 0 vid_dir_flag set to: None chunks set to: None lockfirst set to: None load_valid set to: None raw_im_h set to: None raw_im_w set to: None n_instances set to: 1 start_sample set to: None write_npy set to: None dannce_predict_model set to: None expval set to: None com_thresh set to: None cam3_train set to: None debug_volume_tifdir set to: None downfac set to: None from_weights set to: None dannce_predict_vol_tifdir set to: None Setting vid_dir_flag to True. Setting extension to .avi. Setting chunks to {'Camera1': array([0]), 'Camera2': array([0]), 'Camera3': array([0])}. Setting n_channels_in to 3. Setting raw_im_h to 2560. Setting raw_im_w to 2560. Setting expval to True. Setting net to finetune_AVG. Setting maxbatch to 25. Setting start_batch to 0. Setting vmin to -60.0. Setting vmax to 60.0. Fine-tuning from F:\testdannce120\dannce\weight\avg_3\weights_multigpu.30-8.19468_singleGPU.hdf5 Experiment 0 using videos in .\videos Experiment 0 using camnames: ['Camera1', 'Camera2', 'Camera3'] {'0_Camera1': array([0]), '0_Camera2': array([0]), '0_Camera3': array([0])} ./testlabel3d_dannce.mat Experiment 0 using com3d: ./testlabel3d_dannce.mat Removed 0 samples from the dataset because they either had COM positions over cthresh, or did not have matching sampleIDs in the COM file Using 13 samples total. Using the following cameras: ['Camera1', 'Camera2', 'Camera3'] TRAIN EXPTS: [0] None None Loading training data into memory. This can take a while to seek through large sets of video. This process is much faster if the frame indices are sorted in ascending order in your label data file. Loading new video: .\videos\Camera1\0.avi for 0_Camera1 Loading new video: .\videos\Camera2\0.avi for 0_Camera2 Loading new video: .\videos\Camera3\0.avi for 0_Camera3 Loading validation data into memory Using default n_rand_views augmentation with 3 views and with replacement To disable n_rand_views augmentation, set it to None in the config. Initializing Network... 2022-10-24 10:34:05.555438: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll 2022-10-24 10:34:05.585390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s 2022-10-24 10:34:05.585661: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll 2022-10-24 10:34:05.586291: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll 2022-10-24 10:34:05.586364: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll 2022-10-24 10:34:05.586429: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll 2022-10-24 10:34:05.587624: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll 2022-10-24 10:34:05.587703: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll 2022-10-24 10:34:05.587769: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll 2022-10-24 10:34:05.587833: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll 2022-10-24 10:34:05.587922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2022-10-24 10:34:05.588292: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-10-24 10:34:05.680864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s 2022-10-24 10:34:05.681225: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2022-10-24 10:34:06.267527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-10-24 10:34:06.267717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2022-10-24 10:34:06.269133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N 2022-10-24 10:34:06.269805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7433 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:41:00.0, compute capability: 8.6) WARNING:tensorflow:Collective ops is not configured at program startup. Some performance features may not be enabled. Number of devices: 1 NUM CAMERAS: 3 using instance normalization E:\anaconda\envs\tf25\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py:375: UserWarning: The lrargument is deprecated, uselearning_rateinstead. "Thelrargument is deprecated, uselearning_rate instead.")

`Correcting mismatch in layer name, model: image_input, weights: input_1
Correcting mismatch in layer name, model: conv3d, weights: conv3d_1
Correcting mismatch in layer name, model: instance_normalization, weights: instance_normalization_1
Correcting mismatch in layer name, model: activation, weights: activation_1
Correcting mismatch in layer name, model: conv3d_1, weights: conv3d_2
Correcting mismatch in layer name, model: instance_normalization_1, weights: instance_normalization_2
Correcting mismatch in layer name, model: activation_1, weights: activation_2
Correcting mismatch in layer name, model: max_pooling3d, weights: max_pooling3d_1
Correcting mismatch in layer name, model: conv3d_2, weights: conv3d_3
Correcting mismatch in layer name, model: instance_normalization_2, weights: instance_normalization_3
Correcting mismatch in layer name, model: activation_2, weights: activation_3
Correcting mismatch in layer name, model: conv3d_3, weights: conv3d_4
Correcting mismatch in layer name, model: instance_normalization_3, weights: instance_normalization_4
Correcting mismatch in layer name, model: activation_3, weights: activation_4
Correcting mismatch in layer name, model: max_pooling3d_1, weights: max_pooling3d_2
Correcting mismatch in layer name, model: conv3d_4, weights: conv3d_5
Correcting mismatch in layer name, model: instance_normalization_4, weights: instance_normalization_5
Correcting mismatch in layer name, model: activation_4, weights: activation_5
Correcting mismatch in layer name, model: conv3d_5, weights: conv3d_6
Correcting mismatch in layer name, model: instance_normalization_5, weights: instance_normalization_6
Correcting mismatch in layer name, model: activation_5, weights: activation_6
Correcting mismatch in layer name, model: max_pooling3d_2, weights: max_pooling3d_3
Correcting mismatch in layer name, model: conv3d_6, weights: conv3d_7
Correcting mismatch in layer name, model: instance_normalization_6, weights: instance_normalization_7
Correcting mismatch in layer name, model: activation_6, weights: activation_7
Correcting mismatch in layer name, model: conv3d_7, weights: conv3d_8
Correcting mismatch in layer name, model: instance_normalization_7, weights: instance_normalization_8
Correcting mismatch in layer name, model: activation_7, weights: activation_8
Correcting mismatch in layer name, model: conv3d_transpose, weights: conv3d_transpose_1
Correcting mismatch in layer name, model: concatenate, weights: concatenate_1
Correcting mismatch in layer name, model: conv3d_8, weights: conv3d_9
Correcting mismatch in layer name, model: instance_normalization_8, weights: instance_normalization_9
Correcting mismatch in layer name, model: activation_8, weights: activation_9
Correcting mismatch in layer name, model: conv3d_9, weights: conv3d_10
Correcting mismatch in layer name, model: instance_normalization_9, weights: instance_normalization_10
Correcting mismatch in layer name, model: activation_9, weights: activation_10
Correcting mismatch in layer name, model: conv3d_transpose_1, weights: conv3d_transpose_2
Correcting mismatch in layer name, model: concatenate_1, weights: concatenate_2
Correcting mismatch in layer name, model: conv3d_10, weights: conv3d_11
Correcting mismatch in layer name, model: instance_normalization_10, weights: instance_normalization_11
Correcting mismatch in layer name, model: activation_10, weights: activation_11
Correcting mismatch in layer name, model: conv3d_11, weights: conv3d_12
Correcting mismatch in layer name, model: instance_normalization_11, weights: instance_normalization_12
Correcting mismatch in layer name, model: activation_11, weights: activation_12
Correcting mismatch in layer name, model: conv3d_transpose_2, weights: conv3d_transpose_3
Correcting mismatch in layer name, model: concatenate_2, weights: concatenate_3
Correcting mismatch in layer name, model: conv3d_12, weights: conv3d_13
Correcting mismatch in layer name, model: instance_normalization_12, weights: instance_normalization_13
Correcting mismatch in layer name, model: activation_12, weights: activation_13
Correcting mismatch in layer name, model: conv3d_13, weights: conv3d_14
Correcting mismatch in layer name, model: instance_normalization_13, weights: instance_normalization_14
Correcting mismatch in layer name, model: activation_13, weights: activation_14
Could not load weights for finetune (likely because you are finetuning a previously finetuned network). Attempting to finetune from a full finetune model file.
Traceback (most recent call last):
File "f:\testdannce120\dannce\dannce\interface.py", line 1120, in dannce_train
*fargs
File "f:\testdannce120\dannce\dannce\engine\nets.py", line 1129, in finetune_AVG
model = renameLayers(model, weightspath)
File "f:\testdannce120\dannce\dannce\engine\nets.py", line 1348, in renameLayers
model.load_weights(weightspath, by_name=True)
File "E:\anaconda\envs\tf25\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2324, in load_weights
f, self.layers, skip_mismatch=skip_mismatch)
File "E:\anaconda\envs\tf25\lib\site-packages\tensorflow\python\keras\saving\hdf5_format.py", line 768, in load_weights_from_hdf5_group_by_name
layer, weight_values, original_keras_version, original_backend)
File "E:\anaconda\envs\tf25\lib\site-packages\tensorflow\python\keras\saving\hdf5_format.py", line 404, in preprocess_weights_for_loading
weights[0] = np.transpose(weights[0], (3, 2, 0, 1))
File "<array_function internals>", line 6, in transpose
File "E:\anaconda\envs\tf25\lib\site-packages\numpy\core\fromnumeric.py", line 653, in transpose
return _wrapfunc(a, 'transpose', axes)
File "E:\anaconda\envs\tf25\lib\site-packages\numpy\core\fromnumeric.py", line 58, in _wrapfunc
return bound(*args, **kwds)
ValueError: axes don't match array

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:\anaconda\envs\tf25\Scripts\dannce-train-script.py", line 33, in
sys.exit(load_entry_point('dannce', 'console_scripts', 'dannce-train')())
File "f:\testdannce120\dannce\dannce\cli.py", line 66, in dannce_train_cli
dannce_train(params)
File "f:\testdannce120\dannce\dannce\interface.py", line 1126, in dannce_train
*fargs
File "f:\testdannce120\dannce\dannce\engine\nets.py", line 1205, in finetune_fullmodel_AVG
for layer in model.layers[1].layers:
AttributeError: 'Conv3D' object has no attribute 'layers'`

@data-hound
Copy link
Collaborator

Hi @yuan0821

Sorry for the delay in getting back to your issues.

The reason why you are getting this error is probably because the weights file shape is mismatched. I would check the shape of weights you are trying to load vs. the shape of layers of the model you are trying to load into. You can do this by putting a debugger before
"f:\testdannce120\dannce\dannce\engine\nets.py", line 1348

Here is a deeper technical reason for the issue:
The error here says that the model.layers[1] is a Conv3D and not a nested model. Typically, when finetuning a finetuned model, the code expects a nested model at model.layers[1]. However, if the code for loading weights fails, the code automatically tries to parse the weights as if the weights are from a previously finetuned model (which is not the case here). So, the reason why this is failing is because the weights are not being loaded properly. (This misleading error message will be addressed in the next release)

@yuan0821
Copy link
Author

Hi @data-hound
Thank you so much for your help.
I wonder how to put the debugger before the net.py line 1348 and run the debug.

And please suggest if there are problems with the crop, mono channel, or n_views,... Thank you very much.

@data-hound
Copy link
Collaborator

Hi @yuan0821

As for putting the debugger on line 1348, you can simply add this line import pdb; pdb.set_trace() - this has worked for me during all kinds of debugging at this point. Please let me know if you face any problems with this. After having the debugger, I suggest reinstalling dannce (using pip install -e .) and then, running the command as usual.

Mono channel and n_views look good to me so far. About the crop parameter, you may want to see the input shape expected by the model vs the input shape generated by the generator. This can be done by calling the __getitem__(<batch_num>) of the generator object while in debugger mode in interface.py. But I would first check the weights shapes before going for checking the crop.

@yuan0821
Copy link
Author

Hi @data-hound
After add import pdb; pdb.set_trace() before model.load_weights(weightspath, by_name=True), I got the information in the terminal. Please kindly advise the next step if i want to check the weights shapes. Thank you so much!!
截屏2023-01-31 上午12 50 46
截屏2023-01-31 上午12 50 33

@data-hound
Copy link
Collaborator

Awesome!
So, you need to make a comparison between the shapes of tensors being passed in your model at this point vs. the weights that you are loading. In the debugger console, you should be able to create a model out of the weights file
(pdb) import keras
(pdb) mdl = keras.load_model(weightspath)
Then, you can call .summary() on both and compare the mismatched layers.

@yuan0821
Copy link
Author

Hi @data-hound
I am not sure how to find the shape of the weights. Please find the picture for the details. Thank you very much .

截屏2023-01-31 上午1 35 18

@data-hound
Copy link
Collaborator

Hi @yuan0821
Which branch are you on?
You would probably need to check the equivalent of this line

custom_objects={
in your branch and copy the custom_objects dictionary into the load_model function.

@yuan0821
Copy link
Author

Hi @data-hound
I was in the branch of
def finetune_fullmodel_AVG(...

I think the custom_objects={ is in the load_model function already as the pic shows.

image

@yuan0821
Copy link
Author

yuan0821 commented Feb 1, 2023

(tf26) F:\dannce\demo\mouse_4>dannce-train dannce_mouse_config_4.yaml io_config not found in io.yaml file, falling back to main config new_n_channels_out not found in io.yaml file, falling back to main config downfac not found in io.yaml file, falling back to main config extension not found in io.yaml file, falling back to main config batch_size not found in io.yaml file, falling back to main config n_views not found in io.yaml file, falling back to main config mono not found in io.yaml file, falling back to main config epochs not found in io.yaml file, falling back to main config net_type not found in io.yaml file, falling back to main config train_mode not found in io.yaml file, falling back to main config num_validation_per_exp not found in io.yaml file, falling back to main config vol_size not found in io.yaml file, falling back to main config nvox not found in io.yaml file, falling back to main config max_num_samples not found in io.yaml file, falling back to main config dannce_finetune_weights not found in io.yaml file, falling back to main config predict_mode not found in io.yaml file, falling back to main config com_train_dir set to: .\COM\train_results\ com_predict_dir set to: .\COM\predict_results\ dannce_train_dir set to: .\DANNCE\train_results\AVG\ dannce_predict_dir set to: .\DANNCE\predict_results\ exp set to: [{'label3d_file': 'F:\\dannce\\demo\\mouse_4\\aftercom_label3D_dannce.mat'}] io_config set to: io.yaml new_n_channels_out set to: 16 downfac set to: 4 extension set to: .avi batch_size set to: 1 n_views set to: 4 mono set to: True epochs set to: 3 net_type set to: AVG train_mode set to: finetune num_validation_per_exp set to: 0 vol_size set to: 120 nvox set to: 64 max_num_samples set to: 100 dannce_finetune_weights set to: F:\testdannce120\dannce\weight\avg_5\ predict_mode set to: torch base_config set to: dannce_mouse_config_4.yaml viddir set to: videos crop_height set to: None crop_width set to: None camnames set to: None n_channels_out set to: 20 sigma set to: 10 verbose set to: 1 net set to: None gpu_id set to: 0 immode set to: vid mirror set to: False loss set to: mask_nan_keep_loss num_train_per_exp set to: None metric set to: ['euclidean_distance_3D'] lr set to: 0.001 augment_hue set to: False augment_brightness set to: False augment_hue_val set to: 0.05 augment_bright_val set to: 0.05 augment_rotation_val set to: 5 data_split_seed set to: None valid_exp set to: None com_fromlabels set to: False medfilt_window set to: None com_file set to: None new_last_kernel_size set to: [3, 3, 3] n_layers_locked set to: 2 vmin set to: None vmax set to: None interp set to: nearest depth set to: False comthresh set to: 0 weighted set to: False com_method set to: median cthresh set to: None channel_combo set to: None rotate set to: True augment_continuous_rotation set to: False drop_landmark set to: None use_npy set to: False rand_view_replace set to: True n_rand_views set to: 0 multi_gpu_train set to: False heatmap_reg set to: False heatmap_reg_coeff set to: 0.01 save_pred_targets set to: False start_batch set to: 0 n_channels_in set to: None vid_dir_flag set to: None chunks set to: None lockfirst set to: None load_valid set to: None raw_im_h set to: None raw_im_w set to: None n_instances set to: 1 start_sample set to: None write_npy set to: None dannce_predict_model set to: None expval set to: None com_thresh set to: None cam3_train set to: None debug_volume_tifdir set to: None from_weights set to: None dannce_predict_vol_tifdir set to: None Using the following *dannce.mat files: .\aftercom_label3D_dannce.mat Setting vid_dir_flag to True. Setting extension to .avi. Setting chunks to {'Camera1': array([0]), 'Camera2': array([0]), 'Camera3': array([0]), 'Camera4': array([0])}. Setting n_channels_in to 3. Setting raw_im_h to 2560. Setting raw_im_w to 2560. Setting expval to True. Setting net to finetune_AVG. Setting crop_height to [0, 2560]. Setting crop_width to [0, 2560]. Setting maxbatch to 100. Setting start_batch to 0. Setting vmin to -60.0. Setting vmax to 60.0. Fine-tuning from F:\testdannce120\dannce\weight\avg_5\weights_multigpu-v9.11-11.99217_singleGPU.hdf5 Experiment 0 using videos in F:\dannce\demo\mouse_4\videos Experiment 0 using camnames: ['Camera1', 'Camera2', 'Camera3', 'Camera4'] {'0_Camera1': array([0]), '0_Camera2': array([0]), '0_Camera3': array([0]), '0_Camera4': array([0])} F:\dannce\demo\mouse_4\aftercom_label3D_dannce.mat Experiment 0 using com3d: F:\dannce\demo\mouse_4\aftercom_label3D_dannce.mat Removed 0 samples from the dataset because they either had COM positions over cthresh, or did not have matching sampleIDs in the COM file Using 79 samples total. Using the following cameras: ['Camera1', 'Camera2', 'Camera3', 'Camera4'] TRAIN EXPTS: [0] None None Loading training data into memory. This can take a while to seek through large sets of video. This process is much faster if the frame indices are sorted in ascending order in your label data file. Loading new video: F:\dannce\demo\mouse_4\videos\Camera1\0.avi for 0_Camera1 Loading new video: F:\dannce\demo\mouse_4\videos\Camera2\0.avi for 0_Camera2 Loading new video: F:\dannce\demo\mouse_4\videos\Camera3\0.avi for 0_Camera3 Loading new video: F:\dannce\demo\mouse_4\videos\Camera4\0.avi for 0_Camera4 Loading validation data into memory Using default n_rand_views augmentation with 4 views and with replacement To disable n_rand_views augmentation, set it to None in the config. Initializing Network... Number of devices: 1 NUM CAMERAS: 4 using instance normalization E:\anaconda\envs\tf26\lib\site-packages\keras\optimizer_v2\optimizer_v2.py:355: UserWarning: The lrargument is deprecated, uselearning_rate` instead.
warnings.warn(
Correcting mismatch in layer name, model: image_input, weights: input_1
Correcting mismatch in layer name, model: conv3d, weights: conv3d_1
Correcting mismatch in layer name, model: instance_normalization, weights: instance_normalization_1
Correcting mismatch in layer name, model: activation, weights: activation_1
Correcting mismatch in layer name, model: conv3d_1, weights: conv3d_2
Correcting mismatch in layer name, model: instance_normalization_1, weights: instance_normalization_2
Correcting mismatch in layer name, model: activation_1, weights: activation_2
Correcting mismatch in layer name, model: max_pooling3d, weights: max_pooling3d_1
Correcting mismatch in layer name, model: conv3d_2, weights: conv3d_3
Correcting mismatch in layer name, model: instance_normalization_2, weights: instance_normalization_3
Correcting mismatch in layer name, model: activation_2, weights: activation_3
Correcting mismatch in layer name, model: conv3d_3, weights: conv3d_4
Correcting mismatch in layer name, model: instance_normalization_3, weights: instance_normalization_4
Correcting mismatch in layer name, model: activation_3, weights: activation_4
Correcting mismatch in layer name, model: max_pooling3d_1, weights: max_pooling3d_2
Correcting mismatch in layer name, model: conv3d_4, weights: conv3d_5
Correcting mismatch in layer name, model: instance_normalization_4, weights: instance_normalization_5
Correcting mismatch in layer name, model: activation_4, weights: activation_5
Correcting mismatch in layer name, model: conv3d_5, weights: conv3d_6
Correcting mismatch in layer name, model: instance_normalization_5, weights: instance_normalization_6
Correcting mismatch in layer name, model: activation_5, weights: activation_6
Correcting mismatch in layer name, model: max_pooling3d_2, weights: max_pooling3d_3
Correcting mismatch in layer name, model: conv3d_6, weights: conv3d_7
Correcting mismatch in layer name, model: instance_normalization_6, weights: instance_normalization_7
Correcting mismatch in layer name, model: activation_6, weights: activation_7
Correcting mismatch in layer name, model: conv3d_7, weights: conv3d_8
Correcting mismatch in layer name, model: instance_normalization_7, weights: instance_normalization_8
Correcting mismatch in layer name, model: activation_7, weights: activation_8
Correcting mismatch in layer name, model: conv3d_transpose, weights: conv3d_transpose_1
Correcting mismatch in layer name, model: concatenate, weights: concatenate_1
Correcting mismatch in layer name, model: conv3d_8, weights: conv3d_9
Correcting mismatch in layer name, model: instance_normalization_8, weights: instance_normalization_9
Correcting mismatch in layer name, model: activation_8, weights: activation_9
Correcting mismatch in layer name, model: conv3d_9, weights: conv3d_10
Correcting mismatch in layer name, model: instance_normalization_9, weights: instance_normalization_10
Correcting mismatch in layer name, model: activation_9, weights: activation_10
Correcting mismatch in layer name, model: conv3d_transpose_1, weights: conv3d_transpose_2
Correcting mismatch in layer name, model: concatenate_1, weights: concatenate_2
Correcting mismatch in layer name, model: conv3d_10, weights: conv3d_11
Correcting mismatch in layer name, model: instance_normalization_10, weights: instance_normalization_11
Correcting mismatch in layer name, model: activation_10, weights: activation_11
Correcting mismatch in layer name, model: conv3d_11, weights: conv3d_12
Correcting mismatch in layer name, model: instance_normalization_11, weights: instance_normalization_12
Correcting mismatch in layer name, model: activation_11, weights: activation_12
Correcting mismatch in layer name, model: conv3d_transpose_2, weights: conv3d_transpose_3
Correcting mismatch in layer name, model: concatenate_2, weights: concatenate_3
Correcting mismatch in layer name, model: conv3d_12, weights: conv3d_13
Correcting mismatch in layer name, model: instance_normalization_12, weights: instance_normalization_13
Correcting mismatch in layer name, model: activation_12, weights: activation_13
Correcting mismatch in layer name, model: conv3d_13, weights: conv3d_14
Correcting mismatch in layer name, model: instance_normalization_13, weights: instance_normalization_14
Correcting mismatch in layer name, model: activation_13, weights: activation_14

f:\dannce\dannce\engine\nets.py(1406)renameLayers()
-> model.load_weights(weightspath, by_name=True)
`
Then I start to debug as the above suggestion:

`(Pdb) weightspath
'F:\testdannce120\dannce\weight\avg_5\weights_multigpu-v9.11-11.99217_singleGPU.hdf5'

(Pdb) from tensorflow.keras.models import Model, load_model
(Pdb) load_model(weightspath)
*** ValueError: bad marshal data (unknown type code)

(Pdb) model = load_model(weightspath, custom_objects={"ops": ops,b"slice_input": slice_input, "mask_nan_keep_loss": losses.mask_nan_keep_loss, "mask_nan_l1_loss": losses.mask_nan_l1_loss,"euclidean_distance_3D": losses.euclidean_distance_3D,"centered_euclidean_distance_3D": losses.centered_euclidean_distance_3D,}, compile=False,)
*** ValueError: bad marshal data (unknown type code)`

@yuan0821
Copy link
Author

yuan0821 commented Feb 1, 2023

packages in environment at E:\anaconda\envs\tf26:

Name Version Build Channel

absl-py 0.15.0 pypi_0 pypi
aom 3.5.0 h63175ca_0 conda-forge
astunparse 1.6.3 pypi_0 pypi
attr 0.3.2 pypi_0 pypi
attrs 22.2.0 pypi_0 pypi
bzip2 1.0.8 h8ffe710_4 conda-forge
ca-certificates 2022.12.7 h5b45459_0 conda-forge
cachetools 5.3.0 pypi_0 pypi
certifi 2022.12.7 pypi_0 pypi
charset-normalizer 3.0.1 pypi_0 pypi
clang 5.0 pypi_0 pypi
contourpy 1.0.7 pypi_0 pypi
cudatoolkit 11.1.1 hb074779_11 conda-forge
cudnn 8.1.0.77 h3e0f4f4_0 conda-forge
cycler 0.11.0 pypi_0 pypi
dannce 1.2.0 dev_0
dill 0.3.6 pypi_0 pypi
expat 2.5.0 h1537add_0 conda-forge
ffmpeg 5.1.2 gpl_h5b1d025_106 conda-forge
flatbuffers 1.12 pypi_0 pypi
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
font-ttf-ubuntu 0.83 hab24e00_0 conda-forge
fontconfig 2.14.2 hbde0cde_0 conda-forge
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
fonttools 4.38.0 pypi_0 pypi
freetype 2.12.1 h546665d_1 conda-forge
gast 0.4.0 pypi_0 pypi
google-auth 2.16.0 pypi_0 pypi
google-auth-oauthlib 0.4.6 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
grpcio 1.51.1 pypi_0 pypi
h5py 3.1.0 pypi_0 pypi
idna 3.4 pypi_0 pypi
imageio 2.8.0 pypi_0 pypi
imageio-ffmpeg 0.4.8 pypi_0 pypi
importlib-metadata 6.0.0 pypi_0 pypi
keras 2.6.0 pypi_0 pypi
keras-preprocessing 1.1.2 pypi_0 pypi
kiwisolver 1.4.4 pypi_0 pypi
libffi 3.4.2 h8ffe710_5 conda-forge
libiconv 1.17 h8ffe710_0 conda-forge
libopus 1.3.1 h8ffe710_1 conda-forge
libpng 1.6.39 h19919ed_0 conda-forge
libsqlite 3.40.0 hcfcfb64_0 conda-forge
libxml2 2.10.3 hc3477c8_0 conda-forge
libzlib 1.2.13 hcfcfb64_4 conda-forge
markdown 3.4.1 pypi_0 pypi
markupsafe 2.1.2 pypi_0 pypi
matplotlib 3.6.3 pypi_0 pypi
multiprocess 0.70.14 pypi_0 pypi
networkx 3.0 pypi_0 pypi
numpy 1.19.5 pypi_0 pypi
oauthlib 3.2.2 pypi_0 pypi
opencv-python 4.7.0.68 pypi_0 pypi
openh264 2.3.1 h63175ca_1 conda-forge
openssl 3.0.7 hcfcfb64_2 conda-forge
opt-einsum 3.3.0 pypi_0 pypi
packaging 23.0 pypi_0 pypi
pillow 9.4.0 pypi_0 pypi
pip 23.0 pyhd8ed1ab_0 conda-forge
protobuf 3.20.3 pypi_0 pypi
psutil 5.9.4 pypi_0 pypi
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
pyparsing 3.0.9 pypi_0 pypi
python 3.8.16 h4de0772_1_cpython conda-forge
python-dateutil 2.8.2 pypi_0 pypi
pywavelets 1.4.1 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
requests 2.28.2 pypi_0 pypi
requests-oauthlib 1.3.1 pypi_0 pypi
rsa 4.9 pypi_0 pypi
scikit-image 0.19.3 pypi_0 pypi
scipy 1.10.0 pypi_0 pypi
setuptools 67.1.0 pypi_0 pypi
six 1.15.0 pypi_0 pypi
svt-av1 1.4.1 h63175ca_0 conda-forge
tensorboard 2.11.2 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.1 pypi_0 pypi
tensorflow 2.6.0 pypi_0 pypi
tensorflow-estimator 2.11.0 pypi_0 pypi
termcolor 1.1.0 pypi_0 pypi
tifffile 2023.1.23.1 pypi_0 pypi
tk 8.6.12 h8ffe710_0 conda-forge
torch 1.9.1+cu111 pypi_0 pypi
torchaudio 0.9.1 pypi_0 pypi
torchvision 0.10.1+cu111 pypi_0 pypi
typing-extensions 3.7.4.3 pypi_0 pypi
ucrt 10.0.22621.0 h57928b3_0 conda-forge
urllib3 1.26.14 pypi_0 pypi
vc 14.3 hb6edc58_10 conda-forge
vs2015_runtime 14.34.31931 h4c5c07a_10 conda-forge
werkzeug 2.2.2 pypi_0 pypi
wheel 0.38.4 pyhd8ed1ab_0 conda-forge
wrapt 1.12.1 pypi_0 pypi
x264 1!164.3095 h8ffe710_2 conda-forge
x265 3.5 h2d74725_3 conda-forge
xz 5.2.6 h8d14728_0 conda-forge
zipp 3.12.0 pypi_0 pypi

@data-hound
Copy link
Collaborator

Hi @yuan0821 I see your problem. I tried loading the (local copy of the) same weights and I got bad marshal data message as well. Do you have any other 5-cam weights that you can try with?

@yuan0821
Copy link
Author

yuan0821 commented Feb 2, 2023

Hi @data-hound
I do not have the train-weight file. The file I used was provided by dance. I put the link here.#62

Could you please share some available 3/4/5/6 mono camera weight files? Thank you so much.

@data-hound
Copy link
Collaborator

Hi @yuan0821

From the issue that you linked, you can try downloading the MAX 3-cam weights and try running - as is mentioned in the same comment, you can finrtune AVG network starting from either of the weight files.

Let me know if that lets you move forward.

@yuan0821
Copy link
Author

yuan0821 commented Feb 2, 2023

Hi @data-hound

from tensorflow.keras.models import load_model

model = load_model(weightspath, custom_objects={"ops": ops,b"slice_input": slice_input, "mask_nan_keep_loss": losses.mask_nan_keep_loss, "mask_nan_l1_loss": losses.mask_nan_l1_loss,"euclidean_distance_3D": losses.euclidean_distance_3D,"centered_euclidean_distance_3D": losses.centered_euclidean_distance_3D,})

model.summary()

(Pdb) model.summary()
Model: "model"


Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) [(None, 64, 64, 64, 0


conv3d (Conv3D) (None, 64, 64, 64, 6 6976 input_1[0][0]


instance_normalization (Instanc (None, 64, 64, 64, 6 2 conv3d[0][0]


activation (Activation) (None, 64, 64, 64, 6 0 instance_normalization[0][0]


conv3d_1 (Conv3D) (None, 64, 64, 64, 6 110656 activation[0][0]


instance_normalization_1 (Insta (None, 64, 64, 64, 6 2 conv3d_1[0][0]


activation_1 (Activation) (None, 64, 64, 64, 6 0 instance_normalization_1[0][0]


max_pooling3d (MaxPooling3D) (None, 32, 32, 32, 6 0 activation_1[0][0]


conv3d_2 (Conv3D) (None, 32, 32, 32, 1 221312 max_pooling3d[0][0]


instance_normalization_2 (Insta (None, 32, 32, 32, 1 2 conv3d_2[0][0]


activation_2 (Activation) (None, 32, 32, 32, 1 0 instance_normalization_2[0][0]


conv3d_3 (Conv3D) (None, 32, 32, 32, 1 442496 activation_2[0][0]


instance_normalization_3 (Insta (None, 32, 32, 32, 1 2 conv3d_3[0][0]


activation_3 (Activation) (None, 32, 32, 32, 1 0 instance_normalization_3[0][0]


max_pooling3d_1 (MaxPooling3D) (None, 16, 16, 16, 1 0 activation_3[0][0]


conv3d_4 (Conv3D) (None, 16, 16, 16, 2 884992 max_pooling3d_1[0][0]


instance_normalization_4 (Insta (None, 16, 16, 16, 2 2 conv3d_4[0][0]


activation_4 (Activation) (None, 16, 16, 16, 2 0 instance_normalization_4[0][0]


conv3d_5 (Conv3D) (None, 16, 16, 16, 2 1769728 activation_4[0][0]


instance_normalization_5 (Insta (None, 16, 16, 16, 2 2 conv3d_5[0][0]


activation_5 (Activation) (None, 16, 16, 16, 2 0 instance_normalization_5[0][0]


max_pooling3d_2 (MaxPooling3D) (None, 8, 8, 8, 256) 0 activation_5[0][0]


conv3d_6 (Conv3D) (None, 8, 8, 8, 512) 3539456 max_pooling3d_2[0][0]


instance_normalization_6 (Insta (None, 8, 8, 8, 512) 2 conv3d_6[0][0]


activation_6 (Activation) (None, 8, 8, 8, 512) 0 instance_normalization_6[0][0]


conv3d_7 (Conv3D) (None, 8, 8, 8, 512) 7078400 activation_6[0][0]


instance_normalization_7 (Insta (None, 8, 8, 8, 512) 2 conv3d_7[0][0]


activation_7 (Activation) (None, 8, 8, 8, 512) 0 instance_normalization_7[0][0]


conv3d_transpose (Conv3DTranspo (None, 16, 16, 16, 2 1048832 activation_7[0][0]


concatenate (Concatenate) (None, 16, 16, 16, 5 0 conv3d_transpose[0][0]
activation_5[0][0]


conv3d_8 (Conv3D) (None, 16, 16, 16, 2 3539200 concatenate[0][0]


instance_normalization_8 (Insta (None, 16, 16, 16, 2 2 conv3d_8[0][0]


activation_8 (Activation) (None, 16, 16, 16, 2 0 instance_normalization_8[0][0]


conv3d_9 (Conv3D) (None, 16, 16, 16, 2 1769728 activation_8[0][0]


instance_normalization_9 (Insta (None, 16, 16, 16, 2 2 conv3d_9[0][0]


activation_9 (Activation) (None, 16, 16, 16, 2 0 instance_normalization_9[0][0]


conv3d_transpose_1 (Conv3DTrans (None, 32, 32, 32, 1 262272 activation_9[0][0]


concatenate_1 (Concatenate) (None, 32, 32, 32, 2 0 conv3d_transpose_1[0][0]
activation_3[0][0]


conv3d_10 (Conv3D) (None, 32, 32, 32, 1 884864 concatenate_1[0][0]


instance_normalization_10 (Inst (None, 32, 32, 32, 1 2 conv3d_10[0][0]


activation_10 (Activation) (None, 32, 32, 32, 1 0 instance_normalization_10[0][0]


conv3d_11 (Conv3D) (None, 32, 32, 32, 1 442496 activation_10[0][0]


instance_normalization_11 (Inst (None, 32, 32, 32, 1 2 conv3d_11[0][0]


activation_11 (Activation) (None, 32, 32, 32, 1 0 instance_normalization_11[0][0]


conv3d_transpose_2 (Conv3DTrans (None, 64, 64, 64, 6 65600 activation_11[0][0]


concatenate_2 (Concatenate) (None, 64, 64, 64, 1 0 conv3d_transpose_2[0][0]
activation_1[0][0]


conv3d_12 (Conv3D) (None, 64, 64, 64, 6 221248 concatenate_2[0][0]


instance_normalization_12 (Inst (None, 64, 64, 64, 6 2 conv3d_12[0][0]


activation_12 (Activation) (None, 64, 64, 64, 6 0 instance_normalization_12[0][0]


conv3d_13 (Conv3D) (None, 64, 64, 64, 6 110656 activation_12[0][0]


instance_normalization_13 (Inst (None, 64, 64, 64, 6 2 conv3d_13[0][0]


activation_13 (Activation) (None, 64, 64, 64, 6 0 instance_normalization_13[0][0]

Total params: 22,398,940
Trainable params: 22,398,940
Non-trainable params: 0

@yuan0821
Copy link
Author

yuan0821 commented Feb 2, 2023

(tfnew_25) F:\dannce\demo\mouse_4>dannce-train dannce_mouse_config_4.yaml 2023-02-03 02:38:31.020282: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll io_config not found in io.yaml file, falling back to main config new_n_channels_out not found in io.yaml file, falling back to main config downfac not found in io.yaml file, falling back to main config extension not found in io.yaml file, falling back to main config batch_size not found in io.yaml file, falling back to main config n_views not found in io.yaml file, falling back to main config mono not found in io.yaml file, falling back to main config n_channels_in not found in io.yaml file, falling back to main config epochs not found in io.yaml file, falling back to main config net_type not found in io.yaml file, falling back to main config train_mode not found in io.yaml file, falling back to main config num_validation_per_exp not found in io.yaml file, falling back to main config vol_size not found in io.yaml file, falling back to main config nvox not found in io.yaml file, falling back to main config max_num_samples not found in io.yaml file, falling back to main config dannce_finetune_weights not found in io.yaml file, falling back to main config predict_mode not found in io.yaml file, falling back to main config com_train_dir set to: .\COM\train_results\ com_predict_dir set to: .\COM\predict_results\ dannce_train_dir set to: .\DANNCE\train_results\AVG\ dannce_predict_dir set to: .\DANNCE\predict_results\ exp set to: [{'label3d_file': 'F:\\dannce\\demo\\mouse_4\\aftercom_label3D_dannce.mat'}] io_config set to: io.yaml new_n_channels_out set to: 16 downfac set to: 4 extension set to: .avi batch_size set to: 1 n_views set to: 4 mono set to: True n_channels_in set to: 1 epochs set to: 3 net_type set to: AVG train_mode set to: finetune num_validation_per_exp set to: 0 vol_size set to: 120 nvox set to: 64 max_num_samples set to: 100 dannce_finetune_weights set to: F:\testdannce120\dannce\weight\max_3\ predict_mode set to: torch base_config set to: dannce_mouse_config_4.yaml viddir set to: videos crop_height set to: None crop_width set to: None camnames set to: None n_channels_out set to: 20 sigma set to: 10 verbose set to: 1 net set to: None gpu_id set to: 0 immode set to: vid mirror set to: False loss set to: mask_nan_keep_loss num_train_per_exp set to: None metric set to: ['euclidean_distance_3D'] lr set to: 0.001 augment_hue set to: False augment_brightness set to: False augment_hue_val set to: 0.05 augment_bright_val set to: 0.05 augment_rotation_val set to: 5 data_split_seed set to: None valid_exp set to: None com_fromlabels set to: False medfilt_window set to: None com_file set to: None new_last_kernel_size set to: [3, 3, 3] n_layers_locked set to: 2 vmin set to: None vmax set to: None interp set to: nearest depth set to: False comthresh set to: 0 weighted set to: False com_method set to: median cthresh set to: None channel_combo set to: None rotate set to: True augment_continuous_rotation set to: False drop_landmark set to: None use_npy set to: False rand_view_replace set to: True n_rand_views set to: 0 multi_gpu_train set to: False heatmap_reg set to: False heatmap_reg_coeff set to: 0.01 save_pred_targets set to: False start_batch set to: 0 vid_dir_flag set to: None chunks set to: None lockfirst set to: None load_valid set to: None raw_im_h set to: None raw_im_w set to: None n_instances set to: 1 start_sample set to: None write_npy set to: None dannce_predict_model set to: None expval set to: None com_thresh set to: None cam3_train set to: None debug_volume_tifdir set to: None from_weights set to: None dannce_predict_vol_tifdir set to: None Using the following *dannce.mat files: .\aftercom_label3D_dannce.mat Setting vid_dir_flag to True. Setting extension to .avi. Setting chunks to {'Camera1': array([0]), 'Camera2': array([0]), 'Camera3': array([0]), 'Camera4': array([0])}. Setting n_channels_in to 3. Setting raw_im_h to 2560. Setting raw_im_w to 2560. Setting expval to True. Setting net to finetune_AVG. Setting crop_height to [0, 2560]. Setting crop_width to [0, 2560]. Setting maxbatch to 100. Setting start_batch to 0. Setting vmin to -60.0. Setting vmax to 60.0. Fine-tuning from F:\testdannce120\dannce\weight\max_3\weights_multigpu.30-0.00002.hdf5 Experiment 0 using videos in F:\dannce\demo\mouse_4\videos Experiment 0 using camnames: ['Camera1', 'Camera2', 'Camera3', 'Camera4'] {'0_Camera1': array([0]), '0_Camera2': array([0]), '0_Camera3': array([0]), '0_Camera4': array([0])} F:\dannce\demo\mouse_4\aftercom_label3D_dannce.mat Experiment 0 using com3d: F:\dannce\demo\mouse_4\aftercom_label3D_dannce.mat Removed 0 samples from the dataset because they either had COM positions over cthresh, or did not have matching sampleIDs in the COM file Using 79 samples total. Using the following cameras: ['Camera1', 'Camera2', 'Camera3', 'Camera4'] TRAIN EXPTS: [0] None None Loading training data into memory. This can take a while to seek through large sets of video. This process is much faster if the frame indices are sorted in ascending order in your label data file. Loading new video: F:\dannce\demo\mouse_4\videos\Camera1\0.avi for 0_Camera1 Loading new video: F:\dannce\demo\mouse_4\videos\Camera2\0.avi for 0_Camera2 Loading new video: F:\dannce\demo\mouse_4\videos\Camera3\0.avi for 0_Camera3 Loading new video: F:\dannce\demo\mouse_4\videos\Camera4\0.avi for 0_Camera4 Loading validation data into memory Using default n_rand_views augmentation with 4 views and with replacement To disable n_rand_views augmentation, set it to None in the config. Initializing Network... 2023-02-03 02:40:19.357817: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll 2023-02-03 02:40:19.358100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s 2023-02-03 02:40:19.358229: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll 2023-02-03 02:40:19.358377: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll 2023-02-03 02:40:19.358522: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll 2023-02-03 02:40:19.358657: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll 2023-02-03 02:40:19.360112: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll 2023-02-03 02:40:19.360165: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll 2023-02-03 02:40:19.360327: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll 2023-02-03 02:40:19.360469: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll 2023-02-03 02:40:19.360676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2023-02-03 02:40:19.361168: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-02-03 02:40:19.454925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s 2023-02-03 02:40:19.455123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2023-02-03 02:40:19.894613: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2023-02-03 02:40:19.894699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2023-02-03 02:40:19.895197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N 2023-02-03 02:40:19.895584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7433 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:41:00.0, compute capability: 8.6) WARNING:tensorflow:Collective ops is not configured at program startup. Some performance features may not be enabled. Number of devices: 1 NUM CAMERAS: 4 using instance normalization E:\anaconda\envs\tfnew_25\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py:375: UserWarning: The lrargument is deprecated, uselearning_rateinstead. "Thelrargument is deprecated, uselearning_rate` instead.")
Now you are 1179 net
Correcting mismatch in layer name, model: image_input, weights: input_1
now you are 1410
Could not load weights for finetune (likely because you are finetuning a previously finetuned network). Attempting to finetune from a full finetune model file.
Now you are 1256 net
Traceback (most recent call last):
File "f:\testdannce120\dannce\dannce\interface.py", line 1120, in dannce_train
*fargs
File "f:\testdannce120\dannce\dannce\engine\nets.py", line 1180, in finetune_AVG
model = renameLayers(model, weightspath)
File "f:\testdannce120\dannce\dannce\engine\nets.py", line 1411, in renameLayers
model.load_weights(weightspath, by_name=True)
File "E:\anaconda\envs\tfnew_25\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2324, in load_weights
f, self.layers, skip_mismatch=skip_mismatch)
File "E:\anaconda\envs\tfnew_25\lib\site-packages\tensorflow\python\keras\saving\hdf5_format.py", line 768, in load_weights_from_hdf5_group_by_name
layer, weight_values, original_keras_version, original_backend)
File "E:\anaconda\envs\tfnew_25\lib\site-packages\tensorflow\python\keras\saving\hdf5_format.py", line 404, in preprocess_weights_for_loading
weights[0] = np.transpose(weights[0], (3, 2, 0, 1))
File "<array_function internals>", line 6, in transpose
File "E:\anaconda\envs\tfnew_25\lib\site-packages\numpy\core\fromnumeric.py", line 660, in transpose
return _wrapfunc(a, 'transpose', axes)
File "E:\anaconda\envs\tfnew_25\lib\site-packages\numpy\core\fromnumeric.py", line 57, in _wrapfunc
return bound(*args, **kwds)
ValueError: axes don't match array

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:\anaconda\envs\tfnew_25\Scripts\dannce-train-script.py", line 33, in
sys.exit(load_entry_point('dannce', 'console_scripts', 'dannce-train')())
File "f:\testdannce120\dannce\dannce\cli.py", line 66, in dannce_train_cli
dannce_train(params)
File "f:\testdannce120\dannce\dannce\interface.py", line 1126, in dannce_train
*fargs
File "f:\testdannce120\dannce\dannce\engine\nets.py", line 1259, in finetune_fullmodel_AVG
for layer in model.layers[1].layers:
AttributeError: 'Conv3D' object has no attribute 'layers'`

@yuan0821
Copy link
Author

yuan0821 commented Feb 2, 2023

Hi @data-hound
I use max_3 cam weight file for dannce training, but it still does not work.
I wonder if it is the problem of n_views? mono? n_channels_in??

Thank you so much for your time!! I am struggle with the problem for a long time.

@data-hound
Copy link
Collaborator

Hi @data-hound

from tensorflow.keras.models import load_model

model = load_model(weightspath, custom_objects={"ops": ops,b"slice_input": slice_input, "mask_nan_keep_loss": losses.mask_nan_keep_loss, "mask_nan_l1_loss": losses.mask_nan_l1_loss,"euclidean_distance_3D": losses.euclidean_distance_3D,"centered_euclidean_distance_3D": losses.centered_euclidean_distance_3D,})

model.summary()

(Pdb) model.summary() Model: "model"

Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) [(None, 64, 64, 64, 0

conv3d (Conv3D) (None, 64, 64, 64, 6 6976 input_1[0][0]

instance_normalization (Instanc (None, 64, 64, 64, 6 2 conv3d[0][0]

activation (Activation) (None, 64, 64, 64, 6 0 instance_normalization[0][0]

conv3d_1 (Conv3D) (None, 64, 64, 64, 6 110656 activation[0][0]

instance_normalization_1 (Insta (None, 64, 64, 64, 6 2 conv3d_1[0][0]

activation_1 (Activation) (None, 64, 64, 64, 6 0 instance_normalization_1[0][0]

max_pooling3d (MaxPooling3D) (None, 32, 32, 32, 6 0 activation_1[0][0]

conv3d_2 (Conv3D) (None, 32, 32, 32, 1 221312 max_pooling3d[0][0]

instance_normalization_2 (Insta (None, 32, 32, 32, 1 2 conv3d_2[0][0]

activation_2 (Activation) (None, 32, 32, 32, 1 0 instance_normalization_2[0][0]

conv3d_3 (Conv3D) (None, 32, 32, 32, 1 442496 activation_2[0][0]

instance_normalization_3 (Insta (None, 32, 32, 32, 1 2 conv3d_3[0][0]

activation_3 (Activation) (None, 32, 32, 32, 1 0 instance_normalization_3[0][0]

max_pooling3d_1 (MaxPooling3D) (None, 16, 16, 16, 1 0 activation_3[0][0]

conv3d_4 (Conv3D) (None, 16, 16, 16, 2 884992 max_pooling3d_1[0][0]

instance_normalization_4 (Insta (None, 16, 16, 16, 2 2 conv3d_4[0][0]

activation_4 (Activation) (None, 16, 16, 16, 2 0 instance_normalization_4[0][0]

conv3d_5 (Conv3D) (None, 16, 16, 16, 2 1769728 activation_4[0][0]

instance_normalization_5 (Insta (None, 16, 16, 16, 2 2 conv3d_5[0][0]

activation_5 (Activation) (None, 16, 16, 16, 2 0 instance_normalization_5[0][0]

max_pooling3d_2 (MaxPooling3D) (None, 8, 8, 8, 256) 0 activation_5[0][0]

conv3d_6 (Conv3D) (None, 8, 8, 8, 512) 3539456 max_pooling3d_2[0][0]

instance_normalization_6 (Insta (None, 8, 8, 8, 512) 2 conv3d_6[0][0]

activation_6 (Activation) (None, 8, 8, 8, 512) 0 instance_normalization_6[0][0]

conv3d_7 (Conv3D) (None, 8, 8, 8, 512) 7078400 activation_6[0][0]

instance_normalization_7 (Insta (None, 8, 8, 8, 512) 2 conv3d_7[0][0]

activation_7 (Activation) (None, 8, 8, 8, 512) 0 instance_normalization_7[0][0]

conv3d_transpose (Conv3DTranspo (None, 16, 16, 16, 2 1048832 activation_7[0][0]

concatenate (Concatenate) (None, 16, 16, 16, 5 0 conv3d_transpose[0][0] activation_5[0][0]

conv3d_8 (Conv3D) (None, 16, 16, 16, 2 3539200 concatenate[0][0]

instance_normalization_8 (Insta (None, 16, 16, 16, 2 2 conv3d_8[0][0]

activation_8 (Activation) (None, 16, 16, 16, 2 0 instance_normalization_8[0][0]

conv3d_9 (Conv3D) (None, 16, 16, 16, 2 1769728 activation_8[0][0]

instance_normalization_9 (Insta (None, 16, 16, 16, 2 2 conv3d_9[0][0]

activation_9 (Activation) (None, 16, 16, 16, 2 0 instance_normalization_9[0][0]

conv3d_transpose_1 (Conv3DTrans (None, 32, 32, 32, 1 262272 activation_9[0][0]

concatenate_1 (Concatenate) (None, 32, 32, 32, 2 0 conv3d_transpose_1[0][0] activation_3[0][0]

conv3d_10 (Conv3D) (None, 32, 32, 32, 1 884864 concatenate_1[0][0]

instance_normalization_10 (Inst (None, 32, 32, 32, 1 2 conv3d_10[0][0]

activation_10 (Activation) (None, 32, 32, 32, 1 0 instance_normalization_10[0][0]

conv3d_11 (Conv3D) (None, 32, 32, 32, 1 442496 activation_10[0][0]

instance_normalization_11 (Inst (None, 32, 32, 32, 1 2 conv3d_11[0][0]

activation_11 (Activation) (None, 32, 32, 32, 1 0 instance_normalization_11[0][0]

conv3d_transpose_2 (Conv3DTrans (None, 64, 64, 64, 6 65600 activation_11[0][0]

concatenate_2 (Concatenate) (None, 64, 64, 64, 1 0 conv3d_transpose_2[0][0] activation_1[0][0]

conv3d_12 (Conv3D) (None, 64, 64, 64, 6 221248 concatenate_2[0][0]

instance_normalization_12 (Inst (None, 64, 64, 64, 6 2 conv3d_12[0][0]

activation_12 (Activation) (None, 64, 64, 64, 6 0 instance_normalization_12[0][0]

conv3d_13 (Conv3D) (None, 64, 64, 64, 6 110656 activation_12[0][0]

instance_normalization_13 (Inst (None, 64, 64, 64, 6 2 conv3d_13[0][0]

activation_13 (Activation) (None, 64, 64, 64, 6 0 instance_normalization_13[0][0]

Total params: 22,398,940 Trainable params: 22,398,940 Non-trainable params: 0

Hi @yuan0821

At the point where you got this message from, can you please try the following:

  • Have your model loaded from weights as model_temp = load_model(...)
  • You already have your model created from code as model (Dont overwrite this)
  • Next, check if the weights are both the same, and the layer names are the same - I see that the names are being changed for all the layers which should not be happening if you are loading models with same architecture from what I know.
  • To check the output shape of layer i, use model.layers[i].output.shape or use list comprehension [layer.output.shape for layer in model.layers] and compare the lists between model_temp and model

Hi @data-hound I use max_3 cam weight file for dannce training, but it still does not work. I wonder if it is the problem of n_views? mono? n_channels_in??

Thank you so much for your time!! I am struggle with the problem for a long time.

About this, I could not figure out there being a problem with n_views, mono and n_channels_in - although these are usual suspects. (Can you please paste the parameters in a more readable format?)
There is one other bug that could cause this issue - where renaming of layers is occurring in the code due to format conversion - if this is the case, there is a fix for this in the new release.

@yuan0821
Copy link
Author

yuan0821 commented Feb 8, 2023

hi @data-hound
I didn't fix the bug now, but I can run the dannce-train for my three cameras system if I set my params like:

  1. n_views: 6 and using MONO-6-AVG-pre trained network (https://github.com/spoonsso/dannce/blob/master/demo/markerless_mouse_1/DANNCE/weights/weights.rat.AVG.6cam.hdf5

  2. n_view: 5 and using AVG/MAX-5-cam-pre trained network.

  3. However, the 3-cam-pre-trained network (provided by DANNCE) didn't work for my project.

And I found I had to duplicate the cameras (videos, params, sync, label3dData) to match the weight file so that dannce-predict could run smoothly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants