Update troubleshooting_runs.md (#205)

ShataDg · web-flow · commit 9f32a11d71e8 · 2025-10-06T09:18:09.000-07:00
Troubleshoot `IsADirectoryError`
diff --git a/documentation/DCP-documentation/troubleshooting_runs.md b/documentation/DCP-documentation/troubleshooting_runs.md
@@ -20,6 +20,7 @@ Services/behaviors that are as expected and/or not relevant for diagnosing a pro
 | Jobs moving to dead messages | Your perinstance logs have an IOError indicating that an .h5 batchfile does not exist | No outcome/saved files on S3 | N/A | Your job is configured for using a batchfile and no batchfile exists for your project | 1) Create a batch file and make sure that it is in the appropriate directory 2) Make sure that you have set your batch file location correctly in your jobs 3) If using run_batch_general.py for job creation, make sure that you passed the `--use-batch` flag |
 | No jobs are pulled from the queue  | No logs are created | No outputs are written to S3 | Machines made in EC2 but they remain nameless. |  A nameless machine means that the Dockers are not placed on the machines. 1) There is a mismatch in your DCP config file. OR 2) You haven't set up permissions correctly. OR 3) Dockers are not being made in ECS |  1) Confirm that the MEMORY matches the MACHINE_TYPE  set in your config. Confirm that there are no typos in your DOCKERHUB_TAG set in your config. 2) Check that you have set up permissions correctly for the user or role that you have set in your config under AWS_PROFILE. Confirm that your `ecsInstanceRole` is able to access the S3 bucket where your `ecsconfigs` have been uploaded. 3) Check in ECS that you see `Registered container instances`. |
 | Jobs moving to dead messages | Your perinstance logs have an IOError indicating that CellProfiler cannot open your pipeline  | No outputs are written to S3 | N/A | You have a corrupted pipeline | Check if you can open your pipeline locally. It may have been corrupted on upload or it may have an error within the pipeline itself.  |
+| Jobs moving to dead messages | CloudWatch logs for the plate shows `IsADirectoryError: [Errno 21] Is a directory:` | No outputs are written to S3 | N/A | There might be an issue with job submission | Check if there are any spaces in between the plate names in the `python run_batch_general.py` command that you passed  |
 | N/A | "== ERR move failed:An error occurred (SlowDown) when calling the PutObject operation (reached max retries: 4): Please reduce your request rate." Error may not show initially and may become more prevalent with time. | N/A | N/A | Too many jobs are finishing too quickly creating a backlog of jobs waiting to upload to S3. | You can 1) check out fewer machines at a time, 2) check out smaller machines and run fewer copies of DCP at the same time, or 3) group jobs in larger groupings (e.g. by Plate instead of Well or Site). If this happens because you have many jobs finishing at the same time (but not finishing very rapidly such that it's not creating an increasing backlog) you can increase SECONDS_TO_START in config.py so there is more separation between jobs finishing. |
 | N/A | "/home/ubuntu/bucket: Transport endpoint is not connected" | S3 cannot be accessed by fleet. | N/A | S3FS has stochastically dropped/failed to connect. | Perform your run without using S3FS by setting DOWNLOAD_FILES = TRUE in your config.py. Note that, depending upon your job and machine setup,  you may need to increase the size of your EBS volume to account for the files being downloaded. |
 | Jobs moving to dead messages | "SSL: certificate subject name (*.s3.amazonaws.com) does not match target host name 'xxx.yyy.s3.amazonaws.com'" | S3 cannot be accessed by fleet. | N/A | S3FS fails to mount if your bucket name has a dot (.) in it. | You can bypass S3FS usage by setting DOWNLOAD_FILES = TRUE in your config.py. Note that, depending upon your job and machine setup,  you may need to increase the size of your EBS volume to account for the files being downloaded. Alternatively, you can make your own DCP Docker and edit run-worker.sh to `use_path_request_style`. If your region is not us-east-1 you also need to specify `endpoint`. See S3FS documentation for more information. |