TODO

Documentation
- Add links to papers.
- Add links to project files.
- Make new notebook showing differences in cropped images that result from jpeg compression.
Organization and Refactoring
- Move all plotting functions to visualize.py
- ~~Move create_label_metrics into create_image_metrics?~~ Keep separate.
- Don't use multiple folders for jupyter notebooks. Just makes it hard to find.
- ~~Create a new SDCTile class that represents a single tile from SDCImage.~~ Cancelled. Too slow.
Performance
- Parallelize the creation of image_metrics.
- Parallelize the creation of overlap_scores.
Write unit tests
- ...for most everything in utils.py
- Use design pattern: Each test should have 3 parts: Arrange, Act, Assert
- Verify rotate image {1, 2, 3} x 90 degrees in pytorch?
- make sure len(md5hashes) == len(set(md5hashes)), otherwise we'll have to make our hashes longer.
Net upgrades
- Try using BatchNorm, or LayerNorm
- Try an all convolutional net. Remove max pooling layers and add stride 2 (or 3?) to convolutions
- Try using inputs with shape (BATCH_SIZE, 2, 3, 256, 256) instead of (BATCH_SIZE, 6, 256, 256)?
Importance sampling
- Implement UCB like function?
- Implement TF-IDF like function?
- Use bmh hamming distance with as_scores=True as initial guess to weights?
Features
- Implement an efficient way to lower (or raise) the overlap matches threshold.
- Implement efficient way to remove truth from auto when they are added manually.
- Skip image pairs that have already been verified as duplicates.
- Find out which remaining tile pairs have masks but aren't in dup_truth.
- Provide examples of false positives and false negatives for each of the following:
  - image hashes
  - image entropy
  - image histograms
  - grey-level co-occurrence matrix
  - ship overlap
- Use decision trees to show performance of DupNet over other metrics.
Dup Tool
- Upgrade the dup_tool to use Bokeh or Plotly for interactions and rendering.
- Handle StopIteration exception on last record read.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO.md

TODO.md

TODO

Files

TODO.md

Latest commit

History

TODO.md

File metadata and controls

TODO