Improve Handling of Noise Points in Clustering Algorithms (Fixes #152)#200
Closed
minimalProviderAgentMarket wants to merge 2 commits intolucasimi:mainfrom
Closed
Conversation
Add NoiseHandlingClustering wrapper class to provide control over how noise points (labeled as -1) are handled during clustering. The wrapper supports three modes: - 'singleton': Convert each noise point to its own cluster - 'drop': Keep noise points labeled as -1 - 'group': Group all noise points into a single cluster Update mapper_connected_components documentation to clarify noise point behavior and add comprehensive unit tests for the new functionality. This change enables more flexible handling of outliers and noise points in different Mapper applications.
Author
Aider:Update Summary: Enhancements to Noise Handling in Clustering Context: Addressed issue #152 regarding the handling of points classified as noise by clustering algorithms, specifically in relation to the Key Changes Implemented:
Test Results:
These improvements enhance the flexibility and accuracy of noise handling in clustering, supporting better data analysis and representation. |
Optimize noise handling in clustering implementation - Improve NoiseHandlingClustering performance with numpy operations - Add validation for noise_handling parameter values - Add detailed performance implications documentation - Enhance code readability with clearer variable names - Add debug output and more thorough tests for noise handling - Fix potential edge case when all points are noise - Use array copying to preserve original cluster labels The changes focus on making the noise handling more robust and efficient while maintaining the same functionality. The use of numpy operations replaces list comprehensions for better performance with large datasets.
Owner
|
This PR is generated by a bot. I'm closing this. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Description
Title: Improve Handling of Points Clustered as Noise
Related Issue: Fixes #152
Issue URL: Improve the handling of points clustered as noise
Summary
This pull request addresses issue #152, which focuses on enhancing the handling of noise points produced by clustering algorithms, specifically
DBSCAN. Previously, noise points were labeled as-1and removed from further analysis, which could lead to loss of potentially valuable information. This update introduces an improved mechanism for managing these noise points, allowing for various user-defined approaches.Changes Made
Created
NoiseHandlingClusteringClass:-1and are not considered in analysis.Updated Functionality:
mapper_connected_componentsfunction was modified to incorporate the new clustering strategies, while ensuring that the default behavior remains intact for backward compatibility.Testing and Validation:
-1labels for noise points are preserved when selected.Next Steps
Further testing will be conducted with larger datasets to ensure consistent performance and functionality across varied scenarios. Feedback from team members regarding additional test cases or potential edge cases is welcome.
Thank you for considering this enhancement to improve the handling of noise points in our clustering implementations.
Please let me know if there are any questions or if further adjustments are needed for this pull request!