Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
118 commits
Select commit Hold shift + click to select a range
f476219
Setting up GitHub Classroom Feedback
github-classroom[bot] Oct 29, 2024
92a699b
chore: upload source code
Sbeom12 Oct 29, 2024
30b75ec
chore: add .gitignore
gsgh3016 Oct 29, 2024
fb25323
chore: update git ignore
Sbeom12 Oct 30, 2024
5967a65
chore: upload venv requirements
Sbeom12 Oct 30, 2024
557005f
feat: checking ASCII ratio, Uppercase Ratio and etc
Sbeom12 Oct 30, 2024
712f9a2
chore: add wandb directory and temporary model pt files
gsgh3016 Oct 31, 2024
89b2963
chore: exclude result csv file
gsgh3016 Oct 31, 2024
33bd5ba
feat: add MPS option for macOS
gsgh3016 Oct 31, 2024
696bf1f
feat: re-push baseline code for MPS
gsgh3016 Oct 31, 2024
7be3e5a
feat: check noise data
seohyeon0677 Nov 1, 2024
2c29cf5
chore: add save file
seohyeon0677 Nov 1, 2024
4b56f0a
feat: token-based EDA
gsgh3016 Nov 1, 2024
f26552f
feat: detect noise data
canolayoo78 Nov 1, 2024
cd8f2f3
feat: add cleanlab test result
chell9999 Nov 3, 2024
b9427e3
Merge pull request #15 from boostcampaitech7/feature/token-eda
gsgh3016 Nov 4, 2024
1da424b
chore: add cleanlab in requirements.txt
gsgh3016 Nov 4, 2024
31f9801
feat: add PCA + K-Means clustering
gsgh3016 Nov 4, 2024
2dbf77f
feat: last version for find ascii
Sbeom12 Nov 4, 2024
8327463
Merge branch 'main' of https://github.com/boostcampaitech7/level2-nlp…
Sbeom12 Nov 4, 2024
0a9c5f1
chore: update requirements
Sbeom12 Nov 4, 2024
d982092
chore: add BM25 module
gsgh3016 Nov 4, 2024
940e9c6
WIP: setting BM25 experiment
gsgh3016 Nov 4, 2024
54c7bdf
chore: update: gitignore
Sbeom12 Nov 4, 2024
f845022
WIP: check label clustering
gsgh3016 Nov 4, 2024
ac33cfc
chore: exclude __pycache__
gsgh3016 Nov 4, 2024
41bca2a
docs: create README for module usage
gsgh3016 Nov 4, 2024
a98b3be
refactor: modulize noise detecting
gsgh3016 Nov 4, 2024
ba8c409
rename: change file extension of README for markdown md
gsgh3016 Nov 4, 2024
2e5d6c8
feat: modulize filter/constants.py for easy usage
gsgh3016 Nov 4, 2024
b4b2ef1
fix: add pre-processing for filtering
gsgh3016 Nov 4, 2024
b386376
chore: exclude .pyc files
gsgh3016 Nov 4, 2024
114265f
feat: analyse sparse embedding + PCA + ML based clustering
gsgh3016 Nov 4, 2024
db125f4
docs: add more explanation
gsgh3016 Nov 4, 2024
7ab42e1
rename: data_filter_pipeline -> ML_based_clustering
gsgh3016 Nov 4, 2024
e54bc74
fix: supplement ASCII filter logic
gsgh3016 Nov 4, 2024
e5cb3ae
style: add comment for review
gsgh3016 Nov 4, 2024
9595ab3
refactor: use constants for literal string column name
gsgh3016 Nov 4, 2024
b6bb920
rename: cleanlab_ch to cleanlab_ch_vol1
chell9999 Nov 4, 2024
fd09ab8
add: cleanlab_ch_vol2
chell9999 Nov 4, 2024
be0fad5
Merge branch 'main' of https://github.com/boostcampaitech7/level2-nlp…
chell9999 Nov 4, 2024
76c9736
test: upload cleanlab_ch_vol2 correct version
chell9999 Nov 4, 2024
0d0d6f1
chore: create automatic module checker
gsgh3016 Nov 5, 2024
13e9760
chore: extract minimun dependent modules
gsgh3016 Nov 5, 2024
cf1d09e
Merge branch 'main' into feature/19
chell9999 Nov 5, 2024
990c336
Merge pull request #20 from boostcampaitech7/feature/19
chell9999 Nov 5, 2024
9fc9275
feat: relabel clean data
gsgh3016 Nov 5, 2024
b738f9a
HOTFIX: resolve branch divergence
gsgh3016 Nov 5, 2024
38eeb40
chore: fix dependencies conflict
gsgh3016 Nov 5, 2024
5f64aa1
feat: find field and normalize the ascii problem
Sbeom12 Nov 5, 2024
d069bde
feat: add K-Means clustering
gsgh3016 Nov 5, 2024
ac37faf
fix: change regular expression
gsgh3016 Nov 5, 2024
abb3ae0
Merge pull request #21 from boostcampaitech7/feature/14
gsgh3016 Nov 5, 2024
0d96e43
Merge pull request #24 from boostcampaitech7/main
gsgh3016 Nov 5, 2024
8a09f15
chore: exclude .DS_Store
gsgh3016 Nov 5, 2024
8571f10
feat: add LLM agent for memorizing categories
gsgh3016 Nov 5, 2024
5e1499d
chore: fix pip install for working in local
gsgh3016 Nov 5, 2024
fae5f10
chore: update dependencies
gsgh3016 Nov 5, 2024
312b8ab
chore: update gitignore for exclude model train cache files
gsgh3016 Nov 5, 2024
5a9e3a1
feat: pipeline clustering result
gsgh3016 Nov 5, 2024
0682f7a
chore: exclude log
gsgh3016 Nov 5, 2024
fbbeef1
chore: exclude log files
gsgh3016 Nov 5, 2024
62c5768
feat: update experiment with gemma2
gsgh3016 Nov 5, 2024
fd18cde
Merge pull request #23 from boostcampaitech7/feature/19
gsgh3016 Nov 5, 2024
371bf14
chore:update requirments
Sbeom12 Nov 6, 2024
70dd5b5
chore:update requirements
Sbeom12 Nov 6, 2024
58f2b95
feat: autorizing server environment setting
Sbeom12 Nov 6, 2024
9fdb08d
chore: update make_env
Sbeom12 Nov 6, 2024
766329f
feat: letter to number target mapping
canolayoo78 Nov 6, 2024
a867c5f
feat: EDA gemma2 clustering result
gsgh3016 Nov 6, 2024
1bf644a
style: add comment for experiment result
gsgh3016 Nov 6, 2024
3208466
Merge pull request #26 from boostcampaitech7/feature/25
gsgh3016 Nov 6, 2024
ce28d73
rename: src to utils
gsgh3016 Nov 6, 2024
5fdd108
chore: add dynamic checking feature
gsgh3016 Nov 6, 2024
2cdb4ff
chore: update modules using auto_dependencies.sh
gsgh3016 Nov 6, 2024
fbbd8a6
chore: create 3rd party module storage checker
gsgh3016 Nov 6, 2024
40c5d65
feat: create data_augmentation module
gsgh3016 Nov 6, 2024
6c2124b
fix: clearify import
gsgh3016 Nov 6, 2024
0087da9
fix: commit error
jduck301 Nov 6, 2024
3f9b710
Merge branch 'main' of https://github.com/boostcampaitech7/level2-nlp…
jduck301 Nov 6, 2024
c7535c9
feat: monitor augmentation log
gsgh3016 Nov 6, 2024
a06344e
feat: augment data and fix STS checking
gsgh3016 Nov 6, 2024
8c62302
feat: rtt
Sbeom12 Nov 7, 2024
e2342c8
Merge branch 'main' of https://github.com/boostcampaitech7/level2-nlp…
Sbeom12 Nov 7, 2024
36ebebd
🦄 refactor(baseline_code): modulize baseline (#27)
jduck301 Nov 7, 2024
ca76aa1
🌈 style(baseline code): minor styling
jduck301 Nov 7, 2024
89e415f
🧪 test: relabelling
jduck301 Nov 7, 2024
a945e00
🌈 style(target_mapping notebook): remove outputs
jduck301 Nov 7, 2024
382bc08
feat: add synonym switching augmentation
gsgh3016 Nov 7, 2024
544ad5b
feat: generate data by llm
Sbeom12 Nov 7, 2024
1d5f7ba
Merge branch 'main' of https://github.com/boostcampaitech7/level2-nlp…
Sbeom12 Nov 7, 2024
2ff0f26
docs: create README
gsgh3016 Nov 7, 2024
5ad72bd
feat: concat train data
canolayoo78 Nov 8, 2024
82b36c5
remove: src/target_mapping.ipynb
canolayoo78 Nov 8, 2024
0d9c0de
remove: mapping.py
canolayoo78 Nov 8, 2024
49acc9f
chore: add noise data separator class
seohyeon0677 Nov 8, 2024
4c0aea3
Merge branch 'main' of https://github.com/boostcampaitech7/level2-nlp…
seohyeon0677 Nov 8, 2024
ef1db0f
feat: text cleaner class
seohyeon0677 Nov 8, 2024
f505a77
comment: fix comments
canolayoo78 Nov 8, 2024
1658580
feat: data aug by sh
seohyeon0677 Nov 8, 2024
0bbf782
Merge branch 'main' of https://github.com/boostcampaitech7/level2-nlp…
seohyeon0677 Nov 8, 2024
cb9099a
chore: add class annotation
seohyeon0677 Nov 8, 2024
3f974c2
feat: make config files and refactor: data_classificaition
Sbeom12 Nov 8, 2024
148249c
chore:update data_config
Sbeom12 Nov 8, 2024
1cc930f
chore: delete unused file
Sbeom12 Nov 8, 2024
1be43c8
chore: delete unused file
Sbeom12 Nov 8, 2024
053568b
chore: update config files and classification
Sbeom12 Nov 8, 2024
bf8afdb
feat: main.ipynb
Sbeom12 Nov 8, 2024
c36dd7b
refactroing: find_field and classify the news article
Sbeom12 Nov 8, 2024
ae4ae7e
docs: design README and sub-documents' structure
gsgh3016 Nov 8, 2024
5e86d7e
fix: change competition rule document path
gsgh3016 Nov 8, 2024
31535a7
Merge pull request #28 from boostcampaitech7/feature/6
gsgh3016 Nov 8, 2024
d621fe0
Merge pull request #30 from boostcampaitech7/docs/29
gsgh3016 Nov 8, 2024
537f0ed
chore: find_field
Sbeom12 Nov 8, 2024
37dacee
chore: update main
Sbeom12 Nov 8, 2024
ada952d
Merge branch 'main' of https://github.com/boostcampaitech7/level2-nlp…
Sbeom12 Nov 8, 2024
60eed23
docs: update README
chell9999 Nov 10, 2024
a03d58a
Docs: Update README.md
Sbeom12 Nov 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
data-centric-NLP/
data/
level2_data_centric/
code/wandb/
output/
*.csv
*.pyc
level2_datacentric/
.git/
.github/
*__pycache__*
.DS_Store
logs/
noisy_text_model/
results/
wandb/
*.out
*.log
Loading