Skip to content

krazyjoy/AI_CUP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Workflow

architecture

Meaning of Melspectrogram

Studies have shown that humans do not perceive frequencies on a linear scale. We are better at detecting differences in lower frequencies than higher frequencies. For example, we can easily tell the difference between 500 and 1000 Hz, but we will hardly be able to tell a difference between 10,000 and 10,500 Hz, even though the distance between the two pairs are the same. In 1937, Stevens, Volkmann, and Newmann proposed a unit of pitch such that equal distances in pitch sounded equally distant to the listener. This is called the mel scale. We perform a mathematical operation on frequencies to convert them to the mel scale.

Melspectrogram Feature Process

load audio
$\quad \quad \downarrow$
convert to melspectrogram
$\quad \quad\downarrow$
(n_mels: 256, fmin=0, fmax=14000)
$\quad \quad\downarrow$
frequency map to decibel format
(np.abs(stft))
$\quad \quad\downarrow$
resize to (256, 512)
"from skimage.transform import resize"
$\quad \quad\downarrow$
stack to 3 channels (for cnn)
$\quad \quad\downarrow$
np.stack((stft_db),(stft_db),(stft_db))
$\quad \quad\downarrow$
append each sample to list
$\quad \quad \downarrow \quad$
(nsamples, 3, 256, 512)

a sample outlook of melspectrogram

Medical Record Processing

all columns $\rightarrow$ subset of relevant columns

'ID' 'Sex' 'Age' 'Narrow pitch range'
'Decreased volume' 'Fatigue' 'Dryness' 'Lumping'
'heartburn' 'Choking' 'Eye dryness' 'PND'
'Smoking' 'PPD' 'Drinking' 'frequency'
'Diurnal pattern' 'Onset of dysphonia' 'Noise at work' 'Occupational vocal demand'
'Head injury' 'CVA' 'Voice handicap index - 10' 'Disease category'
  • 'Disease category' is the classification column

CNN Model

The model summary for custom CNN model.

cnn model|300

  • optimizer: nadam
  • minimum lr: 1e-8
  • loss: categorical cross entropy
  • validation ratio: 5%
  • callback: reduce lr on validation loss and early stopping after 10 epochs
  • save checkpoint: True

stft training under 2d cnn model

DNN Model

The model summary for DNN model.
dnn model summary

  • hidden layers: 3
  • activation function: sigmoid (hidden nodes), softmax (categorical prediction)
  • loss: categorical corss entropy
  • optimizer: adam
  • metrics: accuracy

Results UAR and Confusion Matrix

testing data UAR
public 0.687
private 0.543

test public results test private results

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors