Nengo 음성 데이터를 neural network에 입력으로 넣기

음성 데이터를 이용해 neuron의 활성을 조절해보자!

먼저 librosa를 이용해 음성파일을 load해보았다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import librosa
import librosa.display
import IPython.display
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.font_manager as fm
 
import tensorflow as tf
 
%matplotlib inline
 
audio_path = '4_3273.wav'
 
y, sr = librosa.load(audio_path)
 
plt.figure()
plt.plot(y)
cs

1
2
3
4
5
6
time_duration = int(librosa.get_duration(y=y, sr=sr) * 1000)
data_length = y.shape[0]
print("time duration: " + str(time_duration) + " ms")
print("sampling rate: " + str(sr))
print("data length: " + str(data_length))
 
Colored by Color Scripter
cs

time duration: 3529 ms
sampling rate: 22050
data length: 77824

3.5초짜리 데이터이며, sampling rate가 22050으로 총 77824개의 데이터가 저장되었다.

즉, 1초의 음성이 22050개의 데이터로 분리되어 저장되어 있다.

1개의 데이터가 0.0000453초의 음성을 표현한다는 것이다. (0.0000453 * 22050 = 1)

Nengo simulator의 경우, dt가 0.001초로 simulation이 실행된다.

즉, 0.0000453초의 데이터로는 Nengo에서 simulation을 수행할 수 없다.

음성데이터를 입력으로 받는지 테스트를 해보려고 하는 거니까,

간단한 input 함수를 구현해보았다.

1
2
3
4
def voice_input(t):
    ms = t * 1000
    voice = y[int((data_length / time_duration) * int(ms))]
    return voice
Colored by Color Scripter
cs

t는 Nengo simulation이 수행될 때마다 호출하며, second 단위의 시간이 입력된다.

즉, 0.001, 0.002, 0.003 이렇게..

이 input을 받는 network를 설계해보자

1
2
3
4
5
6
7
8
9
with model:
    voice = nengo.Node(voice_input)
    A = nengo.Ensemble(80, dimensions=1)
    nengo.Connection(voice, A, synapse=0.01)
    
    p = nengo.Probe(A, synapse=0.01)
    inp_p = nengo.Probe(voice)
    spike_p = nengo.Probe(A.neurons)
    voltage_p = nengo.Probe(A.neurons, 'voltage')
cs

매우 간단한 1차원의 input->A neuron 집단으로 구성된 network다.

Neuron은 대충 80개로 구성해보았다.

80개 neuron의 발화 형태를 살펴볼까?

1
2
3
4
5
6
7
with nengo.Simulator(model) as sim: # this is the only line that changes
    plt.figure()
    plt.plot(*nengo.utils.ensemble.tuning_curves(A, sim))
    plt.xlabel("input value")
    plt.ylabel("firing rate")
    plt.title(str(nengo.LIF()))
    sim.run(3)
Colored by Color Scripter
cs

이쁘게 잘 발화한다.

1
2
with nengo.Simulator(model) as sim:
    sim.run(int(time_duration-1) / 1000)
cs

Run을 하고 살펴보면..

1
2
3
4
5
6
7
8
9
10
11
12
from nengo.utils.matplotlib import rasterplot
 
plt.figure()
plt.plot(sim.trange(), sim.data[inp_p], 'r', label="Input")
plt.plot(sim.trange(), sim.data[p], label="A output")
plt.xlim(0, 5)
plt.legend()
 
plt.figure()
rasterplot(sim.trange(), sim.data[spike_p])
plt.xlim(0, 5);
 
Colored by Color Scripter
cs

Fail!!!! ^^;;

Wav의 정보를 direct로 받는건 무리가 좀 있는 것 같다..

일부만 확대해서 봐도 input wave의 활성을 따라가지 못하는게 보인다.

그럼 mel spectrogram을 이용해볼까?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
nMels = 80
S = librosa.feature.melspectrogram(y, sr=sr, n_mels=nMels) 
 
log_S = librosa.logamplitude(S, ref_power=np.max)
plt.figure(figsize=(12, 4))
librosa.display.specshow(log_S, sr=sr, x_axis='time', y_axis='mel')
plt.title('mel power spectrogram')
plt.colorbar(format='%+02.0f dB')
plt.tight_layout()
plt.show()
 
def half_normalize(S):
    return (S - S.min()/2) / (-S.min()/2)
 
def simple_normalize(S):
    return (S - S.min()) / (-S.min())
 
norm_S = half_normalize(log_S)
#norm_S = simple_normalize(log_S)
 
plt.figure(figsize=(12, 4))
librosa.display.specshow(norm_S, sr=sr, x_axis='time', y_axis='mel')
plt.title('norm mel power spectrogram')
plt.colorbar(format='%+0.2f dB')
plt.tight_layout()
plt.show()
Colored by Color Scripter
cs

일부러 mel spectrogram값을 -1에서 1로 변환시켜보았다.

역시 앞선 input 함수와 마찬가지로, mel spectrogram을 출력하는 input 함수를 만들었다.

1
2
3
4
5
6
7
8
9
10
transposed_norm_S = norm_S.transpose()
frame_size = int(time_duration/transposed_norm_S.shape[0])+1
 
print(transposed_norm_S.shape)
 
def voice_input(t):
    ms = int(t * 1000)
    frame_num = int(ms / frame_size)
    voice = transposed_norm_S[frame_num]
    return voice
Colored by Color Scripter
cs

똑같이 network를 만드는데, 이번엔 80개의 neuron이 서로 다른 입력을 받을 수 있게 하기 위해 dimension을 80으로 구성해보았다.

1차원으로 구현해도 neuron마다 다른 입력을 받게 할수도 있는데 대충 아는 범위 내에서..

1
2
3
4
5
6
7
8
9
10
11
neuron_number = 80
with nengo.Network() as net:
    voice = nengo.Node(output=voice_input)
    neuronsEns = nengo.Ensemble(neuron_number, dimensions=neuron_number, max_rates=([100] * neuron_number))
    
    nengo.Connection(voice, neuronsEns, synapse=0.01)
    
    voice_probe = nengo.Probe(voice)
    neurons_probe = nengo.Probe(neuronsEns, synapse=0.01)
    spike_probe = nengo.Probe(neuronsEns.neurons)
    voltage_probe = nengo.Probe(neuronsEns.neurons, 'voltage')
Colored by Color Scripter
cs

1
2
with nengo.Simulator(net) as sim:
    sim.run(int(time_duration) / 1000)
cs

결과를 출력해보자

1
2
3
4
5
6
7
plt.figure(figsize=(20, 4))
plt.subplot(1, 2, 1)
librosa.display.specshow(norm_S, sr=sr, x_axis='time', y_axis='mel')
 
plt.subplot(1, 2, 2)
rasterplot(sim.trange(), sim.data[spike_probe])
plt.xlabel('time [s]');
Colored by Color Scripter
cs

대~~충 spectrogram을 입력으로 받는 network가 구성된 것 같다 ^^;;

좀 더 Nengo 문서를 자세히 봐야겠다..

저작자표시 비영리 (새창열림)

'Deep Learning Tools > Nengo' 카테고리의 다른 글

Prescribed Error Sensitivity (PES) learning rule (0)	2018.11.10
Bienenstock, Cooper, and Munro (BCM) learning rule (0)	2018.11.10
Nengo neuron에 특정한 값 input으로 넣기 (0)	2018.11.06
많은 neuron으로 sine wave 표현하기 (0)	2018.11.05
Nengo로 구현하는 MNIST CNN spiking neural network 예제 (5)	2018.10.26

Banana Media Lab

Nengo 음성 데이터를 neural network에 입력으로 넣기

'Deep Learning Tools > Nengo' 카테고리의 다른 글

티스토리툴바

Nengo 음성 데이터를 neural network에 입력으로 넣기

'Deep Learning Tools > Nengo' 카테고리의 다른 글

'Deep Learning Tools/Nengo' Related Articles

티스토리툴바