sample.psycho module¶
Classes and functions related to psychoacoustic models
- class sample.psycho.GammatoneFilter(f: float = 1, n: int = 4, bandwidth: Optional[Union[float, Callable[[GammatoneFilter], float]]] = None, t_c: Optional[Union[float, Callable[[GammatoneFilter], float]]] = None, phi: Optional[Union[float, Callable[[GammatoneFilter], float]]] = None, normalize: bool = False, a: Optional[Union[float, Callable[[GammatoneFilter], float]]] = None)¶
Bases:
object
Gammatone filter
- Parameters:
f (float) – Center frequency. Default is
1
n (int) – Filter order. Default is
4
bandwidth (callable or float) – Filter bandwidth in Hz. If callable, it must accept a single argument of type
GammatoneFilter
. IfNone
(default), then the ERB (erb()
) is usedt_c (callable or float) – Leading time in seconds. If callable, it must accept a single argument of type
GammatoneFilter
. IfNone
(default), then use the group-delay (non-causal filter)phi (callable or float) – Phase in radians at time t=0. If callable, it must accept a single argument of type
GammatoneFilter
. IfNone
(default), then use a phase value coherent with the leading timenormalize (bool) – If
True
, then normalize the IR so that \(|k(t)|^2 = f_s\)a (callable or float) – Scale parameter. If callable, it must accept a single argument of type
GammatoneFilter
. IfNone
(default), then do not rescale IR
- property a: float¶
Scale parameter
- property bandwidth: float¶
Filter bandwidth in Hz
- envelope(t: ndarray, out: Optional[float] = None, **kwargs) ndarray ¶
Envelope function for the IR
- Parameters:
t (array) – Time axis
out (array) – Optional. Array to use for storing results
**kwargs – Keyword arguments for
_envelope()
- Returns:
The IR envelope function evaluated at
t
- Return type:
array
- property group_delay: float¶
Group delay of the gammatone_filter in seconds (accounting for the leading time)
- ir(t: Optional[float] = None, fs: Optional[float] = None, analytic: bool = False, out: Optional[ndarray] = None, **kwargs) ndarray ¶
Filter IR (scaled)
- Parameters:
t (array) – Time axis
fs (float) – Sample frequency
analytic (bool) – If
True
, use a complex exponential as oscillator, instead of a cosineout (array) – Optional. Array to use for storing results
**kwargs – Keyword arguments for
ir_size()
- Returns:
The wave function evaluated at
t
- Return type:
array
- ir_size(fs: float = 1, **kwargs) int ¶
Suggested IR size in samples, based on the t60
- Parameters:
fs (float) – Sample frequency
**kwargs – Keyword arguments for
t60()
- Returns:
Suggested IR size
- Return type:
int
- property phi: float¶
Initial phase in radians
- property raw_group_delay: float¶
Raw group delay of the gammatone_filter in seconds (without accounting for the leading time)
- t60(steps: int = 32, n_starts: int = 16, initial_range: Optional[float] = None, floor: float = -60, warn_th: Optional[float] = 0.001) float ¶
Numerically compute the t60 for the IR envelope, i.e. the time instant at which the IR envelope goes 60 dB below the envelope peak
- Parameters:
steps (int) – Dichotomic search steps for t60 computation
n_starts (int) – Number of restarts for determining the initial search range before raising an exception
initial_range (float) – Width of the initial search range. In case the t60 is not in the range, the width is doubled
n_starts
timesfloor (float) – Threshold for the t60 in decibel. Default is
-60
warn_th (float) – If not
None
, then raise an exception if the amplitude at the found t60 value is not withinwarn_th
dB from the target value (floor
)
- Returns:
The t60 value
- Return type:
float
- property t_c: float¶
Leading time in seconds
- wavefun(t: float, analytic: bool = False, out: Optional[ndarray] = None) ndarray ¶
Filter wave function for the IR (non-scaled)
- Parameters:
t (array) – Time axis
analytic (bool) – If
True
, use a complex exponential as oscillator, instead of a cosineout (array) – Optional. Array to use for storing results
- Returns:
The wave function evaluated at
t
- Return type:
array
- class sample.psycho.GammatoneFilterbank(filters: ~typing.Optional[~typing.Iterable[~sample.psycho.GammatoneFilter]] = None, freqs: ~typing.Optional[~typing.Sequence[float]] = None, n_filters: ~typing.Optional[int] = None, flim: ~typing.Tuple[float, float] = (20, 20000), freq_transform: ~typing.Tuple[~typing.Callable[[float], float], ~typing.Callable[[float], float]] = (<function hz2cams>, <function cams2hz>), **kwargs)¶
Bases:
object
Bank of gammatone filters
- Parameters:
filters (iterable of GammatoneFilter) – Filters that make up the bank. If
None
(default), then build filters using other argumentsfreqs – The center frequencies of the gammatone filters. If
None
(default), then decide frequencies using other argumentsn_filters (int) – Number of gammatone filters. If
None
(default), then decide number of filters using other argumentsflim (float, float) – Limits for the frequency response of the gammatone filters
freq_transform – Couple of callables that implement transformations from and to Hertz, respectively. The center frequencies of the gammatone filters will be chosen linearly between
flim[0]
andflim[1]
in the transformed space. Default ishz2cams()
,cams2hz()
for linear spacing on the ERB-rate scale**kwargs – Keyword arguments for
GammatoneFilter
- class PrecomputedIRBank(parent: GammatoneFilterbank, fs: float, analytic: bool = False, **kwargs)¶
Bases:
object
Precomputed IR bank for a
GammatoneFilterbank
- Parameters:
parent (GammatoneFilterbank) – Gammatone filterbank to render
fs (float) – Sample frequency
analytic (bool) – If
True
, then the IRs are complex-valued. Convolving the complex IRs is faster than convolving the real IRs and then computing the analytic signal of the cochleagram. The resulting cochleagram will be complex. The real part will be the ordinary cochleagram. The absolute value will be the AM envelope of the cochleagram
- convolve(x: ndarray, method: Optional[str] = None, stride: Optional[int] = None)¶
Convolve the IRs and organize the outputs in an aligned matrix
- Parameters:
x (array) – Input signal
method (str) – Convolution method (either
"auto"
,"fft"
,"direct"
, or"overlap-add"
)stride (int) – Time-step for output signal. Can’t be used in conjunction with
method
- Returns:
Cochleagram, will be complex if the IRs are analytic
- Return type:
matrix
- convolve(x: ndarray, fs: float, analytic: Optional[str] = None, method: Optional[str] = None, **kwargs)¶
Filter the input with the filterbank and produce a cochleagram
- Parameters:
x (array) – Input signal
fs (float) – Sample frequency
analytic (str) –
Compute the analytic signal of the cochleagram:
if
"input"
, then compute the analytic signal of the input (fast, accurate in the middle, bad boundary conditions)if
"ir"
(suggested), then compute the analytic signal of the IRs (fast, tends to underestimate amplitude, good boundary conditions)if
"output"
, then compute the analytic signal of the output (slowest, most accurate)
postprocess (callable) – If not
None
, then apply this function to the cochleagram matrix. Default ishwr()
, if the cochleagram is real, otherwise it isNone
method (str) – Convolution method (either
"auto"
,"fft"
,"direct"
, or"overlap-add"
)stride (int) – Time-step for output signal. Can’t be used in conjunction with
mehtod
- Returns:
Cochleagram
- Return type:
matrix
- precompute(fs: float, analytic: bool = False) PrecomputedIRBank ¶
Precompute IRs for this filterbank
- Parameters:
fs (float) – Sample frequency
analytic (bool) – If
True
, compute a complex IR bank
- Returns:
Precomputed IR bank
- Return type:
- sample.psycho.a_weighting(f: float, db: bool = True, out: Optional[ndarray] = None) float ¶
A-Weighting weights for input frequencies, as of “Electroacoustics - Sound level meters - Part 1: Specifications” (2013)
- Parameters:
f (array) – Frequency values in Hertz
db (bool) – If
True
(default), return the gain to apply in dB with reference at 1kHz (a_weighting(1000) = 0
)out (array) – Optional. Array to use for storing results
- Returns:
A-weights
- Return type:
array
- sample.psycho.bark2hz(b, out: Optional[ndarray] = None, *, mode: str = 'traunmuller')¶
Convert Bark to Hertz
- Parameters:
b – Frequency value(s) in Bark
mode (str) – Name of the Bark definition (traunmuller, or wang)
out (array) – Optional. Array to use for storing results
- Returns:
Frequency value(s) in Hertz
- sample.psycho.cams2hz(c: float, out: Optional[ndarray] = None, *, degree: str = 'quadratic') float ¶
Quadratic definition of ERB-rate-scale
- Parameters:
c (array) – Frequency value(s) in Cams
degree (str) – Name of the ERB definition (linear, quadratic)
out (array) – Optional. Array to use for storing results
- Returns:
Frequency value(s) in Hz
- Return type:
array
- sample.psycho.cochleagram(x: Sequence[float], fs: Optional[float] = None, filterbank: Optional[Union[GammatoneFilterbank, PrecomputedIRBank]] = None, analytic: Optional[str] = None, method: Optional[str] = None, stride: Optional[int] = None, **kwargs)¶
Compute the cochleagram for the signal
- Parameters:
x (array) – Array of audio samples
fs (float) – Sampling frequency
filterbank (GammatoneFilterbank) – Filterbank object, or precomputed IRs. If unspecified, it will be specified using
**kwargs
postprocessing (callable) – If not
None
, then apply this function to the cochleagram matrix. Default ishwr()
, if the cochleagram is real, otherwise it isNone
analytic (str) –
Compute the analytic signal of the cochleagram:
if
"input"
, then compute the analytic signal of the input (fast, accurate in the middle, bad boundary conditions)if
"ir"
(suggested), then compute the analytic signal of the IRs (fast, tends to underestimate amplitude, good boundary conditions)if
"output"
, then compute the analytic signal of the output (slowest, most accurate)
method (str) – Convolution method (either
"auto"
,"fft"
,"direct"
, or"overlap-add"
)stride (int) – Time-step for output signal. Can’t be used in conjunction with
method
**kwargs – Keyword arguments for
GammatoneFilterbank
- Returns:
Cochleagram matrix (filter x time) and the array of center frequencies
- Return type:
matrix, array
- sample.psycho.erb(f: float, out: Optional[ndarray] = None, *, degree: str = 'quadratic') float ¶
Definition of equivalent rectangular bandwidth by Moore and Glasberg, “Suggested formulae for calculating auditory-filter bandwidths and excitation patterns”
- Parameters:
f (array) – Frequency value(s) in Hertz
degree (str) – Name of the ERB definition (linear, quadratic)
out (array) – Optional. Array to use for storing results
- Returns:
Equivalent recrangular bandwidths at the given frequencies
- Return type:
array
- sample.psycho.hwr(a: ndarray, th: float = 0, out: Optional[ndarray] = None)¶
Half-wave rectification
- Parameters:
a (array) – Input signal
th (float) – Threshold. Default is
0
out (array) – Optional. Array to use for storing results
- Returns:
Half-wave rectified copy of input signal
- Return type:
array
- sample.psycho.hz2bark(f, out: Optional[ndarray] = None, *, mode: str = 'traunmuller')¶
Convert Hertz to Bark
- Parameters:
f – Frequency value(s) in Hertz
mode (str) – Name of the Bark definition (zwicker, traunmuller, or wang)
out (array) – Optional. Array to use for storing results
- Returns:
Frequency value(s) in Bark
- sample.psycho.hz2cams(f: float, out: Optional[ndarray] = None, *, degree: str = 'quadratic') float ¶
Quadratic definition of ERB-rate-scale
- Parameters:
f (array) – Frequency value(s) in Hertz
degree (str) – Name of the ERB definition (linear, quadratic)
out (array) – Optional. Array to use for storing results
- Returns:
Frequency value(s) in Cams
- Return type:
array
- sample.psycho.hz2mel(f, out: Optional[ndarray] = None, *, mode: str = 'default')¶
Convert Hertz to Mel
- Parameters:
f – Frequency value(s) in Hertz
mode (str) – Name of the Mel definition (default, fant)
out (array) – Optional. Array to use for storing results
- Returns:
Frequency value(s) in Mel
- sample.psycho.mel2hz(m, out: Optional[ndarray] = None, *, mode: str = 'default')¶
Convert Mel to Hertz
- Parameters:
m – Frequency value(s) in Mel
mode (str) – Name of the Mel definition (default, fant)
out (array) – Optional. Array to use for storing results
- Returns:
Frequency value(s) in Hertz
- sample.psycho.mel_spectrogram(x: Sequence[float], stft_kws: Optional[Dict[str, Any]] = None, **kwargs)¶
Compute the mel-spectrogram from a STFT
- Parameters:
x (array) – Array of audio samples
stft_kws – Keyword arguments for
scipy.signal.stft()
**kwargs – Keyword arguments for
stft2mel()
- Returns:
The array of center frequencies, the array of time-steps, and the Mel-spectrogram matrix (filter x time)
- Return type:
array, array, matrix
- sample.psycho.mel_triangular_filterbank(freqs: ~typing.Sequence[float], n_filters: ~typing.Optional[int] = None, bandwidth: ~typing.Optional[~typing.Callable[[float], float]] = None, flim: ~typing.Optional[~typing.Sequence[float]] = None, freq_transform: ~typing.Tuple[~typing.Callable[[float], float], ~typing.Callable[[float], float]] = (<function hz2mel>, <function mel2hz>))¶
Compute a frequency-domain triangular filterbank. Specify at least one of
n_filters
,bandwidth
, orflim
- Parameters:
freqs (array) – Frequency axis for frequency-domain filters
n_filters (int) – Number of filters. If
None
(default), infer from other argumentsbandwidth (callable) – Bandwidth function that maps a center frequency to the -3 dB bandwidth of the filter at that frequency. If
None
, (default), then one filter’s -inf dB cutoff frequencies will be the center frequencies of the previous and the next filter (50% overlapping filters). In this case, the frequency limitsflim
include the lower cutoff frequency of the first filter and the higher cutoff frequency of the last filter. If a function is provided, then the frequency limitsflim
are only the center frequencies of the filtersflim (array) – Corner/center frequencies for the filters. If
n_filters
andbandwidth
are bothNone
, they must be 2 more than the number of desired filters. IfNone
, then it will be set to the first and last elements offreqs
freq_transform – Couple of callables that implement transformations from and to Hertz, respectively. If
n_filters
is notNone
, the center frequencies of the triangular filters will be chosen linearly betweenfreqs[0]
andfreqs[1]
in the transformed space. Default ishz2mel()
,mel2hz()
for linear spacing on the Mel scale
- Returns:
The triangular filterbank matrix (filter x frequency) and the array of center frequencies
- Return type:
matrix, array
- sample.psycho.stft2mel(stft: Sequence[Sequence[complex]], freqs: Sequence[float], filterbank: Optional[Sequence[Sequence[float]]] = None, power: Optional[float] = 2, **kwargs)¶
Compute the mel-spectrogram from a STFT
- Parameters:
stft (matrix) – STFT matrix (frequency x time)
freqs (array) – Frequencies axis for
stft
filterbank (matrix) – Filterbank matrix. If unspecified, it will be computed with
mel_triangular_filterbank()
power (float) – Power for magnitude computation before frequency-domain filtering. After filtering, the inverse power is computed for consistence. Default is :data`2`. If :data`None`, then filter the complex stft matrix
**kwargs – Keyword arguments for
mel_triangular_filterbank()
- Returns:
Mel-spectrogram matrix (filter x time) and the array of center frequencies (only if
filterbank
is unspecified, otherwiseNone
)- Return type:
matrix, array