sample.psycho module¶

Classes and functions related to psychoacoustic models

class sample.psycho.GammatoneFilter(f: float = 1, n: int = 4, bandwidth: Optional[Union[float, Callable[[GammatoneFilter], float]]] = None, t_c: Optional[Union[float, Callable[[GammatoneFilter], float]]] = None, phi: Optional[Union[float, Callable[[GammatoneFilter], float]]] = None, normalize: bool = False, a: Optional[Union[float, Callable[[GammatoneFilter], float]]] = None)¶

Bases: object

Gammatone filter

Parameters:

f (float) – Center frequency. Default is 1
n (int) – Filter order. Default is 4
bandwidth (callable or float) – Filter bandwidth in Hz. If callable, it must accept a single argument of type GammatoneFilter. If None (default), then the ERB (erb()) is used
t_c (callable or float) – Leading time in seconds. If callable, it must accept a single argument of type GammatoneFilter. If None (default), then use the group-delay (non-causal filter)
phi (callable or float) – Phase in radians at time t=0. If callable, it must accept a single argument of type GammatoneFilter. If None (default), then use a phase value coherent with the leading time
normalize (bool) – If True, then normalize the IR so that \(|k(t)|^2 = f_s\)
a (callable or float) – Scale parameter. If callable, it must accept a single argument of type GammatoneFilter. If None (default), then do not rescale IR

property a: float¶: Scale parameter

property bandwidth: float¶: Filter bandwidth in Hz

envelope(t: ndarray, out: Optional[float] = None, **kwargs) → ndarray¶

Envelope function for the IR

Parameters:

t (array) – Time axis
out (array) – Optional. Array to use for storing results
**kwargs – Keyword arguments for _envelope()

Returns:

The IR envelope function evaluated at t

Return type:

array

property group_delay: float¶: Group delay of the gammatone_filter in seconds (accounting for the leading time)

ir(t: Optional[float] = None, fs: Optional[float] = None, analytic: bool = False, out: Optional[ndarray] = None, **kwargs) → ndarray¶

Filter IR (scaled)

Parameters:

t (array) – Time axis
fs (float) – Sample frequency
analytic (bool) – If True, use a complex exponential as oscillator, instead of a cosine
out (array) – Optional. Array to use for storing results
**kwargs – Keyword arguments for ir_size()

Returns:

The wave function evaluated at t

Return type:

array

ir_size(fs: float = 1, **kwargs) → int¶

Suggested IR size in samples, based on the t60

Parameters:

fs (float) – Sample frequency
**kwargs – Keyword arguments for t60()

Returns:

Suggested IR size

Return type:

int

property phi: float¶: Initial phase in radians

property raw_group_delay: float¶: Raw group delay of the gammatone_filter in seconds (without accounting for the leading time)

t60(steps: int = 32, n_starts: int = 16, initial_range: Optional[float] = None, floor: float = -60, warn_th: Optional[float] = 0.001) → float¶

Numerically compute the t60 for the IR envelope, i.e. the time instant at which the IR envelope goes 60 dB below the envelope peak

Parameters:

steps (int) – Dichotomic search steps for t60 computation
n_starts (int) – Number of restarts for determining the initial search range before raising an exception
initial_range (float) – Width of the initial search range. In case the t60 is not in the range, the width is doubled n_starts times
floor (float) – Threshold for the t60 in decibel. Default is -60
warn_th (float) – If not None, then raise an exception if the amplitude at the found t60 value is not within warn_th dB from the target value (floor)

Returns:

The t60 value

Return type:

float

property t_c: float¶: Leading time in seconds

wavefun(t: float, analytic: bool = False, out: Optional[ndarray] = None) → ndarray¶

Filter wave function for the IR (non-scaled)

Parameters:

t (array) – Time axis
analytic (bool) – If True, use a complex exponential as oscillator, instead of a cosine
out (array) – Optional. Array to use for storing results

Returns:

The wave function evaluated at t

Return type:

array

class sample.psycho.GammatoneFilterbank(filters: ~typing.Optional[~typing.Iterable[~sample.psycho.GammatoneFilter]] = None, freqs: ~typing.Optional[~typing.Sequence[float]] = None, n_filters: ~typing.Optional[int] = None, flim: ~typing.Tuple[float, float] = (20, 20000), freq_transform: ~typing.Tuple[~typing.Callable[[float], float], ~typing.Callable[[float], float]] = (<function hz2cams>, <function cams2hz>), **kwargs)¶

Bases: object

Bank of gammatone filters

Parameters:

filters (iterable of GammatoneFilter) – Filters that make up the bank. If None (default), then build filters using other arguments
freqs – The center frequencies of the gammatone filters. If None (default), then decide frequencies using other arguments
n_filters (int) – Number of gammatone filters. If None (default), then decide number of filters using other arguments
flim (float, float) – Limits for the frequency response of the gammatone filters
freq_transform – Couple of callables that implement transformations from and to Hertz, respectively. The center frequencies of the gammatone filters will be chosen linearly between flim[0] and flim[1] in the transformed space. Default is hz2cams(), cams2hz() for linear spacing on the ERB-rate scale
**kwargs – Keyword arguments for GammatoneFilter

class PrecomputedIRBank(parent: GammatoneFilterbank, fs: float, analytic: bool = False, **kwargs)¶

Bases: object

Precomputed IR bank for a GammatoneFilterbank

Parameters:

parent (GammatoneFilterbank) – Gammatone filterbank to render
fs (float) – Sample frequency
analytic (bool) – If True, then the IRs are complex-valued. Convolving the complex IRs is faster than convolving the real IRs and then computing the analytic signal of the cochleagram. The resulting cochleagram will be complex. The real part will be the ordinary cochleagram. The absolute value will be the AM envelope of the cochleagram

convolve(x: ndarray, method: Optional[str] = None, stride: Optional[int] = None)¶

Convolve the IRs and organize the outputs in an aligned matrix

Parameters:

x (array) – Input signal
method (str) – Convolution method (either "auto", "fft", "direct", or "overlap-add")
stride (int) – Time-step for output signal. Can’t be used in conjunction with method

Returns:

Cochleagram, will be complex if the IRs are analytic

Return type:

matrix

convolve(x: ndarray, fs: float, analytic: Optional[str] = None, method: Optional[str] = None, **kwargs)¶

Filter the input with the filterbank and produce a cochleagram

Parameters:

x (array) – Input signal
fs (float) – Sample frequency
analytic (str) –
Compute the analytic signal of the cochleagram:
- if "input", then compute the analytic signal of the input (fast, accurate in the middle, bad boundary conditions)
- if "ir" (suggested), then compute the analytic signal of the IRs (fast, tends to underestimate amplitude, good boundary conditions)
- if "output", then compute the analytic signal of the output (slowest, most accurate)
postprocess (callable) – If not None, then apply this function to the cochleagram matrix. Default is hwr(), if the cochleagram is real, otherwise it is None
method (str) – Convolution method (either "auto", "fft", "direct", or "overlap-add")
stride (int) – Time-step for output signal. Can’t be used in conjunction with mehtod

Returns:

Cochleagram

Return type:

matrix

precompute(fs: float, analytic: bool = False) → PrecomputedIRBank¶

Precompute IRs for this filterbank

Parameters:

fs (float) – Sample frequency
analytic (bool) – If True, compute a complex IR bank

Returns:

Precomputed IR bank

Return type:

PrecomputedIRBank

sample.psycho.a_weighting(f: float, db: bool = True, out: Optional[ndarray] = None) → float¶

A-Weighting weights for input frequencies, as of “Electroacoustics - Sound level meters - Part 1: Specifications” (2013)

Parameters:

f (array) – Frequency values in Hertz
db (bool) – If True (default), return the gain to apply in dB with reference at 1kHz (a_weighting(1000) = 0)
out (array) – Optional. Array to use for storing results

Returns:

A-weights

Return type:

array

sample.psycho.bark2hz(b, out: Optional[ndarray] = None, *, mode: str = 'traunmuller')¶

Convert Bark to Hertz

Parameters:

b – Frequency value(s) in Bark
mode (str) – Name of the Bark definition (traunmuller, or wang)
out (array) – Optional. Array to use for storing results

Returns:

Frequency value(s) in Hertz

sample.psycho.cams2hz(c: float, out: Optional[ndarray] = None, *, degree: str = 'quadratic') → float¶

Quadratic definition of ERB-rate-scale

Parameters:

c (array) – Frequency value(s) in Cams
degree (str) – Name of the ERB definition (linear, quadratic)
out (array) – Optional. Array to use for storing results

Returns:

Frequency value(s) in Hz

Return type:

array

sample.psycho.cochleagram(x: Sequence[float], fs: Optional[float] = None, filterbank: Optional[Union[GammatoneFilterbank, PrecomputedIRBank]] = None, analytic: Optional[str] = None, method: Optional[str] = None, stride: Optional[int] = None, **kwargs)¶

Compute the cochleagram for the signal

Parameters:

x (array) – Array of audio samples
fs (float) – Sampling frequency
filterbank (GammatoneFilterbank) – Filterbank object, or precomputed IRs. If unspecified, it will be specified using **kwargs
postprocessing (callable) – If not None, then apply this function to the cochleagram matrix. Default is hwr(), if the cochleagram is real, otherwise it is None
analytic (str) –
Compute the analytic signal of the cochleagram:
- if "input", then compute the analytic signal of the input (fast, accurate in the middle, bad boundary conditions)
- if "ir" (suggested), then compute the analytic signal of the IRs (fast, tends to underestimate amplitude, good boundary conditions)
- if "output", then compute the analytic signal of the output (slowest, most accurate)
method (str) – Convolution method (either "auto", "fft", "direct", or "overlap-add")
stride (int) – Time-step for output signal. Can’t be used in conjunction with method
**kwargs – Keyword arguments for GammatoneFilterbank

Returns:

Cochleagram matrix (filter x time) and the array of center frequencies

Return type:

matrix, array

sample.psycho.erb(f: float, out: Optional[ndarray] = None, *, degree: str = 'quadratic') → float¶

Definition of equivalent rectangular bandwidth by Moore and Glasberg, “Suggested formulae for calculating auditory-filter bandwidths and excitation patterns”

Parameters:

f (array) – Frequency value(s) in Hertz
degree (str) – Name of the ERB definition (linear, quadratic)
out (array) – Optional. Array to use for storing results

Returns:

Equivalent recrangular bandwidths at the given frequencies

Return type:

array

sample.psycho.hwr(a: ndarray, th: float = 0, out: Optional[ndarray] = None)¶

Half-wave rectification

Parameters:

a (array) – Input signal
th (float) – Threshold. Default is 0
out (array) – Optional. Array to use for storing results

Returns:

Half-wave rectified copy of input signal

Return type:

array

sample.psycho.hz2bark(f, out: Optional[ndarray] = None, *, mode: str = 'traunmuller')¶

Convert Hertz to Bark

Parameters:

f – Frequency value(s) in Hertz
mode (str) – Name of the Bark definition (zwicker, traunmuller, or wang)
out (array) – Optional. Array to use for storing results

Returns:

Frequency value(s) in Bark

sample.psycho.hz2cams(f: float, out: Optional[ndarray] = None, *, degree: str = 'quadratic') → float¶

Quadratic definition of ERB-rate-scale

Parameters:

f (array) – Frequency value(s) in Hertz
degree (str) – Name of the ERB definition (linear, quadratic)
out (array) – Optional. Array to use for storing results

Returns:

Frequency value(s) in Cams

Return type:

array

sample.psycho.hz2mel(f, out: Optional[ndarray] = None, *, mode: str = 'default')¶

Convert Hertz to Mel

Parameters:

f – Frequency value(s) in Hertz
mode (str) – Name of the Mel definition (default, fant)
out (array) – Optional. Array to use for storing results

Returns:

Frequency value(s) in Mel

sample.psycho.mel2hz(m, out: Optional[ndarray] = None, *, mode: str = 'default')¶

Convert Mel to Hertz

Parameters:

m – Frequency value(s) in Mel
mode (str) – Name of the Mel definition (default, fant)
out (array) – Optional. Array to use for storing results

Returns:

Frequency value(s) in Hertz

sample.psycho.mel_spectrogram(x: Sequence[float], stft_kws: Optional[Dict[str, Any]] = None, **kwargs)¶

Compute the mel-spectrogram from a STFT

Parameters:

x (array) – Array of audio samples
stft_kws – Keyword arguments for scipy.signal.stft()
**kwargs – Keyword arguments for stft2mel()

Returns:

The array of center frequencies, the array of time-steps, and the Mel-spectrogram matrix (filter x time)

Return type:

array, array, matrix

sample.psycho.mel_triangular_filterbank(freqs: ~typing.Sequence[float], n_filters: ~typing.Optional[int] = None, bandwidth: ~typing.Optional[~typing.Callable[[float], float]] = None, flim: ~typing.Optional[~typing.Sequence[float]] = None, freq_transform: ~typing.Tuple[~typing.Callable[[float], float], ~typing.Callable[[float], float]] = (<function hz2mel>, <function mel2hz>))¶

Compute a frequency-domain triangular filterbank. Specify at least one of n_filters, bandwidth, or flim

Parameters:

freqs (array) – Frequency axis for frequency-domain filters
n_filters (int) – Number of filters. If None (default), infer from other arguments
bandwidth (callable) – Bandwidth function that maps a center frequency to the -3 dB bandwidth of the filter at that frequency. If None, (default), then one filter’s -inf dB cutoff frequencies will be the center frequencies of the previous and the next filter (50% overlapping filters). In this case, the frequency limits flim include the lower cutoff frequency of the first filter and the higher cutoff frequency of the last filter. If a function is provided, then the frequency limits flim are only the center frequencies of the filters
flim (array) – Corner/center frequencies for the filters. If n_filters and bandwidth are both None, they must be 2 more than the number of desired filters. If None, then it will be set to the first and last elements of freqs
freq_transform – Couple of callables that implement transformations from and to Hertz, respectively. If n_filters is not None, the center frequencies of the triangular filters will be chosen linearly between freqs[0] and freqs[1] in the transformed space. Default is hz2mel(), mel2hz() for linear spacing on the Mel scale

Returns:

The triangular filterbank matrix (filter x frequency) and the array of center frequencies

Return type:

matrix, array

sample.psycho.stft2mel(stft: Sequence[Sequence[complex]], freqs: Sequence[float], filterbank: Optional[Sequence[Sequence[float]]] = None, power: Optional[float] = 2, **kwargs)¶

Compute the mel-spectrogram from a STFT

Parameters:

stft (matrix) – STFT matrix (frequency x time)
freqs (array) – Frequencies axis for stft
filterbank (matrix) – Filterbank matrix. If unspecified, it will be computed with mel_triangular_filterbank()
power (float) – Power for magnitude computation before frequency-domain filtering. After filtering, the inverse power is computed for consistence. Default is :data`2`. If :data`None`, then filter the complex stft matrix
**kwargs – Keyword arguments for mel_triangular_filterbank()

Returns:

Mel-spectrogram matrix (filter x time) and the array of center frequencies (only if filterbank is unspecified, otherwise None)

Return type:

matrix, array