sample.psycho module

Classes and functions related to psychoacoustic models

class sample.psycho.GammatoneFilter(f: float = 1, n: int = 4, bandwidth: Optional[Union[float, Callable[[GammatoneFilter], float]]] = None, t_c: Optional[Union[float, Callable[[GammatoneFilter], float]]] = None, phi: Optional[Union[float, Callable[[GammatoneFilter], float]]] = None, normalize: bool = False, a: Optional[Union[float, Callable[[GammatoneFilter], float]]] = None)

Bases: object

Gammatone filter

  • f (float) – Center frequency. Default is 1

  • n (int) – Filter order. Default is 4

  • bandwidth (callable or float) – Filter bandwidth in Hz. If callable, it must accept a single argument of type GammatoneFilter. If None (default), then the ERB (erb()) is used

  • t_c (callable or float) – Leading time in seconds. If callable, it must accept a single argument of type GammatoneFilter. If None (default), then use the group-delay (non-causal filter)

  • phi (callable or float) – Phase in radians at time t=0. If callable, it must accept a single argument of type GammatoneFilter. If None (default), then use a phase value coherent with the leading time

  • normalize (bool) – If True, then normalize the IR so that \(|k(t)|^2 = f_s\)

  • a (callable or float) – Scale parameter. If callable, it must accept a single argument of type GammatoneFilter. If None (default), then do not rescale IR

property a: float

Scale parameter

property bandwidth: float

Filter bandwidth in Hz

envelope(t: ndarray, out: Optional[float] = None, **kwargs) ndarray

Envelope function for the IR

  • t (array) – Time axis

  • out (array) – Optional. Array to use for storing results

  • **kwargs – Keyword arguments for _envelope()


The IR envelope function evaluated at t

Return type:


property group_delay: float

Group delay of the gammatone_filter in seconds (accounting for the leading time)

ir(t: Optional[float] = None, fs: Optional[float] = None, analytic: bool = False, out: Optional[ndarray] = None, **kwargs) ndarray

Filter IR (scaled)

  • t (array) – Time axis

  • fs (float) – Sample frequency

  • analytic (bool) – If True, use a complex exponential as oscillator, instead of a cosine

  • out (array) – Optional. Array to use for storing results

  • **kwargs – Keyword arguments for ir_size()


The wave function evaluated at t

Return type:


ir_size(fs: float = 1, **kwargs) int

Suggested IR size in samples, based on the t60

  • fs (float) – Sample frequency

  • **kwargs – Keyword arguments for t60()


Suggested IR size

Return type:


property phi: float

Initial phase in radians

property raw_group_delay: float

Raw group delay of the gammatone_filter in seconds (without accounting for the leading time)

t60(steps: int = 32, n_starts: int = 16, initial_range: Optional[float] = None, floor: float = -60, warn_th: Optional[float] = 0.001) float

Numerically compute the t60 for the IR envelope, i.e. the time instant at which the IR envelope goes 60 dB below the envelope peak

  • steps (int) – Dichotomic search steps for t60 computation

  • n_starts (int) – Number of restarts for determining the initial search range before raising an exception

  • initial_range (float) – Width of the initial search range. In case the t60 is not in the range, the width is doubled n_starts times

  • floor (float) – Threshold for the t60 in decibel. Default is -60

  • warn_th (float) – If not None, then raise an exception if the amplitude at the found t60 value is not within warn_th dB from the target value (floor)


The t60 value

Return type:


property t_c: float

Leading time in seconds

wavefun(t: float, analytic: bool = False, out: Optional[ndarray] = None) ndarray

Filter wave function for the IR (non-scaled)

  • t (array) – Time axis

  • analytic (bool) – If True, use a complex exponential as oscillator, instead of a cosine

  • out (array) – Optional. Array to use for storing results


The wave function evaluated at t

Return type:


class sample.psycho.GammatoneFilterbank(filters: ~typing.Optional[~typing.Iterable[~sample.psycho.GammatoneFilter]] = None, freqs: ~typing.Optional[~typing.Sequence[float]] = None, n_filters: ~typing.Optional[int] = None, flim: ~typing.Tuple[float, float] = (20, 20000), freq_transform: ~typing.Tuple[~typing.Callable[[float], float], ~typing.Callable[[float], float]] = (<function hz2cams>, <function cams2hz>), **kwargs)

Bases: object

Bank of gammatone filters

  • filters (iterable of GammatoneFilter) – Filters that make up the bank. If None (default), then build filters using other arguments

  • freqs – The center frequencies of the gammatone filters. If None (default), then decide frequencies using other arguments

  • n_filters (int) – Number of gammatone filters. If None (default), then decide number of filters using other arguments

  • flim (float, float) – Limits for the frequency response of the gammatone filters

  • freq_transform – Couple of callables that implement transformations from and to Hertz, respectively. The center frequencies of the gammatone filters will be chosen linearly between flim[0] and flim[1] in the transformed space. Default is hz2cams(), cams2hz() for linear spacing on the ERB-rate scale

  • **kwargs – Keyword arguments for GammatoneFilter

class PrecomputedIRBank(parent: GammatoneFilterbank, fs: float, analytic: bool = False, **kwargs)

Bases: object

Precomputed IR bank for a GammatoneFilterbank

  • parent (GammatoneFilterbank) – Gammatone filterbank to render

  • fs (float) – Sample frequency

  • analytic (bool) – If True, then the IRs are complex-valued. Convolving the complex IRs is faster than convolving the real IRs and then computing the analytic signal of the cochleagram. The resulting cochleagram will be complex. The real part will be the ordinary cochleagram. The absolute value will be the AM envelope of the cochleagram

convolve(x: ndarray, method: Optional[str] = None, stride: Optional[int] = None)

Convolve the IRs and organize the outputs in an aligned matrix

  • x (array) – Input signal

  • method (str) – Convolution method (either "auto", "fft", "direct", or "overlap-add")

  • stride (int) – Time-step for output signal. Can’t be used in conjunction with method


Cochleagram, will be complex if the IRs are analytic

Return type:


convolve(x: ndarray, fs: float, analytic: Optional[str] = None, method: Optional[str] = None, **kwargs)

Filter the input with the filterbank and produce a cochleagram

  • x (array) – Input signal

  • fs (float) – Sample frequency

  • analytic (str) –

    Compute the analytic signal of the cochleagram:

    • if "input", then compute the analytic signal of the input (fast, accurate in the middle, bad boundary conditions)

    • if "ir" (suggested), then compute the analytic signal of the IRs (fast, tends to underestimate amplitude, good boundary conditions)

    • if "output", then compute the analytic signal of the output (slowest, most accurate)

  • postprocess (callable) – If not None, then apply this function to the cochleagram matrix. Default is hwr(), if the cochleagram is real, otherwise it is None

  • method (str) – Convolution method (either "auto", "fft", "direct", or "overlap-add")

  • stride (int) – Time-step for output signal. Can’t be used in conjunction with mehtod



Return type:


precompute(fs: float, analytic: bool = False) PrecomputedIRBank

Precompute IRs for this filterbank

  • fs (float) – Sample frequency

  • analytic (bool) – If True, compute a complex IR bank


Precomputed IR bank

Return type:


sample.psycho.a_weighting(f: float, db: bool = True, out: Optional[ndarray] = None) float

A-Weighting weights for input frequencies, as of “Electroacoustics - Sound level meters - Part 1: Specifications” (2013)

  • f (array) – Frequency values in Hertz

  • db (bool) – If True (default), return the gain to apply in dB with reference at 1kHz (a_weighting(1000) = 0)

  • out (array) – Optional. Array to use for storing results



Return type:


sample.psycho.bark2hz(b, out: Optional[ndarray] = None, *, mode: str = 'traunmuller')

Convert Bark to Hertz

  • b – Frequency value(s) in Bark

  • mode (str) – Name of the Bark definition (traunmuller, or wang)

  • out (array) – Optional. Array to use for storing results


Frequency value(s) in Hertz

sample.psycho.cams2hz(c: float, out: Optional[ndarray] = None, *, degree: str = 'quadratic') float

Quadratic definition of ERB-rate-scale

  • c (array) – Frequency value(s) in Cams

  • degree (str) – Name of the ERB definition (linear, quadratic)

  • out (array) – Optional. Array to use for storing results


Frequency value(s) in Hz

Return type:


sample.psycho.cochleagram(x: Sequence[float], fs: Optional[float] = None, filterbank: Optional[Union[GammatoneFilterbank, PrecomputedIRBank]] = None, analytic: Optional[str] = None, method: Optional[str] = None, stride: Optional[int] = None, **kwargs)

Compute the cochleagram for the signal

  • x (array) – Array of audio samples

  • fs (float) – Sampling frequency

  • filterbank (GammatoneFilterbank) – Filterbank object, or precomputed IRs. If unspecified, it will be specified using **kwargs

  • postprocessing (callable) – If not None, then apply this function to the cochleagram matrix. Default is hwr(), if the cochleagram is real, otherwise it is None

  • analytic (str) –

    Compute the analytic signal of the cochleagram:

    • if "input", then compute the analytic signal of the input (fast, accurate in the middle, bad boundary conditions)

    • if "ir" (suggested), then compute the analytic signal of the IRs (fast, tends to underestimate amplitude, good boundary conditions)

    • if "output", then compute the analytic signal of the output (slowest, most accurate)

  • method (str) – Convolution method (either "auto", "fft", "direct", or "overlap-add")

  • stride (int) – Time-step for output signal. Can’t be used in conjunction with method

  • **kwargs – Keyword arguments for GammatoneFilterbank


Cochleagram matrix (filter x time) and the array of center frequencies

Return type:

matrix, array

sample.psycho.erb(f: float, out: Optional[ndarray] = None, *, degree: str = 'quadratic') float

Definition of equivalent rectangular bandwidth by Moore and Glasberg, “Suggested formulae for calculating auditory-filter bandwidths and excitation patterns”

  • f (array) – Frequency value(s) in Hertz

  • degree (str) – Name of the ERB definition (linear, quadratic)

  • out (array) – Optional. Array to use for storing results


Equivalent recrangular bandwidths at the given frequencies

Return type:


sample.psycho.hwr(a: ndarray, th: float = 0, out: Optional[ndarray] = None)

Half-wave rectification

  • a (array) – Input signal

  • th (float) – Threshold. Default is 0

  • out (array) – Optional. Array to use for storing results


Half-wave rectified copy of input signal

Return type:


sample.psycho.hz2bark(f, out: Optional[ndarray] = None, *, mode: str = 'traunmuller')

Convert Hertz to Bark

  • f – Frequency value(s) in Hertz

  • mode (str) – Name of the Bark definition (zwicker, traunmuller, or wang)

  • out (array) – Optional. Array to use for storing results


Frequency value(s) in Bark

sample.psycho.hz2cams(f: float, out: Optional[ndarray] = None, *, degree: str = 'quadratic') float

Quadratic definition of ERB-rate-scale

  • f (array) – Frequency value(s) in Hertz

  • degree (str) – Name of the ERB definition (linear, quadratic)

  • out (array) – Optional. Array to use for storing results


Frequency value(s) in Cams

Return type:


sample.psycho.hz2mel(f, out: Optional[ndarray] = None, *, mode: str = 'default')

Convert Hertz to Mel

  • f – Frequency value(s) in Hertz

  • mode (str) – Name of the Mel definition (default, fant)

  • out (array) – Optional. Array to use for storing results


Frequency value(s) in Mel

sample.psycho.mel2hz(m, out: Optional[ndarray] = None, *, mode: str = 'default')

Convert Mel to Hertz

  • m – Frequency value(s) in Mel

  • mode (str) – Name of the Mel definition (default, fant)

  • out (array) – Optional. Array to use for storing results


Frequency value(s) in Hertz

sample.psycho.mel_spectrogram(x: Sequence[float], stft_kws: Optional[Dict[str, Any]] = None, **kwargs)

Compute the mel-spectrogram from a STFT

  • x (array) – Array of audio samples

  • stft_kws – Keyword arguments for scipy.signal.stft()

  • **kwargs – Keyword arguments for stft2mel()


The array of center frequencies, the array of time-steps, and the Mel-spectrogram matrix (filter x time)

Return type:

array, array, matrix

sample.psycho.mel_triangular_filterbank(freqs: ~typing.Sequence[float], n_filters: ~typing.Optional[int] = None, bandwidth: ~typing.Optional[~typing.Callable[[float], float]] = None, flim: ~typing.Optional[~typing.Sequence[float]] = None, freq_transform: ~typing.Tuple[~typing.Callable[[float], float], ~typing.Callable[[float], float]] = (<function hz2mel>, <function mel2hz>))

Compute a frequency-domain triangular filterbank. Specify at least one of n_filters, bandwidth, or flim

  • freqs (array) – Frequency axis for frequency-domain filters

  • n_filters (int) – Number of filters. If None (default), infer from other arguments

  • bandwidth (callable) – Bandwidth function that maps a center frequency to the -3 dB bandwidth of the filter at that frequency. If None, (default), then one filter’s -inf dB cutoff frequencies will be the center frequencies of the previous and the next filter (50% overlapping filters). In this case, the frequency limits flim include the lower cutoff frequency of the first filter and the higher cutoff frequency of the last filter. If a function is provided, then the frequency limits flim are only the center frequencies of the filters

  • flim (array) – Corner/center frequencies for the filters. If n_filters and bandwidth are both None, they must be 2 more than the number of desired filters. If None, then it will be set to the first and last elements of freqs

  • freq_transform – Couple of callables that implement transformations from and to Hertz, respectively. If n_filters is not None, the center frequencies of the triangular filters will be chosen linearly between freqs[0] and freqs[1] in the transformed space. Default is hz2mel(), mel2hz() for linear spacing on the Mel scale


The triangular filterbank matrix (filter x frequency) and the array of center frequencies

Return type:

matrix, array

sample.psycho.stft2mel(stft: Sequence[Sequence[complex]], freqs: Sequence[float], filterbank: Optional[Sequence[Sequence[float]]] = None, power: Optional[float] = 2, **kwargs)

Compute the mel-spectrogram from a STFT

  • stft (matrix) – STFT matrix (frequency x time)

  • freqs (array) – Frequencies axis for stft

  • filterbank (matrix) – Filterbank matrix. If unspecified, it will be computed with mel_triangular_filterbank()

  • power (float) – Power for magnitude computation before frequency-domain filtering. After filtering, the inverse power is computed for consistence. Default is :data`2`. If :data`None`, then filter the complex stft matrix

  • **kwargs – Keyword arguments for mel_triangular_filterbank()


Mel-spectrogram matrix (filter x time) and the array of center frequencies (only if filterbank is unspecified, otherwise None)

Return type:

matrix, array