This step-by-step guide for the Android app Room Acoustics Meter, more precisely for reading frequency response, phase, impulse response, and related quantities. is intended to provide an introduction to the field. As with any simple guide to a complex topic sometimes counterexamples can be thought up. The goal is to help with measurements of a typical living room to "get your foot in the door" in the first place. For smaller rooms such as car hifi or larger rooms such as event halls, the conditions assumed here do not apply. At the end, some measurement curves of private living rooms are interpreted for illustration.

Instructions for performing the measurement

The prerequisite of the following sections is that the data originate from a correctly executed measurement. Before starting the measurement, the elementary rules for setting up loudspeakers should in turn have been taken into account.

Hifi apps guide you step by step through the measurement. It is possible to decide how many microphone positions should be measured. In an average living room, where the listening positions are no more than 2 m apart, two or three measurements two or three measurements often already give an overview of which measures make sense. The microphone should be located where the listeners' ears would otherwise be.

What will be better with more measurements?

If possible, several setups of the speakers and listening positions should be compared. For each setup again at least 2 to 3 measurements should be made. Sometimes significant improvements already arise from changes in the range of 50 cm.

A better, but also more complex way is a so-called sound field measurement: Instead of rounding within a measurement over adjacent frequencies (smoothing) several (over 10) measurements in a range of perhaps 50 cm are made and the (less strongly rounded) frequency responses are interpreted together. The result will certainly not knock down the results of other measurements, but will provide additional precision for fine-tuning a sound processor or equalizer.

More microphone positions naturally make the result more precise when more listening positions are involved. For rating the listening experience, measurements are generally taken only at (potential) listening positions. Measurements at other microphone positions have an indirect benefit: If there is more interest, a complete matrix of the room, e.g. at ear level with 1x1 m grid, can be recorded. In this way, one obtains a deeper understanding of the entire wave field in the room and can plan specific damping measures and the positioning of sound sources and listening positions.

Hifi-Apps are written in such a way that, with little effort, a high number of measurements can also be made: Determination of the microphone position and saving the results are automated. The Android devices can easily be carried around the room, since there is no need for the external power supply as well as the external audio interface.

When you start the app, you will be prompted to connect the Android device to the playback system via cable, if possible. Practical experiences for different types of connections:

Bluetooth: You should not lump all protocols and devices together. However, it does happen that frequencies above approx. 4 kHz are lost during measurement. Then please don't give in to the temptation "... you can actually still use it".
Jack plug: Works well, but some devices filter out very high and very low frequencies. filtered out, see here. The low bass range below 40 Hz can be distorted by this, but matching subwoofers works without problems in practice.
USB: The best solution. With USB C, the analog headphone signal can also be output via this connector (depending on the device). Where the D/A converter is located depends on the configuration, but the measurement results do not significantly depend on the converter used. Caution: Some devices do not switch reliably to the external source. At the beginning of the measurement a microphone test is offered together with the possibility to explicitly set input and output. If a measurement microphone is used this step should not be skipped! The display should response clearly when tapping with the finger on the microphone's basket. Otherwise, the internal microphone may still be in operation.

Initially, it is also possible to work without measurement microphone, see here. This introduction assumes that the measurement was made with an omnidirectional microphone. This has become established because different measurements would otherwise hardly be comparable. For those interested, there is of course nothing to stop them from experimenting anyway, e.g. with so-called 3D measurement methods [Protheroe 2013]: For the app, it does not matter where the data come from. In principle, directional information is of course useful in practice: The search for interfering reflections can and should be supported by clapping and (directional) listening (see below).

In the following steps, the app guides you dialog-based until the measurement is started. The Farina algorithm used is relatively insensitive to ambient noise during the measurement. It In practice, it is even used shortly before the start of a concert with a waiting audience. Nevertheless, if necessary it is a good idea to bring loudly rattling objects under control and repeat the measurement.

The measurement must be performed at moderate volume: The measurement signal puts considerably more strain on the ears and loudspeakers than music of the same volume. Some professional acousticians wear hearing protection for every measurement.

Setup settings (for advanced users)

In the setup menu the duration of the log sweep and its cutoff frequencies can be set. Since the sweep is faded in and out slowly for technical reasons (Blackman windowing 0.05 s leadin, 0.005 s leadout), the frequency range that can be used later is somewhat narrower. 20 Hz as start is sufficient to find resonances in rooms up to 8 m length. The upper cutoff frequency helps with various length measurements from the sound runtime, 20 kHz are proven in practice. Whether the loudspeaker covers a wider frequency range according to the manufacturer does not matter for the important part of the measured values.

The duration determines the precision of the measurement. It should be long enough to give interfering room resonances a chance to resonate and thus be found in the results. To excite a room resonance, the sweep must be long enough in its frequency range. A good start is 2 seconds. Example for technicians: For a room mode with 300 ms rise time and 1/3 octave bandwidth vs. 10 octaves for the sweep, you get 300 ms * 1/(1/3) * 10 ≃ 10 seconds. This is the highest value that can be set. If someone deliberately uses short sweeps to create optically nice straight frequency responses with the accompanying poor resolution, they should probably look around for another field of activity.

The sample rate can be specified in the setup menu. The app receives the offered values from the Android system of the device, as well as the information "native" for a certain value, usually 48,000 per second. This value should not be changed without concrete reasons (see below in the section "Impulse response"). High values ("a lot helps a lot") are not helpful per se.

Evaluation of the frequency response

By "frequency response", (FR) and SPL (Sound Pressure Level) is meant here the sum frequency response, i.e. the sound pressure with which a certain frequency at some future time arrives at the microphone after it has been fed into the system. Thus, both sound emitted directly from the loudspeaker and reflections from the walls, etc. are included. To get an overview of the room properties, this is a good start.

Side thoughts on the sum frequency response and

The view of the sum frequency response is unusual at first: Compared to the manufacturer's loudspeaker measurements, much larger fluctuations occur due to the room properties. Manufacturer's loudspeaker measurements usually show only the direct sound - the user's room is, of course, not known at the time of the measurement. Sometimes the measurement is really done outdoors. The sum frequency response in a listening room has hardly anything to do with the characteristics of the loudspeaker. Both frequency responses have their justification and are reflected in the listening impression: A balanced behavior in direct sound can be seen as a basic requirement, similar to the way the light source in a projector must provide white light. In Background knowledge is described, which sonic characteristics are to be expected with which characteristics of the room. Our sense of hearing is apparently able to separate the first direct sound from the loudspeaker and the reflections in such a way that not all these fluctuations are important.
The sum frequency response should be oriented to a target curve that drops a few dB to higher frequencies [Møller 1974], [Toole 2015], [Web search "Target Curve spl" or "House curve spl"]. The app offers the possibility to show some established target curves.

Prerequisite for the interpretation are correctly performed measurements (see above). The results should resemble the curves shown here.

Unfiltered	1/24 octav smoothing	1/8 octav smoothing	1/3 octav smoothing
Click images to enlarge

Sum frequency response of two mid-range floorstanding loudspeakers (Quadral Chromnium Style 50) without sound correction in a 5m x 7m room. The microphone positions of all measurements are less than 20 cm apart, i.e. within a range of ear distance and head movements. The upper blue curves and lower red curves are from the left and right channels, respectively. The curves have a shift of 20 dB each. The measurement was made with a calibrated Audix TM-1 measurement microphone connected to a Samsung SM-T510 tablet via a Shure X2u XLR-to-USB adapter. The wide light blue curve is the preferred target curve of trained listeners according to [Toole 2015].

Die selben Daten, ohne Shift.

Left column - (almost) universal: For an intuition where the data comes from and what accuracy can be expected, it certainly doesn't hurt to look at a few unfiltered frequency responses at the beginning. Later this step can be skipped. The extreme fluctuations above about 300 kHz are caused by reflections. In open air measurements and in the near field of the loudspeaker they are very reduced or not visible at all. The strength of this comb filter effect gives an idea of the respective uncertainty in the results: It is debatable to what extent the auditory sense is oriented to the peaks or to the averages. In the case of large fluctuations, a drop in treble after averaging can therefore be a consequence of the rounding method and completely meaningless for the auditory impression. In the example measurements below, these raw data are therefore also attached.

"1/... Octave smoothing" columns - (almost) universal: Comparing (in the lower diagrams) first the different measurements (each of a stereo channel), one sees that the values diverge at about 300-400 Hz (i.e. wavelengths of 1 m). This is plausible because for standing waves maximum and node are 1/4 of the wavelength apart, i.e. in the range in which the microphone was moved. The uncertainties are about 5 dB.
Zwischen eng (≪ 1/3 Oktave) benachbarten Frequenzen innerhalb jeder einzelnen Messung treten ebenfalls Schwankungen von ungefähr 5 dB auf. Obviously, above about 300 Hz, the wavelength-dependent gains and cancellations due to small changes in microphone position are very comparable to those due to small changes in frequency. In the range below 300 Hz the measurements are more similar. However, the individual measurement curves fluctuate more strongly at some points. Whether this is caused by reflections or room modes cannot always be seen in the sum frequency response.

These effects occur with practically all sum frequency responses of normal living rooms. With a higher direct sound component (see below), the fluctuation range can fall below 5 dB, which is desirable. With favorably placed (and driven) subwoofers or fewer reflections due to a more suitable floor plan, the peaks in the range below 300 Hz should be much more moderate. The additional effort for sound field measurements, i.e. averaging over many closely neighboring microphone positions, is only worthwhile in rare cases. Averaging over neighboring frequencies leads to very similar results in the upper frequency range, and in the lower frequency range the measurements are similar anyway. Some sound engineers, for example, use this method only for a final last adjustment of the sound correction for a specific listening position.

Qualitatively generalizable: Both the 5 dB fluctuations of neighboring frequencies and the much more pronounced room modes or reflections in the bass range show that the room still has "some room for improvement" acoustically, but is not "catastrophic". As with many wave patterns in nature, it is the overall picture that counts; the individual fluctuations have little significance as long as they are evenly distributed. Basically, not every reflection is bad. However, the frequency response smoothed over 1/3 octave should definitely have fluctuations below 10 or maximum 15 dB. If they are evenly distributed, they are less of a problem. Setups in which a few frequencies stand out very prominently should be subjected to a listening test. Positioning of loudspeakers, listening positions and damping with a balanced response will probably sound better.

The peak at 35 Hz indicates a room mode. Although the speakers reach their limit in this range, it is clearly evident. Measures: From this you can already guess that the use of a single subwoofer is probably pointless. The 35 Hz mode, together with the peaks just below 100 Hz, will unbearably dominate the sound image and when trying to prevent this by clever setups, some other room mode will certainly be played. (Currently, 4 subwoofers are therefore in use, which are controlled via a sound processor).
Another reason can be a subwoofer of the lower price range: Its resonances may have been designed to be strong and correspondingly narrow-band - with the goal of producing as much noise per money as possible.
The narrow peak at just under 70 Hz in the left channel and the broader peak at 80-100 Hz in the right channel are audible as a growly, muddy bass. 1/4 of the wavelength here is about 1 m. Measures: Moving speakers or listening positions by 0.5...1 m may provide some help. Unlike the "subwoofer range", where damping can be replaced by additional cleverly controlled speakers, damping would actually be required here. However, for this frequency range, the material cost is quite high. If the effect is similar for all listening positions, an equalizer could be tried.
The emphasis on the range up to 300 Hz creates a listening experience "as if there was a blanket covering it". Only close to the saturation level do the speakers "open up and become live-like". Measures: Like the previous point, this suggests a moderate electronic lowering of this range.
The lowering of the treble from 1500 Hz may also be due to the measurement or rounding method (see above). The values can change significantly if the tweeters are aligned with the microphone or the distance between the loudspeaker and the microphone is smaller. Depending on the target curve, room characteristics and personal experience in handling the measurement equipment, it should not be corrected blindly. If one were to increase the level here by 15 dB, the result would be (briefly, until the ribbon tweeters burn out) a terrible sound image.
Floor bounce is not noticeable here, but can be found further down in the sample measurements. They can clearly disturb the auditory sensation. If the frequency response shows an m-shaped curve, it can be checked as follows whether a reflection is the reason.
- Open the overlay with SET
- Switch Sie REFLECTION to ON.
- Use the 3 sliders ↣ ↕ ↑ to position the calculated reflectance curve over the measured values.
Nur only if this is possible, it is a reflection. For technicians: Otherwise, the respective frequency maxima have no integer ratio of wavelength to detour. More.
Downward fluctuations in the bass range up to about 150 Hz are probably due to mutually cancelling waves. Basically, missing frequencies are considerably less noticeable in the listening experience than boosts. At least as long as the system is not used for mixing your own productions and these are then overcompensated. Attempts to eliminate them by moving the speakers usually creates new fluctuations. However, it is definitely worth comparing several measurements and finding "the least evil".
The evaluation of the entire 1-octave smoothed response assumes that a) the microphone has been calibrated. Above about 5 kHz, at least some experience should have been gained with different arrangements of microphone to tweeter. Measures: The frequency response can be compared with the different target curves (see above) and a house curve can be selected.
Regardless of the previous point, the entire 1-octave smoothed curve can be used to compare tendencies - e.g. whether different channels (Right Left, Surround) show significant differences. This also works without a calibrated microphone. Of course, a microphone with a directional pattern must not favor any loudspeaker. As described above, the measured values have nothing to do with the frequency response of the loudspeaker.

The broader increases and decreases in the frequency response can be assigned to certain sound characters. This can contribute to the understanding of a listening test. If neutrality is not the goal, it can be incorporated into corrective measures. Search engine queries "sound character vs. frequency response" yield interesting findings, some of which are summarized here from the author's personal point of view:

All frequencies in Hz	20-40	40-200	200-500	500-2k	2k-6k	6k-12k	12k-20k
Important to	Feel vibrations	100: beginning of the deepest human voices 150: warmth of sound, body of voices	200: foundation of the human voice	sensitive area of hearing. For guitars and voices	most sensitive area of hearing. Clarity, proximity of voices.	brilliance, openness. Cymbals, strings	for sparkling brilliant transparent treble. Cymbals.
Sound at boost +10 dB	Depending on listening experience and taste	40-60 boomy, 100: bass muddy, but 60-150: boost for fullness in the bass range, fat sound, 125-250: "as if there's a blanket over it"	300: potted... 600: dull, honky, hollow	500-2k "old kitchen radio". 1k: nasal, thin, 2k: thin, harsh	2k-6k: "megaphone sound". 5k: sharp, hissing	7k: unpleasant "s" and "t" sounds, web search: "de-esser"
Sound at -10 dB lowering	Too bad	150: bass seems disconnected	250 bass seems hollow, 450: voices seem hollow	1k: lower about 5 dB to "de-mulmerize"	2k: clarity lacks 3k: "attack" Missing, cloying, romanticized 4k: finesse lacks		.
Blauert			300-600: front	700-1800: behind	2,2k-6k: front	8k-11k: above	11k-17k: behind

The sensations in lines 2-4 do not satisfy any scientific requirement, they are certainly not systematically reproducible. If single frequencies are indicated, a range of 1 to 2 octaves FWHM is meant. The last line refers to the so-called Blauert bands: By raising certain frequency ranges, one can create the auditory impression that the sound comes from the front, above, behind or below [Blauert 1974, Wikipedia].

Side thoughts about smoothing

The question which smoothing is the most meaningful for what fills many forum pages. Basically (for sound field measurements - see above) can be averaged over several measurements, or the averaging is done over the frequency response, where again there are several methods. Before reading, one's own intuition should be created by switching (especially between 1/3 and 1/8 octave smoothing) what is to be seen in each case. As a basis, the raw data should also be viewed. Both the average sound pressure and the maximum sound pressure of each small frequency range go into the perceived loudness.

For the listening impression, the trends in the superposition of direct and reflected signals at the microphone or ear are important. Even in the smoothed data, the resulting individual peaks should not be overinterpreted: Their position also changes depending on the microphone location. In addition, listeners change their sitting position and have about 14 cm distance between the eardrums. It is important that this does not change the sound too much. There are two basic ways to realize this:

durch einen hohen Direktschallanteil. Dafür gibt es in der Studiotechnik Nahfeldmonitore. Der Hörabstand ist meist unter 2 m. Je stärker der Raum gedämmt ist, desto unkritischer ist der Hörabstand. Hornlautsprecher ermöglichen ebenfalls größere Hörabstände.
durch Ergodizität, d.h. eine gute Durchmischung, so dass möglichst alle Frequenzen zu allen Zeiten aus den richtigen Richtungen reflektiert werden. Rund- oder dipolartig abstrahlende Lautsprecher sorgen sicherlich "für einen guten Anfang", wenn eine solche Schallverteilung bevorzugt ist.

Selten erwähnt wird die Kanalgleichheit. Sie hat mindestens die gleiche Priorität wie der Frequenzgang: Ein System bei dem Sänger nicht, oder schlimmer an einer völlig falschen Stelle geortet werden, kann nicht ernsthaft als "eingemessen" bezeichnet werden. Die Kanalgleichheit hat mit der Gleichheit der Frequenzgänge des rechten und linken Kanals zu tun, viele weitere Effekte gehen aber ebenso ein. Anfangs sind Hörtests sicher der bessere Weg zur Wertung.

Auswertung von Phase, Delay, Gruppenlaufzeit

Während der Frequenzgang zeigt, wie stark eine bestimmte Frequenz wiedergegeben wird, zeigen diese Kurven in verschiedenen Formen, wann sie wiedergegeben wird. Nicht alle Frequenzen kommen immer gleich schnell an: Ein Beispiel aus der Natur ist das Knacken von Eis auf einem zugefrorenem See: Aus einiger Entfernung ändert sich das "Knack"-Geräusch zu einer Art "Piu", weil die höheren Frequenzen von Eis schneller übertragen werden. Auch beim Einschlag eines Blitzes kommt der tieffrequente Donner oft später als der Knall des Einschlags an. Im Gegensatz zum brechenden Eis, dessen Verhalten sich physikalisch nachvollziehen lässt, spielen hier allerdings viele nicht genau bekannte Faktoren zusammen. Bei entfernten Explosionen ist der zeitliche Verlauf beispielsweise völlig anders.

Das Verhalten von Schall in Wohnräumen entspricht eher dem zweiten Beispiel. Nur Teilaspekte scheinen manchmal durch Konzepte wie Bassreflexkanäle erklärbar. Ohne Maßnahmen an der Raumakustik braucht man sich wohl nicht damit zu beschäftigen. Danach sicherlich im High End Bereich, ansonsten vielleicht. Ein Rezept zum Verständnis der Kurven steht bereits an anderer Stelle.

Evaluation of the impulse response

Roughly speaking, impulse response (IR) is what you hear when a sharp bang is played. It characterizes echoes, reflections, reverberation, etc. in the listening room as well as the reaction of the loudspeaker cabinets including the associated electronics. Instead of the bang, Hifi-Apps play a more tolerable logsweep and reconstruct the data of the bang from it.

Before dealing with the measured impulse response, it should be intuitively clear what it is all about. For this, you can clap your hands in different rooms and hear how the reverb decays. The ideal listening room must be neither "completely dead" nor reverberant like a bathroom. The reverb should also be evenly distributed in the room (e.g. not bounce back and forth between facing walls). With a little practice, this can be heard well. It can help to turn your head in different directions while clapping.

The measured impulse response shows the time course of the sound pressure or a closely related quantity. The x-axis always shows the time, in the case of hi-fi apps optionally also as a product with the speed of sound in meters. The curve can be divided into three parts:

The peak on the left shows the direct sound. This is the sound that reaches the microphone without detours. Vividly it is the first and most intensive part of the imaginary bang. Its shape comes from the transient of certain parts of the loudspeaker and would be similar in an anechoic chamber. It does not necessarily have to be a single peak, but several oscillations next to each other should not be overinterpreted. Not every spike in this range provides useful information, there are also computational artifacts [Usher 2010].
Early reflections are often relatively isolated, faithful repetitions of the direct sound. They can also be visible in the frequency response and are already described here.
Late reverberation. The individual reflections go by more and more reflections into an ergodic state: Descriptively, the loudest part no longer jumps back and forth in space, but has spread out. Except for the loudness of the reverberation, nothing changes anymore. The auditory sense perceives the latter as more pleasant. This is why diffusers are used to distribute sound.

While direct sound is a single event that can be well identified, the separation between early reflections and late reverberation is less obvious: They may overlap or there may be a gap "transition time" between the two [Defrance 2009]. The following figure always shows the same 4 impulse responses or a selection of them in different representations. Practical goal could be to bring the bad light blue/beige impulse response into the range of the good dark blue/red one by damping, placing diffusors, changing listening places etc.

Impulsantwort	Zoom auf Anfangsbereich	Lineare Darstellung	zugehöriger Frequenzgang
Bilder zum Vergrössern anklicken

Spalte 1: Vergleich der Impulsantworten eins leerem quadratischen Raums und eines gedämmten Hörraums, jeweils linker und rechter Kanal. Spalte 2: Ausschnitt aus dem ersten Bild, die beiden prominentesten Peaks von a) mit Werten markiert. Spalte 3: Lineare Darstellung der selben Messung, zur Übersichtlichkeit nur die linken Kanäle (blau). Spalte 4: Zugehöriger Frequenzgang von a). Die roten Kurven sind durch berechnete Werte: Sie zeigen die Schallintensität bei Überlagerung zweiter um den jeweils angegebenen Betrag verzögerten Wellen.

Spalte 1: Wie oben (Zeitachse verlängert 500 ms), zusammen mit Schroeder Kurven und davon abgenommenen Werten für - EDT (Early Decay Time gem. ISO 3382), d.h. Dauer des Abfalls vom dB-Wert bei t=0 bis -10 dB, multipliziert mit 6 - T20, d.h. die Dauer des Abfalls von -5 dB auf -20 dB, multipliziert mit 4. Die Schroederkurve startet hier bei -10 ms, deshalb ist der Wert bei t=0 etwas unter 0 dB. Dadurch ändern sich die Ergebnisse nicht. Alle Nachhallzeiten beziehen sich auf einheitlich 60 dB Abfall, d.h. die abgelesenen Werte für 20 dB bzw. 15 dB sind bereits mit 3 bzw. 4 multipliziert. Spalte 2 und 3: gefilterte Impulsantworten von jeweils 2 ausgewählten Oktavbändern mit Schroederkurven. Spalte 2: ungedämmter Raum (hellblau), Spalte 3: gedämmt (dunkelblau). Erkennbar ist die starke Methodenabhängigkeit der T60 Werte: Im Bassbereich des ungedämmten Raumes sind die Schwankungen so stark, dass eine Festlegung des Nachhalls unabhängig von der Messmethode sinnlos erscheint. Der Anstieg bei ca. 300 ms ist psychoakustisch sicher völlig anders zu beurteilen als der Beginn der Kurve. Die Entscheidung ihn in die Bestimmung von T60 aufzunehmen ist damit subjektiv. Vergleicht der EDT Werte beider Spalten zeigt, dass auch hier Ausreißer auftreten können, die das Hörerlebnis wohl nicht widerspiegeln. Spalte 4: Die in Terz-Schritten gefilterten Impulsantworten mit Schroederkurven und T20 Werten. Wie für Spalte 2 erläutert, sollten den Werten für den Bassbereich nicht naiv vertraut werden.

Hellblau: Die Impulsantwort des leeren Raums ist erwartungsgemäß so schlecht, dass die Einzelheiten kaum interpretiert werden müssen. Trotzdem - Spalte 2: In der Vergrößerung stechen insbesondere der Einschwingvorgang in den ersten 3 ms und die Peaks danach heraus. Die ersten beiden Peaks wurde mit der Schall Laufzeit gekennzeichnet. Offenbar gibt es zwei Schall spiegelnde Flächen, die den Schall mit 2,84 bzw. 4,56 m Umweg von linken Lautsprecher zum Mikrofon reflektieren. Die akustische Behandlung könnte mit der Suche nach diesen Flächen per Schnur-Methode und ihrer Behandlung beginnen. Um den Einschwingvorgang am Anfang etwas zu verteilen, könnte an Diffusoren gedacht werden. Andererseits zeigt die untere Bildreihe, dass die Nachhallzeit viel zu lang ist, was mehr für Dämmung spricht. Vermutlich wäre jede Behandlung eine Verbesserung.

Tolerances of 10 to 20 cm are normal for length specifications, and correspondingly more for large speakers, because the exact location of the sound source is often unclear due to the interaction of several drivers. Similar discrepancies between room size and measured delays should not cause too much concern either. It is probably enough to know that the acoustic size of a room can be up to 20% larger than its architectural size. Especially if the walls are wood or plasterboard and resonate (web search "sound at media boundaries" for more).

There are several possibilities for the Y axis of the impulse response. Basically in the first step you can choose between linear and logarithmic representation. Since the sound pressure in the impulse response of a living room falls logarithmically over long distances, the latter is almost self-evident. The decrease appears as a straight line and deviations from it become recognizable. The linear representation is closer to the measured raw data. Accordingly, technical details such as the interaction of different loudspeaker chassis are better visible. To view parts with small amplitude, the curve can be enlarged with two-finger gestures.

There is a fundamental problem with both representations: the energy arriving at the microphone for a time period is closely related to the area under the corresponding piece of curve, depending on the representation. Ideally, it is concentrated on a single narrow high peak. In the real world, however, its high-frequency and low-frequency components sooner or later diverge. The high-frequency parts, however, naturally remain limited to a short period of time. Consequently, they are represented by high narrow peaks. The low-frequency components cover a comparable area, but this area extends over a much longer period of time. Therefore they can hardly be seen, although they are much more important for the auditory sense. In forum discussions on room acoustics, a raw or slightly smoothed logarithmic representation is usually posted.

On the other hand, the peaks are a useful indicator to identify individual reflections as accurately as possible. They cannot simply be "smoothed away". So to understand the system, you should switch back and forth between the representations. The app offers the following options:

Linear representations
[RAW] shows the raw data, as in the examples above
[AVG] shows the smoothed raw data. The sharp peaks caused by high frequency components disappear, thus reflections without high frequency components become better recognizable. The bandwidth of the smoothing can be adjusted with the "Bandwidth" slider. This representation should only be used after viewing the unsmoothed data. Otherwise, different reflections could melt together due to the "smearing" of the peaks. The EBU 3276 standard, for example, focuses attention on spikes in the unsmoothed impulse response.
[STEP] shows the step response, i.e. the integral of [RAW], i.e. the area collected so far under the curve RAW when it is scanned from left to right. In an ideal system, it would jump from 0 to 1 at time 0 and stay there. In a real system, significant oscillations are visible, but they irritate the sense of hearing surprisingly little. The advantage of this representation is that reflections with and without high frequency components also look similar here, if they transfer similar amounts of energy. An example can be found in the documentation of the ETC (see below).
[ETC]: The Energy Time Curve, also Impulse Response Envelope is documented here. It shows the instantaneous energy (more precisely: the Hilbert Transform) of the sound field measured by the microphone. In combination with other representations it can help to detect unwanted reflections.
Logarithmic representations
[RAW] wie bei "Lineare Darstellungen". Siehe Text weiter oben für den Unterschied linear / logarithmisch. Der begrenzte Platz kleiner Bildschirme kann besser genutzt werden, wenn der untere dB Wert begrenzt wird. Dazu muss in der zentralen Ansicht der App (Hauptmenü) geöffnet werden. Die Einstellung erfolgt unter SETUP/"Range for sound pressure [dB]".
[MAX] Zeigt die Maximalwerte von RAW (in 0.5 ms Intervallen). Dadurch wird verhindert, dass Ausschläge nach unten Bildschirm-Platz verbrauchen. Dadurch lassen sich auf kleinen Bildschirmen und vielen Kurven darstellen.
[ETC] wie bei "Lineare Darstellungen".
T60 Method Comparison
[RAW], [MAX], [ETC] arbeiten wie in "Logarithmische Darstellungen"
Schröder Kurve: zeigt die gesamte Energie, die noch im Schallfeld steckt. Sie macht nachvollziehbar, woher die berechneten Nachhallzeiten und deren Abweichungen bei verschiedenen Methoden kommen. Siehe Text weiter oben.
Die Nachhallzeit in Sekunden zeigt an, wie schnell der Hall in einem Raum unter die Hörschwelle (-60 dB) abklingt. Hifi-Apps verwendet die Schröderkurve zur Berechnung. Je nach Methode werden zwei Punkte dieser Kurve ausgesucht und mit einer Geraden verbunden. Die Nachhallzeit ist dann die Zeit, die diese Gerade braucht, um 60 dB "Höhenunterschied" zu verbinden. Sie hängt also davon ab, welche Punkte in der Kurve jeweils vergleichen werden. Für gute Lokalisation ist eher der Anfang der Kurve wichtig, für einen angenehmen Raumeindruck spätere Teile. Entsprechend gibt es mehrere Normen, die in der App ausgewählt werden können. Die App bietet zusätzlich an, den Start- und Endpunkt frei zu wählen. Grundlage ist entweder der dB-Wert, also die Y-Koordinate oder der Zeit-Wert, also die X-Koordinate. In allen Fällen kann sowohl die Schröderkurve als auch die verbindende Gerade eingeblendet werden. Auf diese Weise erhält man nicht nur den Wert für die Nachhallzeit, sondern ein umfassenderes Bild wo er herkommt.

Die Button-Zeile darunter [T=0] [FILTER]... öffnet verschiedene Einstellmöglichkeiten, die aus Platzgründen bei Start ausgeblendet sind. Ihre Dokumentation wird in der App gezeigt: Wenn [HINTS] aktiviert ist, erscheint zu dem jeweils letzten berührten Bedienelement eine Kurzanleitung. [FILTER] ist weiter unten beschrieben.

Waterfall

Der "normale" Summenfrequenzgang zeigt, wie stark jede Frequenz wiedergegeben wird, wobei egal ist, wann sie ankommt. Im Wasserfalldiagramm ist wie beim Summenfrequenzgang die Amplitude gegen die Frequenz aufgetragen. Allerdings wird die ankommende Schallintensität in Zeitabschnitte zerhackt und so auf mehrere Kurven verteilt. Diese werden ab jetzt FR(t) genannt, was darstellen soll, dass es sich um zeitabhängige Frequenzgänge (Frequency Responses) handelt. Der erste FR(t) ist also beispielsweise der Frequenzgang für die ersten 5 ms, der nächste für die nächsten 5 ms usw. Dadurch wird besser erkennbar, ob einzelne Unregelmäßigkeiten im Summenfrequenzgang z.B. von kurzen, intensiven Reflexionen oder weniger intensiven aber dafür längeren Resonanzen kommen. Sowohl die Hörsamkeit (also ob überhaupt Maßnahmen dagegen nötig sind) als auch die Maßnahmen selber hängen davon ab.

Die Ausgabe entsteht, indem der Frequenzgang nicht aus der gesamten Impulsantwort berechnet wird, sondern jeweils aus dem Abschnitt, der für FR(t) zeitlich untersucht werden soll. Jeder FR(t) entsteht also aus einem bestimmten Abschnitt der Impulsantwort um die Zeit t herum.

Selbst ein ideales Wasserfalldiagramm in einem absolut reflexionsfreien Raum würde sich nicht auf FR(t=0) beschränken: Würde man im Beispiel die Breite der untersuchten Zeitabschnitte auch auf 5 ms (= 1/200 Hz) setzen, könnten Frequenzen unterhalb von 200 Hz schlichtweg nicht definiert werden, da keine volle Periode in diesen Zeitabschnitt passt. Dieses Heisenberg-Limit verhindert, dass es schnelle Bass Arien gibt. Es kann nicht durch geschickte Algebra überwunden werden.

Aus diesem Grunde muss immer ein Kompromiss aus Zeit- und Frequenzgenauigkeit gewählt werden. Dafür gibt es zwei etablierte Methoden [IRZU FFT WAVELETS]:

FFT
Im Overlay für die Einstellungen wird die Anzahl der Kurven zusammen mit der zeitlichen Position der ersten und letzten Kurve gezeigt. Klick auf [CHANGE] ermöglicht die Änderung. Dazu wird die Ansicht temporär gegen die Impulsantwort ausgetauscht. Man sieht jetzt die einzelnen Abschnitte der Impulsantwort, aus denen die FR(t) berechnet werden. Die Einstellungen für das Bild oben: Hier sind 20 stark überlappende Abschnitte gewählt, von denen der erste bei -10 ms, also 10 ms vor dem Haupt-Peak der Impulsantwort anfängt. Der letzte beginnt 500 ms später. Der Kompromiss zwischen Genauigkeit von Zeit oder Frequenz wird durch die Breite der einzelnen Abschnitte festgelegt. Hierfür dienen die beiden letzten Slider "FFT Rise" und "FFT Top+Fall". Die Summe ihrer Werte (hier also 15 ms + 204 ms) ergibt einen Anhaltspunkt für die Frequenzauflösung, die hier bei wenigen Hz liegen sollte. Die Abschnitte sind also sehr breit gewählt und überlappen auch entsprechend stark. Entsprechend schlecht ist auch die zeitliche Auflösung. Dafür ist die schmalbandige Mode bei 110 Hz im Bild oben gut erkennbar.
Wavelet
Diese Darstellung passt wesentlich besser zum Ausgabe-Modus [Spektrogramm] und ist hier nur für Vergleichszwecke mit den FFT-Daten implementiert. Anschaulich legt die Methode auf viele Stellen der Impulsantwort jeweils relativ kurze Wellenzüge mit wenigen Schwingungen (Wavelets). Die Impulsantwort wird also auch Stückweise abgetastet. Diese Abtastung wird aber mehrmals wiederholt, wobei die Frequenz der Wavelets geändert wird. Ihre Form bleibt dabei gleich, d.h. bei halbierter Frequenz verdoppelt sich ihre Länge. Dadurch werden für die gesamte Impulsantwort jetzt nur noch halb so viele Wavelets gebraucht. Das hat den Vorteil, dass bei hohen Frequenzen, wo eine hohe Zeitauflösung möglich ist, viele Abtastungen in entsprechend kurzen Zeitabschnitten erfolgen. Bei tiefen Frequenzen werden die Zeitabschnitte dagegen lang genug um die Frequenzauflösung (pro Oktave) beizubehalten.
Beispiel: Das Wavelet hat eine Länge von 5 Wellenzügen. Setzt man seine Frequenz auf 40 Hz, deckt es einen Bereich von 0.125 Sekunden ab. Mit etwas Überlapp genügen also ungefähr 10 Wavelets um das 40-Hz Band der Schallverteilung in der ersten Sekunde zu untersuchen. Dabei ist theoretisch eine Genauigkeit von bis zu 1 / 0.125 s = 8 Hz, also grob 1/3 Oktave möglich. Für 80 Hz benötigt am 20 Wavelets, erhält so die doppelte Zeitauflösung mit Genauigkeit 16 Hz, also wieder 1/3 Oktave. Bei 800 Hz können 200 Wavelets, bei 8000 Hz 2000 Wavelets verwendet werden. Die Frequenzauflösung bleibt bei 1/3 Okt, die zeitliche Auflösung wird immer besser.
Bei der App reduzieren sich die Einstellungen also theoretisch auf die Genauigkeit in Oktaven und Beginn und Ende des abzutastenden Zeitbereichs. In der Praxis muss das Wasserfalldiagramm aber Kurven mit festem Zeitabstand liefern, deshalb bleiben Anzahl der Kurven und Auflösung einstellbar. Die Wavelets werden dann "entgegen ihrer natürlichen Auflösung" auf diese Kurve "gezwungen". Es bleibt der Vorteil, dass sehr kleine Zeitabschnitte des Hochtonbereichs aufgelöst werden. Bei der Spektrogramm-Darstellung entfällt dieser Zwang und die Wavelets können ihre Vorteile erheblich besser ausspielen.

Spektrogramm

The spectrogram is similar in content to the waterfall diagram. It uses different colors instead of different curves. In both diagrams, one axis is for frequency. In the spectrogram, the other axis is for time and the color is for intensity. In the waterfall diagram, the other axis is for intensity and different curves are shown for different times. With these differences in mind, the documentation is the same as the waterfall diagram.

T60

In the Evaluation of Impulse Response section, we explained how hi-fi apps measure reverberation time T60 and presented various filtered impulse responses with associated T60. The calculations here are identical, but go one step further: the reverberation time is calculated with third-octave or octave spacing for many frequencies and displayed as a function of frequency. This allows a direct comparison with common standards. These can be selected and displayed. To calculate the standard values, the room volume must be specified.

The calculated reverberation times in the bass range must not be blindly trusted. As shown in the section "Evaluation of the impulse response", the filtered impulse response there is subject to considerable fluctuations of various period durations. These result from a mixture of chaotic room acoustics (below the Schroeder frequency), which strongly depend on the microphone position, etc., and filter effects. Ultimately, a single T60 value can never fully describe the highly complex behavior of a listening room in this frequency range. Only if its behavior is stable, however, comparative measurements and different evaluations, it can serve as a reference point. In [Zehner Ringversuch] measurements with different 13 acoustics software packages, operated by sound engineers with mostly far more than 10 years of professional experience, are compared.

It is much easier to attenuate high frequencies than low ones. Low frequencies, however, play an important role in word understanding. Accordingly, the reverberation time in the lower frequency range must be kept under control as well as possible. When DIN 18041 was drawn up, compromises were made between acoustic quality and feasibility for the lower frequency range [Fuchs 2019]. In case of doubt, it is therefore better if the measured values lie further down in the range of the standard.

If the reverberation time in the low frequency range is too long, the cause can be investigated in the spectrogram or waterfall diagram. A room mode should be found in the respective frequency band. The elimination in an existing building is difficult. Some approaches can be found here.

Room Acoustics

This representation can be understood as a parameterized sound pressure map. Sound pressure maps show how loud it is at the individual listening positions. In private rooms, however, this is not very important; the volume is more or less the same everywhere. The problem is the resonances and reflections that can create a different auditory image in each place. The sound pressure in particular for a certain frequency is therefore much more decisive here. This is exactly where the representation comes in: The sound pressure map is supplemented by a controller with which the frequency can be tuned:

Rohdaten	44 Hz (λ = 7,8 m)	140 Hz (λ = 2,5 m)	170 Hz (λ = 2,0 m)
Bilder zum Vergrößern anklicken

Spalte 1: Rohdaten aus den Frequenzgängen (1/3 Okt geglättet). 1: Umschalter "Alle Frequenzen" vs. "Raummoden": für das Verhalten des Sliders zur Frequenzeinstellung daneben 2: Setup zum Umschalten zwischen Darstellung der Rohdaten und als Heatmap etc. 3: Auswahl der Kanäle. Spalte 2, 3, 4: Heatmap bei verschiedenen Frequenzen. Der schmale schwarz - weiß wechselnde Streifen am rechten Bildrand stellt die Wellenlänge dar.

For the example, measurements were taken at 28 places. The graphic shows the floor plan of the room, the red areas are loud for the respective frequency, the green areas are quiet. At 44 Hz and 140 Hz, clear room resonances build up, once transversely and once longitudinally. The 44 Hz resonance does not behave "textbook": on the right side it is barely visible. The room has two thin glass doors there, which are obviously acoustically favorable. The next steps should be to investigate which other loudspeaker setups or drive units improve the picture. It is clearly recognizable that the green / blue area can hardly be improved by more power: For the required increase of the sound pressure by approx. 10 dB the power would have to be increased tenfold, which would make the bass in other room and frequency ranges (and outside the room) unbearable.

Measurement details

The measurement should extend over the entire floor space of the room in a grid at ear level. (Later during the evaluation, the diagram thus shows the floor plan of the room). The app automatically determines the microphone positions based on the sound propagation times. This makes the horizontal modes with their respective maxima visible. The more clearly the maximum is visible at a certain point, the more effective is the damping at exactly this point. This gives you concrete comparative values of what attenuation can do at a certain point and you are less dependent on intuition and guesswork.

As a mesh width 1 m has proven itself. Smaller scale fluctuations should be understood as trends as described above. So you can save yourself the trouble of measuring them individually.

The measurement is performed separately on the right and left channel. If subwoofers are used, they should be set "as usual". Even with stereo recordings without a separate bass (LFE) channel, the bass range can be (partially) monophonic mixed. This can cause amplifications and cancellations between both loudspeakers, which would be detected by further measurements, where several loudspeakers are driven at the same time. According to my personal experiences one ends however thereby fast in the "curse of the many parameters", if one does not follow a firm procedure. A suggestion for this can be found in the documentation for the app "Subwoofer Optimizer".

Evaluation details

Above the diagram there is a slider for selecting the frequency. The "All Frequencies" vs. "Room Modes" toggle to the left controls the behavior of the slider. "All frequencies" allows the free selection of the frequency, "Room modes" offers some "especially suspicious" frequencies with large level differences depending on the room position.

Falls mehrere Kanäle (R, L...) oder mehrere Lautsprecheraufstellungen gemessen wurden, wird im unteren Bereich eine entsprechende Auswahlmöglichkeit eingeblendet:

All speakers (max. SPL of ...) zeigt einen ersten Überblick. Raummoden sind erkennbar, egal durch welchen Lautsprecher in welcher Position sie angeregt werden. Die App verwendet für jeden Mikrofonplatz alle Messungen (rechter und linker Kanal, ggf. mehrere Lautsprecheraufstellungen) und zeigt den maximal gemessenen Schalldruck an. (Die Messungen sind automatisch gegeneinander eingepegelt.)
All positions of spkr. ... funktioniert wie der erste Punkt, beschränkt sich aber auf den jeweiligen Kanal (nur rechter und nur linker Kanal). So kann ermittelt werden, ob bestimmte Moden durch einen einzelnen Lautsprecher angeregt werden.
Single pos. of spkr. ... beschränkt sich schließlich auf eine Aufstellung eines bestimmten Lautsprechers. So kann ermittelt werden, ob bestimmte Moden durch einen einzelnen Lautsprecher in einer bestimmten Aufstellung angeregt werden.

The impulse response - technical background

Frequency response and impulse response are Fourier transformed to each other. In the examples above, filters were used in both the frequency and time domains. The Fourier and Hilbert transforms of these filters produce artifacts that must not be mixed with the physical properties of the system under study: Filters in the frequency domain can be seen as a kind of transient in the temporal domain and vice versa. This is called filter ringing and also occurs in linear systems, so technically it has a different origin than "normal ringing" due to nonlinearities.

The examples were generated in the app by replacing the measured microphone signal with the logsweep. This was used to simulate an ideal system that accurately reproduces the logsweep. Filtering of the impulse response is done in the [IMP RESP] view by activating [FILTER]. In some cases the logsweep was additionally edited by wave editor (see tables).

The following table shows artifacts caused by different sample rates and limiting the frequency response at 20 kHz:

Impulse response	Zoom to initial region	Linear representation	corresponding frequency response
Click images to enlarge

Column 1: The filtering of the impulse response is done in the [IMP RESP] view by activating [FILTER]. Column 2: Impulse response for input = logsweep. Sections with enlarged time axis and different sample rates. Column 3: Analog picture at lower frequency using a filter in the app. Column 4: Logarithmic plot of the same filtered data with reverberation time.

The impulse response shows in the normal view hardly recognizable, but after zooming of the time axis clear deflections with approx. 0.05 ms period duration, thus roughly 20 kHz. These do not depend on the sample rate. It can only be seen that the image becomes clearer at 192 kHz. The unsmoothed Hilbert transform (ETC for Energy Time Curve) forms the envelope.

For comparison, several filters with different slopes were set at 50 Hz. Here, an analogous picture appears in the corresponding time scale. Thus, the visible artifact is most likely due to the frequency response being limited at 20 kHz. An ideal filter, which would cut off abruptly at the angular frequency $\omega_g$, would have the impulse response $\sin(\omega_g t)/t$ and the zero-crossings would be a bit further out at 10 ms (web search: si-filter, Küpfmüller lowpass). The filters used here have a finite bandwidth and therefore decay faster. A roloff bandwidth of 1/3 octave allows the impulse response to decay to a non-interfering level after 5 to 10 periods. To cover this additional 1/3 octave, the measurement data would have to go up to 25 kHz.

Conclusion: Vibrations in the range of the Roloff frequency are artifacts and no indication for defects or inaccuracies in the used equipment or Consequences of whimsical "graded reflective" objects. If they occur during the examination of reflections one should classify them accordingly and not be bothered. For elimination, a Roloff well above the 20 kHz range would be necessary. Since the usual measurements do not provide any data there, this would only be possible with computational tricks.

The fourth image shows the same data logarithmically (in dB) and the corresponding T60 value. Even if the impulse response used has a reverberation time close to zero, the calculated reverberation time after filtering is not zero. In the figure, T20 (i.e. T60 determined from a drop of 20 dB) has the value 150 ms at 1 octave bandwidth and 50 Hz corner frequency. At a bandwidth of 1/6 octave, the value increases to 327 ms. In practice, if the filters are set too narrowly, the filter ringing can reach the order of magnitude of the useful signal [Goertz 2020]. This must not be forgotten when investigating narrow frequency bands.

Different speakers and rooms

	Beschreibung	Frequency und Implse Response - Bild zum Vergrössern anklicken	Hifi-Apps Ergebnis und Kommentar
PC R5x5	Kleine PC Lautsprecher auf einem Tisch in einem nahezu ungedämmten quadratischen Raum (5m x 5m).		2019-08-29_INT.PCSPKR.WZ5X5 (Deutsch)
PC OFFICE	Kleine PC Lautsprecher im Büro.		2019-08-13_TAB_PC_SPKR_CABLE_CONNECTED LIST_2019-08-13_TAB_PC_SPKR_BT_CONNECTED
GAUS + ABSORBER	Modularer Eigenbau mit Plasma Hochtönern (Magnat MP 02) - Dämmung mit aktustischen Vorhägen.		LIST_2019-08-19_INT.GAUS.VORH.OFFEN LIST_2019-08-19_INT.GAUS.VORH.ZU LIST_2019-08-19_AUDIX.GAUS.VORH.OFFEN LIST_2019-08-19_AUDIX.GAUS.VORH.ZU.
ML + ABSORBER	Dipol Lautsprecher (Martin Logan ESL 9)		2019-09-22_ML_INT_XXX
QU50 LIV	Standlautsprecher (Quadral Chromnium Style 50) in 5m x 7m grossem durchschnittlich möbliertem Raum		LIST_2019-06-11_AUDIX.WZ.STR-GTN LIST_2019-06-11_INT.WZ.STR-GTN.
QU30 BA	Kein worst case Szenario: Quadral Chromnium Style 30 in einem Badezimmer.		LIST_2019-10-17_INT_QUADRAL_BAD
QU30 R5x5	Künstliche Provokation von Bodenreflexionen (rechter Lautsprecher zum Boden geneigt)		LIST_2019-10-17_AUDIX_W5_FB. LIST_2019-10-17_INTW5_FLOORBOUNCE.
MARTION	Hornlautsprecher MARTION Bullfrog aktiv		LIST_2019-10-30_MARTION_BULLFROG

Literatur

[Blauert 1974, Wikipedia] Wikipedia Artikel "Blauertsche Bänder": https://de.wikipedia.org/wiki/Blauertsche_B%C3%A4nder

[Defrance 2009] G. Defrance, L. Daudet and J-D. Polack: Using Matching Pursuit for estimating mixing time within Room ImpulseResponses. DOI:10.3813/AAA.918239 https://www.institut-langevin.espci.fr/IMG/pdf/defrance2009.pdf

[Fuchs 2019] Helmut Fuchs, Vortrag TU-Berlin 2019.

[Goertz 2020] Anselm Goertz: Seminat "Studioakustik und Monitorlautsprecher" 2020 TU-Berlin.

[IRZU FFT WAVELETS] IRZU, Research - Category: FFT & WAVELETS https://irzu.org/category/research/fft-wavelets/

[Møller 1974] Møller, Henning: Relevant loudspeaker tests in studios inHi-Fi dealers' demo rooms in the home etc. using 1/3 octave, pink-weighted, random noise. 47th Audio Engineering Society Convention, 1974-02-26/29, Copenhagen (Denmark), https://www.bksv.com/media/doc/17-197.pdf

[Toole 2015] Toole, Floyd E: The Measurement and Calibration of Sound Reproducing Systems, in: Journal of the Audio Engineering Society 63(7/8):512-541, August 2015, http://www.aes.org/e-lib/download.cfm/17839.pdf?ID=17839

[Usher 2010] John Usher: "An improved method to determine the onset timings of reflections in an acoustic impulse response". The Journal of the Acoustical Society of America 127, EL172 (2010); https://doi.org/10.1121/1.3361042 https://asa.scitation.org/doi/10.1121/1.3361042

[Zehner Ringversuch] Markus Zehner: Ringversuch Nachhallzeit http://www.zehner.ch/lab/ringversuch4.html

[Protheroe 2013] Daniel Protheroe, Bernard Guillemin: "3D impulse response measurements of spaces using aninexpensive microphone array". Toronto, Canada International Symposium on Room Acoustics 2013 http://www.iris.co.nz/media/14459/ISRA2013.pdf

Understand frequency response, impulse response, phase, etc.