UPC-TALP database of isolated meeting-room acoustic events

Full Official Name: UPC-TALP database of isolated meeting-room acoustic events
Submission date: Jan. 24, 2014, 4:32 p.m.

This database was produced within the CHIL Project (Computers in the Human Interaction Loop), in the framework of an Integrated Project (IP 506909) under the European Commission's Sixth Framework Programme. It contains a set of isolated acoustic events that occur in a meeting room environment and that were recorded for the CHIL Acoustic Event Detection (AED) task. The recorded sounds do not have temporal overlapping. The database can be used as training material for AED technologies as well as for testing AED algorithms in quiet environments without temporal sound overlapping. The database contains signals corresponding to 23 audio channels with corresponding labels (out of 84 channels used in the whole CHIL task). The 23 audio channels correspond to: 12 microphones of the 3 T-shaped clusters, 4 tabletop omni directional microphones, and 7 channels of the Mark III array. Data was recorded at 44.1kHz, 24-bit precision, and then converted to 16-bit Raw Little Endian format. All the channels were synchronized. During all recordings two-three additional people were inside the room for a more realistic scenario. Approximately 60 sounds per sound class were recorded. Each session was produced by the same ten people (5 men and 5 women). There are 3 sessions per participant. At each session, the participant took a different place in the room out of 7 fixed different positions. During each session a person had to produce a complete set of sounds twice. A script indicating the order of events to be produced was given to each participant. Almost each event was followed and preceded by a pause of several seconds. All sounds were produced individually, except “applause” and several “laugh” that were produced by the people that were inside the room altogether. The annotation was done manually. The database is stored on 3 DVDs (one session per DVD). The following table summarizes the content of the DVDs and shows the number of annotated acoustic events in each session: <table border="0" width="100%" cellspacing="0" cellpadding="2" class="infoBoxContents"> <tr align=center><td align=left><strong>Event type</strong></td><td><strong>Session 1</strong></td><td><strong>Session 2</strong></td><td><strong>Session 3</strong></td></tr> <tr align=center><td align=left>Knock (door, table)</td><td>15</td><td>18</td><td>17</td></tr> <tr align=center><td align=left>Door open</td><td>20</td><td>20</td><td>20</td></tr> <tr align=center><td align=left>Door close</td><td>20</td><td>21</td><td>20</td></tr> <tr align=center><td align=left>Steps</td><td>28</td><td>24</td><td>21</td></tr> <tr align=center><td align=left>Chair moving</td><td>23</td><td>28</td><td>25</td></tr> <tr align=center><td align=left>Spoon (cup jingle)</td><td>23</td><td>21</td><td>24</td></tr> <tr align=center><td align=left>Paper work (listing, wrapping)</td><td>31</td><td>29</td><td>24</td></tr> <tr align=center><td align=left>Key jingle</td><td>21</td><td>21</td><td>23</td></tr> <tr align=center><td align=left>Keyboard typing</td><td>21</td><td>25</td><td>20</td></tr> <tr align=center><td align=left>Phone ringing/Music</td><td>37</td><td>36</td><td>43</td></tr> <tr align=center><td align=left>Applause</td><td>20</td><td>20</td><td>20</td></tr> <tr align=center><td align=left>Cough</td><td>22</td><td>22</td><td>21</td></tr> <tr align=center><td align=left>Laugh</td><td>22</td><td>21</td><td>21</td></tr> <tr align=center><td align=left>Unknown</td><td>38</td><td>46</td><td>42</td></tr> </table>

Right Holder(s)