The CHIL 2004 Evaluation Package was produced within the CHIL Project (Computers in the Human Interaction Loop), in the framework of an Integrated Project (IP 506909) under the European Commission's Sixth Framework Programme. The objective of this project is to create environments in which computers serve humans who focus on interacting with other humans as opposed to having to attend to and being preoccupied with the machines themselves. Instead of computers operating in an isolated manner, and Humans [thrust] in the loop [of computers] we will put Computers in the Human Interaction Loop (CHIL). In this context, the CHIL project produced CHIL Seminars. The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. During the talks, videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and ambient sounds were recorded. The CHIL Seminars have been compiled in four different packages, according to the evaluations for which they have been created and used: - CHIL 2004 Evaluation Package (catalogue reference ELRA-E0009) - CHIL 2005 Evaluation Package (catalogue reference ELRA-E0010) - CHIL 2006 Evaluation Package (catalogue reference ELRA-E0017) - CHIL 2007 Evaluation Package (catalogue reference ELRA-E0033) The CHIL_2004 Evaluation Package consists of the following contents: The whole set of recordings amounts to a total of almost 6 hours of audio recordings and more than 2 hours of video recordings. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds. The database consists of: 1) Audio and Video Recordings: 10 seminars (7 seminars recorded from October to December 2003 and 3 seminars recorded in June 2004). 2) Annotations: Video annotations done displaying 1 over 10 pictures in sequence, for the 4 cameras. 3) Transcriptions: Transcriptions using both TRS and STMUID formats.