Informational masking of speech depends on masker spectro-temporal variation but not on its coherence These datasets comprise listeners’ transcriptions of sentence-length speech analogues for Experiments 1 and 2 of the article of the same title (Roberts and Summers, 2020; Journal of the Acoustical Society of America). There are two spreadsheets for each experiment; one comprising keyword scores and one comprising phoneme scores. Each spreadsheet comprises a summary worksheet and the raw data for each listener. The summary worksheet contains aggregated scores (keywords correct by tight scoring or phoneme scores, see below) for each listener in each condition, with relevant demographic information. Subsequent worksheets comprise the raw data for each listener and stimulus. For the spreadsheet containing the keyword scores, the raw data comprise: (a) the stimulus presented [Column heading: Stimulus], (b) the stimulus sentence [Text], (c) the listener’s response [Transcription], (d) the condition number for which the stimulus was presented [Condition], (e) the number of times the listener heard the stimulus (always once in this experiment), (f) the number of keywords in the stimulus, (g) the loose score (number of keywords reported correctly for which the stem of the word is correct – e.g., “type”, “types”, and “typed” would all be marked correct for keyword “typing”; the loose score was not analysed but is included for completeness), and (h) the tight score (only exactly reported keywords are marked as correct; homonyms are accepted). The mean scores – (i) loose or (j) tight) – for each condition are computed by dividing the number of correct keywords reported for all sentences in the condition (6 sentences/condition) by the total number of keywords. For the spreadsheet containing the phoneme scores, the raw data comprise: (a) the stimulus presented [Column heading: Stimulus], (b) the stimulus sentence [Column heading: Text], (c) the listener’s response [Transcription], (d) the condition number for which the stimulus was presented [Condition], (e) the number of times the listener heard the stimulus (always once in this experiment), (f) the number of correct phonemes in the transcription, (g) the number of phonemes in the transcription that were not in the stimulus and had to be deleted to provide an optimal alignment between stimulus and transcription [deletions], (h) the number of phonemes in the transcription that were not in the stimulus and had to be substituted to provide an optimal alignment [substitutions], (i) the number of phonemes that needed to be inserted into the transcription to provide an optimal alignment [insertions], (j) the total number of phonemes in the stimulus [Num Phonemes], (k) the % correct phonemes (100 * [Correct phonemes]/[Num Phonemes]). The mean proportion of phonemes correct for each condition (l) is computed by dividing the number of correct phonemes reported for all sentences in the condition (6 sentences/condition) by the total number of phonemes.