wav to midi conversion
Automatic Music Transcription
Bibliography (1970-1998)

Bibliography accumulated here might still be interesting for those working in the field of Automatic Music Transcription.

Keywords: automatic music transcription, wav to midi conversion, fundamental frequency detection, pitch tracking, sound segmentation, note's onset detection, beat induction, rhythm recognition, content based audio retrieval, score following, expressive performance extraction, music perception.


Computer Music Journal
Transactions on Acoustics Speech and Signal Processing
Journal of Audio Engineering Society
Journal of the Acoustical Society of America
Journal of New Music Research
Music Perception
International Conference on Acoustics Speech and Signal Processing
International Computer Music Conference
International Conference on Music Perception and Cognition
International Joint Conference on Artificial Intelligence
Association for Computing Machinery
Center for Computer Research in Musics and Acoustics
Institut Recherche Coordination Acoustique Musique
Massachussets Institute of Technology


Agon, C., G. Assayag, J. Fineberg and C. Rueda (1994). Kant: a critique of pure quantification, ICMC’94, pp. 52-59.

Allen, P.E. and R.B. Dannenberg (1990). Tracking musical beats in real time, ICMC’90, pp. 140-143.

Andre-Obrecht, R. (1988). A new statistical approach for the automatic segmentation of continuous speech signals, IEEE ASSP 36(1).

Askenfelt, A. (1976). Automatic notation of played music (status report), STL-QPSR 1/1976, pp. 1-11.

Askenfelt, A. and K. Elenius (1977). Editor and search programs for music, STL-QPSR, 4/1977, pp. 9-12.

Askenfelt, A. (1979). Automatic notation of played music: the VISA project, Fontes Artes Musicae, Vol. XXVI/2, pp. 109-120.

Avitsur, E. (1993). WATER: A workstation for automatic transcription of ethnic recordings, Computing in Musicology 9, p. 77.

Bagshaw, P.C., S.M. Hiller and M.A. Jack (1993). Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching, EuroSpeech’93, pp. 1003-1006.

Bilmes, J. (1993). Timing is of the essence: perceptual and computational techniques for representing, learning, and reproducing expressive timing in percusive rhythm, M.Sc. thesis, MIT Media Laboratory.

Blackburn, S. and D. DeRoure (1998). A tool for content based navigation of music, ACM Multimedia’98 - Electronic Proceedings.

Bobrek, M. (1996). Polyphonic music segmentation using wavelet based pre-structured filter banks with improved time-frequency resolution, Ph.D. Dissertation, 1996.

Bobrek, M. and D.B. Koch (1997). A Macintosh based system for polyphonic music transcription, ESEAM’97, The First Electronic Scientific and Engineering Applications of the Macintosh Conference, 1997.

Bregman, A.S. (1990). Auditory Scene Analysis: the Perceptual Organisation of Sound, MIT Press.

Brown, G.J. and M. Cooke (1994). Perceptual grouping of musical sounds: a computational model, JNMR 23(1), pp. 107-132.

Brown, J.C. and M. Puckette (1989). Calculation of a “narrowed” autocorrelation function, JASA 85(4), pp.1595-1601.

Brown, J.C., (1991). Calculation of a Constant Q Spectral Transform, JASA 89, pp. 425-434.

Brown, J.C. and M.S. Puckette (1993). A high resolution fundamental frequency determination based on phase changes of the Fourier transform, JASA 94(2), pp. 662-667.

Brown, J.C. (1993). Determination of the meter of musical scores by autocorrelation, JASA 94(4), pp. 1953-1957.

Brown, J.C. and K.V. Vaughn (1996). Pitch center of stringed instrument vibrato tones, JASA 100(), pp. 1728-1735.

Cagle, R.T. (1996). Music to MIDI: progress towards the automatic transcription of multi-timbral musical signals into standard MIDI files, Ph.D. Dissertation, University of Tennessee, 1996.

Cagle, R.T. and D.B. Koch (1997). Progress towards the automatic transcription of musical recordings into standard MIDI files, Electronic Scientific and Engineering Aplications of the Macintosh Conference eSEAM’97.

Calway, A. (1989). The multiresolution Fourier transform: a general purpose tool for image analysis, PhD Thesis, Department of Computer Science, The University of Warwick, UK.

Cambouropoulos, E. (1998). Musical parallelism and melodic segmentation, Proceedings of the XIIth Colloquium on Musical Informatics, pp. 111-114.

Carey, M., E.S. Parris and G.D. Tattersall (1997). Pitch Estimation of Singing for Re-Synthesis and Musical Transcription, EuroSpeech’97, pp. 887-890.

Carreras, F., M. Leman and D. Petrolino (1998). Extraction of music harmonic information using schema-based decomposition, Proceedings of the XIIth Colloquium on Musical Informatics, pp. 115-118.

Casajus-Quiros, F.J. and P. Fernandez-Cid (1994). Real-time loose-harmonic matching fundamental frequency estimation for musical signals, ICASSP’94, pp. 221-224.

Cerveau, L. (1994). Segmentation de phrases musicales a partir de la fréquence fondamentale, Mémoire DEA ATIAM, Université Paris 6.

Chafe, C., B. Mont-Reynaud and L. Rush (1982) Toward an intelligent editor of digital audio: Recognition of musical constructs, CMJ, 6(1), pp. 30-41.

Chafe, C., D. Jaffe, K. Kashima, B. Mont-Reynaud and J. Smith (1985). Techniques for note identification in polyphonic music, ICMC’85, pp. 399-405. Chafe, C. and D. Jaffe (1986). Source separation and note identification in polyphonic music, ICASSP’86, pp.

Chilton, E.H.S. and B.G. Evans (1987). Performance comparison of five pitch determination algorithms on the linear prediction residual of speech, EuroSpeech’87, pp. 403-406.

Chowning, J.M., L. Rush, B. Mont-Reynaud, C. Chafe, A. Schloss and J. Smith (1984). Intelligent systems for the analysis of digitized acoustic signals, Final report, Technical Report STAN-M-15, Stanford University Department of Music.

Chowning, J.M. and B. Mont-Reynaud (1986). Intelligent analysis of composite acoustic signals, Technical Report STAN-M-36, Stanford University Department of Music.

Clynes, M. (1987). What can a musician learn about music performance from newly discovered microstructure principles (PM and PAS)?, In A. Gabrielson (ed.) Action and Perception in Rhythm and Music, Royal Swedish Academy of Music, 55.

Cook, P.R. (1995). An investigation of singer pitch deviation as a function of pitch and dynamics, Thirteenth International Congress of Phonetic Sciences, pp. 202-205.

Coüasnon, B. and B. Rétif (1995). Using a grammar for a reliable full score recognition system, ICMC’95, pp. 187-194.

Coyle, E.J. and I. Shmulevich (1998). A system for machine recognition of music patterns, ICASSP’98, pp. 3597-3600.

d’Allesandro, C. and M. Castellengo (1991). Etudes, par la synthèse de la perception du vibrato vocal dans les transitions de notes, Bulletin d’Audiophonologie 7, pp. 551-564.

d’Alessandro, C. and M. Castellengo (1994). The pitch of short duration vibrato tones, JASA 95(3), pp. 1617-1630.

Dannenberg, R.B. and B. Mont-Reynaud (1987). Following an improvisation in real time, ICMC’87, pp. 241-248.

Desain, P. and H. Honing (1989). Quantization of musical time: A connexionist approach, CMJ 13(3), pp.

Desain, P. and H. Honing (1993). Time functions function better as functions of multiple times, CMJ 16(2), pp. 17-34.

Desain, P. (1993). A connectionist and a traditional AI quantizer, symbolic versus sub-symbolic models of rhythm perception, Contemporary Music Review 9, pp. 239-254 (http://www.nici.kun.nl/mmm/publications/list.html).

Desain, P. and H. Honing (1994). Foot-tapping: a brief introduction to beat induction, ICMC’94, pp. 78-79 (http://www.nici.kun.nl/mmm/publications/list.html).

Desain, P. and H. Honing (1994). Rule-based models of initial beat induction and an analysis of their behavior, ICMC’94, pp. 80-82 (http://www.nici.kun.nl/mmm/publications/list.html).

Desain, P. and H. Honing (1994). Does expressive timing in music performance scale proportionally with tempo, Psychological Research 56, pp. 285-292 (http://www.nici.kun.nl/mmm/publications/list.html).

Desain, P. (1995). A (de)composable theory of rhythm perception, MP 9, pp. 439-454.

Desain, P. and H. Honing (1995). Towards algorithmic descriptions of continuous modulations of musical parameters, ICMC’95, pp. 393-395 (http://www.nici.kun.nl/mmm/publications/list.html).

Desain, P. and H. Honing (1996). Modeling continuous aspects of music performance: vibrato and portamento, ICMPC’96 (http://www.nici.kun.nl/mmm/publications/list.html).

Desain, P. and al. (1997). Robust score performance matching: taking advantage of structural information, ICMC’97, pp. 337-340 (http://www.nici.kun.nl/mmm/publications/list.html).

Di Federico, R. and G. Borin (1998). An improved pitch synchronous sinusoidal analysis-synthesis method for voice and quasi-harmonic sounds, Proceedings of the XIIth Colloquium on Musical Informatics, pp. 215-218.

Dixon, S.E. and D.M.W. Powers (1996). The characterization, separation and transcription of complex acoustic signals, Proceedings of the 6th Australian International Conference on Speech Science and Technology, pp. 73-78.

Dixon, S. (1996). A dynamic modeling approach to music recognition, ICMC’96, pp. 83-86.

Dixon, S. (1996). Multiphonic note identification, Proceedings of the 19th Australasian Computer Science Conference (? Australian Computer Science Communications 18(1)), pp. 318-323.

Dixon, S. (1997). Beat induction and rythm recognition, Proceedings of the Australian Joint Conference on Artificial Intelligence, pp. 311-320.

Dolson, M. (1986). The phase vocoder, CMJ 10(4), pp. 14-27.

Doval, B. (1994). Estimation de la Fréquence Fondamentale des Signaux Sonores, These de doctorat de l’Université Paris VI.

Drake, C. and C. Palmer (1993). Accent structures in music performance, MP 10(3), pp. 343-378.

Drioli, C. and G. Borin (1998). Automatic recognition of musical events and attributes in singing, Proceedings of the XIIth Colloquium on Musical Informatics, pp. 17-20.

Ellis, D.P.W. (1996). Prediction-driven computational auditory scene analysis, Ph.D. Thesis, MIT. (http://sound.media.mit.edu/papers.html#dpwe).

Fernandez-Cid, P. and F.J. Casajus-Quiros (1998). Multi-pitch estimation for polyphonic musical signals, ICASSP’98, pp. 3565-3568.

Foote, J.T. (1997). Content based retrieval of music and audio, Proceedings of SPIE, Vol 3229, pp. 138-147, (http://www.fxpal.com/people/foote/papers/index.htm).

Foote, J.T. (1997). An overview of audio information retrieval, ACM - Springer, Multimedia Systems, (http://www.fxpal.com/people/foote/papers/index.htm).

Forsberg, J. (1997). Automatic conversion of sound to the MIDI-format, M.Sc. thesis, Department of Speech, Music and Hearing, Royal Institute of Technology, Stockholm. Forsberg, J. (1998). Automatic conversion of sound to the MIDI-format, TMH-QPSH, 1-2/1998, pp. 53-60.

Foster, S. (1982). A pitch synchronous segmenter for musical signals, ICASSP’82.

Foster, S. and A.J. Rockmore (1982). Signal processing for the analysis of musics sound, ICASSP’82, pp. 89-92.

Foster, S., W.A. Schloss and A.J. Rockmore (1982). Toward an intelligent editor of digital audio: Signal processing methods, CMJ, 6(1), pp. 42-51.

Ghias, A., J. Logan, D. Chamberlin, B.C. Smith (1995). Query by humming - Musical information retrieval in an audio database, ACM Multimedia’95 - Electronic Proceedings, (http://www.cs.cornell.edu/Info/Faculty/bsmith/query-by-humming.html).

Gold, B. and L. Rabiner (1969). Parallel processing techniques for estimating pitch periods of speech in the time domain, JASA, 46(2), pp. 442-448.

Goldstein, J.L., A. Gerson, P. Srulovicz and M. Furst (1978). Verification of the optimal probabilistic basis of aural processing in pitch of complex tones, JASA 63(2), pp. 486-497.

Gordon, J.W. (1987). The perceptual attack transients in musical tones, JASA 82(1), pp. 88-105.

Gordon, J.W. (). Perception of attack transients in musical tones, Technical Report STAN-M-17, Department of Music, Stanford University.

Goto, M. and Y. Muraoka (1995). A real-time beat tracking system for audio signals, ICMC’95, pp. 171-174.

Goto, M. and Y. Muraoka (1996). Beat tracking based on multiple-agent architecture - a real-time beat tracking system for audio signals, Proceedings of the 2nd International Conference on Multiagent Systems, pp. 103-110, (http://staff.aist.go.jp/m.goto/publications.html).

Goto, M. and Y. Muraoka (1997). Issues in evaluating beat tracking systems, IJCAI’97 Workshop on Issues in Artificial Intelligence and Music - Evaluation and Assessment, pp. 9-16, (http://staff.aist.go.jp/m.goto/publications.html)

Goto, M. and Y. Muraoka (1997). Real-time rhythm tracking for drumless audio signals - chord change detection for musical decisions, IJCAI’97 Workshop on CASA, pp. 135-144, (http://staff.aist.go.jp/m.goto/publications.html).

Goto, M. and Y. Muraoka (1998). An audio-based real-time beat tracking system and its applications, ICMC’98, pp. 17-20, (http://staff.aist.go.jp/m.goto/publications.html).

Goto, M. and Y. Muraoka (1998). Music understanding at the beat level - real-time beat tracking for audio signals, In Readings in CASA (eds. Rosenthal, D. and H. Okuno), Erlbaum, Mahwah, NJ, pp. 157-176.

Grassi, M. (1998). Mistuned scales, Proceedings of the XIIth Colloquium on Musical Informatics, pp. 228-231.

Grubb, L. and R. Dannenberg (1994). Automating ensemble performance, ICMC’94, pp. 63-69.

Handel, S. (1989). Listening: An Introduction to the perception of Auditory Events. MIT Press, Cambridge, Massachusetts.

Hashimoto, S., H. Qi and D. Chang (1996). Sound database retrieved by sound, ICMC’96, pp. 121-123.

Hawley (1993). Structure out of Sound, Ph.D. thesis, MIT.

Hermes, D. (1988). Measurement of pitch by subharmonic summation, JASA 83(1), pp. 257-264.

Hess, W. (1983). Pitch Determination of Speech Signals. Springer-Verlag, New York.

Honing, H. (1995). The vibrato problem, comparing two solutions, CMJ 19(3).

Inoue, W., S. Hashimoto and S. Ohteru (1993). A computer music system for human singing, ICMC’93, pp. 150-153.

Iwamiya, S., T. Miyakura and N. Satoh (1989). Perceived pitch of complex FM-AM tones, ICMPC’89, pp. 431-436.

Kageyama, T., K. Mochizuki and Y. Takashima (1993). Melody retrieval with humming, ICMC’93, pp. 349-351.

Kapadia, J.H. (1995). Automatic recognition of musical notes, M.Sc. thesis, University of Toledo.

Kapadia, J.H. and J.F. Hemdal (1995). Automatic recognition of musical notes, JASA 98(5), p. 2957.

Kashino, K. and H. Tanaka (1993). A sound source separation system with the ability of automatic tone modeling, ICMC’93, pp. 248-255.

Kashino, K., K. Nakadai, T. Kinoshita, H. Tanaka (1995). Application of Bayesian probability network to music scene analysis, Working notes of the IJCAI’95 Computational Audio Scene Analysis workshop.

Kashino, K., K. Nakadai, T. Kinoshita, H. Tanaka (1995). Organization of Hierarchical Perceptual Sounds, IJCAI’95, pp. 158-164.

Kashino, K. and H. Murase (1998). Music Recognition using note transition context, ICASSP’98, pp. 3593-3596.

Katayose, H. and S. Inokuchi (1989). The Kansei music system, CMJ 13(4), pp. 72-77.

Katayose, H., T. Kanamori, K. Kamei, Y. Nagashima, K. Sato, S. Inokuchi and S. Simura (1993). Virtual performer, ICMC’93, pp. 138-145.

Katayose, H. and S. Inokuchi (1993). Learning performance rules in a music interpretation system, Computers and Humanities 27(1), pp. 31-40.

Katayose, H. and S. Inokuchi (1995). A model of pattern processing for music, ICMC’95, pp. 505-506.

Keislar, D., T. Blum, J. Wheaton and E. Wold (1995). Audio analysis for content-based retrieval, ICMC’95, pp. 199-202.

King, J.-B. and Y. Horii (1993). Vocal matching of frequency modulation in synthesised vowels, Journal of Voice 7, pp. 151-159.

Klapuri, A. (1997). Automatic Transcription of Music, M.Sc. Thesis, Department of Information Technology, Tampere University of Technology, Finland. (http://www.cs.tut.fi/~klap/iiro/contents.html).

Klapuri, A. (1998). Number theoretical means of resolving a mixture of several harmonic sounds, Proceedings of the European Signal Processing Conference EUSIPCO’98. http://www.cs.tut.fi/~klap/iiro/).

Klapuri, A. (1999). Sound onset detection by applying psychoacoustic knowledge, ICASSP’99, (http://www.cs.tut.fi/~klap/iiro/).

Kronland-Martinet, R, J. Morlet and A. Grossmann (1987). Analysis of sound patterns through wavelet transforms, International Journal of Pattern Recognition and Artificial Intelligence, 2, pp. 97-126.

Krumhansl, C.L. (1991(90)). Cognitive Foundations of Musical Pitch. Oxford University Press, Oxford (New York).

Kuhn, W.B. (1990). A real-time pitch recognition algorithm for music applications, CMJ 14(3), pp. 60-71.

Large, E.W. and J.F. Kolen (1994). Resonance and the perception of musical meter, Connection Science 6, pp. 177-208.

Large, E.W. (1995). Beat tracking with a nonlinear oscillator, IJCAI Workshop on Artificial Intelligence and Music.

Lee, C.S. (1986). The rhythmic interpretation of single musical sequences: towards a perceptual model, In Musical Structure and Cognition (ed. Howell, P., I. Cross and R.West), pp. 53-69.

Lerdahl, F and R. Jackendoff (1983). A Generative Theory of Tonal Music. MIT Press, Cambridge, Massachusetts.

Longuet-Higgins, H.C. (1976). Perception of melodies, Nature 263/5579, pp. 646-653.

Longuet-Higgins, H.C. (1978). The perception of music, Interdisciplinary Science Reviews 3(2), pp. 148-156.

Longuet-Higgins, H.C. and C.-S. Lee (1982). The perception of musical rhythms, Perception 11(), pp. 115-128.

Longuet-Higgins, H.C. and C.S. Lee (1984). The rhythmic interpretation of monophonic music, MP 1, pp. 424-441.

Longuet-Higgins, H.C. (1987). Mental Processes, MIT Press.

Lunney, H.W.M. (1974). Time as heard in speech and music, Nature 249, p. 592. Maher, R.C. (1989). An Approach for the Separation of Voices in Composite Musical Signals, Ph.D. thesis, University of Illinois, Urbana-Champaign.

Maher, R.C. (1990). Evaluation of a Method for Separating Digitized Duet Signals, JAES 38(12), pp. 956-979.

Marcus, S.M. (1981) Acoustic determinants of perceptual center (P-center) location, Perception and Psychophysics 30(3), pp. 247-256.

Markel, J.D. and A.H. Gray Jr. (1976). Linear Prediction of Speech, Springer-Verlag, New York.

Marolt, M. (1997). A music transcription system based on multiple-agents architecture, Proceedings of Multimedia and Hypermedia Systems Conference MIPRO’97 Opatija, Croatia, (http://lgm.fri.uni-lj.si/~matic/).

Marolt, M. (1998). Feedforward neural networks for piano music transcription, Proceedings of the XIIth Colloquium on Musical Informatics, pp. 240-243.

Martin (1996). A Blackboard System for Automatic Transcription of Simple Polyphonic Music. MIT Media Laboratory Perceptual Computing Section Technical Report No. 385. (http://xenia.media.mit.edu/~kdm//professional.html).

Martin (1996). Automatic transcription of simple polyphonic music: robust front end processing. MIT Media Laboratory Perceptual Computing Section Technical Report No. 399. (http://xenia.media.mit.edu/~kdm//professional.html).

Martin, K.D., E.D. Scheier and B.L. Vercoe (1998). Musical context analysis through models of audition, Proceedings ACM Multimedia Workshop on Content Processing for Multimedia Applications, (http://xenia.media.mit.edu/~kdm//professional.html).

McAdams, S. and A. Bregman (1979). Hearing musical streams, CMJ 3(4), pp. 26-43.

McAdams, S. (1996). Audition: cognitive psychology of music in The Mind-Brain Continuum (Eds. R. Llinas, P. Churchland), MIT Press, 1996, pp. 251-279.

McNab, R., L.A. Smith and I.H. Witten (1995). Signal processing for melody transcription, Working paper 95/22, University of Waikato, Hamilton, New Zealand.

McNab, R. (1996). Interactive applications of music transcription, M.Sc. thesis, University of Waikato - New Zealand.

McNab, R.J., L.A. Smith, I.H. Witten, C.L. Henderson and S.J. Cunningham (1996). Towards the digital music library: tune retrieval from acoustic input, Proceedings of ACM Digital Libraries’96, pp. 11-18.

McNab, R.J., L.A. Smith, D. Bainbridge and I.H. Witten (1997). The New Zealand digital library melody index, D-Lib Magazine (http://www.dlib.org/dlib/may97/meldex/05witten.html).

Medan, Y., E. Yair and D. Chazan (1991). Super resolution pitch determination of speech signals, IEEE ASSP 39(1), pp. 40-48.

Meddis, R. and M.J. Hewitt (1991). Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I: pitch identification, JASA 89(6), pp. 2866-2882.

Mellinger, D.K. and B. Mont-Reynaud (1991). Sound explorer: A workbench for investigating source separation, ICMC’91, pp. 90-94.

Mellinger, D.K. (1991). Event Formation and Separation in Musical Sounds. Ph.D. thesis, Dept. of Computer Science, Stanford University, (ftp://ccrma-ftp.stanford.edu/pub/Publications/Theses).

Michon, J.A. (1964). Studies on subjective duration: I Differential sensitivity in the perception of repeated temporal intervals, Acta Psychologica 22, pp. 441-450.

Mitchell, T.M. (1997). Machine Learning. McGraw-Hill International Editions. Moelants, D. and C. Rampazzo (1997). A computer system for the automatic detection of perceptual onsets in a musical signal, In KANSEI, the technology of emotion (ed. Camurri, A.), pp. 140-146.

Mont-Reynaud, B. (1985). Problem-solving Strategies in a Music Transcription System, IJCAI’85, pp. 916-918.

Mont-Reynaud, B. and M. Goldstein (1985). On finding rhythmic patterns in musical lines, ICMC’85, pp. 391-397.

Mont-Reynaud, B. and D.K. Mellinger (1989). A computational model of source separation by frequency co-modulation, Proceedings of the First International Conference on Music Perception and Cognition, pp. 99-102.

Mont-Reynaud, B. and E. Gresset (1990). PRISM: Pattern recognition in sound and music, ICMC’90, pp. 153-155.

Mont-Reynaud, B. (1992). Machine hearing research at CCRMA: An overview, CCRMA Research Overview, Department of Music, Stanford University, pp. 24-32, (ftp://ccrma-ftp.stanford.edu/pub/Publications/).

Moore (ed.) (1995). Hearing. Handbook of Perception and Cognition (2nd edition), Academic Press Inc.

Moore, B., B. Glasberg and T. Baer (1997). A model for the prediction of thresholds, loudness and partial loudness, JAES 45(4), pp. 224-240.

Moorer, J.A. (1975). On the segmentation and analysis of continuous musical sound by digital computer, Ph.D. thesis, Department of Computer Science, Stanford University.

Moorer, J.A. (1977). On the transcription of musical sound by computer, CMJ, 1(4), pp. 32-38. Moorer, J.A. (1978). The use of the linear prediction of speech in computer music applications, JAES, 27(3), pp. 134-140.

Moorer, J.A. (1984). Algorithm design for real-time audio signal processing, ICASSP’84, pp. 12.B.3.1-12.B.3.4.

Moreno, E.I. (1992). The existence of unexplored dimensions of pitch: Expanded chromas, ICMC’92.

Morton, J., S.M. Marcus and C. Frankish (1976). Perceptual centers (P-centers). Psychological Review 83(5), pp. 405-408.

Nakajima, Y. G. ten Hoopen and R. van der Wilk (1991). A new illusion of time perception, MP 8, pp. 431-448.

Nakamura, Y. and S. Inokuchi (1979). Music information processing system in application to comparative musicology, IJCAI’79, pp. 633-635.

Ng, K., R. Boyle and D. Cooper (1996). Automatic detection of tonality using note distribution, JNMR 25(4), pp. 369-381.

Niihara, T. and S. Inokuchi (1986). Transcription of sung song, ICASSP’86, pp. 1277-1280.

Noll, A.M. (1967). Cepstrum pitch determination, JASA 41(2), pp. 293-309.

Nunn, D. (1994). Source separation and transcription of polyphonic music.

Parncutt, R. (1994). A perceptual model of pulse salience and metrical accent in musical rhythms, MP 11, pp. 409-464.

Patterson, R.D. and J. Holdsworth (1990). An introduction to auditory sensation processing, In HAM HAP 1(1).

Pearson, E.R.S. and R.G. Wilson (1990). Musical event detection from audio signals within a multiresolution framework, ICMC’90, pp. 156-158.

Pearson, E.R.S. (1995). The multiresolution Fourier transform and its application to polyphonic audio analysis, Technical Report CC-RR-282, University of Warwick.

Phillips, M.S. (1985). A feature-based time domain pitch tracker, JASA 77, S9-S10(k).

Pielemeier, W.J. and G.H. Wakefield (1996). A high-resolution time-frequency representation for musical instrument signals, JASA 99(4), pp. 2382-2396.

Pierce, J.R. (1991). Periodicity and pitch perception, JASA, 90, pp. 1889-1893. Piszczalski, M. (1977). Automatic music transcription, CMJ 1(4), pp.24-31.

Piszczalski, M. and B. Galler (1979). Predicting musical pitch from component frequency ratios, JASA 66(3), pp. 710-720.

Piszczalski, M. and B. Galler (1979). Computer Analysis and Transcription of Performed Music: a Project Approach, Computers and the Humanities 13, pp. 195-206.

Piszczalski, M., B. Galler, R. Bossemeyer, M. Hatamian and F. Looft (1981). Performed music: analysis, synthesis, and display by computer, JAES 29(1/2), pp. 38-46.

Piszczalski, M. (1986). A computational model for music transcription, Ph.D. thesis, University of Stanford.

Pollastri, E. (1998). Melody-retrieval based on pitch-tracking and string-matching methods, Proceedings of the XIIth Colloquium on Musical Informatics.

Povel, D.-J. and H. Okkenman (1981). Accents in equitone sequences, Perception & Psychophysics 30(6), pp. 565-572.

Povel, D.-J. and P. Essens (1985). Perception of temporal patterns, MP 2, pp. 411-440.

Prame, E. (1984). Measurements of the vibrato rate of ten singers, JASA 96(4), pp. 1979-1984.

Pressing, J. and P. Lawrence (1993). Transcribe: A comprehensive autotranscription program, ICMC’93, pp. 343-345.

Privosnik, M. and M. Marolt (1998). A system for automatic transcription of music based on multiple agents architecture, Proceedings of MELECON’98, Tel Aviv, Israel, pp. 169-172.

Proakis, J. and D. Manolakis (1996). Digital Signal Processing. 3rd ed. Englewood Cliffs, NJ: Prentice-Hall.

Puckette, M. (1995). Score following using the sung voice, ICMC’95, pp. 175-178. Rabiner, L.R., and B. Gold (1975). Theory and Applications of Digital Signal Processing. Englewood Cliffs:

Prentice-Hall. Rabiner, L.R., M.J. Cheng, A.E. Rosenberg and C.A. McGonegal (1976). A comparative performance study of several pitch detection algorithms, IEEE Transactions on Acoustics, Speech and Signal Processing, 24(5), pp. 399-418.

Rabiner, L.R. (1977). On the use of autocorrelation analysis for pitch detection, IEEE Transactions on Acoustics, Speech and Signal Processing, 25(1), pp. 24-33.

Raskinis, G. (1998). Preprocessing of folk song acoustic records for transcription into music scores, Informatica 9(3), pp. 343-364.

Raskinis, G. (2000). Automatic Transcription of Lithuanian Folk Songs, PhD. Thesis, Vytautas Magnus University, Kaunas, Lithuania.

Remmel, M., I. Ruutel, J. Sarv and R. Sule (1975). Automatic notation of one-voiced song, Academy of Sciences of the Estonian SSR, Institute of Language and Literatureg, Preprint KKI-4, (Ed. Ü.Tedre), Tallin, Estonia.

Repp, B.H. (1994). On determining the basic tempo of an expressive music performance, Psychology of Music 22, pp. 157-167.

Roads, C. (1996). The Computer Music Tutorial, . MIT Press, Cambridge, Massachussets.

Roberts, S.C. and M. Greenhough (1995). Rhytmic pattern processing using a self organising neural network, ICMC’95, pp. 412-419.

Rodet, X. and S. Rossignol (1998). Automatic characterization of musical signals: feature extraction and temporal segmentation, ACM Multimedia’98.

Rolland, P.-Y. (1998). Découverte automatique de regularités dans les sequences et application ? l’analyse musicale, These de doctorat de l’Université Paris VI.

Rolland, P.-Y., G. Raskinis and J.-G. Ganascia (1999). Musical Content-Based Retrieval: an Overview of the Melodiscov Approach and System, ACM Multimedia’99, pp. 81-84.

Rosenthal, D. (1992). Emulation of human rhythm perception, CMJ 16, pp. 64-76. Rosenthal, D. (1992). Intelligent rhythm tracking, ICMC’92, pp. 227-230.

Rosenthal, D. (1992). Machine rhythm: computer emulation of human rhythm perception, MIT Media Laboratory, Ph.D. Thesis.

Rosenthal, D., M. Goto and Y. Muraoka (1994). Rhythm tracking using multiple hypothesis, ICMC’94, pp. 85-88, (http://staff.aist.go.jp/m.goto/publications.html).

Rossignol, S. (1997). Segmentation - Extraction du vibrato: Premier rapport d’activité, Rapport de stage: Premier rapport d’activité de th?se, jan 1997.

Rossignol, S., X. Rodet, J. Soumagne, J.-L. Colette and P. Depalle. (1998). Feature extraction and temporal segmentation of acoustic signals, ICMC’98, (http://mediatheque.ircam.fr/articles/textes/Rossignol98a/).

Scarborough, D.L., B.O. Miller and J.A. Jones (1989). Connectionist models for tonal analysis, CMJ 13(3), pp. 49-55.

Schaffer, R.W. and L.R. Rabiner (1973). A digital signal processing approach to interpolation. Proc. IEEE 61, pp. 692-702.

Schloss, A.W. (1985). On the automatic transcription of percussive music: From acoustic signal to high-level analysis. Ph.D. Thesis, Department of Hearing and Speech, Stanford University.

Schroeder, M.R. (1968). Period histogram and product spectrum: new methods for fundamental frequency measurement, JASA, 43(4), pp. 829-834.

Secrest, B.G. and G.R. Doddington (1982). Postprocessing techniques for voice pitch trackers, ICASSP’82, pp 172-175.

Secrest, B.G. and G.R. Doddington (1983). An integrated pitch tracking algorithm for speech systems, ICASSP’83, pp. 1352-1355.

Scheier, E.D. (1995). Extracting expressive performance information from recorded music, M.Sc. thesis, Program in Media Arts and Science, MIT, 1995, (http://sound.media.mit.edu/papers.html#eds).

Scheier, E.D. (1995). Using musical knowledge to extract expressive performance information from audio recordings, IJCAI’95 Workshop on Computational Auditory Scene Analysis, pp. 153-160, (http://sound.media.mit.edu/papers.html#eds).

Scheier, E.D. (1996). Bergman’s chimerae: music perception as auditory scene analysis, 4th ICMPC, (http://sound.media.mit.edu/papers.html#eds).

Scheier, E.D. (1997). Pulse tracking with a pitch tracker, Proceedings of the 97 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (http://sound.media.mit.edu/papers.html#eds).

Scheier, E.D. (1998). Tempo and beat analysis of acoustic musical signals, JASA 103(1), pp. 588-601.

Seashore, C.E. (1967). Psychology of Music, New York: Dover

Shepard, R.N. (1982). Structural representation of musical pitch, In P.Deutch (Ed.), The psychology of music, New York: Academic Press, pp. 343-390.

Shepard, R.N. and D.S. Jordan (1984). Auditory illusions demonstrating that tones are assimilated to an internalized musical scale, Science 226, pp. 1333-1334.

Shmulevich, I. and E.J. Coyle (1997). Establishing the tonal context for musical pattern recognition, Proceedings of the 1997 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

Shmulevich, I. and D. Povel. (1998). Rhythm complexity measures for music pattern recognition, Proceedings of the IEEE Workshop on Multimedia Signal Processing.

Shuttleworth, T. and R.G. Wilson (1993). Note Recognition in Polyphonic Music using Neural Networks, Technical Report CS-RR-252, University of Warwick (ftp://ftp.dcs.warwick.ac.uk/reports/rr/252/).

Shuttleworth, T. and R.G. Wilson (1995). A neural network for triad classification, ICMC’95, pp. 428-431.

Smith, L.S. (1994). Sound segmentation using onsets and offsets, Journal of New Music Research 23, pp. 11-23.

Smith, L.S. (1993). Temporal localisation and simulation of sounds using onsets and offsets, CCCN Technical Report CCCN-16, University of Stirling.

Stainsby, T. (1996). A system for the separation of simultaneous musical audio signals, ICMC’96, pp. 75-78.

Stautner, J.(A.) (1982(83)). The auditory transform (Analysis and Synthesis of Music Using the Auditory Transform)), M.Sc. thesis, Department of Electrical Engineering and Computer Science, MIT, 1982.

Sterian, A. and G.H. Wakefield (1996). Robust automated music transcription systems, ICMC’96, pp. 219-221.

Steedman, M.J. (1977). The perception of musical rhythm and metre, Perception 6, pp. 555-569.

Strawn, J.M. (1980). Approximations and syntactic analysis of amplitude and frequency functions for digital sound synthesis, CMJ 4(3), pp. 3-24.

Sundberg, J. and P. Tjernlund (1970). A computer program for the notation of played music, STL-QPSR 2-3/1970, pp. 46-49.

Sundberg, J. (1987).The Science of the Singing Voice, Northern Illinois University Press, Dekalb, Illinois.

Sundberg, J. (1991). The Science of Musical Sounds, Academic Press.

Tait, C. (1995). Audio analysis for rhytmic structure, ICMC’95, pp. 590-591.

Tait, C. and W. Findlay (1996). Wavelet analysis for onset detection, ICMC’96, pp. 500-503.

Tanguiane, A. (1991). Criterion of data complexity in rhythm recognition, ICMC’91, pp. 559-562.

Tanguiane, A.S. (1993). Artificial Perception and Music, Springer-Verlag.

Tanguiane, A. (1994). A principle of correlativity of perception and its application to music recognition, MP 11(4), pp. 465-502.

Taylor, I.J. and M. Greenhough (1995). Neural network pitch tracking over the pitch continuum, ICMC’95, pp.432-435.

Terhardt, E. (1974). Pitch, consonance, and harmony, JASA 55(5), pp. 1061-1069.

Thomassen, J.M. (1982). Melodic accent: experiments and a tentative model, JASA 71(6), pp. 1596-1605.

Todd, N. (1994). The auditory “primal sketch”: a multiscale model of rhythmic grouping, JNMR 23, pp. 25-70.

Toiviainen, P. (1998). Intelligent jazz accompanist: a real-time system for recognizing, following, and accompanying musical improvisations, Proceedings of the XIIth Colloquium on Musical Informatics, pp. 101-104.

Tuerk, C.M. (1990). A Text-to-Speech system based on NETtalk, M.Sc. Thesis, Engineering Department, Cambridge University, (http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/speech/systems/pt/0.html).

Uitdenbogerd, A.L. and J. Zobel (1998). Manipulation of music for melody matching, ACM Multimedia’98 - Electronic Proceedings.

Vantomme, J.D. (1995). The induction of musical structure using correlation, ICMC’95, pp. 585-586.

Vantomme, J.D. (1995). Score following by temporal pattern, CMJ 19(3), pp. 50-59. Vercoe, B.L. (1994).

Perceptually-based music pattern recognition and response ICMPC’94, pp.

Wightman, F. (1973). The pattern transformation model of pitch, JASA 54(2), pp. 407-416.

Wilson, R.G. and E.R.S. Pearson (1989). A multiresolution signal representation and its application to the analysis of musical signals, ICMC’89.

Wilson, R.G. and T. Shuttleworth (1995). The recognition of musical structures using neural networks, IJCAI’95.

Wöhrmann, R. and L. Solbach (1995). Preprocessing for the automated transcription of polyphonic music: linking wavelet theory and auditory filtering, ICMC’95, pp.396-399.

Wold, E., T. Blum, D. Keislar and J. Wheaton (1996). Content-based classification, search, and retrieval of audio, IEEE Multimedia, 3(3), pp. 27-36, (ftp://ftp-db.deis.unibo.it/pub/ibartolini/Courses/Papers/CBClassSrch&RetrOfAudio.pdf).

Zwicker, E. (1977). Procedure for calculating loudness of temporally variable sounds, JASA 62(3), pp. 675-682.

     to top   


     Copyright © 2002 Gailius Raskinis. All rights reserved.