| 5 |
| 5 | ARCON Corporation |
|
|
|
Subjective Test Methods In order to insure a voice coding algorithm will meet the requirements of its intended users, the test plan must be multi-dimensional in design. Four significant characteristics of the performance of a digital speech coding algorithms are: 1. Intelligibility - The
degree to which the transmitted speech is interpreted and, therefore,
understood as the talker intended. Below are brief descriptions of some of the test methods implemented at ARCON used to evaluate these facets. Our facility is flexible enough to provide a thorough evaluation of communication systems under virtually any test. All of these tests, with the exception of communicability, utilize prerecorded speech material from multiple talkers. This material as processed by the system under test is evaluated by the various test methodologies. Depending on the methodology, the recorded material may contain scripted words, phrases, sentences or paragraphs. Diagnostic Rhyme Test (DRT) - An ANSI standardized method used for evaluation of intelligibility [1]. Listeners are required to choose which word of a rhyming pair they perceived. The words differ only in their leading consonant. The word pairs have been chosen such that six binary attributes of speech intelligibility are measured in their present and absent states. This attribute profile provides a diagnostic capability to the test. For details on the attributes evaluated by the DRT follow this link: ATTRIBUTES. To perform a sample DRT follow this link: DRT. The following references present the development, implementation and application of the DRT: [1][36] [40] [41] [42] [44] [45] [46] [47] [49]. Absolute Category Rating (ACR) - A standardized method used for evaluation of subjective quality. The ACR is the test method recommended by the Telecommunications Standardization Sector of the International Telecommunications Union (ITU-T) [15]. This test produces the well known five point scaled Mean Opinion Score (MOS). To perform a sample ACR test follow this link: MOS. The following references present the methodology, standard practices and application of the ACR:[2][3][4][5][14][15] [16] [17] [18][19][20][21][43] [45]. Degraded Category Rating (DCR) - An alternative to the ACR used for evaluation of the relative quality of speech systems when the ACR resolution or linearity is limited. A comparative judgment is made that compares the system under test to a reference system. The degradation between the reference system and the system under test is rated on a 5 point scale. This method is often used with low quality coders, coders operating in severe environments, or where the impairment is small. The ITU-T Standard can be found in [15]. The following references present the methodology, standard practices and application of the DCR: [6] [14][15][16][17]. Comparison Category Rating (CCR) - Forced Choice Paired Comparison (A/B test) - A measure of the personal opinion of a group of listeners in regards to the preference of one voice system over another. This method is useful in the validation of an implementation of an algorithm standard. Since direct comparisons between systems are made, this test methodology has a high "face validity". Speaker Recognizability - [45] [47] Communicability - A rating of user acceptance of a system in a two way conversational setting under actual service conditions as reproduced in the laboratory. Its measurement requires the use of non-scripted conversational speech. The conversation is typically driven by some task. The task can be chosen to be representative of the voice system's application or to structure the vocabulary during the communication. Task loading is often used to introduce stress as a test variable. The acoustic environments at both sides of the communication can be controlled along with the transducers being used, The transmission channel or channels between the parties can be modeled to simulate various error conditions and/or delays. Voice over Internet Protocol (VoIP) systems can be evaluated in a controlled manner. The following references introduce some of the research and methodologies associated with Communicability: [35][37] [38] [48]. A communicability test was developed by ARCON Corporation.. This test, the ARCON Communicability Exercise (ACE) is detailed in [35]. Multi-Speaker Communicability - A rating of user acceptance of a multi-speaker conferencing system in a realistic and dynamic setting. Current voice communication systems include conferencing capabilities with multiple talkers. These systems may range from a simple voice conference bridge to a complex multi-media conferencing suite that includes voice. A test method has been developed by ARCON Corporation specifically for evaluating the performance of conferencing systems. SUBJECTIVE TEST METHODOLOGY REFERENCES STANDARDS - Over the years, several national and international groups have standardized subjective test methodologies and practices. The following reference list is separated by organization. ANSI - American National Standards Institute [1] ANSI S3.2-1989, "Method for Measuring the Intelligibility of Speech Over Communication Systems", New York: American Standards Association. IEEE - Institute of Electrical and Electronics Engineers [2] IEEE Subcommittee on Subjective Measurements. "IEEE recommended practices for speech quality measurements", IEEE Transactions on Audio and Electroacoustics V17, pp 227-246, 1969. ITU-T International Telecommunication Union, Telecommunication Standardization Sector; formerly the CCITT
[3] ITU-T "Handbook on Telephonometry" Geneva, 1992,
ISBN 92-61-04911-7 IEC - International Electrotechnical Commission.
[28] IEC
Publication 1260: 1995, Electroacoustics – Octave-band and fractional –
Octave-band filters. ISO - International Organization for Standardization
[31] ISO
266: 1975, Acoustics – Preferred frequencies for measurements. PUBLISHED TECHNICAL REPORTS AND PAPERS
[35] E.W. Kreamer, J.D. Tardelli, P.D. Gatewood, and J.
LeBlanc, "An Investigation of the Feasibility of Using the ARCON
Communicability Exercise (ACE) for Communication System Evaluations" ARCON
Corp. Final Report, Dec. 1995. ADDITIONAL BIBLIOGRAPHY B. W. Anderson and J. P. Kalb, "English Verification of the STI Method for Estimating Speech Inteligibility of a Communications Channel," Journal of the Acoustical Society of America, 81, 1982- 1985, (1987). BERANEK L.L.: Noise and Vibration Control, McGraw-Hill, pp. 564-566, 1971. BOYD I., SOUTHCOTT C.B.: A speech codec for the Skyphone service, British Telecom Technology Journal, Vol. 6, No. 2, April 1988. P. T. Brady, "A Technique for Investigating On-Off Patterns of Speech," The Bell System Technical Journal, Vol. 44, pp. 1-22, 1965. P. T. Brady, "A Statistical Analysis of On-Off Patterns in 16 Conversations," The Bell System Technical Journal, Vol. 47, pp. 73-91, 1968. L.W. Butler and L. Kiddle, "The Rating of Delta Sigma Modulating Systems with Constant Errors, Burst Errors, and Tandem Links in a Free Conversation Test Using the Reference Speech Link," Signals Research and Development Establishment, Ministry of Technology, Christchurch, Hants., Rpt No. 69014, Feb. 1969. D. R. Carl, "Developing Faculty to Use Videoconferencing to Deliver University Credit Courses Over Cable and Satellite," Canadian Journal of Educational Communication, Vol. 15, pp. 235-250, 1986. A. Chapanis, R. N. Parrish, R. B. Ochsman, and G. D. Weeks, "Studies in Interactive Communication I. The Effects of Four Communication Modes on the Behavior of Teams During Cooperative Problem-Solving," Human Factors, Vol. 14, pp. 487-509, 1972. A. Chapanis, R. N. Parrish, R. B. Ochsman, and G. D. Weeks, "Studies in Interactive Communication II. The Effects of Four Communication Modes on the Linguistic Performance of Teams During Cooperative Problem Solving," Human Factors, Vol. 19, pp. 101-126, 1977. E. D. Chapple, "The Interaction Chronograph: Its Evolution and Present Application," Personnel, Vol. 25, pp. 295-307, 1949. CLARINGBOLD P.J.: The within-animal bioassay with quantal responses, Journal of the Royal Statistical Society, Series B, Volume 18, No. 1, pp. 133-137, 1956. COLEMAN A., GLEISS N., SOTSCHECK J., USAI P., SCHEUERMANN H.: Subjective performance evaluation of the RPE-LTP codec for the Pan-European cellular digital mobile radio system, IEEE Globecom ’89, Dallas, Texas, 27-30 November 1989. COLEMAN A., GLEISS N., USAI P.: A Subjective Testing Methodology for Evaluating Medium Rate Codecs for Digital Mobile Applications, Speech Communications, Vol. 7, pp. 151-166, June 1988. COMBESCURE P. et al: Quality evaluation of speech coded at 32 kbit/s by means of degradation category ratings, Proc. ICASSP 82 (International Conference on Acoustics, Speech and Signal Processing), Vol. 2, Paris, May 1982. CROWE D.P.: Selection of Voice Codec for the Aeronautical Satellite Service, European Conference on Speech Communication and Technology, Vol. 2, S37, pp. 320-323, September 1989. DAUMER W.R., CAVANAUGH J.R.: A subjective comparison of selected digital codecs for speech, Bell Systems Technical Journal, Vol. 57, No. 9, November 1978. DIMOLITSAS S., CORCORAN F., RAVISHANKAR C.: Correlation between headphone and telephone handset listener opinion scores for single-stimulus voice coder performance assessments, IEEE Signal Processing Letters, Vol. 2, No. 3, March 1995. F. G. Eisler, Psycholinguistics: Experiments in Spontaneous Speech. New York: Academic Press, 1968. J. W. Forgie, C. E. Feehrer and P. L. Weene, Voice Conferencing Technology Program: Final Report, Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, MA, 1979. GABRIELSSON A.: Statistical treatment of data from listening tests on sound-reproducing systems, Report TA No. 92, KTH Karolinska Institutet, Department of Technical Audiology, S-10044 Stockholm, Sweden, November 1979. IEC Publication 268-13, Annex 3, subclause 3.3 (a condensed version). GOODMAN D.J., NASH R.D.: Subjective quality of the same speech transmission conditions in seven different countries, Proc. ICASSP 82 (International Conference on Acoustics, Speech and Signal Processing), Vol. 2, Paris, May 1982. HOTH D.F.: Room noise spectra at subscribers’ telephone locations, J.A.S.A., Volume 12, pp. 99-504, April 1941. T. B. Horrel, and T Jacobson, RASTI Measurements: Demonstration of Different Applications, J. Jaffe, and S. Feldstein, Rhythms of Dialogue. New York: Academic Press, 1970. J. Jaffe, L. Cassotta and S. Feldstein, "Markovian Model of Time Patterns in Speech," Science, Vol. 144, pp. 884-886, 1964. S. Johnson, "Interactive Teaching: Breaking Television Viewing Habits," The Distance Education Network Report, Vol. 21, pp. 4-6, 1988. E. T. Klemmer, "Human Factors Problems in Satellite Telephoning," Human Factors, Vol. xxx, pp. 475-480, 1966. R. M. Krauss and P. D. Bricker, "Effects of Transmission Delay and Access Delay on the Efficiency of Verbal Communication," Journal of the Acoustical Society of America, Vol. 41, pp. 286-292, 1966. K. D. Kryter, ^&The Effects of Noise on Man, 2nd ed.\&, Academic Press,Orlando, FL., 1985, pp.343-380. M. A. Mack and B. Gold, "The Intelligibility of Non-Vocoded and Vocoded Semantically Anomalous Sentences," Lincoln Laboratories, Boston, MA., Technical Report 703, 26 July 1985. G. A. Miller, "Speaking in General. Review of J.H. Greenberg (Ed.), Universals of Language", Contemporary Psychology, Vol. 8, pp. 417-418, 1963. MODENA G., COLEMAN A., USAI P., COVERDALE P.: Subjective performance evaluation of the 7 kHz audio coder, IEEE Global Telecommunications Conference 1986 (Globecom ’86), Houston, Texas, 1-4 December 1986. T. J. Moore and R. L. McKinley, "Proposed Test Procedures to Evaluate Audio Performance of Seek Talk Advanced Development Model Radio Systems," AFMRL-TR-80-100, July 1980. R. L. Moreland and J. M. Levine, "Socialization in Small Groups: Temporal Changes in Individual-Group Relations," Advances in Experimental Social Psychology, Vol. 15, pp. 137-192, 1982. I.E. Morley & G.M. Stephenson, "Interpersonal and Interparty Exchange: A Laboratory Simulation of an Industrial Negotiation at the Plant Level." British Journal of Psychology, Vol. 60, 543-545, 1969. I.E. Morley & G.M. Stephenson, "Formality in Experimental Negotiations: A Validation Study. British Journal of Psychology, Vol. 60, 383, 1970. F. A. Muckler & S. A. Seven, "Selecting Performance Measures: "Objective" versus "Subjective" Measurement." Human Factors, Vol. 34, 441-455, 1992. A. C. Norwine and O. J. Murphy, "Characteristic Time Intervals in Telephonic Conversation," Bell System Telephone Journal, Vol. 17, pp.281-291, 1938. G. Pask, "Review of Conversation Theory and a Protologic (or Protolanguage), LP," ECTJ, Vol. 32, pp. 3-40, 1984. G. Pask and D. Gregory, "Conversational Systems," In J. Zeidner (Ed.), Human Productivity Enhancement, Vol. 2, New York: Praeger, 1987. D. B. Pisoni, L. M. Manous and M. J. Dedina, "Comprehension of Natural and Synthetic Speech: II Effects of Predictability on the Verification of Sentences Controlled for Intelligibility," Speech Research Laboratory, Department of Psychology, Indiana University, 1986. R.L. Pratt, "Assessing the Intelligibility and Acceptability of Voice Communication Systems," Royal Signals and Radar Establishment, Malvern, England, 1985 IEEE Military Comm. Conference. R. L. Pratt, I. H. Flindell and A. J. Belyavin, "Assessing the Intelligibility and Acceptability of Voice Communication Systems," Royal Signals and Radar Establishment, Malvern, England, Report No. 87003, June 1987. B. M. Reed, "Conversations" as a Theme for the Design of Telecommunications Based Knowledge Systems. Unpublished Manuscript, 1988. D. L. Richards, Telecommunication by Speech, Butterworth, London, 1973. RICHARDS D.L.), BARNES G.J.): Pay-off between quantizing distortion and injected circuit noise, Proc. ICASSP 82 (International Conference on Acoustics, Speech and Signal Processing), Vol. 2, Paris, May 1982. H.J.M. Steeneken and T. Houtgast, "A Physical Method for Measuring Speech-Transmition Quality," Journal of the Acoustical Society of America, 67, 318-326 (1980). STEVENS S.S.: Psychophysics – Introduction to its perceptual, neural and social prospects, John Wiley and Sons, 1975. J. D. Tardelli, C. M. Sims, P. A. LaFollette and P. D. Gatewood, "Research and Development for Digital Voice Processors," R86-01W, ARCON Corporation, 30 May 1986; and Final Report 01 January 1984 to 14 February 1986. J. Tierney and H. Schecter, "The Lincoln Laboratory-Aerospace Medical Research Laboratory Digital Speech Test Facility," MIT Lincoln Laboratory, Technical Report 683, May 1984. TUKEY J.W.: The problem of multiple comparisons, Ditton, Princeton University, Ed. 1953. W.D. Voiers, "Exploratory Research on the Feasibility of a Practical and Realistic Test of Speech Communicability", Dynastat Inc., Final Report, April 1978. E. Wachsler and J. D.Tardelli, "Speech Processing Facility System and DRT Software," R82-01W, ARCON Corporation, 20 April 1982; and Final Report 1 May 1980 to 31 December 1981. J. C. Webster, "A Compendium of Speech Testing Material and Typical Noise Spectra for Use in Evaluating Communications Equipment," Naval Electronics Laboratory Center, San Diego, CA., September 1972. WHEDDON C., LINGGARD R.: Speech and Language Processing, Chapman and Hall, 1990. W. W. Wierwille, M. Rahimi, & J. G. Casali, "Evaluation of 16 Measures of Mental Workload Using a Simulated Flight Test Emphasizing Mediational Activity," Human Factors, Vol. 27, 489-502, 1985. BT Laboratories: Enhanced Equivalent-Q Rating Algorithm, ETSI/TM/TM5/TCH-HS Document TD 93/126, December 1993. Network Speech Processing Program: Semiannual Report, October 1976 - 31 March 1977, Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, MA, August 1977. Network Speech Processing Program: Annual Report, October 1976 - September 1977, Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, MA, March 1978. CCITT Question 24/XII, Contribution COM XII-120, Noise inside light motor vehicles, study period 1981-1984. CCITT Question 24/XII, Contribution COM XII-134, Internal vehicle noise spectra, study period 1981-1984. CCITT Contribution COM XII-208, Comparison of the results of vehicle noise submitted by France and BT, study period 1981-1984. CCIR Document 11/17, Subjective assessment of the quality of television pictures (EBU), study period 1978-1982. CCITT Contribution COM XII-79: Specification for an Intermediate Reference System, Study Period 1973-1976. CCITT Contribution COM XII-104: Recapitulation and analysis of the results of subjective and objective Loudness Rating measurements carried out with eleven telephone systems by the CCITT Laboratory, Study Period 1973-1976. CCITT Report of the meeting of Working Party XVIII/2 (Speech processing), COM XVIII-R 28, Annex 1, pp. 13-39, December 1983. CCITT, COM XII-147, Swedish Telecommunication Administration Report: Subjective test on candidate codecs for mobile radio, February 1987. CCITT, COM XII-68, European JEG: Subjective testing methodology for the evaluation of low-bit rate codecs for mobile radio, May 1986. ITU-T Contribution COM 15-20: Transmission quality of interconnected PSTN-digital cellular networks, COMSAT, Study Period 1993-1996. Recommendation ITU-R BS1116, "Subjective assessment of audio systems with small impairments including multichannel sound systems", Geneva, October 1997. |
Copyright © 2004 ARCON Corp. All
rights reserved.
This page last updated on
11/10/2004