Voice techonology 語音技術(shù)
Now we’re talking?輕而易“語”
Voice technology is making computers less daunting and more accessible
有了語音技術(shù),電腦不再令人敬而遠(yuǎn)之,反而更加平易近人
ANY sufficiently advanced technology, noted Arthur C. Clarke, a British science-fiction writer, is indistinguishable from magic. The fast-emerging technology of voice computing proves his point. Using it is just like casting a spell: say a few words into the air, and a nearby device can grant your wish.
英國科幻小說作家亞瑟·克拉克(Arthur C. Clarke)曾經(jīng)指出,任何科技只要先進(jìn)到足夠的程度,就和魔法沒有區(qū)別。迅速興起的語音電腦證明了他的觀點(diǎn)。它用起來就像是變魔法:對著空氣說句話,附近的智能設(shè)備就會幫你如愿以償。
The Amazon Echo, a voice-driven cylindrical computer that sits on a table top and answers to the name Alexa, can call up music tracks and radio stations, tell jokes, answer trivia questions and control smart appliances; even before Christmas it was already resident in about 4% of American households. Voice assistants are proliferating in smartphones, too: Apple’s Siri handles over 2bn commands a week, and 20% of Google searches on Android-powered handsets in America are input by voice. Dictating e-mails and text messages now works reliably enough to be useful. Why type when you can talk?
亞馬遜智能音箱(Amazon Echo)是一種聲控筒狀臺式電腦,聽到“阿麗夏”(Alexa)這個名字,它就會做出反應(yīng),挑選歌曲,選擇電臺,講笑話,回答各種瑣碎問題,還能控制智能設(shè)備;甚至早在圣誕節(jié)到來之前,它就已經(jīng)入住了4%的美國家庭。語音智能助手還擴(kuò)張到了智能手機(jī)行業(yè):蘋果(Apple)的語音助手西里(Siri)每周要處理超過20億個指令;在美國,安卓手機(jī)的谷歌搜索指令有20%都是通過語音發(fā)布的。電郵和短信的語音輸入技術(shù)已經(jīng)發(fā)展得足夠穩(wěn)定,十分好用了。說話就能解決,何必還去打字?
This is a huge shift. Simple though it may seem, voice has the power to transform computing, by providing a natural means of interaction. Windows, icons and menus, and then touchscreens, were welcomed as more intuitive ways to deal with computers than entering complex keyboard commands. But being able to talk to computers abolishes the need for the abstraction of a “user interface” at all. Just as mobile phones were more than existing phones without wires, and cars were more than carriages without horses, so computers without screens and keyboards have the potential to be more useful, powerful and ubiquitous than people can imagine today.
這種轉(zhuǎn)變非同小可。盡管語音技術(shù)看起如此簡單,但通過提供自然的交流方式,它具備著改變電腦的力量。從Windows操作系統(tǒng),到圖標(biāo)和菜單,再到觸屏技術(shù),這些和電腦打交道的方式更加直觀,比輸入復(fù)雜鍵盤指令更受歡迎。但是,一旦能夠與電腦交談,就不存在將“用戶界面”抽象出來的必要了。就像手機(jī)不光是沒有線的電話,汽車不光是沒有馬的馬車,沒有屏幕和鍵盤的電腦也有潛力變得更好用、更強(qiáng)大、更無所不在,超乎今人的想象。
Voice will not wholly replace other forms of input and output. Sometimes it will remain more convenient to converse with a?machine by typing rather than talking (Amazon is said to be working on an Echo device with a built-in screen). But voice is destined to account for a growing share of people’s interactions with the technology around them, from washing machines that tell you how much of the cycle they have left to virtual assistants in corporate call-centres. However, to reach its full potential, the technology requires further breakthroughs—and a resolution of the tricky questions it raises around the trade-off between convenience and privacy.
語音技術(shù)并不會完全替代其他形式的輸入與輸出。有時候,要和機(jī)器聊天,打字仍舊比語音更容易(據(jù)說,亞馬遜打算研發(fā)一種帶有內(nèi)置屏幕的語音設(shè)備)。不過,從告訴你剩余洗衣時間還有多長的洗衣機(jī),到企業(yè)呼叫中心的虛擬助手,作為人們與周邊科技互動的方式,語音注定會越來越受青睞。然而,要充分發(fā)揮其潛力,語音技術(shù)還需要進(jìn)一步突破,解決好由此產(chǎn)生的微妙問題,拿捏好便利性與否與隱私權(quán)之間的平衡。
Alexa, what is deep learning??
阿麗夏,深度學(xué)習(xí)是啥?
Computer-dictation systems have been around for years. But they were unreliable and required lengthy training to learn a specific user’s voice. Computers’ new ability to recognise almost anyone’s speech dependably without training is the latest manifestation of the power of “deep learning”, an artificial-intelligence technique in which a software system is trained using millions of examples, usually culled from the internet. Thanks to deep learning, machines now nearly equal humans in transcription accuracy, computerised translation systems are improving rapidly and text-to-speech systems are becoming less robotic and more natural-sounding. Computers are, in short, getting much better at handling natural language in all its forms (seeTechnology Quarterly).
電腦指令系統(tǒng)已經(jīng)伴隨我們好些年了。但是,他們性能不穩(wěn)定,需要經(jīng)過長時間訓(xùn)練,才能識別特定用戶的語音。“無需訓(xùn)練就能可靠地識別幾乎任何人的講話,這是電腦的新功能,也是“深度學(xué)習(xí)”能力的最新印證。深度學(xué)習(xí)”是一種人工智能技術(shù),該技術(shù)可讓某種軟件系統(tǒng)接收上百萬次案例訓(xùn)練,這些案例往往是從網(wǎng)絡(luò)上精選出來的。現(xiàn)在,有了深度學(xué)習(xí)技術(shù),機(jī)器在轉(zhuǎn)錄準(zhǔn)確性上,已經(jīng)與人無異,電腦翻譯系統(tǒng)正在飛速發(fā)展,文本轉(zhuǎn)語音系統(tǒng)的機(jī)器人腔越來越少,更加接近自然人聲。簡言之,電腦對各種自然語言的處理能力都今非昔比了。
Although deep learning means that machines can recognise speech more reliably and talk in a less stilted manner, they still don’t understand the meaning of language. That is the most difficult aspect of the problem and, if voice-driven computing is truly to flourish, one that must be overcome. Computers must be able to understand context in order to maintain a coherent conversation about something, rather than just responding to simple, one-off voice commands, as they mostly do today (“Hey, Siri, set a timer for ten minutes”). Researchers in universities and at companies large and small are working on this very problem, building “bots” that can hold more elaborate conversations about more complex tasks, from retrieving information to advising on mortgages to making travel arrangements. (Amazon is offering a $1m prize for a bot that can converse “coherently and engagingly” for 20 minutes.)
盡管深度學(xué)習(xí)意味著機(jī)器能更加可靠的識別人聲,發(fā)音也不再生硬,但是機(jī)器依然無法理解語言的意思。這是語音技術(shù)中最困難的一點(diǎn),真要想蓬勃發(fā)展,這是聲控電腦所必須克服的問題。電腦必須能夠理解文字的意思,才能就某個話題展開連貫對話,而不是像今天常見的,電腦只對簡單的、一次性的語音指令做出回應(yīng)(“嘿,西里,幫我設(shè)個10分鐘的鬧鐘”)。在大大小小的高校和公司中,研究員們正在研究這個問題,設(shè)計能夠針對復(fù)雜任務(wù)進(jìn)行細(xì)致對話的機(jī)器人,從檢索信息,到提供房產(chǎn)按揭建議,再到安排行程等。(亞馬遜設(shè)置了100萬美元獎金,獎勵能夠連貫地愉快聊天20分鐘的機(jī)器人)
When spells replace spelling
?動口代替動手
Consumers and regulators also have a role to play in determining how voice computing develops. Even in its current, relatively primitive form, the technology poses a dilemma: voice-driven systems are most useful when they are personalised, and are granted wide access to sources of data such as calendars, e-mails and other sensitive information. That raises privacy and security concerns.
在語音電腦的發(fā)展問題上,消費(fèi)者和監(jiān)管者也發(fā)揮著一定的決定作用。盡管目前來說,語音技術(shù)尚處于相對原始的發(fā)展階段,但它已然讓人們陷入兩難:聲控系統(tǒng)的個性化程度越高,允許接觸的私人日程、電郵和其他敏感信息越豐富,則發(fā)揮的用處也越大。這引發(fā)了人們對隱私和安全問題的擔(dān)憂。
To further complicate matters, many voice-driven devices are always listening, waiting to be activated. Some people are already concerned about the implications of internet-connected microphones listening in every room and from every smartphone. Not all audio is sent to the cloud—devices wait for a trigger phrase (“Alexa”, “OK, Google”, “Hey, Cortana”, or “Hey, Siri”) before they start relaying the user’s voice to the servers that actually handle the requests—but when it comes to storing audio, it is unclear who keeps what and when.
從更復(fù)雜的角度看,許多聲控設(shè)備一直都處于待命狀態(tài),等待被聲音指令一觸即發(fā)。聯(lián)網(wǎng)的麥克風(fēng)監(jiān)聽著每個房間,每一部智能手機(jī),已經(jīng)有人在擔(dān)心這一切意味著什么了。并非所有的音頻都發(fā)送到了云設(shè)備——在設(shè)備開始將用戶語音傳達(dá)給實際處理語音請求的服務(wù)器之前,他們隨時等待著“一聲令下”(“阿夏麗”,“好,谷歌”,“嘿,微軟小娜”,或“嗨,西里”)——但是,一旦開始儲存音頻,就難說誰會在什么時候保留什么錄音了。
Police investigating a murder in Arkansas, which may have been overheard by an Amazon Echo, have asked the company for access to any audio that might have been captured. Amazon has refused to co-operate, arguing (with the backing of privacy advocates) that the legal status of such requests is unclear. The situation is analogous to Apple’s refusal in 2016 to help FBI investigators unlock a terrorist’s iPhone; both cases highlight the need for rules that specify when and what intrusions into personal privacy are justified in the interests of security.
警方在阿肯色州調(diào)查的一樁謀殺案中,亞馬遜智能音箱有可能聽到了兇殺過程,于是警方要求公司提供該智能設(shè)備可能獲取的任何音頻資料。亞馬遜公司卻拒絕合作,理由(受到了隱私擁護(hù)者們的支持)是這種要求是否合法尚不明確。無獨(dú)有偶,2016年,蘋果供公司也拒絕配合協(xié)助FBI調(diào)查員解鎖一名恐怖分子的蘋果手機(jī);這兩個案例都突出了明確法規(guī)的必要性,出于安全利益考慮,對個人隱私的何時何種侵?jǐn)_屬于合法,應(yīng)該得到明確。
Consumers will adopt voice computing even if such issues remain unresolved. In many situations voice is far more convenient and natural than any other means of communication. Uniquely, it can also be used while doing something else (driving, working out or walking down the street). It can extend the power of computing to people unable, for one reason or another, to use screens and keyboards. And it could have a dramatic impact not just on computing, but on the use of language itself. Computerised simultaneous translation could render the need to speak a foreign language irrelevant for many people; and in a world where machines can talk, minor languages may be more likely to survive. The arrival of the touchscreen was the last big shift in the way humans interact with computers. The leap to speech matters more.
盡管這些問題尚未解決,消費(fèi)者們?nèi)耘f愿意接受語音電腦。在許多情況下,語音比其他交流方式要方便、自然得多。與眾不同的是,當(dāng)你使用它時,還可以同時做其他事情(開車,健身或在街上走路)。語音可以讓由于種種原因不能使用屏幕和鍵盤的人們感受到電腦的力量。它不僅給電腦帶來驚人影響,還影響了語言使用本身。電腦同聲傳譯讓許多人不必會說外語;在一個機(jī)器可以講話的世界中,小語種生存下去的可能性更高。觸屏的到來是人類與電腦互動模式的上一次重大轉(zhuǎn)變。語音的飛躍有過之而無不及。
原文出處:經(jīng)濟(jì)學(xué)人網(wǎng)站
譯者:linda10030
本譯文僅供個人研習(xí)、欣賞語言之用,謝絕任何轉(zhuǎn)載及用于任何商業(yè)用途。本譯文所涉法律后果均由本人承擔(dān)。本人同意簡書平臺在接獲有關(guān)著作權(quán)人的通知后,刪除文章。