Li Yanhong Speech Record: Why is Baidu's brain "the core of the core"? | Baidu World Congress

Lei Fengwang September 1, 2016 news, Baidu World Congress opened today in Beijing. After experiencing many controversies this year, Baidu’s once-a-year world conference is an occasion to show its strength and determination. What Li Yanhong said on the day will define Baidu in the next year, three years or even further. Lei Feng Network (search "Lei Feng Net" public concern) hereby excerpt Li Yanhong's speech at Baidu World Congress today.

Briefly sum up Li Yanhong's speech:

He believes that artificial intelligence is the core of Baidu after mobile Internet. Three years ago, the "Baidu brain" already had the intelligence level of children between two and three years old. Baidu today also has super-large-scale computing resources, China's largest GPU cluster, and the most abundant (if not one) search, image, video, and positioning data in China. What level has Baidu's "brain" grown to?

The following is the full text of Li Yanhong's speech (Lei Feng network did not change the original meaning of the cut):

Before this, the development of the Internet has gone through two important phases, namely the PC Internet phase; the second is the mobile Internet phase in the last four or five years. China's 700 million Internet users have used smart phones, and the penetration rate of the Internet population has exceeded 50%. The development of the Internet cannot be driven by the demographic dividend.

At present, China has entered a new normal state of the economy, and the state has also advocated Internet +. In fact, we hope that we can use the efficiency of the Internet to promote the development of the industry. The entire Chinese economy needs to rely on the Internet, but the growth of the Internet has encountered a bottleneck. In the past, the extensive and brutal growth has disappeared. Internet industry practitioners have felt a deep crisis.

The next scene of mobile internet is artificial intelligence. For Baidu, this is the core of the core. In the past, Baidu invested a lot of research and development efforts. What does it mean for Baidu? Describe the Baidu brain with four words. Three years ago, Baidu said "Baidu brain" concept. At that time, Baidu’s brain already had the intelligence level of a child about two or three years old.

Many people today ask me: How old is Baidu's brain? It is quite difficult to answer this question. The normal development process of Baidu brain and human brain is very different.

Baidu's brain consists of three parts: its computational methods, including ultra-large-scale neural networks, tera-scale parameters, hundreds of billions of samples, and hundreds of billions of feature training; computing capabilities, including hundreds of thousands of servers that make up the "Baidu brain" entity, A few years ago, Baidu began to build China's largest GPU cluster, not the traditional CPU-based server, but the GPU. The GPU was widely used in the game field in the early years and was suitable for in-depth computing; data, web-page data of the entire network, more than a decade Cumulative search data, and billions of images, video, and positioning data. In this way, Baidu's brain can start working.

What is the ability of Baidu's brain? Four capabilities, voice, images, natural language understanding and user portraits.

The first voice has entered a relatively mature stage.

The speech recognition of Baidu's brain has reached the second generation and uses deep learning technology for speech recognition. The accuracy rate of Baidu's speech recognition can reach 97%, which has surpassed people's ability to recognize speech.

Where exactly can such capabilities help us? For example, many companies have their own telesales department. The industry is very mobile, and sales staff need long training to get jobs. Even after training, even salespeople with certain experience, each person's sales efficiency is uneven. However, through Baidu's voice recognition capability, the first day of a new sales job, it is possible to master the best sales ability he has obtained:

When a new sales call is made to a potential customer, every time a customer speaks a word and every time the customer asks a question, Baidu’s brain identifies the problem in real time and displays it on the screen of the selling computer. The screen not only shows the customer’s problem in real time. , And in real time how to best answer the customer's question is also displayed on this screen. In this way, the new sales work is much simpler. In fact, he can basically follow the screen to achieve the best sales skill level.

There are many application scenarios for speech recognition. Everyone can imagine based on their own industry background.

Speech capabilities are divided into two directions, speech synthesis and speech recognition.

Speech synthesis is the conversion of text into speech and reading it with the voice of a natural person, rather than the previously uniform and speechless speech. At present, Baidu's daily request for speech synthesis has reached 250 million times. After the emotional speech synthesis technology came online, Baidu users who listened to fiction with their voices listened to an average of 0.69 hours per day from the past and grew to the current 2.21 hours.

Today, speech synthesis can also be tailored to individual needs to create the ability to spontaneously vocalize, simulating the way any person you like speaks. Baidu map has a Li Yanhong navigation voice package, in fact, I did not say those words, is based on my daily speech synthesis.

We synthesize the voice of Leslie Cheung who passed away 13 years ago. The synthesis of Leslie Cheung's voice is more difficult, and the Mandarin tone is less. The original sounds that Zhang Guorong retained in film and television and radio stations were modeled and synthesized through emotional speech synthesis technology.

As long as anyone can record 50 sentences in 30 minutes as required, they can use Baidu brain's speech synthesis technology to simulate the person's voice. Everyone can have their own sound model.

How is the second image recognition capability implemented?

From the technical point of view, it is through the extraction of the key points of human face features, and finds the most different place among them, which constitutes a facial expression. When a person's expression changes, his facial expression characteristics do not change.

In addition to face recognition, Baidu often encounters image recognition needs in other scenes. Computer vision or image recognition technology is the last mile of a car. In the past year, Baidu has spent a lot of effort to improve the level of Baidu's unmanned vehicles in urban roads. During the driving process, Baidu’s unmanned vehicles can sense the presence of vehicles, road signs, and a variety of obstacles on the road. Each object has a unique number that is easily identified by the vehicle. The car's identification is identified by Baidu's brain. The latest results of KITTI's evaluation in August this year show that Baidu's unmanned vehicles rank first in vehicle inspections, and vehicle tracking is the fourth of six indicators.

The next is Augmented Reality, where advertisers can link real-world products with the scenes they wish to present to consumers. L'Oreal collaborates with Baidu to take shampoos to accurately identify and interact with users.

The third type is natural language processing capability, which is currently not as developed as image recognition.

The degree of secret that was launched at Baidu World Congress last year was a personal assistant. Now, the way in which Secret and users communicate with each other is very different from traditional search. 56% of them are accomplished through speech or images. The interaction between voice and images is gradually exceeding the frequency of use of traditional texts and becomes a requirement for people to express their needs. Mainstream approach.

In addition, the core technology is to use human voice to communicate, although not every degree of understanding can be understood. During this year's Olympic Games, Baidu Robot’s famous partner Yang Yi, a secret partner, performed the first quarter finals of the Rio Olympic Games men’s basketball team.

The last user portrait ability.

Through a large amount of user behavior data, Baidu's brain can describe the basic characteristics of the famous actor Hu Ge's fan community. Hu Ge Fan Group Portrait We look at the interests and preferences of the movie, music, public welfare, travel, and travel. We see that the biggest feature here is travel, not film and television music, or the thinking of ordinary people is not the same.

Posted on