Cloud terminal dual scenarios, three proprietary chips each show its

AI (artificial intelligence) has been buoyant for decades and has been surging in the "expected-disappointed-progressive-expected" cycle. According to the report released by CCID Consulting, the global artificial intelligence market reached US$29.3 billion in 2016. We expect the global artificial intelligence market will reach US$120 billion by 2020, with a compound growth rate of approximately 20%. Artificial intelligence chips are an important part of the artificial intelligence market. According to data from companies such as Nvidia, AMD, Xilinx, and Google, we estimate that the artificial intelligence chip market in 2016 will reach US$2.388 billion, accounting for the global artificial intelligence market size. 8.15%, and by 2020, the market size of artificial intelligence chips will reach US$14.616 billion, accounting for approximately 12.18% of the global artificial intelligence market. The market for artificial intelligence chips is extremely vast. Chip-bearing algorithm is the commanding point of competition Artificial intelligence is based on algorithms. Deep learning is currently the most popular artificial intelligence algorithm. Deep learning, also called deep neural networks (DNN), has evolved from previous Artificial Neural Networks (ANN) models. This type of model is generally expressed visually using a graph model in computer science. The “depth” of deep learning refers to the number of layers in the graph model and the number of nodes in each layer. The complexity of neural networks is continuously increasing. From the earliest single neurons to the AlexNet (8 network layers) proposed in 2012, and to ResNET (150 network layers) proposed in 2015, the complexity between levels is increasing geometrically. Corresponding to the explosion of demand for processor computing power. Deep learning leads to a dramatic increase in computational complexity, which places higher demands on computing hardware. The deep learning algorithm is divided into two processes: "training" and "inference". Simply put, artificial intelligence needs to obtain various parameters through "training" based on big data, and pass these parameters to the "inference" part to get the final result. The type of neural network operation required for "training" and "inference" is different. Neural networks are divided into forward calculations (including matrix multiplication, convolution, and circular layers) and backward updates (mainly gradient calculations), both of which contain a large number of parallel operations. The operations required for "training" include "forward calculation + backward update"; "inference" is mainly "forward calculation". In general, the training process is more computationally intensive than the inference process. In general, the cloud artificial intelligence hardware is responsible for "training + inference", and the terminal artificial intelligence hardware is only responsible for "inference". "Training" requires big data to support and maintain a high degree of flexibility, usually in the "cloud" (ie server). In the artificial intelligence training process, a massive data set is needed on the top level and a certain deep learning model is selected. Each model has some internal parameters that need to be flexibly adjusted in order to learn the data. This parameter adjustment can actually be attributed to the optimization problem. When these parameters are adjusted, it is equivalent to optimizing specific constraints. This is called “training”. After the cloud server collects user big data, it relies on its powerful computing resources and proprietary hardware to implement the training process and extract corresponding training parameters. Because the deep learning training process requires massive data sets and huge calculations, it also puts higher requirements on the server. The cloud AI server platform in the future needs to have features such as considerable data levels, streamlined parallelism, multi-threading, and high memory bandwidth. The "inference" process can be performed either in the cloud (server side) or in the terminal (product side). After waiting for the training of the model, the trained model (mainly various parameters obtained through training) is used for various application scenarios (such as image recognition, speech recognition, text translation, etc.). The "application" process mainly includes a large number of multiply-accumulate matrix operations, and the amount of parallel computation is large. However, the "training" process is relatively solid and does not require big data support. It can also be implemented in the terminal in addition to the server-side implementation. After the “inference” required parameters can be “trained” by the cloud, they are regularly downloaded and updated to the terminal. The traditional CPU has insufficient computing power. New architecture chips support AI. The core chip determines the infrastructure and development ecology of the computing platform. Because the deep learning required by the AI ​​requires a high level of internal parallelism, a large number of floating-point computing capabilities, and matrix operations, the traditional CPU-based computing architecture cannot fully meet the artificial intelligence high-performance parallelism. The need for computing (HPC) requires the development of a dedicated chip for an artificial intelligence architecture. Exclusive hardware acceleration is the mainstream of new architecture chips. There are two development paths for the current optimization of processor chips for artificial intelligence hardware: (1) Continuation of traditional computing architectures and acceleration of hardware computing capabilities: represented by GPUs, FPGAs, ASICs (TPUs, NPUs, etc.), and using these proprietary chips as the Auxiliary, with the control of the CPU, specializing in various operations related to artificial intelligence; (2) Completely subvert the traditional computing architecture, using simulation of human brain neuron structure to enhance computing capacity, represented by IBMTrueNorth chip, due to technology and underlying hardware Constraints, the second path is still in the early stage of R&D, and currently there is no possibility of large-scale commercial application. From the perspective of technology maturity and commercial feasibility, we believe that using AI-specific hardware for accelerated computing is the mainstream of the market for the next five years or more. Cloud terminal dual scenarios, three proprietary chips each show its We have summarized artificial intelligence hardware application scenarios into two categories: cloud scenarios and terminal scenarios. The cloud mainly refers to the server side, including various common cloud, private cloud, data center, and other business areas; the terminal mainly refers to mobile terminals including various applications such as security, automotive, mobile phones, speakers, and robots. Since the efficiency of the algorithm is closely related to the choice of underlying hardware, the hardware requirements for the “cloud” (server side) and “terminal” (product side) scenarios are also different. In addition to the CPU, artificial intelligence currently uses three specialized core chips, namely, GPU, FPGA, and ASIC. GPU: preemptive "decathlon" players, cloud terminals are the top picks. GPU (GraphicsProcessingUnit), also known as graphics processor, was previously used as a microprocessor for image computing. Compared to CPUs, GPUs are better suited for performing complex mathematical and geometric calculations (especially parallel operations) and are just matched to AI deep learning algorithms that contain a large number of parallel operations. Therefore, GPUs have just been given a new mission in the age of artificial intelligence. It became the first choice for artificial intelligence hardware, and it took the lead in landing various scenarios in the cloud and terminals. At present, as the main chip for AI "training" in the cloud, GPUs are the first to land in the security and automotive fields of terminals, and are the most widely used and most flexible AI hardware. FPGA: "Transformers", the best choice for the stage before the algorithm is finalized. Field-programmable gate array (FPGA) is a "universal chip" that users can repeatedly program according to their own needs. After programming, the function is equivalent to an ASIC (application-specific integrated circuit), which has the characteristics of high efficiency and low power consumption. At the same time, due to the flexibility of programming, there will be a lot of redundancy on the circuit. Therefore, the cost cannot be the same as the ASIC. Excellent, and the operating frequency can not be too high (usually less than 500MHz). Compared with GPUs, FPGAs have low power consumption advantages, and have faster development cycles and more flexible programming than ASICs. The development of FPGAs in the gap between "application outbreaks" and "ASIC mass production" is a good trade-off between efficiency and flexibility. "Run against time" has great advantages before the algorithm is finalized. In the cloud data center business at this stage, FPGAs are expected to break out of the market after GPUs due to their flexibility and in-depth optimization. In the current field of terminal smart security, there are currently vendors that use FPGAs to implement AI hardware acceleration. ASIC: "Specialized professional player", decided efficiency, AI chip best choice in the future. ASIC (Application Specific Integrated Circuit) is an application-specific integrated circuit. In this article, it refers specifically to a processor chip that is designed specifically for AI applications and has a dedicated architecture. The dazzling variety of chips that have emerged in recent years, such as TPUs, NPUs, VPUs, and BPUs, essentially belong to ASICs. Regardless of performance, area, power consumption and other aspects, AISC is superior to GPU and FPGA. In the long term, both in the cloud and in the terminal, ASIC represents the future of AI chips. However, in the AI ​​algorithm is still booming, rapid iteration today, ASIC has a long development cycle, the need for low-level hardware programming, low flexibility and other disadvantages, so the development speed is lower than GPU and FPGA. In this report, we carefully analyze the application status, development prospects, and possible changes of the three proprietary AI chips under the two cloud and terminal application scenarios. Cloud scene: GPU ecology leads, future multi-chip complementary coexistence Core Conclusions: Processors suitable for parallel computing such as GPUs and TPUs will become the main devices supporting artificial intelligence operations in the future. There will be both competition and long-term coexistence, and they can cooperate with each other to a certain extent; FPGA is expected to take more roles in data center services, mainly in the cloud. As an effective supplement exists; CPU will be "smaller" and still serve as a control center. Future development prospects of the chip depend on ecology, and it is expected to be unified under several mainstream software frameworks to form a multi-chip collaboration scenario with cloud CPU+GPU/TPU+FPGA (optional). (1) Relying on big data, technology giants have different technology path layouts AI cloud platform Based on the cloud platform, major tech giants vigorously deployed artificial intelligence. Cloud computing is divided into three layers, which are Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (Paas), Software (software)-as-a-Service (Saas ). The infrastructure is at the bottom, the platform is in the middle, and the software is at the top. IaaS provides off-site servers, storage and networking hardware. Big data provides information sources for artificial intelligence, and cloud computing provides a platform for artificial intelligence. The key technology of artificial intelligence has made breakthroughs in the context of cloud computing and the growing maturity of big data. At present, major technology giants are optimistic about the future development trend of artificial intelligence toward the cloud, and they have adopted artificial intelligence systems on their own cloud platforms in order to use the value of big data mining deposited in the cloud. (2) One hundred billion US dollars cloud service market, AI chip development potential One hundred billion US dollars cloud service market, cloud computing hardware market is huge. The market size of cloud computing is gradually expanding. According to Gartner's statistics, the typical cloud service market represented by IaaS, PaaS, and SaaS in 2015 reached US$52.24 billion, a growth rate of 20.6%, and is expected to reach US$143.53 billion by 2020, a compound annual growth rate. Up to 22%. Among them, IaaS Company's market space will reach 61.5 billion U.S. dollars by 2020, accounting for 43% of the total cloud computing market. The cloud computing hardware market has a huge space, and cloud computing and artificial intelligence are closely related to various acceleration algorithms. The future of cloud computing hardware cannot be separated. AI chip acceleration. Cloud AI chips have great potential for development. According to the financial data of Nvidia and AMD, we expect that the GPU will reach a market size of about $5 billion in data center business by 2020. At the same time, according to FPGA vendors such as Xilinx and Altra, we expect FPAG data center business to reach US$2 billion by 2020. Coupled with the upcoming ASIC cloud market space, we expect that by 2020 the cloud AI chip market will reach 10.568 billion US dollars, AI chips in the cloud will become an important part of cloud computing, development potential is huge. (3) Status Quo of Cloud Microcontrollers: GPU leadership, FPGAs, and ASICs AI chips are based on big data in the cloud and the core is responsible for "training." The feature of the cloud is "big data + cloud computing". Users can rely on big data to perform full data analysis and data mining, extract all kinds of data features, and fully integrate with artificial intelligence algorithms to perform cloud computing, resulting in a variety of server-side AI+ application. The AI ​​chip is the hardware responsible for accelerating various complex algorithms for artificial intelligence. Due to the large amount of related computations, the CPU architecture proves to be incapable of satisfying the artificial intelligence algorithm that needs to deal with a large number of parallel computations, and it needs a chip more suitable for parallel computation. Therefore, various chips such as GPU, FPGA, and TPU emerge as the times require. The AI ​​chip can also take on the "training" and "inference" processes of artificial intelligence in the cloud. The status of cloud chips: GPUs occupy the market dominated by artificial intelligence in the cloud, and ASICs represented by TPU are currently only used in the closed-loop ecology of giants, and FPGAs have developed rapidly in data center services. The GPU application development cycle is short, the cost is relatively low, and the technology system is mature. Currently, major companies in the world, such as Google, Microsoft, Amazon, Alibaba, and other mainstream companies all use GPUs for AI calculations. In addition to using a large number of GPUs, Google is striving to develop its own ASIC-specific ASIC chips. Compared with GPUs, the TPU introduced in May this year consumes 60% less power and 40% smaller chip area, which can better meet its huge AI computational requirements. However, due to the rapid iteration of artificial intelligence algorithms, TPU is currently only available. Google's own use, follow-up with the maturity of TensorFlow, TPU also has the possibility of external supply, but commonality still has a long way to go. Baidu and other vendors are actively adopting FPGAs for cloud acceleration in their data center services. FPGA can be seen as a key transition from GPU to ASIC. Compared with GPUs, it is possible to go deeper into hardware-level optimizations. Compared to ASICs, the ASICs are more flexible in terms of continuous iterative evolution and the development time is shorter. The ASIC-specific architecture chip (ASIC) has been proven to have better performance and power consumption, and is expected to become the mainstream of artificial intelligence hardware in the future. (4) Cloud GPU: mainstream cloud AI chips, first-mover advantage Development status: GPU is naturally suitable for parallel computing and is currently the most widely used chip in cloud AI. The GPU currently has the widest range of cloud applications. Companies that are heavily involved in artificial intelligence currently use GPUs to accelerate their efforts. According to Nvidia’s official data, more than 19,000 companies in 2016 were involved in developing deep learning projects with Nvidia, compared to 1,500 in 2014. At present, IT giants such as Baidu, Google, Facebook, and Microsoft all use NVIDIA's GPUs to accelerate their artificial intelligence projects. GPUs are currently the most widely used in cloud AI deep learning scenarios, and are expected to have the first-mover advantage due to their good programming environment. The future will continue to be strong. GPU chip architecture takes off the image processing and has powerful parallel computing capabilities. GPU (GraphicsProcessingUnit), also known as the visual processor, is a microprocessor that has been previously used in personal computers, workstations, game consoles, and mobile devices (such as tablet PCs, smart phones, etc.) and is used exclusively for image computing. Similar to the CPU can be programmed, but more suitable than the CPU to perform complex mathematical and geometric calculations, especially parallel operations. Internally with a highly parallel structure, it has higher efficiency than CPU in processing graphics data and complex algorithms. The GPU is significantly different from the CPU structure and is more suitable for parallel computing. Comparing the difference in structure between GPU and CPU, most of the area of ​​the CPU is controllers and registers. The GPU has more ALUs (Arithmetic Logic Units) for data processing than data cache and flow control. Suitable for parallel processing of intensive data. When the CPU performs a calculation task, only one data is processed at a time, there is no true parallelism, and the GPU has multiple processor cores, and multiple data can be processed in parallel at the same time. Compared with the CPU, GPU performance in the AI ​​field has absolute advantages. Deep learning in neural network training requires high intrinsic degree of parallelism, a large number of floating-point computational capabilities, and matrix operations, while GPUs can provide these capabilities and, with the same accuracy, have faster speeds than traditional CPUs. Processing speed, less server investment and lower power consumption. At the GPU Technology Conference in San Jose, California on May 11, 2017, NVIDIA has already released the Tesla V100. Volta, the most powerful GPU computing architecture at the moment, uses TSMC's 12nm FFN process and integrates 21 billion transistors, which is equivalent to 250 CPUs in processing deep learning.

Posted on