When Hongzhi Gao was young, he lived with his family in Gansu, a province in the center of northern China on the Tengger Desert. When he thinks back to his childhood, he remembers the constant, steady wind of filth outside their house, and that for most months of the year it took no more than a minute after he went outside to sand every empty space filled and crawled into his pockets, boots and his mouth. The monotony of the desert stuck in his head for years, and at university he translated this memory into the idea of building a machine that could bring plants into the desert landscape.
Efforts to halt desertification – the process that turns fertile land into desert – have mainly focused on expensive manual solutions. Hongzhi has developed a robot with deep learning technology to automate the process of tree planting: from identifying optimal spots, to planting tree saplings, to watering. Although he had no experience with AI, as an undergraduate student, Hongzhi used Baidu’s deep learning platform PaddlePaddle to combine different modules to build a robot with better object recognition capabilities than comparable machines already on the market. It took Hongzhi and friends less than a year to develop and commission the final product.
Hongzhi’s desert robot is a telling example of the increasing accessibility of artificial intelligence.
Today, more than four million developers use Baidu’s open source AI technology to create solutions that can improve the lives of people in their communities, and many of them have little to no technical expertise in the field. “Within the next decade, AI will be the source of changes that are taking place in all areas of our society and changing the way industries and companies work. The technology will expand the human experience by immersing us deeper in the digital world, ”said Robin Li, CEO of Baidu, at Baidu Create 2021, an AI developer conference.
As we open a new chapter in AI evolution, Baidu’s CTO Haifeng Wang identified two key trends that are propping up the industry’s path: AI will continue to mature and increase in technical complexity. At the same time, deployment costs and the barrier to entry will decrease, benefiting both companies developing AI-powered solutions on a large scale and software developers exploring the world of AI.
Merging knowledge and data with deep learning
The integration of knowledge and data with deep learning has significantly improved the efficiency and accuracy of AI models. Since 2011, Baidu’s AI infrastructure has been collecting new information and integrating it into a large-scale knowledge graph. Currently, this knowledge graph contains more than 550 billion facts, covering all aspects of daily life and industry-specific topics, including manufacturing, pharmaceuticals, law, financial services, technology, and media and entertainment.
This knowledge graph and the massive data points together form the building blocks of Baidu’s newly published pre-trained language model PCL-BAIDU Wenxin (version ERINIE 3.0 Titan). The model outperforms other language models without knowledge graphs in 60 natural language processing (NLP) tasks, including reading comprehension, text classification, and semantic similarity.
Learnings about modalities
Cross-modal learning is a new area of AI research that aims to improve the cognitive understanding of machines and better mimic human adaptive behavior. Examples of research efforts in this area include automatic text-to-image synthesis, in which a model is trained to generate images from textual descriptions alone, and algorithms designed to understand visual content and express that understanding with words . The challenge in these tasks is for the machines to establish semantic connections across different types of data sets (e.g. images, text) and to understand the mutual dependencies between them.
The next step for AI is to merge AI technologies such as computer vision, speech recognition and natural language processing into a multimodal system.
On this front, Baidu has introduced a variant of its NLP models that combine language and visual semantic understanding. Examples of real-world applications for this type of model are digital avatars that perceive their surroundings as humans and provide customer support for companies, as well as algorithms that can “draw” works of art and compose poems based on their understanding of the works of art generated.
There are even more creative and impactful potential outcomes for this technology. The PaddlePaddle platform can build semantic connections between vision and language, prompting a group of masters students in China to create a dictionary to preserve endangered languages in regions like Yunnan and Guangxi by translating them into simplified Chinese more easily.
AI integration in software and hardware as well as in industry-specific use cases
As AI systems are used to solve increasingly complex and industry-specific problems, more emphasis is placed on optimizing the software (deep learning framework) and hardware (AI chip) as a whole, rather than individually taking into account factors such as Computing power, power consumption and latency.
Additionally, tremendous innovations are taking place at the platform level of Baidu’s AI infrastructure, where third-party developers leverage deep learning capabilities to create new applications tailored to specific use cases. The PaddlePaddle platform has a number of APIs to support AI applications in newer technologies such as quantum computing, life sciences, computational fluid dynamics, and molecular dynamics.
AI also has practical uses. For example, in Shouguang, a small town in Shandong Province, AI is being used to streamline the fruit and vegetable industry. It only takes two people and an app to manage dozens of vegetable sheds.
And that’s remarkable, says Wang: “Despite the increasing complexity of AI technology, the open source deep learning platform brings the processor and applications together like an operating system, reducing the barriers to entry for companies and individuals using AI want to integrate their business. “
Reduced entry barrier for developers and end users
On the technology front, pre-training for large models such as PCL-BAIDU Wenxin (ERNIE 3.0 Titan version) has resolved many common bottlenecks in traditional models. For example, these multipurpose models have helped lay the foundation for performing different types of NLP downstream tasks such as text classification and question answering in one consolidated place, whereas in the past each type of task had to be solved by a separate model.
PaddlePaddle also has a number of developer-friendly tools, such as model compression technologies, to customize the general-purpose models for more specific use cases. The platform offers an officially supported library of industrial models with more than 400 models, from large to small, that maintain only a fraction of the size of the general-purpose models but can achieve comparable performance, reducing model development and deployment costs.
Today, Baidu’s open source deep learning technology supports a community of more than four million AI developers who have jointly developed 476,000 models and thus contribute to the AI-driven transformation of 157,000 companies and institutions. The examples listed above are the result of innovations at all levels of the Baidu AI infrastructure that integrates technologies such as speech recognition, computer vision, AR / VR, knowledge graphs, and pre-training of large models that are one step closer to perceiving the world like humans .
In its current state, AI has reached a level of maturity that enables it to do amazing tasks. For example, the recent launch of Metaverse XiRang would not have been possible without PaddlePaddle’s platform to create digital avatars for participants around the world to connect from their devices. Additionally, future breakthroughs in areas such as quantum computing could significantly improve the performance of metaverse. This shows how Baidu’s various offerings are interwoven and interdependent.
In a few years, AI will become the core of our human experience. It will be to our society what steam power, electricity, and the Internet were to previous generations. As AI becomes more complex, developers like Hongzhi will work more than artists and designers, given the creative freedom to explore use cases that were previously only possible in theory. The sky is the limit.
This content was created by Baidu. It was not written by the editorial staff of the MIT Technology Review.