伊人婷婷涩六月丁香七月_国产亚洲视频在线免费观看_91本色_久久日本精品字幕区二区_久久久人体_91免费国产视频网站

position: EnglishChannel  > AI ripples> Chinese AI Model Emu3 Handles Text, Image, Video Seamlessly

Chinese AI Model Emu3 Handles Text, Image, Video Seamlessly

Source: Science and Technology Daily | 2024-12-17 15:44:35 | Author: Gong Qian

On October 21, the Beijing Academy of Artificial Intelligence (BAAI), a Chinese non-profit organization engaged in AI R&D, released Emu3, a multimodal AI model that seamlessly integrates text, image, and video modalities into a single, unified framework.

The BAAI research team said Emu3 is expected to be used in scenario applications such as robot brains, autonomous driving, multimodal dialogue and inference.

Emu3, based solely on next-token prediction, proves that next-token prediction can be a powerful paradigm for multimodal models.

The existing multimodal AI models are mostly designed for specific tasks. Each has its corresponding architecture and methods. For instance, in the field of video generation, many developers use the diffusion in time (DiT) architecture, as referenced by Sora. Other models such as Stable Diffusion are used for text-to-image synthesis, Sora for text-to-video conversion, and GPT-4V for image-to-text generation.

In contrast to these models, which have a combination of isolated skills rather than an inherently unified ability, Emu3, eliminates the need for diffusion or compositional approaches. By tokenizing images, text, and videos into a discrete space, BAAI has developed a single transformer from scratch.

Emu3 outperforms several well-established task-specific models in both generation and perception tasks, surpassing flagship models such as SDXL and LLaVA.

In September, BAAI open-sourced the key technologies and models of Emu3 including the chat model and generation model after supervised fine-tuning.

Emu3 has been receiving rave reviews from overseas developers. "For researchers, a new opportunity has emerged to explore multimodality through a unified architecture, eliminating the need to combine complex diffusion models with large language models. This approach is akin to the transformative impact of transformers in vision-related tasks," AI consultant Muhammad Umair said on social media platform Meta.

While next-token prediction is considered a promising path towards artificial general intelligence, it struggled to excel in multimodal tasks, which were dominated by diffusion models such as Stable Diffusion and compositional approaches like CLIP combined with large language models.

Raphael Mansuy, co-founder of QuantaLogic, an AI agent platform, thinks that Em3 has significant implications for Al development. Mansuy wrote on X that Em3's success suggests several key insights: Next-token prediction as a viable path to general multimodal Al; potential for simplified and more scalable model architectures; challenge to the dominance of diffusion and compositional approaches.

Editor:GONG Qian

Top News

Innovation as Engine: How Local Brands Are Going Global

At the forefront of this change are companies that have transformed from local workshops into global leaders through relentless innovation and smart manufacturing.

SKAO Director-General:Science Breaks Down Borders

During the Second Belt and Road Conference on Science and Technology Exchange, SKAO Director-General Philip Diamond said that science breaks down borders.

抱歉,您使用的瀏覽器版本過低或開啟了瀏覽器兼容模式,這會影響您正常瀏覽本網(wǎng)頁

您可以進行以下操作:

1.將瀏覽器切換回極速模式

2.點擊下面圖標升級或更換您的瀏覽器

3.暫不升級,繼續(xù)瀏覽

繼續(xù)瀏覽
主站蜘蛛池模板: 精品欧美一区二区久久久 | 色视频网站在线观看 | 天天热久久 | 亚洲国产美女久久久久 | 久久久久美女 | 欧美一级黄色片免费看 | 久久久亚洲福利精品午夜 | 在线播放免费人成毛片乱码 | 秋霞鲁丝片av无码少妇 | 日本一区二区三区四区在线播放 | 国产精品人妻熟女a8198v久 | 亚洲成人免费视频在线 | 国产精品久久久久久久久久三级 | 啊不要操久久 | 日韩美女中文字幕 | 性xxxxx大片做受免费视 | 精品久久久免费视频 | 狠狠插久久综合网最新章节 | 久久精品国产99久久久 | 成人xxxx| 乱人伦XXXX国语对白 | 鸥美毛片 | 国精品一区| 亚洲成人精品在线播放 | 国产亚洲精品成人av久久ww | 18禁人看免费无遮挡网站不卡 | 天天色踪合| 丰满熟女人妻一区二区三 | 久久久高清免费视频 | 国产午夜无码片在线观看 | 欧美顶级大胆免费视频 | 黄色片aaaa | 国产成人青青热久免费精品 | 亚洲国产精品久久一线不卡 | 国产欧美精品一区二区三区四区 | 久久亚洲综合精品99国产 | 亚洲国产日韩在线观看 | 国产精品国产三级国产普通话一 | 女的被弄到高潮娇喘喷水视频 | 伊人丁香五月婷婷综合激情四射网 | 成人av久久|