{"id":12129,"date":"2025-08-09T09:16:11","date_gmt":"2025-08-09T00:16:11","guid":{"rendered":"https:\/\/aireviewirush.com\/?p=12129"},"modified":"2025-08-09T09:16:12","modified_gmt":"2025-08-09T00:16:12","slug":"shengshu-expertise-launches-vidar-multi-view-bodily-ai-coaching-mannequin","status":"publish","type":"post","link":"https:\/\/aireviewirush.com\/?p=12129","title":{"rendered":"ShengShu Expertise launches Vidar multi-view bodily AI coaching mannequin"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<div id=\"attachment_584898\" style=\"width: 780px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-584898\" class=\"size-full wp-image-584898\" src=\"https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/AdobeStock_1425967720-copy.jpg\" alt=\"An AI image. The Vidar embodied AI model uses simulated worlds instead of physical training data.\" width=\"770\" height=\"494\" srcset=\"https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/AdobeStock_1425967720-copy.jpg 770w, https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/AdobeStock_1425967720-copy-300x192.jpg 300w, https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/AdobeStock_1425967720-copy-150x96.jpg 150w, https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/AdobeStock_1425967720-copy-768x493.jpg 768w, https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/AdobeStock_1425967720-copy-368x236.jpg 368w\" sizes=\"auto, (max-width: 770px) 100vw, 770px\"\/><\/p>\n<p id=\"caption-attachment-584898\" class=\"wp-caption-text\">The Vidar embodied AI mannequin from ShengShu makes use of simulated worlds as a substitute of bodily coaching information. Supply: Adobe Inventory, Vectorhub by ice<\/p>\n<\/div>\n<p>ShengShu Expertise Co. yesterday launched its multi-view bodily AI coaching mannequin, Vidar \u2014 which stands for for \u201cvideo diffusion for motion reasoning.\u201d Utilizing Vidu\u2019s capabilities in semantic and video understanding, Vidar makes use of a restricted set of bodily information to simulate a robotic\u2019s decision-making in real-world environments, stated the corporate.<\/p>\n<p>\u201cVidar affords a radically totally different strategy to coaching embodied AI fashions,\u201d said ShengShu Expertise. \u201cSimply as Tesla focuses on vision-based coaching and Waymo leans into lidar, the {industry} is exploring divergent paths to bodily AI.\u201d<\/p>\n<p>Based in March 2023, ShengShu Expertise specializes within the growth of multimodal massive language fashions (LLMs). The Beijing-based firm stated it delivers mobility-as-a-service (MaaS) and software-as-a-service (SaaS) merchandise for smarter, quicker, and extra scalable content material creation.<\/p>\n<p>With its flagship video-generation platform <a href=\"https:\/\/www.vidu.com\/\" target=\"_blank\" rel=\"noopener\">Vidu<\/a>, ShengShu stated it has reached customers in additional than 200 international locations and areas world wide, spanning fields together with interactive leisure, promoting, movie, animation, cultural tourism, and extra.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_53 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title \" >Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\" role=\"button\"><label for=\"item-6a29a459ed836\" ><span class=\"\"><span style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input aria-label=\"Toggle\" aria-label=\"item-6a29a459ed836\"  type=\"checkbox\" id=\"item-6a29a459ed836\"><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/aireviewirush.com\/?p=12129\/#Vidar_simulated_coaching_to_speed_up_robotic_growth\" title=\"Vidar simulated coaching to speed up robotic growth\">Vidar simulated coaching to speed up robotic growth<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/aireviewirush.com\/?p=12129\/#Vidar_a_framework_for_scalable_embodied_intelligence\" title=\"Vidar a framework for scalable embodied intelligence\">Vidar a framework for scalable embodied intelligence<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/aireviewirush.com\/?p=12129\/#ShengShu_marks_milestones_in_multimodal_AI\" title=\"ShengShu marks milestones in multimodal AI\">ShengShu marks milestones in multimodal AI<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Vidar_simulated_coaching_to_speed_up_robotic_growth\"><\/span>Vidar simulated coaching to speed up robotic growth<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>\u201cWhereas some corporations prepare bodily <a href=\"https:\/\/www.therobotreport.com\/category\/design-development\/ai-cognition\/\" target=\"_blank\" rel=\"noopener\">AI<\/a> by embedding fashions into real-world robots and amassing information by the bodily interactions that their robots encounter, it\u2019s a way that\u2019s pricey, hardware-dependent, and troublesome to scale,\u201d stated ShengShu Expertise. \u201cOthers depend on purely simulated coaching, however this typically lacks the variability and edge-case information wanted for real-world deployment.\u201d<\/p>\n<p>Vidar takes a special strategy, the corporate claimed. It combines restricted bodily coaching information with generative video to make predictions and generate new hypothetical eventualities, making a multi-view <a href=\"https:\/\/www.therobotreport.com\/category\/software-simulation\/\" target=\"_blank\" rel=\"noopener\">simulation<\/a> that includes lifelike coaching environments, all inside a digital area. This permits for extra sturdy, scalable coaching with out the time, value, or limitations of physical-world information assortment, defined ShengShu.<\/p>\n<p>Constructed on high of the Vidu generative video mannequin, Vidar can carry out dual-arm manipulation duties with multi-view video prediction and even reply to natural-language voice instructions after fine-tuning. <a href=\"https:\/\/embodiedfoundation.github.io\/vidar_anypos\" target=\"_blank\" rel=\"noopener\">The mannequin<\/a> successfully serves as a digital mind for real-world motion, stated the corporate.<\/p>\n<p>Utilizing Vidu\u2019s generative video engine, Vidar generates large-scale simulations to scale back dependency on bodily information, whereas sustaining the complexity and richness wanted to coach real-world-capable AI brokers. ShengShu stated Vidar can extrapolate a generalized collection of robotic actions and duties from solely 20 minutes of coaching information. The corporate asserted that&#8217;s between 1\/80 and 1\/1,200 of the information wanted to coach industry-leading fashions together with RDT and \u03c00.5.<\/p>\n<p>ShengShu stated Vidar\u2019s core innovation lies in its modular two-stage studying structure. Not like conventional strategies that merge notion and management, Vidar decouples them into two distinct phases for larger flexibility and scalability.<\/p>\n<p>Within the upstream stage, large-scale basic video information and moderate-scale embodied video information are used to coach Vidu\u2019s mannequin for perceptual understanding.<\/p>\n<p>Within the second downstream stage, a task-agnostic mannequin referred to as AnyPos turns that visible understanding into actionable motor instructions for robots. This separation makes it considerably simpler and quicker to coach and deploy AI throughout several types of robots, whereas reducing prices and rising scalability.<\/p>\n<div id=\"attachment_584894\" style=\"width: 780px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-584894\" class=\"size-full wp-image-584894\" src=\"https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/Vidar-1.jpg\" alt=\"Vidar can reduce the amount of training data needed to train AI models, says ShengShu Technology.\" width=\"770\" height=\"491\" srcset=\"https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/Vidar-1.jpg 770w, https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/Vidar-1-300x191.jpg 300w, https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/Vidar-1-150x96.jpg 150w, https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/Vidar-1-768x490.jpg 768w, https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/Vidar-1-268x170.jpg 268w, https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/Vidar-1-368x235.jpg 368w\" sizes=\"auto, (max-width: 770px) 100vw, 770px\"\/><\/p>\n<p id=\"caption-attachment-584894\" class=\"wp-caption-text\">Vidar is designed to scale back the quantity of coaching information wanted to coach AI fashions. Supply: ShengShu Expertise.<\/p>\n<\/div>\n<h2><span class=\"ez-toc-section\" id=\"Vidar_a_framework_for_scalable_embodied_intelligence\"><\/span>Vidar a framework for scalable embodied intelligence<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Vidar follows a scalable coaching framework impressed by language and picture basis fashions of the previous decade of AI breakthroughs. ShengShu stated its three-tiered information pyramid, spanning large-scale generic video, embodied video information, and robot-specific examples, makes for a extra versatile system, lowering conventional information bottleneck.<\/p>\n<p>Constructed on the U-ViT structure, which explores the fusion of diffusion fashions and transformer architectures for a large assortment of multimodal era duties, Vidar harnesses long-term temporal modeling and multi-angle video consistency to energy bodily grounded decision-making.<\/p>\n<p>This design helps speedy switch from simulation to real-world deployment, which ShengShu stated is essential for robotics in dynamic environments. It additionally minimizes engineering complexity, in keeping with the corporate,<\/p>\n<p>ShengShu stated Vidar can facilitate robotics adoption throughout a number of sectors. From dwelling assistants and eldercare to good manufacturing and medical robotics, the mannequin allows quick adaptation to new environments and multi-task eventualities, all with minimal information, it added.<\/p>\n<p>Vidar creates an AI-native path for robotics growth that&#8217;s environment friendly, scalable, and cost-effective, ShengShu claimed. By reworking basic video into actionable robotic intelligence, the corporate stated its mannequin can bridge the hole between visible understanding and embodied company.<\/p>\n<div id=\"attachment_584895\" style=\"width: 1230px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-584895\" class=\"size-full wp-image-584895\" src=\"https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/Vidar-2.jpg\" alt=\"Vidar has a modular learning architecture, according to ShengShu Technology.\" width=\"1220\" height=\"139\" srcset=\"https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/Vidar-2.jpg 1220w, https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/Vidar-2-300x34.jpg 300w, https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/Vidar-2-1024x117.jpg 1024w, https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/Vidar-2-150x17.jpg 150w, https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/Vidar-2-768x88.jpg 768w, https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/08\/Vidar-2-368x42.jpg 368w\" sizes=\"auto, (max-width: 1220px) 100vw, 1220px\"\/><\/p>\n<p id=\"caption-attachment-584895\" class=\"wp-caption-text\">Vidar has a modular studying structure. Supply: ShengShu Expertise<\/p>\n<\/div>\n<h2><span class=\"ez-toc-section\" id=\"ShengShu_marks_milestones_in_multimodal_AI\"><\/span>ShengShu marks milestones in multimodal AI<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Vidar builds on the speedy momentum of the Vidu video basis mannequin, stated ShengShu. The corporate listed statistics since its debut:<\/p>\n<ul>\n<li>Vidu reached 1 million customers inside one month<\/li>\n<li>Surpassed 10 million customers in simply three months<\/li>\n<li>Generated over 100 million movies by Month 4<\/li>\n<li>Reference-to-video era exceeded 100 million by Month 8<\/li>\n<li>Whole generated movies now high 300 million<\/li>\n<\/ul>\n<p>ShengShu continues to broaden the frontiers of multimodal AI, Vidar represents the subsequent frontier\u2014bringing generalization, generativity, and embodiment into one unified system.<\/p>\n<p><strong>Editor\u2019s word:<\/strong>\u00a0<a href=\"https:\/\/www.robobusiness.com\/\" target=\"_blank\" rel=\"noopener\">RoboBusiness<\/a>\u00a02025, which might be on Oct. 15 and 16 in Santa Clara, Calif., will embrace tracks on\u00a0<a href=\"https:\/\/www.therobotreport.com\/nvidia-vp-deepu-talla-to-discuss-physical-ai-at-robobusiness\/\" target=\"_blank\" rel=\"noopener\">bodily AI <\/a>and <a href=\"https:\/\/robobdtw2025.mapyourshow.com\/8_0\/explore\/session-gallery.cfm?sessiontrack=Humanoids&amp;sessiontype=RoboBusiness%20Breakout~~~~RoboBusiness%20Keynote~~~~RoboBusiness%20Networking%20Opportunity\" target=\"_blank\" rel=\"noopener\">humanoid<\/a>\u00a0robots. Registration is now open.<\/p>\n<hr\/>\n<div style=\"text-align: center;\"><a href=\"https:\/\/www.robobusiness.com\/\" target=\"_blank\" rel=\"noopener\">&#13;<br \/>\n<img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-568305\" src=\"https:\/\/www.therobotreport.com\/wp-content\/uploads\/2025\/06\/ROBO25_RegOpen-2_728x90_Vs1.jpg\" alt=\"SITE AD for the 2025 RoboBusiness registration open.\" width=\"728\" height=\"90\"\/><\/a><\/div>\n<hr\/>\n<p><!--<rdf:RDF xmlns:rdf=\"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#\"\n\t\t\txmlns:dc=\"http:\/\/purl.org\/dc\/elements\/1.1\/\"\n\t\t\txmlns:trackback=\"http:\/\/madskills.com\/public\/xml\/rss\/module\/trackback\/\">\n\t\t<rdf:Description rdf:about=\"https:\/\/www.therobotreport.com\/shengshu-technology-launches-vidar-multi-view-physical-ai-training-model\/\"\n    dc:identifier=\"https:\/\/www.therobotreport.com\/shengshu-technology-launches-vidar-multi-view-physical-ai-training-model\/\"\n    dc:title=\"ShengShu Technology launches Vidar multi-view physical AI training model\"\n    trackback:ping=\"https:\/\/www.therobotreport.com\/shengshu-technology-launches-vidar-multi-view-physical-ai-training-model\/trackback\/\" \/>\n<\/rdf:RDF>-->\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>The Vidar embodied AI mannequin from ShengShu makes use of simulated worlds as a substitute of bodily coaching information. Supply: Adobe Inventory, Vectorhub by ice ShengShu Expertise Co. yesterday launched its multi-view bodily AI coaching mannequin, Vidar \u2014 which stands for for \u201cvideo diffusion for motion reasoning.\u201d Utilizing Vidu\u2019s capabilities in semantic and video understanding, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":12131,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21],"tags":[],"class_list":["post-12129","post","type-post","status-publish","format-standard","has-post-thumbnail","category-robotics"],"_links":{"self":[{"href":"https:\/\/aireviewirush.com\/index.php?rest_route=\/wp\/v2\/posts\/12129","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aireviewirush.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aireviewirush.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aireviewirush.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/aireviewirush.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=12129"}],"version-history":[{"count":1,"href":"https:\/\/aireviewirush.com\/index.php?rest_route=\/wp\/v2\/posts\/12129\/revisions"}],"predecessor-version":[{"id":12130,"href":"https:\/\/aireviewirush.com\/index.php?rest_route=\/wp\/v2\/posts\/12129\/revisions\/12130"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/aireviewirush.com\/index.php?rest_route=\/wp\/v2\/media\/12131"}],"wp:attachment":[{"href":"https:\/\/aireviewirush.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=12129"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aireviewirush.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=12129"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aireviewirush.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=12129"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}