{"id":26069,"date":"2026-04-29T00:17:23","date_gmt":"2026-04-28T15:17:23","guid":{"rendered":"https:\/\/aireviewirush.com\/?p=26069"},"modified":"2026-04-29T00:17:23","modified_gmt":"2026-04-28T15:17:23","slug":"gradient-based-planning-for-world-fashions-at-longer-horizons","status":"publish","type":"post","link":"https:\/\/aireviewirush.com\/?p=26069","title":{"rendered":"Gradient-based planning for world fashions at longer horizons"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div style=\" \">\n<div style=\"display: flex; flex-direction: column; align-items: center; gap: 1em; margin-bottom: 1.5em;\">\n  <img decoding=\"async\" src=\"https:\/\/bair.berkeley.edu\/static\/blog\/grasp\/ballnav_demo.gif\" alt=\"BallNav demo\" style=\"max-width: 60%;\"\/><br \/>\n  <img decoding=\"async\" src=\"https:\/\/bair.berkeley.edu\/static\/blog\/grasp\/pusht_zoomout.gif\" alt=\"Push-T demo\" style=\"max-width: 90%;\"\/>\n<\/div>\n<p><strong>By <a href=\"https:\/\/michael-psenka.github.io\/\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow external noopener noreferrer\">Michael Psenka<\/a>, <a href=\"https:\/\/ai.meta.com\/people\/1148536089838617\/michael-rabbat\/\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow external noopener noreferrer\">Mike Rabbat<\/a>, <a href=\"https:\/\/a1k12.github.io\/\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow external noopener noreferrer\">Aditi Krishnapriyan<\/a>, <a href=\"https:\/\/yann.lecun.com\/\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow external noopener noreferrer\">Yann LeCun<\/a>, <a href=\"https:\/\/amirbar.net\/\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow external noopener noreferrer\">Amir Bar<\/a><\/strong><\/p>\n<p><strong>GRASP<\/strong> is a brand new gradient-based planner for discovered dynamics (a \u201cworld mannequin\u201d) that makes long-horizon planning sensible by (1) lifting the trajectory into digital states so optimization is parallel throughout time, (2) including stochasticity on to the state iterates for exploration, and (3) reshaping gradients so actions get clear indicators whereas we keep away from brittle \u201cstate-input\u201d gradients by means of high-dimensional imaginative and prescient fashions.<\/p>\n<p>Giant, discovered world fashions have gotten more and more succesful. They&#8217;ll predict lengthy sequences of future observations in high-dimensional visible areas and generalize throughout duties in ways in which had been tough to think about a couple of years in the past. As these fashions scale, they begin to look much less like task-specific predictors and extra like general-purpose simulators.<\/p>\n<p>However having a strong predictive mannequin isn&#8217;t the identical as with the ability to use it successfully for management\/studying\/planning. In follow, long-horizon planning with fashionable world fashions stays fragile: optimization turns into ill-conditioned, non-greedy construction creates unhealthy native minima, and high-dimensional latent areas introduce refined failure modes.<\/p>\n<p>On this weblog submit, I describe the issues that motivated this mission and our method to deal with them: why planning with fashionable world fashions might be surprisingly fragile, why lengthy horizons are the actual stress take a look at, and what we modified to make gradient-based planning far more strong.<\/p>\n<hr\/>\n<blockquote>\n<p>This weblog submit discusses work achieved with Mike Rabbat, Aditi Krishnapriyan, Yann LeCun, and Amir Bar (* denotes equal advisorship), the place we suggest GRASP.<\/p>\n<\/blockquote>\n<hr\/>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_53 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title \" >Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\" role=\"button\"><label for=\"item-69f10206ddbde\" ><span class=\"\"><span style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input aria-label=\"Toggle\" aria-label=\"item-69f10206ddbde\"  type=\"checkbox\" id=\"item-69f10206ddbde\"><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/aireviewirush.com\/?p=26069\/#What%E2%80%99s_a_world_mannequin\" title=\"What&#8217;s a world mannequin?\">What&#8217;s a world mannequin?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/aireviewirush.com\/?p=26069\/#Planning_selecting_actions_by_optimizing_by_means_of_the_mannequin\" title=\"Planning: selecting actions by optimizing by means of the mannequin\">Planning: selecting actions by optimizing by means of the mannequin<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/aireviewirush.com\/?p=26069\/#Why_long-horizon_planning_is_tough_even_when_all_the_things_is_differentiable\" title=\"Why long-horizon planning is tough (even when all the things is differentiable)\">Why long-horizon planning is tough (even when all the things is differentiable)<\/a><ul class='ez-toc-list-level-3'><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/aireviewirush.com\/?p=26069\/#1_Lengthy-horizon_rollouts_create_deep_ill-conditioned_computation_graphs\" title=\"1) Lengthy-horizon rollouts create deep, ill-conditioned computation graphs\">1) Lengthy-horizon rollouts create deep, ill-conditioned computation graphs<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/aireviewirush.com\/?p=26069\/#2_The_panorama_is_non-greedy_and_stuffed_with_traps\" title=\"2) The panorama is non-greedy and stuffed with traps\">2) The panorama is non-greedy and stuffed with traps<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/aireviewirush.com\/?p=26069\/#A_protracted-horizon_repair_lifting_the_dynamics_constraint\" title=\"A protracted-horizon repair: lifting the dynamics constraint\">A protracted-horizon repair: lifting the dynamics constraint<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/aireviewirush.com\/?p=26069\/#A_problem_for_deep_learning-based_world_fashions_sensitivity_of_state-input_gradients\" title=\"A problem for deep learning-based world fashions: sensitivity of state-input gradients\">A problem for deep learning-based world fashions: sensitivity of state-input gradients<\/a><ul class='ez-toc-list-level-3'><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/aireviewirush.com\/?p=26069\/#Adversarial_robustness_and_the_%E2%80%9Cdimpled_manifold%E2%80%9D_mannequin\" title=\"Adversarial robustness and the \u201cdimpled manifold\u201d mannequin\">Adversarial robustness and the \u201cdimpled manifold\u201d mannequin<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/aireviewirush.com\/?p=26069\/#Why_is_adversarial_robustness_a_problem_for_world_mannequin_planning\" title=\"Why is adversarial robustness a problem for world mannequin planning?\">Why is adversarial robustness a problem for world mannequin planning?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/aireviewirush.com\/?p=26069\/#Our_repair\" title=\"Our repair\">Our repair<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/aireviewirush.com\/?p=26069\/#GRASP_Gradient_RelAxed_Stochastic_Planner\" title=\"GRASP: Gradient RelAxed Stochastic Planner\">GRASP: Gradient RelAxed Stochastic Planner<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/aireviewirush.com\/?p=26069\/#Ingredient_1_Exploration_by_noising_the_state_iterates\" title=\"Ingredient 1: Exploration by noising the state iterates\">Ingredient 1: Exploration by noising the state iterates<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/aireviewirush.com\/?p=26069\/#Ingredient_2_Reshape_gradients_cease_brittle_state-input_gradients_hold_motion_gradients\" title=\"Ingredient 2: Reshape gradients: cease brittle state-input gradients, hold motion gradients\">Ingredient 2: Reshape gradients: cease brittle state-input gradients, hold motion gradients<\/a><ul class='ez-toc-list-level-3'><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/aireviewirush.com\/?p=26069\/#Dense_aim_shaping\" title=\"Dense aim shaping\">Dense aim shaping<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/aireviewirush.com\/?p=26069\/#Periodic_%E2%80%9Csync%E2%80%9D_briefly_return_to_true_rollout_gradients\" title=\"Periodic \u201csync\u201d: briefly return to true rollout gradients\">Periodic \u201csync\u201d: briefly return to true rollout gradients<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/aireviewirush.com\/?p=26069\/#How_GRASP_addresses_long-range_planning\" title=\"How GRASP addresses long-range planning\">How GRASP addresses long-range planning<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/aireviewirush.com\/?p=26069\/#What%E2%80%99s_subsequent\" title=\"What\u2019s subsequent?\">What\u2019s subsequent?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-18\" href=\"https:\/\/aireviewirush.com\/?p=26069\/#Quotation\" title=\"Quotation\">Quotation<\/a><\/li><\/ul><\/nav><\/div>\n<h2 id=\"what-is-a-world-model\"><span class=\"ez-toc-section\" id=\"What%E2%80%99s_a_world_mannequin\"><\/span>What&#8217;s a world mannequin?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Today, the time period \u201cworld mannequin\u201d is kind of overloaded, and relying on the context can both imply an specific dynamics mannequin or some implicit, dependable inside state {that a} generative mannequin depends on (e.g. when an LLM generates chess strikes, whether or not there&#8217;s some inside illustration of the board). We give our free working definition beneath.<\/p>\n<p>Suppose you are taking actions <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-a3c54d08fc60a73c5415736b15212106_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"a_t in mathcal{A}\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"52\" style=\"vertical-align: -3px;\"\/> and observe states <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-8f69e990aa1d6ebdae714e913745d6fb_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"s_t in mathcal{S}\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"48\" style=\"vertical-align: -3px;\"\/> (pictures, latent vectors, proprioception). A <strong>world mannequin<\/strong> is a discovered mannequin that, given the present state and a sequence of future actions, predicts what is going to occur subsequent. Formally, it defines a predictive distribution on a sequence of noticed states <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-88b4b5b7d2b439238442a7aa5e01cb86_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"s_{t-h:t}\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"41\" style=\"vertical-align: -3px;\"\/> and present motion <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-b5f8b679d5b1ba8241ec391b34717ef0_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"a_t\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"14\" style=\"vertical-align: -3px;\"\/>:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 19px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-95106aed300ef1fe3763ed3e7a4c35a2_l3.png\" height=\"19\" width=\"148\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[P_theta(s_{t+1} mid s_{t-h:t},; a_t)]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>that approximates the setting\u2019s true conditional <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-68d611727a003284884c7b755480587a_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"P(s_{t+1} mid s_{t-h:t},; a_t)\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"143\" style=\"vertical-align: -5px;\"\/>. For this weblog submit, we\u2019ll assume a Markovian mannequin <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-68d611727a003284884c7b755480587a_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"P(s_{t+1} mid s_{t-h:t},; a_t)\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"143\" style=\"vertical-align: -5px;\"\/> for simplicity (all outcomes right here might be prolonged to the extra normal case), and when the mannequin is deterministic it reduces to a map over states:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 19px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-d6e5828483a4880c1cfe7bad44d78973_l3.png\" height=\"19\" width=\"129\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[s_{t+1} = F_theta(s_t, a_t).]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>In follow the state <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-e86795deb37ff5f5055e741b17eb25d7_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"s_t\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"13\" style=\"vertical-align: -3px;\"\/> is usually a discovered latent illustration (e.g., encoded from pixels), so the mannequin operates in a (theoretically) compact, differentiable house. The important thing level is {that a} world mannequin offers you a <em>differentiable simulator<\/em>; you may roll it ahead underneath hypothetical motion sequences and backpropagate by means of the predictions.<\/p>\n<hr\/>\n<h2 id=\"planning-choosing-actions-by-optimizing-through-the-model\"><span class=\"ez-toc-section\" id=\"Planning_selecting_actions_by_optimizing_by_means_of_the_mannequin\"><\/span>Planning: selecting actions by optimizing by means of the mannequin<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Given a begin <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-7730c5a8c9bd4b8291e5231082f6b9c6_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"s_0\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"15\" style=\"vertical-align: -3px;\"\/> and a aim <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-d208fd391fa57c168dc0f151de829fee_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"g\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"9\" style=\"vertical-align: -4px;\"\/>, the only planner chooses an motion sequence <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-17f8a318ee1fb3a1f9da386869647837_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"mathbf{a}=(a_0,dots,a_{T-1})\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"141\" style=\"vertical-align: -5px;\"\/> by rolling out the mannequin and minimizing terminal error:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 28px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-6dce57a108f3a4db376233c1e545dfd6_l3.png\" height=\"28\" width=\"357\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[min_{mathbf{a}} ; | s_T(mathbf{a}) - g |_2^2, quad text{where } s_T(mathbf{a}) = mathcal{F}_{theta}^{T}(s_0,mathbf{a}).]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>Right here we use <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-6b9827b4af48fd65e7e83838bfd859f6_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"mathcal{F}^T\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"25\" style=\"vertical-align: -1px;\"\/> as shorthand for the complete rollout by means of the world mannequin (dependence on mannequin parameters <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-356a08e839ab6974a16448e16e56745d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"theta\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"9\" style=\"vertical-align: 0px;\"\/> is implicit):<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 22px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-a46e0710d3438d8afddd4362bc5e468a_l3.png\" height=\"22\" width=\"389\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[mathcal{F}_{theta}^{T}(s_0, mathbf{a}) = F_theta(F_theta(cdots F_theta(s_0, a_0), cdots, a_{T-2}), a_{T-1}).]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>Briefly horizons and low-dimensional techniques, this could work fairly properly. However as horizons develop and fashions develop into bigger and extra expressive, its weaknesses develop into amplified.<\/p>\n<p>So why doesn\u2019t this simply work at scale?<\/p>\n<hr\/>\n<h2 id=\"why-long-horizon-planning-is-hard-even-when-everything-is-differentiable\"><span class=\"ez-toc-section\" id=\"Why_long-horizon_planning_is_tough_even_when_all_the_things_is_differentiable\"><\/span>Why long-horizon planning is tough (even when all the things is differentiable)<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>There are two separate ache factors for the extra normal world mannequin, plus a 3rd that&#8217;s particular to discovered, deep learning-based fashions.<\/p>\n<h3 id=\"1-long-horizon-rollouts-create-deep-ill-conditioned-computation-graphs\"><span class=\"ez-toc-section\" id=\"1_Lengthy-horizon_rollouts_create_deep_ill-conditioned_computation_graphs\"><\/span>1) Lengthy-horizon rollouts create deep, ill-conditioned computation graphs<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>These accustomed to backprop by means of time (BPTT) could discover that we\u2019re differentiating by means of a mannequin utilized to itself repeatedly, which is able to result in the <strong>exploding\/vanishing gradients<\/strong> downside. Particularly, if we take derivatives (observe we\u2019re differentiating vector-valued features, leading to Jacobians that we denote with <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-b87824c6689dbd809ac8ebfdf787d8df_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"D_x (cdots)\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"60\" style=\"vertical-align: -5px;\"\/>) with respect to earlier actions (e.g. <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-d093ecd207b3f9d9b1c9cf0692b77d01_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"a_0\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"16\" style=\"vertical-align: -3px;\"\/>):<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 52px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-006a676ea89bf4f534c4e46f3822e638_l3.png\" height=\"52\" width=\"372\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[D_{a_0} mathcal{F}_{theta}^{T}(s_0, mathbf{a}) = Bigl(prod_{t=1}^T D_s F_theta(s_t, a_t)Bigr) D_{a_0}F_theta(s_0, a_0).]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>We see that the Jacobian\u2019s conditioning scales exponentially with time <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-f9ed275b0bf1633b7ee83b78fcc28273_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"T\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"13\" style=\"vertical-align: 0px;\"\/>:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 24px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-7954df6601577938b25130705f47a19b_l3.png\" height=\"24\" width=\"312\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[sigma_{text{max\/min}}(D_{a_0}mathcal{F}_{theta}^{T}) sim sigma_{text{max\/min}}(D_s F_theta)^{T-1},]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>resulting in exploding or vanishing gradients.<\/p>\n<h3 id=\"2-the-landscape-is-non-greedy-and-full-of-traps\"><span class=\"ez-toc-section\" id=\"2_The_panorama_is_non-greedy_and_stuffed_with_traps\"><\/span>2) The panorama is non-greedy and stuffed with traps<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>At brief horizons, the grasping answer, the place we transfer straight towards the aim at each step, is usually ok. In the event you solely must plan a couple of steps forward, the optimum trajectory normally doesn\u2019t deviate a lot from \u201chead towards <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-d208fd391fa57c168dc0f151de829fee_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"g\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"9\" style=\"vertical-align: -4px;\"\/>\u201d at every step.<\/p>\n<p>As horizons develop, two issues occur. First, longer duties usually tend to require <em>non-greedy<\/em> conduct: going round a wall, repositioning earlier than pushing, backing as much as take a greater path. And as horizons develop, extra of those non-greedy steps are usually wanted. Second, the optimization house itself scales with horizon: <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-4bbae29ae0131c6238e884beb47ed8bf_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"mathrm{dim}(mathcal{A} times cdots times mathcal{A}) = Tmathrm{dim}(mathcal{A})\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"229\" style=\"vertical-align: -5px;\"\/>, additional increasing the house of native minima for the optimization downside.<\/p>\n<p><figure style=\"text-align: center;\">\n  <img decoding=\"async\" src=\"https:\/\/bair.berkeley.edu\/static\/blog\/grasp\/loss-landscape.jpg\" alt=\"Loss landscape\" style=\"max-width: 80%;\"\/><figcaption><em>Distance to aim alongside the optimum path is non-monotonic, and the ensuing loss panorama might be tough.<\/em><\/figcaption><\/figure>\n<\/p>\n<hr\/>\n<h2 id=\"a-long-horizon-fix-lifting-the-dynamics-constraint\"><span class=\"ez-toc-section\" id=\"A_protracted-horizon_repair_lifting_the_dynamics_constraint\"><\/span>A protracted-horizon repair: lifting the dynamics constraint<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Suppose we deal with the dynamics constraint <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-02d5b78079d86b85e769749919de58d4_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"s_{t+1} = F_{theta}(s_t, a_t)\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"124\" style=\"vertical-align: -5px;\"\/> as a gentle constraint, and we as an alternative optimize the next penalty perform over each actions <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-fe019867f244e7b47fe6ab35584574ab_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"(a_0,ldots,a_{T-1})\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"106\" style=\"vertical-align: -5px;\"\/> and states <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-1f06604430c41599fcbbbbb0372853f4_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"(s_0,ldots,s_T)\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"86\" style=\"vertical-align: -5px;\"\/>:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 52px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-863f11a3d6371cdc4342477c54c6f78f_l3.png\" height=\"52\" width=\"510\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[min_{mathbf{s},mathbf{a}} mathcal{L}(mathbf{s}, mathbf{a}) = sum_{t=0}^{T-1} big|F_theta(s_t,a_t) - s_{t+1}big|_2^2, quad text{with } s_0 text{ fixed and } s_T=g.]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>That is additionally typically known as <em>collocation<\/em> in planning\/robotics literature. Be aware the lifted formulation shares the identical <em>international<\/em> minimizers as the unique rollout goal (each are zero precisely when the trajectory is dynamically possible). However the optimization landscapes are very completely different, and we get two quick advantages:<\/p>\n<ul>\n<li>Every world mannequin analysis <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-7643455d4474b4d26cbf516bd6dbd863_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"F_{theta}(s_t,a_t)\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"69\" style=\"vertical-align: -5px;\"\/> relies upon solely on native variables, so all <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-f9ed275b0bf1633b7ee83b78fcc28273_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"T\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"13\" style=\"vertical-align: 0px;\"\/> phrases might be computed <em>in parallel throughout time<\/em>, leading to an enormous speed-up for longer horizons, and<\/li>\n<li>You not backpropagate by means of a single deep <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-f9ed275b0bf1633b7ee83b78fcc28273_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"T\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"13\" style=\"vertical-align: 0px;\"\/>-step composition to get a studying sign, for the reason that earlier product of Jacobians now splits right into a sum, e.g.:<\/li>\n<\/ul>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 19px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-0ef7c03c999d91f9a56d87d2c2348184_l3.png\" height=\"19\" width=\"203\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[D_{a_0} mathcal{L} = 2(F_theta(s_0, a_0) - s_1).]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>Having the ability to optimize states immediately additionally helps with exploration, as we will briefly navigate by means of unphysical domains to search out the optimum plan:<\/p>\n<p><figure style=\"text-align: center;\">\n  <img decoding=\"async\" src=\"https:\/\/bair.berkeley.edu\/static\/blog\/grasp\/ballnav_demo.gif\" alt=\"Collocation planning in BallNav\" style=\"max-width: 60%;\"\/><figcaption><em>Collocation-based planning permits us to immediately perturb states and discover midpoints extra successfully.<\/em><\/figcaption><\/figure>\n<\/p>\n<p>Nevertheless, lunch isn&#8217;t free. And certainly, particularly for deep learning-based world fashions, there&#8217;s a essential concern that makes the above optimization fairly tough in follow.<\/p>\n<h2 id=\"an-issue-for-deep-learning-based-world-models-sensitivity-of-state-input-gradients\"><span class=\"ez-toc-section\" id=\"A_problem_for_deep_learning-based_world_fashions_sensitivity_of_state-input_gradients\"><\/span>A problem for deep learning-based world fashions: sensitivity of state-input gradients<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The <strong>tl;dr<\/strong> of this part is: immediately optimizing states by means of a deep learning-based <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-c502880341922d69202070782fbc9ab3_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"F_{theta}\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"18\" style=\"vertical-align: -3px;\"\/> is extremely brittle, \u00e0 la <em>adversarial robustness<\/em>. Even in case you prepare your world mannequin in a lower-dimensional state house, the coaching course of for the world mannequin makes unseen state landscapes very sharp, whether or not it&#8217;s an unseen state itself or just a traditional\/orthogonal route to the information manifold.<\/p>\n<h3 id=\"adversarial-robustness-and-the-dimpled-manifold-model\"><span class=\"ez-toc-section\" id=\"Adversarial_robustness_and_the_%E2%80%9Cdimpled_manifold%E2%80%9D_mannequin\"><\/span>Adversarial robustness and the \u201cdimpled manifold\u201d mannequin<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Adversarial robustness initially checked out classification fashions <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-9cd4886e5f6ad5b59898d6a66094ee7b_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"f_theta : mathbb{R}^{wtimes h times c} to mathbb{R}^K\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"144\" style=\"vertical-align: -4px;\"\/>, and confirmed that by following the gradient of a specific logit <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-fd14444ec7e338a8be9f26d1e8fc6e3c_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"nabla f_theta^k\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"32\" style=\"vertical-align: -5px;\"\/> from a base picture <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-ede05c264bba0eda080918aaa09c4658_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"x\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"10\" style=\"vertical-align: 0px;\"\/> (not of sophistication <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-3422b6bb5c160593658b7c39425d9880_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"k\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"9\" style=\"vertical-align: 0px;\"\/>), you didn&#8217;t have to maneuver far alongside <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-135dec93ed3ba20d5a0e6949123f416a_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"x' = x + epsilonnabla f_theta^k\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"110\" style=\"vertical-align: -5px;\"\/> to make <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-6a20c3b1d68ef800563a48d91b7289d5_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"f_theta\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"16\" style=\"vertical-align: -4px;\"\/> classify <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-b5e2f0a82567597c38101d0774b3fa68_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"x'\" title=\"Rendered by QuickLaTeX.com\" height=\"14\" width=\"14\" style=\"vertical-align: 0px;\"\/> as <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-3422b6bb5c160593658b7c39425d9880_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"k\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"9\" style=\"vertical-align: 0px;\"\/> (<a href=\"https:\/\/arxiv.org\/abs\/1312.6199\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow external noopener noreferrer\">Szegedy et al., 2014<\/a>; <a href=\"https:\/\/arxiv.org\/abs\/1412.6572\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow external noopener noreferrer\">Goodfellow et al., 2015<\/a>):<\/p>\n<p><figure style=\"text-align: center;\">\n  <img decoding=\"async\" src=\"https:\/\/bair.berkeley.edu\/static\/blog\/grasp\/adversarial_animated.gif\" alt=\"Adversarial example\" style=\"max-width: 70%;\"\/><figcaption><em>Depiction of the traditional instance from (Goodfellow et al., 2015).<\/em><\/figcaption><\/figure>\n<\/p>\n<p>Later work has painted a geometrical image for what\u2019s happening: for information close to a low-dimensional manifold <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-8b54f3b7741c19693e1e9d187786f082_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"mathcal{M}\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"20\" style=\"vertical-align: -1px;\"\/>, the coaching course of controls conduct in tangential instructions, however doesn&#8217;t regularize conduct in orthogonal instructions, thus resulting in delicate conduct (<a href=\"https:\/\/arxiv.org\/pdf\/1812.00740\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow external noopener noreferrer\">Stutz et al., 2019<\/a>). One other method acknowledged: <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-6a20c3b1d68ef800563a48d91b7289d5_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"f_theta\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"16\" style=\"vertical-align: -4px;\"\/> has an inexpensive Lipschitz fixed when contemplating solely tangential instructions to the information manifold <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-8b54f3b7741c19693e1e9d187786f082_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"mathcal{M}\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"20\" style=\"vertical-align: -1px;\"\/>, however can have very excessive Lipschitz constants in regular instructions. The truth is, it usually advantages the mannequin to be sharper in these regular instructions, so it will possibly match extra sophisticated features extra exactly.<\/p>\n<p><figure style=\"text-align: center;\">\n  <img decoding=\"async\" src=\"https:\/\/bair.berkeley.edu\/static\/blog\/grasp\/manifold_adversarial.gif\" alt=\"Adversarial perturbations leave the data manifold\" style=\"max-width: 70%;\"\/><br \/><\/figure>\n<\/p>\n<p>Because of this, such adversarial examples are extremely widespread even for a single given mannequin. Additional, this isn&#8217;t simply a pc imaginative and prescient phenomenon; adversarial examples additionally seem in LLMs (<a href=\"https:\/\/arxiv.org\/abs\/1908.07125\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow external noopener noreferrer\">Wallace et al., 2019<\/a>) and in RL (<a href=\"https:\/\/arxiv.org\/abs\/1905.10615\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow external noopener noreferrer\">Gleave et al., 2019<\/a>).<\/p>\n<p>Whereas there are strategies to coach for extra adversarially strong fashions, there&#8217;s a identified trade-off between mannequin efficiency and adversarial robustness (<a href=\"https:\/\/arxiv.org\/pdf\/1805.12152\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow external noopener noreferrer\">Tsipras et al., 2019<\/a>): particularly within the presence of many weakly-correlated variables, the mannequin <em>should<\/em> be sharper to attain increased efficiency. Certainly, most fashionable coaching algorithms, whether or not in pc imaginative and prescient or LLMs, don&#8217;t prepare adversarial robustness out. Thus, at the least till deep studying sees a significant regime change, <strong>this can be a downside we\u2019re caught with<\/strong>.<\/p>\n<h3 id=\"why-is-adversarial-robustness-an-issue-for-world-model-planning\"><span class=\"ez-toc-section\" id=\"Why_is_adversarial_robustness_a_problem_for_world_mannequin_planning\"><\/span>Why is adversarial robustness a problem for world mannequin planning?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Contemplate a single element of the dynamics loss we\u2019re optimizing within the lifted state method:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 31px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-c0125bf4e6fc638ffbe447c238df12b1_l3.png\" height=\"31\" width=\"210\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[min_{s_t, a_t, s_{t+1}} |F_theta(s_t, a_t) - s_{t+1}|_2^2]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>Let\u2019s additional deal with simply the bottom state:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 29px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-0dc167be31acb8f939fc7634fc9b3296_l3.png\" height=\"29\" width=\"185\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[min_{s_t} |F_theta(s_t, a_t) - s_{t+1}|_2^2.]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>Since world fashions are usually skilled on state\/motion trajectories <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-56946ab92a181c0c5f0d12c3eccf56e9_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"(s_1, a_1, s_2, a_2, ldots)\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"130\" style=\"vertical-align: -5px;\"\/>, the state-data manifold for <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-c502880341922d69202070782fbc9ab3_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"F_{theta}\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"18\" style=\"vertical-align: -3px;\"\/> has dimensionality bounded by the motion house:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 19px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-fc6404df2a87ce26da2fffd7519bba15_l3.png\" height=\"19\" width=\"267\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[mathrm{dim}(mathcal{M}_s) le mathrm{dim}(mathcal{A}) + 1 + mathrm{dim}(mathcal{R}),]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>the place <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-ab41b2e51b8afd5fbfb9ae4f9fd1e881_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"mathcal{R}\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"15\" style=\"vertical-align: 0px;\"\/> is a few non-obligatory house of augmentations (e.g. translations\/rotations). Thus, we will usually count on <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-4a48166dd2d2d629c4f8d493fbb01868_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"mathrm{dim}(mathcal{M}_s)\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"71\" style=\"vertical-align: -5px;\"\/> to be a lot decrease than <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-d865a64aa254d26f90c3f0c3bd85ac3b_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"mathrm{dim}(mathcal{S})\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"55\" style=\"vertical-align: -5px;\"\/>, and thus: <strong>it is vitally straightforward to search out adversarial examples that hack any state to every other desired state.<\/strong><\/p>\n<p>Because of this, the dynamics optimization<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 52px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-b32171577f474230feb8469c32d3c3e4_l3.png\" height=\"52\" width=\"181\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[sum_{t=0}^{T-1} big|F_theta(s_t,a_t) - s_{t+1}big|_2^2]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>feels extremely \u201csticky,\u201d as the bottom factors <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-e86795deb37ff5f5055e741b17eb25d7_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"s_t\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"13\" style=\"vertical-align: -3px;\"\/> can simply trick <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-c502880341922d69202070782fbc9ab3_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"F_{theta}\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"18\" style=\"vertical-align: -3px;\"\/> into pondering it\u2019s already made its native aim.<sup><a href=\"http:\/\/bair.berkeley.edu\/blog\/2026\/04\/20\/grasp\/#fn1\" id=\"ref1\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow external noopener noreferrer\">1<\/a><\/sup><\/p>\n<p><figure style=\"text-align: center;\">\n  <img decoding=\"async\" src=\"https:\/\/bair.berkeley.edu\/static\/blog\/grasp\/pusht_adversarial.gif\" alt=\"Adversarial world model example\" style=\"max-width: 70%;\"\/><br \/><\/figure>\n<\/p>\n<hr\/>\n<div id=\"fn1\" style=\"font-size: 0.88em; margin: 0.75em 0; padding-left: 1em; border-left: 3px solid #ddd; color: #5f5f5f;\">\n<p><strong>1.<\/strong> This adversarial robustness concern, whereas significantly unhealthy for lifted-state approaches, isn&#8217;t distinctive to them. Even for serial optimization strategies that optimize by means of the complete rollout map <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-6b9827b4af48fd65e7e83838bfd859f6_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"mathcal{F}^T\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"25\" style=\"vertical-align: -1px;\"\/>, it&#8217;s doable to get into unseen states, the place it is vitally straightforward to have a traditional element fed into the delicate regular parts of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-0ff50bc1113354825aa0a8ee24984eeb_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"D_s F_{theta}\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"40\" style=\"vertical-align: -3px;\"\/>. The motion Jacobian\u2019s chain rule growth is<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 52px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-3ce71c40a5f7f6ce0dc00795502ef2c8_l3.png\" height=\"52\" width=\"243\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[Bigl(prod_{t=1}^T D_s F_theta(s_t, a_t)Bigr) D_{a_0}F_theta(s_0, a_0).]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>See what occurs if any stage of the product has any element regular to the information manifold. <a href=\"http:\/\/bair.berkeley.edu\/blog\/2026\/04\/20\/grasp\/#ref1\" style=\"color: #4d6b92;\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow external noopener noreferrer\">\u21a9<\/a><\/p>\n<\/div>\n<hr\/>\n<h3 id=\"our-fix\"><span class=\"ez-toc-section\" id=\"Our_repair\"><\/span>Our repair<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>That is the place our new planner GRASP is available in. The principle commentary: whereas <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-0ff50bc1113354825aa0a8ee24984eeb_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"D_s F_{theta}\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"40\" style=\"vertical-align: -3px;\"\/> is untrustworthy and adversarial, the motion house is normally low-dimensional and exhaustively skilled, so <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-c2f1132e9862e1f0b7226021ef92e3e4_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"D_a F_{theta}\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"41\" style=\"vertical-align: -3px;\"\/> is definitely cheap to optimize by means of and doesn\u2019t undergo from the adversarial robustness concern!<\/p>\n<p><figure style=\"text-align: center;\">\n  <img decoding=\"async\" src=\"https:\/\/bair.berkeley.edu\/static\/blog\/grasp\/network_diagram.jpg\" alt=\"Network diagram showing high-dim state vs low-dim action\" style=\"max-width: 65%;\"\/><figcaption><em>The motion enter is normally lower-dimensional and densely skilled (the mannequin has seen each motion route), so motion gradients are significantly better behaved.<\/em><\/figcaption><\/figure>\n<\/p>\n<p>At its core, <strong>GRASP builds a first-order lifted state \/ collocation-based planner that&#8217;s solely depending on motion Jacobians by means of the world mannequin.<\/strong> We thus exploit the differentiability of discovered world fashions <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-c502880341922d69202070782fbc9ab3_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"F_{theta}\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"18\" style=\"vertical-align: -3px;\"\/>, whereas not falling sufferer to the inherent sensitivity of the state Jacobians <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-0ff50bc1113354825aa0a8ee24984eeb_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"D_s F_{theta}\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"40\" style=\"vertical-align: -3px;\"\/>.<\/p>\n<h2 id=\"grasp-gradient-relaxed-stochastic-planner\"><span class=\"ez-toc-section\" id=\"GRASP_Gradient_RelAxed_Stochastic_Planner\"><\/span>GRASP: Gradient <strong>RelAxed<\/strong> <strong>S<\/strong>tochastic <strong>P<\/strong>lanner<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>As famous earlier than, we begin with the collocation planning goal, the place we raise the states and loosen up dynamics right into a penalty:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 52px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-863f11a3d6371cdc4342477c54c6f78f_l3.png\" height=\"52\" width=\"510\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[min_{mathbf{s},mathbf{a}} mathcal{L}(mathbf{s}, mathbf{a}) = sum_{t=0}^{T-1} big|F_theta(s_t,a_t) - s_{t+1}big|_2^2, quad text{with } s_0 text{ fixed and } s_T=g.]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>We then make two key additions.<\/p>\n<h2 id=\"ingredient-1-exploration-by-noising-the-state-iterates\"><span class=\"ez-toc-section\" id=\"Ingredient_1_Exploration_by_noising_the_state_iterates\"><\/span>Ingredient 1: Exploration by noising the <strong>state iterates<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Even with a smoother goal, planning is nonconvex. We introduce exploration by injecting Gaussian noise into the <strong>digital state updates<\/strong> throughout optimization.<\/p>\n<p>A easy model:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 19px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-75ff21e0a372181401e643f6a15de711_l3.png\" height=\"19\" width=\"339\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[s_t leftarrow s_t - eta_s nabla_{s_t}mathcal{L} + sigma_{text{state}} xi, qquad xisimmathcal{N}(0,I).]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>Actions are nonetheless up to date by non-stochastic descent:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 16px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-cba1022bc8ad5d65da23bfac98316acb_l3.png\" height=\"16\" width=\"141\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[a_t leftarrow a_t - eta_a nabla_{a_t}mathcal{L}.]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>The state noise helps you \u201chop\u201d between basins within the lifted house, whereas the actions stay guided by gradients. We discovered that particularly noising states right here (versus actions) finds an excellent stability of exploration and the power to search out sharper minima.<sup><a href=\"http:\/\/bair.berkeley.edu\/blog\/2026\/04\/20\/grasp\/#fn2\" id=\"ref2\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow external noopener noreferrer\">2<\/a><\/sup><\/p>\n<hr\/>\n<div id=\"fn2\" style=\"font-size: 0.88em; margin: 0.75em 0; padding-left: 1em; border-left: 3px solid #ddd; color: #5f5f5f;\">\n<p><strong>2.<\/strong> As a result of we solely noise the states (and never the actions), the corresponding dynamics are usually not really Langevin dynamics. <a href=\"http:\/\/bair.berkeley.edu\/blog\/2026\/04\/20\/grasp\/#ref2\" style=\"color: #4d6b92;\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow external noopener noreferrer\">\u21a9<\/a><\/p>\n<\/div>\n<hr\/>\n<h2 id=\"ingredient-2-reshape-gradients-stop-brittle-state-input-gradients-keep-action-gradients\"><span class=\"ez-toc-section\" id=\"Ingredient_2_Reshape_gradients_cease_brittle_state-input_gradients_hold_motion_gradients\"><\/span>Ingredient 2: Reshape gradients: cease brittle state-input gradients, hold motion gradients<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>As mentioned, the delicate pathway is the gradient that flows <em>into the state enter<\/em> of the world mannequin, <span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-0ff50bc1113354825aa0a8ee24984eeb_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"D_s F_{theta}\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"40\" style=\"vertical-align: -3px;\"\/><\/span>. Probably the most simple method to do that initially is to simply cease state gradients into <span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-c502880341922d69202070782fbc9ab3_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"F_{theta}\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"18\" style=\"vertical-align: -3px;\"\/><\/span> immediately:<\/p>\n<ul>\n<li>Let <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-d6a08548d66ef341e02d54f6937ad037_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"bar{s}_t\" title=\"Rendered by QuickLaTeX.com\" height=\"14\" width=\"13\" style=\"vertical-align: -3px;\"\/> be the identical worth as <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-e86795deb37ff5f5055e741b17eb25d7_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"s_t\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"13\" style=\"vertical-align: -3px;\"\/>, however with gradients stopped.<\/li>\n<\/ul>\n<p>Outline the <strong>stop-gradient dynamics loss<\/strong>:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 52px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-cd8e96c9d63f8a87db71c295647be60c_l3.png\" height=\"52\" width=\"284\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[mathcal{L}_{text{dyn}}^{text{sg}}(mathbf{s},mathbf{a}) = sum_{t=0}^{T-1} big|F_theta(bar{s}_t, a_t) - s_{t+1}big|_2^2.]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>This alone doesn&#8217;t work. Discover now states solely comply with the earlier state\u2019s step, with out something forcing the bottom states to chase the following ones. Because of this, there are trivial minima for simply stopping on the origin, then just for the ultimate motion making an attempt to get to the aim in a single step.<\/p>\n<h3 id=\"dense-goal-shaping\"><span class=\"ez-toc-section\" id=\"Dense_aim_shaping\"><\/span>Dense aim shaping<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>We are able to view the above concern because the aim\u2019s sign being lower off fully from earlier states. One strategy to repair that is to easily add a dense aim time period all through prediction:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 52px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-8bf9d85259fdc485efc9660032128d29_l3.png\" height=\"52\" width=\"263\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[mathcal{L}_{text{goal}}^{text{sg}}(mathbf{s},mathbf{a}) = sum_{t=0}^{T-1} big|F_theta(bar{s}_t, a_t) - gbig|_2^2.]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>In regular settings this could over-bias in the direction of the grasping answer of straight chasing the aim, however that is balanced in our setting by the stop-gradient dynamics loss\u2019s bias in the direction of possible dynamics. The ultimate goal is then as follows:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 24px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-d1f4f133db4d9c011c84ac420e31def9_l3.png\" height=\"24\" width=\"267\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[mathcal{L}(mathbf{s},mathbf{a}) = mathcal{L}_{text{dyn}}^{text{sg}}(mathbf{s},mathbf{a}) + gamma , mathcal{L}_{text{goal}}^{text{sg}}(mathbf{s},mathbf{a}).]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>The result&#8217;s a planning optimization goal that doesn&#8217;t have dependence on state gradients.<\/p>\n<hr\/>\n<h2 id=\"periodic-sync-briefly-return-to-true-rollout-gradients\"><span class=\"ez-toc-section\" id=\"Periodic_%E2%80%9Csync%E2%80%9D_briefly_return_to_true_rollout_gradients\"><\/span>Periodic \u201csync\u201d: briefly return to true rollout gradients<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The lifted stop-gradient goal is nice for <strong>quick, guided exploration<\/strong>, however it\u2019s nonetheless an approximation of the unique serial rollout goal.<\/p>\n<p>So each <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-0efc4f0539590861f708da095a045355_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"K_{text{sync}}\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"41\" style=\"vertical-align: -6px;\"\/> iterations, GRASP does a brief refinement section:<\/p>\n<ol>\n<li>Roll out from <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-7730c5a8c9bd4b8291e5231082f6b9c6_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"s_0\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"15\" style=\"vertical-align: -3px;\"\/> utilizing present actions <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-91d6e96c2498d9d1364efcb44c9f1efe_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"mathbf{a}\" title=\"Rendered by QuickLaTeX.com\" height=\"9\" width=\"10\" style=\"vertical-align: -1px;\"\/>, and take a couple of small gradient steps on the unique serial loss:<\/li>\n<\/ol>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 22px;\"><span class=\"ql-right-eqno\"> \u00a0 <\/span><span class=\"ql-left-eqno\"> \u00a0 <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/ql-cache\/quicklatex.com-c542432dde5b7f9184dd1c220097eea2_l3.png\" height=\"22\" width=\"237\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"[mathbf{a} leftarrow mathbf{a} - eta_{text{sync}},nabla_{mathbf{a}},|s_T(mathbf{a})-g|_2^2.]\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>The lifted-state optimization nonetheless offers the core of the optimization, whereas this refinement step provides some help to maintain states and actions grounded in the direction of actual trajectories. This refinement step can after all get replaced with a serial planner of your selection (e.g. CEM); the core concept is to nonetheless get a number of the good thing about the full-path synchronization of serial planners, whereas nonetheless largely utilizing the advantages of the lifted-state planning.<\/p>\n<hr\/>\n<h2 id=\"how-grasp-addresses-long-range-planning\"><span class=\"ez-toc-section\" id=\"How_GRASP_addresses_long-range_planning\"><\/span>How GRASP addresses long-range planning<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Collocation-based planners provide a pure repair for long-horizon planning, however this optimization is kind of tough by means of fashionable world fashions attributable to adversarial robustness points. <em>GRASP proposes a easy answer for a smoother collocation-based planner, alongside steady stochasticity for exploration<\/em>. Because of this, longer-horizon planning finally ends up not solely succeeding extra, but in addition discovering such successes sooner:<\/p>\n<p><figure style=\"text-align: center; margin: 1.25em 0;\">\n  <img decoding=\"async\" src=\"https:\/\/bair.berkeley.edu\/static\/blog\/grasp\/pusht_zoomout.gif\" alt=\"Push-T planning demo\" style=\"max-width: 90%; height: auto;\"\/><figcaption style=\"font-size: 0.95em; margin-top: 0.5em;\"><em>Push-T demo: longer-horizon planning with GRASP.<\/em><\/figcaption><\/figure>\n<\/p>\n<div class=\"grasp-results-table\" style=\"overflow-x: auto; margin: 1em 0;\">\n<table>\n<thead>\n<tr>\n<th>Horizon<\/th>\n<th>CEM<\/th>\n<th>GD<\/th>\n<th>LatCo<\/th>\n<th><strong>GRASP<\/strong><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>H=40<\/td>\n<td><strong>61.4%<\/strong> \/ 35.3s<\/td>\n<td>51.0% \/ 18.0s<\/td>\n<td>15.0% \/ 598.0s<\/td>\n<td>59.0% \/ <strong>8.5s<\/strong><\/td>\n<\/tr>\n<tr>\n<td>H=50<\/td>\n<td>30.2% \/ 96.2s<\/td>\n<td>37.6% \/ 76.3s<\/td>\n<td>4.2% \/ 1114.7s<\/td>\n<td><strong>43.4%<\/strong> \/ <strong>15.2s<\/strong><\/td>\n<\/tr>\n<tr>\n<td>H=60<\/td>\n<td>7.2% \/ 83.1s<\/td>\n<td>16.4% \/ 146.5s<\/td>\n<td>2.0% \/ 231.5s<\/td>\n<td><strong>26.2%<\/strong> \/ <strong>49.1s<\/strong><\/td>\n<\/tr>\n<tr>\n<td>H=70<\/td>\n<td>7.8% \/ 156.1s<\/td>\n<td>12.0% \/ 103.1s<\/td>\n<td>0.0% \/ \u2014<\/td>\n<td><strong>16.0%<\/strong> \/ <strong>79.9s<\/strong><\/td>\n<\/tr>\n<tr>\n<td>H=80<\/td>\n<td>2.8% \/ 132.2s<\/td>\n<td>6.4% \/ 161.3s<\/td>\n<td>0.0% \/ \u2014<\/td>\n<td><strong>10.4%<\/strong> \/ <strong>58.9s<\/strong><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p style=\"text-align: center; margin-top: 0.75em;\"><em>Push-T outcomes. Success price (%) \/ median time to success. Daring = greatest in row. Be aware the median success time will bias increased with increased success price; GRASP manages to be sooner regardless of increased success price.<\/em><\/p>\n<hr\/>\n<h2 id=\"whats-next\"><span class=\"ez-toc-section\" id=\"What%E2%80%99s_subsequent\"><\/span>What\u2019s subsequent?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>There may be nonetheless loads of work to be achieved for contemporary world mannequin planners. We need to exploit the gradient construction of discovered world fashions, and collocation (lifted-state optimization) is a pure method for long-horizon planning, however it\u2019s essential to grasp typical gradient construction right here: easy and informative motion gradients and brittle state gradients. We view GRASP as an preliminary iteration for such planners.<\/p>\n<p>Extension to diffusion-based world fashions (deeper latent timesteps might be considered as smoothed variations of the world mannequin itself), extra subtle optimizers and noising methods, and integrating GRASP into both a closed-loop system or RL coverage studying for adaptive long-horizon planning are all pure and attention-grabbing subsequent steps.<\/p>\n<p>I do genuinely assume it\u2019s an thrilling time to be engaged on world mannequin planners. It\u2019s a humorous candy spot the place the background literature (planning and management general) is extremely mature and well-developed, however the present setting (pure planning optimization over fashionable, large-scale world fashions) remains to be closely underexplored. However, as soon as we work out all the proper concepts, world mannequin planners will probably develop into as commonplace as RL.<\/p>\n<hr\/>\n<p>For extra particulars, learn the <a href=\"https:\/\/arxiv.org\/pdf\/2602.00475\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow external noopener noreferrer\">full paper<\/a> or go to the <a href=\"https:\/\/www.michaelpsenka.io\/grasp\/\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow external noopener noreferrer\">mission web site<\/a>.<\/p>\n<hr\/>\n<h2 id=\"citation\"><span class=\"ez-toc-section\" id=\"Quotation\"><\/span>Quotation<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<div class=\"language-bibtex highlighter-rouge\">\n<div class=\"highlight\">\n<pre class=\"highlight\"><code><span class=\"nc\">@article<\/span><span class=\"p\">{<\/span><span class=\"nl\">psenka2026grasp<\/span><span class=\"p\">,<\/span>&#13;\n  <span class=\"na\">title<\/span><span class=\"p\">=<\/span><span class=\"s\">{Parallel Stochastic Gradient-Primarily based Planning for World Fashions}<\/span><span class=\"p\">,<\/span>&#13;\n  <span class=\"na\">writer<\/span><span class=\"p\">=<\/span><span class=\"s\">{Michael Psenka and Michael Rabbat and Aditi Krishnapriyan and Yann LeCun and Amir Bar}<\/span><span class=\"p\">,<\/span>&#13;\n  <span class=\"na\">12 months<\/span><span class=\"p\">=<\/span><span class=\"s\">{2026}<\/span><span class=\"p\">,<\/span>&#13;\n  <span class=\"na\">eprint<\/span><span class=\"p\">=<\/span><span class=\"s\">{2602.00475}<\/span><span class=\"p\">,<\/span>&#13;\n  <span class=\"na\">archivePrefix<\/span><span class=\"p\">=<\/span><span class=\"s\">{arXiv}<\/span><span class=\"p\">,<\/span>&#13;\n  <span class=\"na\">primaryClass<\/span><span class=\"p\">=<\/span><span class=\"s\">{cs.LG}<\/span><span class=\"p\">,<\/span>&#13;\n  <span class=\"na\">url<\/span><span class=\"p\">=<\/span><span class=\"s\">{https:\/\/arxiv.org\/abs\/2602.00475}<\/span>&#13;\n<span class=\"p\">}<\/span>&#13;\n<\/code><\/pre>\n<\/div>\n<\/div>\n<hr\/>\n<p>This text was initially revealed on the <a href=\"https:\/\/bair.berkeley.edu\/blog\/2026\/04\/20\/grasp\/\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow external noopener noreferrer\">BAIR weblog<\/a>, and seems right here with the authors\u2019 permission.<\/p>\n<hr class=\"xh2\"\/>\n<div class=\"pdxv  printhide\">\n<p><a href=\"https:\/\/robohub.org\/author\/bairblog\" data-wpel-link=\"internal\" target=\"_blank\" rel=\"noopener\">&#13;<br \/>\n<img decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/uploads\/2017\/11\/BAIR-220x220.jpeg\" class=\"grayscale0\" style=\"float:left; margin: 0px 1.5em 0em 0px;  width:100px  \" alt=\"\">&#13;<br \/>\n<\/a><\/p>\n<div class=\"minitext xh75 pdx\" style=\"min-height: 100px; \">\n<p><a href=\"https:\/\/robohub.org\/author\/bairblog\" data-wpel-link=\"internal\" target=\"_blank\" rel=\"noopener\">BAIR Weblog<\/a><br \/>\nis the official weblog of the Berkeley Synthetic Intelligence Analysis (BAIR) Lab. <\/p>\n<\/div>\n<\/div>\n<div class=\"pdxv  printshow\">\n<p><img decoding=\"async\" src=\"https:\/\/robohub.org\/wp-content\/uploads\/2017\/11\/BAIR-220x220.jpeg\" class=\"grayscale\" style=\"float:left; margin: 0px 1.5em 0em 0px;  width:100px  \" alt=\"\"><\/p>\n<p>     &#13;<br \/>\nBAIR Weblog &#13;<br \/>\nis the official weblog of the Berkeley Synthetic Intelligence Analysis (BAIR) Lab. &#13;<br \/>\n    &#13;\n<\/p>\n<\/div>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>By Michael Psenka, Mike Rabbat, Aditi Krishnapriyan, Yann LeCun, Amir Bar GRASP is a brand new gradient-based planner for discovered dynamics (a \u201cworld mannequin\u201d) that makes long-horizon planning sensible by (1) lifting the trajectory into digital states so optimization is parallel throughout time, (2) including stochasticity on to the state iterates for exploration, and (3) [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":26071,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21],"tags":[],"class_list":{"0":"post-26069","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-robotics"},"_links":{"self":[{"href":"https:\/\/aireviewirush.com\/index.php?rest_route=\/wp\/v2\/posts\/26069","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aireviewirush.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aireviewirush.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aireviewirush.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/aireviewirush.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=26069"}],"version-history":[{"count":1,"href":"https:\/\/aireviewirush.com\/index.php?rest_route=\/wp\/v2\/posts\/26069\/revisions"}],"predecessor-version":[{"id":26070,"href":"https:\/\/aireviewirush.com\/index.php?rest_route=\/wp\/v2\/posts\/26069\/revisions\/26070"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/aireviewirush.com\/index.php?rest_route=\/wp\/v2\/media\/26071"}],"wp:attachment":[{"href":"https:\/\/aireviewirush.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=26069"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aireviewirush.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=26069"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aireviewirush.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=26069"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}