Deploying Small Language Fashions at Scale with AWS IoT Greengrass and Strands Brokers

January 12, 2026

61

Trendy producers face an more and more complicated problem: implementing clever decision-making techniques that reply to real-time operational knowledge whereas sustaining safety and efficiency requirements. The quantity of sensor knowledge and operational complexity calls for AI-powered options that course of info regionally for speedy responses whereas leveraging cloud sources for complicated duties.The trade is at a crucial juncture the place edge computing and AI converge. Small Language Fashions (SLMs) are light-weight sufficient to run on constrained GPU {hardware} but highly effective sufficient to ship context-aware insights. Not like Massive Language Fashions (LLMs), SLMs match throughout the energy and thermal limits of business PCs or gateways, making them best for manufacturing facility environments the place sources are restricted and reliability is paramount. For the aim of this weblog put up, assume a SLM has roughly 3 to fifteen billion parameters.

This weblog focuses on Open Platform Communications Unified Structure (OPC-UA) as a consultant manufacturing protocol. OPC-UA servers present standardized, real-time machine knowledge that SLMs working on the edge can devour, enabling operators to question gear standing, interpret telemetry, or entry documentation immediately—even with out cloud connectivity.

AWS IoT Greengrass permits this hybrid sample by deploying SLMs along with AWS Lambda features on to OPC-UA gateways. Native inference ensures responsiveness for safety-critical duties, whereas the cloud handles fleet-wide analytics, multi-site optimization, or mannequin retraining beneath stronger safety controls.

This hybrid method opens potentialities throughout industries. Automakers might run SLMs in car compute models for pure voice instructions and enhanced driving expertise. Vitality suppliers might course of SCADA sensor knowledge regionally in substations. In gaming, SLMs might run on gamers’ gadgets to energy companion AI in video games. Past manufacturing, greater training establishments might use SLMs to offer customized studying, proofreading, analysis help and content material technology.

On this weblog, we are going to have a look at deploy SLMs to the sting seamlessly and at scale utilizing AWS IoT Greengrass.

The answer makes use of AWS IoT Greengrass to deploy and handle SLMs on edge gadgets, with Strands Brokers offering native agent capabilities. The companies used embody:

AWS IoT Greengrass: An open-source edge software program and cloud service that allows you to deploy, handle and monitor system software program.
AWS IoT Core: Service enabling you to attach IoT gadgets to AWS cloud.
Amazon Easy Storage Service (S3): A extremely scalable object storage which helps you to to retailer and retrieve any quantity of knowledge.
Strands Brokers: A light-weight Python framework for working multi-agent techniques utilizing cloud and native inference.

We exhibit the agent capabilities within the code pattern utilizing an industrial automation situation. We offer an OPC-UA simulator which defines a manufacturing facility consisting of an oven and a conveyor belt in addition to upkeep runbooks because the supply of the commercial knowledge. This answer might be prolonged to different use instances by utilizing different agentic instruments.The next diagram exhibits the high-level structure:

AWS IoT Greengrass workflow for edge-based language model deployment using Strands Agents and Ollama

Person uploads a mannequin file in GPT-Generated Unified Format (GGUF) format to an Amazon S3 bucket which AWS IoT Greengrass gadgets have entry to.
The gadgets within the fleet obtain a file obtain job. S3FileDownloader part processes this job and downloads the mannequin file to the system from the S3 bucket. The S3FileDownloader part can deal with giant file sizes, sometimes wanted for SLM mannequin recordsdata that exceed the native Greengrass part artifact dimension limits.
The mannequin file in GGUF format is loaded into Ollama when Strands Brokers part makes the primary name to Ollama. GGUF is a binary file format used for storing LLMs. Ollama is a software program which masses the GGUF mannequin file and runs inference. The mannequin identify is specified within the recipe.yaml file of the part.
The consumer sends a question to the native agent by publishing a payload to a tool particular agent subject in AWS IoT MQTT dealer.
After receiving the question, the part leverages the Strands Brokers SDK‘s model-agnostic orchestration capabilities. The Orchestrator Agent perceives the question, causes concerning the required info sources, and acts by calling the suitable specialised brokers (Documentation Agent, OPC-UA Agent, or each) to assemble complete knowledge earlier than formulating a response.
If the question is said to an info that may be discovered within the documentation, Orchestrator Agent calls Documentation Agent.
Documentation Agent finds the data from the offered paperwork and returns it to Orchestrator Agent.
If the question is said to present or historic machine knowledge, Orchestrator Agent will name OPC-UA Agent.
OPC-UA Agent makes a question to the OPC-UA server relying on the consumer question and returns the info from server to Orchestrator Agent.
Orchestrator Agent types a response based mostly on the collected info. Strands Brokers part publishes the response to a tool particular agent response subject in AWS IoT MQTT dealer.
The Strands Brokers SDK permits the system to work with regionally deployed basis fashions by means of Ollama on the edge, whereas sustaining the choice to change to cloud-based fashions like these in Amazon Bedrock when connectivity is obtainable.
AWS IAM Greengrass service position gives entry to the S3 useful resource bucket to obtain fashions to the system.
AWS IoT certificates hooked up to the IoT factor permits Strands Brokers part to obtain and publish MQTT payloads to AWS IoT Core.
Greengrass part logs the part operation to the native file system. Optionally, AWS CloudWatch logs might be enabled to watch the part operation within the CloudWatch console.

Earlier than beginning this walkthrough, guarantee you’ve got:

On this put up, you’ll:

Deploy Strands Brokers as an AWS IoT Greengrass part.
Obtain SLMs to edge gadgets.
Check the deployed agent.

Table of Contents

Part deployment

First, let’s deploy the StrandsAgentGreengrass part to your edge system.Clone the Strands Brokers repository:

git clone https://github.com/aws-solutions-library-samples/guidance-for-deploying-ai-agents-to-device-fleets-using-aws-iot-greengrass.git
cd guidance-for-deploying-ai-agents-to-device-fleets-using-aws-iot-greengrass

Use Greengrass Improvement Equipment (GDK) to construct and publish the part:

To publish the part, you want to modify the area and bucket values in gdk-config.json file. The really useful artifact bucket worth is greengrass-artifacts. GDK will generate a bucket in greengrass-artifacts-<area>-<account-id> format, if it doesn’t exist already. You possibly can discuss with Greengrass Improvement Equipment CLI configuration file documentation for extra info. After modifying the bucket and area values, run the next instructions to construct and publish the part.

gdk part construct
gdk part publish

The part will seem within the AWS IoT Greengrass Parts Console. You possibly can discuss with Deploy your part documentation to deploy the part to your gadgets.

After the deployment, the part will run on the system. It consists of Strands Brokers, an OPC-UA simulation server and pattern documentation. Strands Brokers makes use of Ollama server because the SLM inference engine. The part has OPC-UA and documentation instruments to retrieve the simulated real-time knowledge and pattern gear manuals for use by the agent.

If you wish to check the part in an Amazon EC2 occasion, you need to use IoTResources.yaml Amazon CloudFormation template to deploy a GPU occasion with vital software program put in. This template additionally creates sources for working Greengrass. After the deployment of the stack, a Greengrass Core system will seem within the AWS IoT Greengrass console. The CloudFormation stack might be discovered beneath supply/cfn folder within the repository. You possibly can learn deploy a CloudFormation stack in Create a stack from the CloudFormation console documentation.

Downloading the mannequin file

The part wants a mannequin file in GGUF format for use by Ollama because the SLM. It is advisable copy the mannequin file beneath /tmp/vacation spot/ folder within the edge system. The mannequin file identify should be mannequin.gguf, for those who use the default ModelGGUFName parameter within the recipe.yaml file of the part.

In the event you don’t have a mannequin file in GGUF format, you possibly can obtain one from Hugging Face, for instance Qwen3-1.7B-GGUF. In a real-world utility, this generally is a fine-tuned mannequin which solves particular enterprise issues in your use case.

(Optionally available) Use S3FileDownloader to obtain mannequin recordsdata

To handle mannequin distribution to edge gadgets at scale, you need to use the S3FileDownloader AWS IoT Greengrass part. This part is especially worthwhile for deploying giant recordsdata in environments with unreliable connectivity, because it helps computerized retry and resume capabilities. For the reason that mannequin recordsdata might be giant, and system connectivity just isn’t dependable in lots of IoT use instances, this part might help you to deploy fashions to your system fleets reliably.

After deploying S3FileDownloader part to your system, you possibly can publish the next payload to issues/<MyThingName>/obtain subject by utilizing AWS IoT MQTT Check Consumer. The file will probably be downloaded from the Amazon S3 bucket and put into /tmp/vacation spot/ folder within the edge system:

{
    "jobId": "filedownload",
    "s3Bucket": "<ModelFileBucket>",
    "key":"mannequin.gguf"
}

In the event you used the CloudFormation template offered within the repository, you need to use the S3 bucket created by this template. Seek advice from the output of the CloudFormation stack deployment to view the identify of the bucket.

Testing the native agent

As soon as the deployment is full and the mannequin is downloaded, we are able to check the agent by means of the AWS IoT Core MQTT Check Consumer. Steps:

Subscribe to issues/<MyThingName>/# subject to view the response of the agent.
Publish a check question to the enter subject issues/<MyThingName>/agent/question:

{
    "question": "What's the standing of the conveyor belt?"
}

It’s best to obtain responses on a number of matters:
1. Ultimate response subject (issues/<MyThingName>/agent/response) which incorporates the ultimate response of the Orchestrator Agent:

{
    "question": "What's the standing of the oven?",
    "response": "The oven is presently working at 802.2°F (barely above the setpoint of 800.0°F), with heating lively...",
    "timestamp": 1757677413.6358254,
    "standing": "success"
}

1. Sub-agent responses (issues/<MyThingName>/agent/subagent) which incorporates the response from middleman brokers corresponding to OPC-UA Agent and Documentation Agent:

{
    "agent": "opc manufacturing facility",
    "question": "Get present oven standing",
    "response": "**Oven Standing Report:**n- **Present Temperature:** 802.2°F...",
    "timestamp": 1757677323.443954
}

The agent will course of your question utilizing the native SLM and supply responses based mostly on each the OPC-UA simulated knowledge and the gear documentation saved regionally.For demonstration functions, we use the AWS IoT Core MQTT check shopper as a simple interface to speak with the native system. In manufacturing, Strands Brokers can run absolutely on the system itself, eliminating the necessity for any cloud interplay.

Monitoring the part

To watch the part’s operation, you possibly can join remotely to your AWS IoT Greengrass system and examine the part logs:

sudo tail -f /greengrass/v2/logs/com.strands.agent.greengrass.log

It will present you the real-time operation of the agent, together with mannequin loading, question processing, and response technology. You possibly can study extra about Greengrass logging system in Monitor AWS IoT Greengrass logs documentation.

Go to AWS IoT Core Greengrass console to delete the sources created on this put up:

Go to Deployments, select the deployment that you just used for deploying the part, then revise the deployment by eradicating the Strands Brokers part.
When you’ve got deployed S3FileDownloader part, you possibly can take away it from the deployment as defined within the earlier step.
Go to Parts, select the Strands Brokers part and select ‘Delete model’ to delete the part.
When you’ve got created S3FileDownloader part, you possibly can delete it as defined within the earlier step.
In the event you deployed the CloudFormation stack to run the demo in an EC2 occasion, delete the stack from AWS CloudFormation console. Word that the EC2 occasion will incur hourly costs till it’s stopped or terminated.
In the event you don’t want the Greengrass core system, you possibly can delete it from Core gadgets part of Greengrass console.
After deleting Greengrass Core system, delete the IoT certificates hooked up to the core factor. To search out the factor certificates, go to AWS IoT Issues console, select the IoT factor created on this information, view the Certificates tab, select the hooked up certificates, select Actions, then select Deactivate and Delete.

On this put up, we confirmed run a SLM regionally utilizing Ollama built-in by means of Strands Brokers on AWS IoT Greengrass. This workflow demonstrated how light-weight AI fashions might be deployed and managed on constrained {hardware} whereas benefiting from cloud integration for scale and monitoring. Utilizing OPC-UA as our manufacturing instance, we highlighted how SLMs on the edge allow operators to question gear standing, interpret telemetry, and entry documentation in actual time—even with restricted connectivity. The hybrid mannequin ensures crucial choices occur regionally, whereas complicated analytics and retraining are dealt with securely within the cloud.This structure might be prolonged to create a hybrid cloud-edge AI agent system, the place edge AI brokers (utilizing AWS IoT Greengrass) seamlessly combine with cloud-based brokers (utilizing Amazon Bedrock). This allows distributed collaboration: edge brokers handle real-time, low-latency processing and speedy actions, whereas cloud brokers deal with complicated reasoning, knowledge analytics, mannequin refinement, and orchestration.

In regards to the authors

Deploying Small Language Fashions at Scale with AWS IoT Greengrass and Strands Brokers 1 Ozan Cihangir is a Senior Prototyping Engineer at AWS Specialists & Companions Group. He helps prospects to construct progressive options for his or her rising know-how tasks within the cloud.

Deploying Small Language Fashions at Scale with AWS IoT Greengrass and Strands Brokers 2 Luis Orus is a senior member of the AWS Specialists & Companions Group, the place he has held a number of roles – from constructing high-performing groups at international scale to serving to prospects innovate and experiment rapidly by means of prototyping.

Deploying Small Language Fashions at Scale with AWS IoT Greengrass and Strands Brokers 3 Amir Majlesi leads the EMEA prototyping workforce inside AWS Specialists & Companions Group. He has in depth expertise in serving to prospects speed up cloud adoption, expedite their path to manufacturing and foster a tradition of innovation. Via speedy prototyping methodologies, Amir permits buyer groups to construct cloud native purposes, with a concentrate on rising applied sciences corresponding to Generative & Agentic AI, Superior Analytics, Serverless and IoT.

Deploying Small Language Fashions at Scale with AWS IoT Greengrass and Strands Brokers 4 Jaime Stewart targeted his Options Architect Internship inside AWS Specialists & Companions Group round Edge Inference with SLMs. Jaime presently pursues a MSc in Synthetic Intelligence.

Deploying Small Language Fashions at Scale with AWS IoT Greengrass and Strands Brokers

Part deployment

Downloading the mannequin file

(Optionally available) Use S3FileDownloader to obtain mannequin recordsdata

Testing the native agent

Monitoring the part

In regards to the authors

Related Articles

Attention-grabbing, However Years Too Late

This dual-purpose gaming setup will get virtually every little thing proper, besides one factor

GeoComm and SkyfireAI Join 911 Information to Drone as First Responder Operations

LEAVE A REPLY Cancel reply

Latest Articles

Attention-grabbing, However Years Too Late

This dual-purpose gaming setup will get virtually every little thing proper, besides one factor

GeoComm and SkyfireAI Join 911 Information to Drone as First Responder Operations

Jackery’s Greatest Energy Station But

Liene PixCut S1 Assessment: This Photograph Printer Additionally Does Deep Cuts