Link Search Menu Expand Document

Using Reinforcement Learning to simulate robot Inverse Kinematics in Unity

Process to develop a reverse kinematic model of a robot arm in given scenarios using reinforcement learning in Unity with the ML-Agents package. The ML training environment for this task is to move the robot arm to reach the target object from any state. The arm must not go below the ground and the robot must not collide with its own parts.

ML unity


Poster

Poster


Workflow development

 
CONTROL:
Robot arm data sheet - Training statistics
 
INPUTS:
Geometrical model of the robot (links, tree structure, etc) - Environment assets (obstacles, targets…) - Requirements on the operation to execute
scenecreation OUTPUT:
.onnx brain able to create a trajectory to reach a given target
 
RESOURCE:
Unity 3D - ML-Agents - PyTorch
 

Workflow building-blocks

Acitivities Overview
A1
Environment set up
Description: Add all the assets that are included in the scene. For the robot, define its structure, the relationship between the different axes and the limits of movement. Additionally, define the position and orientation of the objects in the Unity scene, and ensure that each physical object has a collider or Rigidbody to register collisions. In case of using methods like OnTriggerEnter for collision detection, it is necessary to check isTrigger.
Input: 3D Assets (Robot arm, pick object, etc)
Output: Pick and place scenario
Control: Robot behavior
Resource: 3D CAD Model, URDF File
A2
Agent set up
Description: For this task the robot has two objectives:
1) Avoid collision with the ground and with its own parts.
2) Reach the target object.
To achieve this it is necessary to set the learning parameters:
- Initialization: select the robot and add the BehaviorParameters and DecisionRequester component. Set the maximum number of steps per episode depending on the complexity of the task to be performed by the agent.
- Observations: Create the episodes and at the beginning of each episode place the robot and the target in random positions within a reachable area for the robot. The observations should provide the agent with the necessary information to make informed decisions. This includes the current position and orientation of the robot, the location of the target, among others. These observations are added using the AddObservation() method within the CollectObservations() method.
- Actions: Define the actions that can be performed by the agent. In this case, it is only a rotation of the robot’s axes to reach a given goal. The actions are specified in the OnActionReceived() method.
- Rewards: Build the reward system for the agent, this is crucial to drive the agent’s learning, as it allows to verify if the agent is doing something wanted or not. Within the OnActionReceived() method, after processing the actions, a reward or penalty should be provided using AddReward() or SetReward(). A positive reward should be given when the robot performs desirable actions (e.g., moving towards the target), while penalties (negative rewards) should be used when the robot performs undesirable actions (e.g., moving away from the target or exceeding movement limits).
Decisions and Learning: Configure how and when the agent will make decisions. In the Decision Requester script the decision period is set. This value represents how many simulation steps must be taken before the agent requests a new action. A higher value will allow the agent to learn the consequences of an action for a longer time before requesting the next action.
Input: ML-Agents release 12
Output: Agent set up and C# Scripts
Control: Behavioral expectation
Resource: Unity 3D, ML Documentation, Visual Studio
A3
Training
Description:
The training process with ML-Agents in Unity starts with the configuration of the trainer_config.yaml file. Here are defined the type of trainer, hyperparameters such as batch size, buffer size, learning rate, and the neural network configuration, including the number of hidden units and layers. Before starting the training, a “robot farm” is created, which is basically a scene with multiple agents. This allows the learning process to be accelerated.
To start training, it is necessary to have the ML-Agents Python server installed and activated. This server communicates with Unity to facilitate the training process. During this training process, it is likely that erratic movements of the robot will be observed, but over time, these movements should improve. It is possible to monitor the process through TensorBoard.
Input: trainer_config.yam file, Robot farm and Pyhon Server
Output: Robot behavior
Control: Parameter settings
Resource: TensorBoard, Unity 3D
A4
Results of training
Description: Once the training is completed, the trained neural network model will be saved and ready to be used. A brain folder is created within the results folder in the Unity project directory. Inside the folder is the .onnx file which can be easily dragged and dropped into the Model field of the Behavioral Parameters component of the robot. Additionally, the inference mode must be selected to activate the brain and enable the agent to execute tasks based on its acquired knowledge.
Input: Training results • Output: .onnx brain able to create a trajectory to reach a given target
Control: Achievement of the task, i.e. the robot is able to reach a given target object
Resource: Unity 3D, PyTotch, Visual Studio

Results

A pick and place scenario was developed. Several brains were trained, having effective behaviours relatively to the given requirements. However the movements of the digital twin may lack of precision and efficiency to be used as they are.

Conclusion

The models developed are flexible and can be quickly adapted to minor changes in the scenario. Improvements can be made in the training and treatment of output datas to get more accurate behaviours.


Avatar for me

This project is an amazing way to explore a subject and have the possibility to exchange with international students about the details of our works. Furthermore, I love the idea of developping knowledges that can be useful to other students in the future.


References