Continuing from where I left off, if you haven't read my earlier Deepracer blog post, read it first - click here.
I updated my AWS accounts limits by making a Limit Increase request at Service Quotas and after a day or two I'm able to run G/P instance family on demand - just the minimum 4 cores. However that's all I need, GPU and few cores.
Why Ondemand you could ask, the cost of running training in the Deepracer console - for extended periods of time - is just too much for my wallet. 3.5 USD / hour compared to 0.558 USD/hour (g4dn.xlarge).
So I take a little bit more hassle in setting up the environment, but I get faster iterations (at least feels like it) and cheaper price.
With spot instances you can go even cheaper, but I like to be in control when the instance dies - just don't forget it running.
git clone https://github.com/aws-deepracer-community/deepracer-for-cloud
cd deepracer-for-cloud/bin/; ./prepare.sh
cd deepracer-for-cloud/bin/; ./init.sh -c aws -a gpu
source activate.sh; cd ..
Edit system.env and run.env
Edit your reinforcement function and training parameters.
dr-start-training -q -w
- Only train at slow speed
- Train multiple tracks
- Don't overtrain
I didn't really understand why slow, don't we want it to drive as fast as possible. But here's where real world and virtual are different. The car only has a throttle from 0 to 100, training at lower speed on virtual environment will go a lot faster to start driving around the track. And in real world if you give the car 100% speed, it'll drive max speed of the car anyway.
I started training the model with similar reward functions as what I used on the real world track. Setting race line, and having the car follow that.
Did that for 2-3 tracks, short training cycle, and hop to next track. Maybe 8-12 hours on a model.
Looked like they'd run around the track, maybe around 11s on virtual track.
I had the pleasure of taking the trip to Berlin and Stockholm, both not for Deepracing - presented in Berlin, and manned our company stand in Stockholm.
In Berlin there was a track and I had my models prepared so I could shine in the real world. Or so I thought.
The track was the original track where deepracing started off from.
I tried out multiple models, but at the end had to just believe that my models were nowhere close to running around the track. Maybe too sophisticated approach.
Thanks to DBro for the short but enlightening talk at trackside , and for some additional hints on which way to continue. Networking is the best part of these events.
I had hoped that there would be a track at AWS Summit Stockholm also - but sadly this year that wasn't there.
- Have high entropy for your models
- Discreate action space
- Alternate driving directions
First of all, entropy plays a significant role in balancing exploration and exploitation during training. When referring to setting "high entropy" for real-world models, it means encouraging more exploration during the learning process. In real-world scenarios, track conditions, lighting, and obstacles can vary, and exploration allows the model to learn robust policies that can adapt to different situations.
By encouraging exploration, high entropy helps the model generalize its learned policies to handle novel or unseen situations. It prevents the model from becoming overly specialized to a specific track or set of conditions, making it more adaptable to different environments.
This is also where running alternate tracks, and directions comes to play - to generalize the model. It'll know more than just one optimally lit virtual track that it needs to drive the exact race line.
I also now understand how the action space affects the model and why discrete action space for real world models would make more sense. Need to take a look into how to make an optimal one for real world applications, as its not just math for perfect line on one track.
Discrete Action Space:
- Finite set of predefined actions.
- Precise control and interpretability.
- Larger action space and sparse rewards, requiring more training samples.
Continuous Action Space:
- Continuous range of actions.
- Smooth control and flexibility.
- Requires fewer training samples, but challenges in exploration.
- Infinite possibilities for modeling complex behaviors.
Driving both ways, just gives the model more input, more possibilities to learn. And you never know which way you'll need to run the tracks in the future.
Then just train a bit on the track in question, to make sure your model is working on that exact track.
To really ace models in the physical world, you need to have a track - period. Really hard to come with multiple pretrained models and go through em one by one to find if one would drive ok.
I feel that I didn't have enough time to train my models, even though for physical car I'm told one should only train the models shortly, its not a multi day training session on one track - like the virtual circuits are.
I do have a track and a car available - at our company office. So I'm one of the lucky ones that has the possibility of putting loads of more time, setting it up, calibrating the car, testing it live. This is what I'll write to you guys more about before Reinvent this year - aim to participate there with some new models.