Optimization with Offline Reinforcement Learning

Posted by

We showed that when you are early in your digitalization journey where you only have access to manipulated variables (e.g. sugar feed rate) and the outcome (e.g. yield), you can achieve about 7% improvement with Bayesian Optimization. And in this blog post, we are going to show that a better result can be achieved if you have access to more real-time measurements. 

For the penicillin fermentation simulation, in addition to real-time sensor measurements like pH and temperature, we also have access to RAMAN spectroscopy. RAMAN spectroscopy can be used to accurately identify penicillin concentration in real-time (compared to offline discrete samples). And concentration maps directly to yield. In other words, the additional information we can obtain is process parameters like pH and temperature as well as the estimated yield during the batch. 

Machine Learning thrives when it is provided more and better data. The experiment condition is set up the same way as in the Bayesian Optimization one for DoE. Namely, we have a budget of 10 batches and the goal is to come up with the best recipe. While Bayesian Optimization iteratively optimizes by taking feedbacks from every batch, we simply just run all the 10 batches randomly here for the sake of collecting data.  

Two offline reinforcement learning approaches are applied here. The first approach is inspired by conservative Q learning and MOPO, and let’s just call it offline Q learning (OQL) for lack of a better term. The basic idea is to reduce the overestimation of Q values while being optimistic on extrapolation. The second approach is inspired by Decision Transformer, where it treats this as a conditional sequence modeling (CSM) problem. 

Below we show the improvement comparison of the three approaches.  

It is quite tricky to do hyperparameter tuning for offline reinforcement learning algorithms due to lack of robust offline evaluation methods. And the results are shown with hyperparameters chosen either the same as the papers suggest or by our experience. 

Let’s look at the recipes proposed by them. 

The plots suggest the learned recipes are not very similar for the three approaches. It would be very interesting for SMEs to study these recipes and potentially improve further! 

If you are interested in applying Machine Learning to optimize your processes or would like to learn how Quartic does that, please feel free to reach out!