Abstract

Engineering problems that are modeled using sophisticated mathematical methods or are characterized by expensive-to-conduct tests or experiments are encumbered with limited budget or finite computational resources. Moreover, practical scenarios in the industry, impose restrictions, based on logistics and preference, on the manner in which the experiments can be conducted. For example, material supply may enable only a handful of experiments in a single-shot or in the case of computational models one may face significant wait-time based on shared computational resources. In such scenarios, one usually resorts to performing experiments in a manner that allows for maximizing one’s state-of-knowledge while satisfying the above-mentioned practical constraints. Sequential design of experiments (SDOE) is a popular suite of methods that have yielded promising results in recent years across different engineering and practical problems. A common strategy that leverages Bayesian formalism is the Bayesian SDOE, which usually works best in the one-step-ahead or myopic scenario of selecting a single experiment at each step of a sequence of experiments. In this work, we aim to extend the SDOE strategy, to query the experiment or computer code at a batch of inputs. To this end, we leverage deep reinforcement learning (RL)-based policy gradient methods, to propose batches of queries that are selected taking into account the entire budget in hand. The algorithm retains the sequential nature, inherent in the SDOE while incorporating elements of reward based on task from the domain of deep RL. A unique capability of the proposed methodology is its ability to be applied to multiple tasks, for example, optimization of a function, once its trained. We demonstrate the performance of the proposed algorithm on a synthetic problem and a challenging high-dimensional engineering problem.

References

1.
Chernoff
,
H.
,
1972
,
Sequential Analysis and Optimal Design
,
Society for Industrial and Applied Mathematics
,
Philadelphia, PA
.
2.
Bartroff
,
J.
,
Lai
,
T.
, and
Shih
,
M.
,
2012
,
Sequential Experimentation in Clinical Trials: Design and Analysis
,
Springer Science & Business Media
,
Berlin/Heidelberg, Germany
.
3.
Liu
,
X.
,
Ye
,
K.
,
Van Vlijmen
,
H.
,
Emmerich
,
M. T.
,
IJzerman
,
A. P.
, and
van Westen
,
G.
,
2021
, “
Drugex v2: De Novo Design of Drug Molecule by Pareto-Based Multi-Objective Reinforcement Learning in Polypharmacology
,”
J. Cheminform
.
4.
Atkinson
,
A.
,
Donev
,
A.
, and
Tobias
,
R.
,
2007
,
Optimum Experimental Designs, With SAS
, Vol.
34
,
Oxford University Press
,
Oxford, UK
.
5.
Box
,
G. E.
,
1992
,
Sequential Experimentation and Sequential Assembly of Designs
,
Center for Quality and Productivity Improvement, University of Wisconsin-Madison
,
Madison, WI
.
6.
Jones
,
D. R.
,
Schonlau
,
M.
, and
Welch
,
W. J.
,
1998
, “
Efficient Global Optimization of Expensive Black-Box Functions
,”
J. Global Optim.
,
13
(
4
), pp.
455
492
.
7.
Emmerich
,
M. T.
, and
Deutz
,
A. H.
,
2018
, “
A Tutorial on Multiobjective Optimization: Fundamentals and Evolutionary Methods
,”
Nat. Comput.
,
17
(
3
), pp.
585
609
.
8.
Beck
,
J.
,
Dia
,
B. M.
,
Espath
,
L. F.
,
Long
,
Q.
, and
Tempone
,
R.
,
2018
, “
Fast Bayesian Experimental Design: Laplace-Based Importance Sampling for the Expected Information Gain
,”
Comput. Methods Appl. Mech. Eng.
,
334
, pp.
523
553
.
9.
Long
,
Q.
,
Scavino
,
M.
,
Tempone
,
R.
, and
Wang
,
S.
,
2013
, “
Fast Estimation of Expected Information Gains for Bayesian Experimental Designs Based on Laplace Approximations
,”
Comput. Methods Appl. Mech. Eng.
,
259
, pp.
24
39
.
10.
Long
,
Q.
,
Motamed
,
M.
, and
Tempone
,
R.
,
2015
, “
Fast Bayesian Optimal Experimental Design for Seismic Source Inversion
,”
Comput. Methods Appl. Mech. Eng.
,
291
, pp.
123
145
.
11.
Long
,
Q.
,
Scavino
,
M.
,
Tempone
,
R.
, and
Wang
,
S.
,
2015
, “
A Laplace Method for Under-Determined Bayesian Optimal Experimental Designs
,”
Comput. Methods Appl. Mech. Eng.
,
285
, pp.
849
876
.
12.
Deodatis
,
G.
,
Ellingwood
,
B. R.
, and
Frangopol
,
D. M.
,
2014
,
Safety, Reliability, Risk and Life-Cycle Performance of Structures and Infrastructures
,
Informa UK Limited
,
London, UK
, pp.
2203
2207
.
13.
Tsilifis
,
P.
,
Ghanem
,
R. G.
, and
Hajali
,
P.
,
2017
, “
Efficient Bayesian Experimentation Using an Expected Information Gain Lower Bound
,”
SIAM/ASA J. Uncertain. Quantif.
,
5
(
1
), pp.
30
62
.
14.
Ryan
,
K. J.
,
2003
, “
Estimating Expected Information Gains for Experimental Designs With Application to the Random Fatigue-Limit Model
,”
J. Comput. Graph. Stat.
,
12
(
3
), pp.
585
603
.
15.
Hennig
,
P.
, and
Schuler
,
C. J.
,
2012
, “
Entropy Search for Information-Efficient Global Optimization
,”
J. Mach. Learn. Res.
,
13
(
6
), pp.
1809
1837
.
16.
Pandita
,
P.
,
Bilionis
,
I.
, and
Panchal
,
J.
,
2019
, “
Bayesian Optimal Design of Experiments for Inferring the Statistical Expectation of Expensive Black-Box Functions
,”
ASME J. Mech. Des.
,
141
(
10
), p.
101404
.
17.
Lam
,
R.
,
Willcox
,
K.
, and
Wolpert
,
D. H.
,
2016
, “
Bayesian Optimization With a Finite Budget: An Approximate Dynamic Programming Approach
,”
Advances in Neural Information Processing Systems 29 (NIPS 2016)
,
Barcelona, Spain
,
Dec. 5–10
.
18.
Bhaduri
,
A.
, and
Graham-Brady
,
L.
,
2018
, “
An Efficient Adaptive Sparse Grid Collocation Method Through Derivative Estimation
,”
Probab. Eng. Mech.
,
51
, pp.
11
22
.
19.
Bhaduri
,
A.
,
He
,
Y.
,
Shields
,
M. D.
,
Graham-Brady
,
L.
, and
Kirby
,
R. M.
,
2018
, “
Stochastic Collocation Approach With Adaptive Mesh Refinement for Parametric Uncertainty Analysis
,”
J. Comput. Phys.
,
371
, pp.
732
750
.
20.
Blom
,
R. S.
,
Freer
,
J.
,
Robinson
,
D. M.
,
Roychowdhury
,
S.
, and
Mathews, Jr.
,
H. K.
,
2020
, “
Systems and Method for Advanced Additive Manufacturing
,” U.S. Patent No. 10,747,202, Aug. 18.
21.
Roychowdhury
,
S.
,
Chen
,
A.
,
Xiaohu
,
P.
, and
Hershey
,
J. E.
,
2021
, “
Transfer Learning/Dictionary Generation and Usage for Tailored Part Parameter Generation From Coupon Builds
,” U.S. Patent No. 11,079,739, Aug. 3.
22.
Azimi
,
J.
,
Fern
,
A.
, and
Fern
,
X. Z.
,
2010
, “
Batch Bayesian Optimization Via Simulation Matching
,”
Advances in Neural Information Processing Systems
,
Vancouver, Canada
,
Dec. 6–11
, pp.
109
117
.
23.
Azimi
,
J.
,
Jalali
,
A.
, and
Fern
,
X.
,
2011
, “
Dynamic Batch Bayesian Optimization
,” arXiv preprint arXiv:1110.3347.
24.
Azimi
,
J.
,
Jalali
,
A.
, and
Fern
,
X.
,
2012
, “
Hybrid Batch Bayesian Optimization
,” arXiv preprint arXiv:1202.5597.
25.
González
,
J.
,
Dai
,
Z.
,
Hennig
,
P.
, and
Lawrence
,
N.
,
2016
, “
Batch Bayesian Optimization Via Local Penalization
,”
Artificial Intelligence and Statistics
,
Cadiz, Spain
,
May 9–11
, pp.
648
657
.
26.
Tran
,
A.
,
Sun
,
J.
,
Furlan
,
J. M.
,
Pagalthivarthi
,
K. V.
,
Visintainer
,
R. J.
, and
Wang
,
Y.
,
2019
, “
pbo-2gp-3b: A Batch Parallel Known/Unknown Constrained Bayesian Optimization With Feasibility Classification and Its Applications in Computational Fluid Dynamics
,”
Comput. Methods Appl. Mech. Eng.
,
347
, pp.
827
852
.
27.
Huan
,
X.
, and
Marzouk
,
Y. M.
,
2013
, “
Simulation-Based Optimal Bayesian Experimental Design for Nonlinear Systems
,”
J. Comput. Phys.
,
232
(
1
), pp.
288
317
.
28.
Huan
,
X.
, and
Marzouk
,
Y.
,
2014
, “
Gradient-Based Stochastic Optimization Methods in Bayesian Experimental Design
,”
Int. J. Uncertainty Quantif.
,
4
(
6
), pp.
479
510
.
29.
Shen
,
W.
, and
Huan
,
X.
,
2021
, “
Bayesian Sequential Optimal Experimental Design for Nonlinear Models Using Policy Gradient Reinforcement Learning
,” arXiv preprint arXiv:2110.15335.
30.
Cheon
,
M.
,
Byun
,
H. E.
, and
Lee
,
J. H.
,
2021
, “
A New Reinforcement Learning Based Bayesian Optimization Method for a Sequential Decision Making in an Unknown Environment
,”
2021 AIChE Annual Meeting
,
Boston, MA
,
Nov. 7–19
.
31.
Viana
,
F. A.
,
Haftka
,
R. T.
, and
Watson
,
L. T.
,
2012
, “
Sequential Sampling for Contour Estimation With Concurrent Function Evaluations
,”
Struct. Multidiscipl. Optim.
,
45
(
4
), pp.
615
618
.
32.
Allen-Zhu
,
Z.
,
Li
,
Y.
,
Singh
,
A.
, and
Wang
,
Y.
,
2017
,
Proceedings of the 34th International Conference on Machine Learning
, Vol.
70
,
D.
Precup
, and
Y. W.
Teh
, eds.,
PMLR
,
Sydney, Australia
, pp.
126
135
.
33.
Sutton
,
R. S.
, and
Barto
,
A. G.
,
2018
,
Reinforcement Learning: An Introduction
,
A Bradford Book
,
Cambridge, MA
.
34.
Roychowdhury
,
S.
,
Chen
,
A.
,
Xiaohu
,
P.
,
GAMBONE
,
J.
,
Citriniti
,
T.
,
Barr
,
B.
, et al
2020
, “
System and Methods for Correcting Build Parameters in an Additive Manufacturing Process Based on a Thermal Model and Sensor Data
,” U.S. Patent App. 16/257,348, July 30.
35.
Zhou
,
Z.
,
Kearnes
,
S.
,
Li
,
L.
,
Zare
,
R. N.
, and
Riley
,
P.
,
2019
, “
Optimization of Molecules Via Deep Reinforcement Learning
,”
Sci. Rep.
,
9
(
1
), p.
10752
36.
Andriotis
,
C. P.
, and
Papakonstantinou
,
K. G.
,
2020
, “
Deep Reinforcement Learning Driven Inspection and Maintenance Planning Under Incomplete Information and Constraints
,”
Reliab. Eng. Sys. Safety
.
37.
Li
,
K.
, and
Malik
,
J.
,
2017
, “
Learning to Optimize Neural Nets
,” arXiv preprint arXiv:1703.00441.
38.
Li
,
Y.
,
2017
, “
Deep Reinforcement Learning: An Overview
,” arXiv preprint arXiv:1701.07274.
39.
Deisenroth
,
M.
, and
Rasmussen
,
C. E.
,
2011
, “
Pilco: A Model-Based and Data-Efficient Approach to Policy Search
,”
Proceedings of the 28th International Conference on International Conference on Machine Learning
,
Bellevue, WA
,
June 28–July 2
, pp.
465
472
.
40.
Deisenroth
,
M. P.
,
Rasmussen
,
C. E.
, and
Fox
,
D.
,
2011
, “Learning to Control a Low-Cost Manipulator Using Data-Efficient Reinforcement Learning,”
Robotics: Science and Systems VII
, Vol.
7
,
H.
Durrant-Whyte
,
N.
Roy
, and
P.
Abbeel
, eds.,
MIT Press
,
Boston, MA
, pp.
57
64
.
41.
Bhaduri
,
A.
,
Gupta
,
A.
,
Olivier
,
A.
, and
Graham-Brady
,
L.
,
2021
, “
An Efficient Optimization Based Microstructure Reconstruction Approach With Multiple Loss Functions
,” arXiv preprint arXiv:2102.02407.
42.
Bhaduri
,
A.
,
Gupta
,
A.
, and
Graham-Brady
,
L.
,
2021
, “
Stress Field Prediction in Fiber-Reinforced Composite Materials Using a Deep Learning Approach
,” arXiv preprint arXiv:2111.05271.
43.
Bhaduri
,
A.
,
Brandyberry
,
D.
,
Shields
,
M. D.
,
Geubelle
,
P.
, and
Graham-Brady
,
L.
,
2020
, “
On the Usefulness of Gradient Information in Surrogate Modeling: Application to Uncertainty Propagation in Composite Material Models
,”
Probab. Eng. Mech.
,
60
, p.
103024
.
44.
Deisenroth
,
M. P.
,
2010
,
Efficient Reinforcement Learning Using Gaussian Processes
, Vol.
9
,
KIT Scientific Publishing
,
Karlsruhe, Germany
.
45.
Arulkumaran
,
K.
,
Deisenroth
,
M. P.
,
Brundage
,
M.
, and
Bharath
,
A. A.
,
2017
, “
A Brief Survey of Deep Reinforcement Learning
,” arXiv preprint arXiv:1708.05866.
46.
Rasmussen
,
C. E.
,
2003
, “Gaussian Processes in Machine Learning,”
Advanced Lectures on Machine Learning
,
O.
Bousquet
,
L.
Luxburg
, and
G.
Rätsch
, eds.,
Springer Publishing
,
New York City
.
47.
Rasmussen
,
C. E.
, and
Kuss
,
M.
,
2003
, “
Gaussian Processes in Reinforcement Learning
,”
Advances in Neural Information Processing Systems 16 (NIPS 2003)
,
Vancouver, Canada
,
Dec. 8–13
.
48.
Goodfellow
,
I.
,
Bengio
,
Y.
, and
Courville
,
A.
,
2016
,
Deep Learning
,
MIT Press
,
Boston, MA
.
49.
Williams
,
C. K.
, and
Rasmussen
,
C. E.
,
2006
,
Gaussian Processes for Machine Learning
, Vol.
2
,
MIT Press
,
Cambridge, MA
.
50.
Gelman
,
A.
,
Carlin
,
J. B.
,
Stern
,
H. S.
, and
Rubin
,
D. B.
,
1995
,
Bayesian Data Analysis
,
Chapman and Hall/CRC
,
London, UK
.
51.
Bilionis
,
I.
,
Zabaras
,
N.
,
Konomi
,
B. A.
, and
Lin
,
G.
,
2013
, “
Multi-Output Separable Gaussian Process: Towards an Efficient, Fully Bayesian Paradigm for Uncertainty Quantification
,”
J. Comput. Phys.
,
241
, pp.
212
239
.
52.
Li
,
K.
, and
Malik
,
J.
,
2016
, “
Learning to Optimize
,” arXiv preprint arXiv:1606.01885.
53.
Williams
,
R. J.
,
1992
, “
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
,”
Mach. Learn.
,
8
(
3
), pp.
229
256
.
54.
Fletcher
,
R.
,
1987
,
Practical Methods of Optimization
, 2nd ed.,
Wiley-Interscience
,
Hoboken, NJ
.
55.
Sharma
,
J.
, and
Singhal
,
R. S.
,
2015
, “
Comparative Research on Genetic Algorithm, Particle Swarm Optimization and Hybrid GA-PSO
,”
2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom)
,
New Delhi, India
,
Mar. 11–13
, IEEE, pp.
110
114
.
56.
Kristensen
,
J.
,
Subber
,
W.
,
Zhang
,
Y.
,
Ghosh
,
S.
,
Kumar
,
N. C.
,
Khan
,
G.
, and
Wang
,
L.
,
2019
, “Industrial Applications of Intelligent Adaptive Sampling Methods for Multi-Objective Optimization,”
Design and Manufacturing
,
E.
Yasa
,
M.
Mhadhbi
, and
E.
Santecchia
, eds.,
IntechOpen
,
Rijeka, Iceland
.
57.
Ghosh
,
S.
,
Anantha Padmanabha
,
G.
,
Peng
,
C.
,
Andreoli
,
V.
,
Atkinson
,
S.
,
Pandita
,
P.
,
Vandeputte
,
T.
,
Zabaras
,
N.
, and
Wang
,
L.
,
2021
, “
Inverse Aerodynamic Design of Gas Turbine Blades Using Probabilistic Machine Learning
,”
ASME J. Mech. Des.
,
144
(
2
), p.
021706
.
58.
Tsilifis
,
P.
,
Pandita
,
P.
,
Ghosh
,
S.
,
Andreoli
,
V.
,
Vandeputte
,
T.
, and
Wang
,
L.
,
2021
, “
Bayesian Learning of Orthogonal Embeddings for Multi-Fidelity Gaussian Processes
,”
Comput. Methods Appl. Mech. Eng.
,
386
, p.
114147
.
You do not currently have access to this content.