Abstract
The control of complex systems is often challenging due to high-dimensional nonlinear models, unmodeled phenomena, and parameter uncertainty. The increasing ubiquity of sensors measuring such systems and increased computational resources has led to an interest in purely data-driven control methods, particularly using the Koopman operator. In this paper, we elucidate the construction of a linear predictor based on a sequence of time realizations of observables drawn from a data archive of different trajectories combined with subspace identification methods for linear systems. This approach is free of any predefined set of basis functions but instead depends on the time realization of these basis functions. The prediction and control are demonstrated with examples. The basis functions can be constructed using time-delayed coordinates of the outputs, enabling the application to purely data-driven systems. The paper thus shows the link between Koopman operator-based control methods and classical subspace identification methods. The approach in this paper can be extended to adaptive online learning and control.
1 Introduction
Controlling complex autonomous systems, such as swimming robots, off-road vehicles, wind turbines, transportation networks, and power grids poses significant challenges due to high-dimensional nonlinear models, unmodeled phenomena, and parameter uncertainties. The increasing ubiquity of sensors has led to an abundance of data from such systems, which combined with improved computational resources has led to an interest in purely data-driven methods for modeling, analysis, and control of such systems [1–3]. These data-driven approaches have increasingly been based on linear operators associated with nonlinear dynamical systems. The Perron–Frobenius operator propagates the densities in the state space of a dynamical system, and the Koopman operator propagates the observables along the flow in the state space of a dynamical system, see for example Ref. [4]. By considering the evolution of functions of the states rather than the states themselves, these operator-based methods transform a nonlinear system into an infinite-dimensional linear system, which can be approximated by a finite-dimensional linear system such as in Refs. [5,6]. Additionally, many efforts have been made recently to extend these results from autonomous dynamical systems to input–output systems, where the operator approximation must account for the effect of a control input to predict the output variables. For example, extended dynamic mode decomposition (EDMD) coupled with model predictive control (MPC) such as in Ref. [7] or with LPV modes such as in Ref. [8] and variations thereof have become popular. More recently, ideas from subspace identification (SSID) for linear systems, such as from Refs. [9,10] have been used by Refs. [11,12] to develop a method based on identifying the significant modes in the evolution of an observable along a single trajectory and then using Gaussian processes (GP) as a mapping to this identified lifted state. The current paper builds on these ideas of subspace identification and clarifies and extends them to include observation data from multiple trajectories and further shows that basis functions for the observables can be defined using only data from output variables.
The main contributions of this paper are as follows. We demonstrate the relation between a Koopman operator associated with a nonlinear system and subspace identification methods of linear systems. We identify the relation between the approximate Koopman operator that is calculated using techniques like EDMD versus subspace methods. In the approach using subspace methods, the approximate Koopman operator that is calculated propagates time realizations of observables. Therefore, output data from a single trajectory do not (usually) sample the state space dynamics, as they are difficult to ensure that a nonlinear system is sufficiently excited, an assumption that underlies the subspace identification methods with a single data record, see for example Ref. [13]. We show that using multiple data records improves the accuracy of prediction. Multiple data records have been shown to be useful even in subspace identification for linear systems in Ref. [14] since a sufficiently long single data record may not be available in practice and, more importantly, the observables could be projected on any set of basis functions, such as Gaussian processes. Usually, the basis functions are functions of the state, and in purely data-driven systems, such functions cannot be constructed since explicit knowledge of the states is not available nor are different observables realized or sampled. This paper shows that the basis functions can be constructed using time-delayed coordinates of the outputs, enabling the application to purely data-driven systems.
The rest of the paper is organized as follows. In Sec. 2, we describe the approximations of the different “versions” of the Koopman operator and in particular the approximation that propagates time realizations of the observables. In Sec. 3, we formulate the subspace identification problem for the linearized dynamics using multiple data records. In Sec. 4, we review GP as basis functions (of state variable) for observables and in Sec. 4.2, we reformulate the GP as basis functions of time-delayed observables. Section 5 prediction is obtain using GP-SSID and demonstrate numerical results in Sec. 6. Section 7 conlcudes the paper.
2 Formulation for Koopman Operator Based Control

Commutative diagram showing the propagation of the observable g(x) by the operator , the propagation of the projected observables Ψ(x) by the projected operator and their time realization zt by the operator K

(a) Training data set for the model with multiple data records and (b) training data for the single trajectory case. Points are the training data for the GP and the lines show the full trajectories.
3 Koopman Operator Via Observables From Multiple Data Records
3.1 System Matrix Calculation Using SSID.
4 Gaussian Processes as Basis Functions
The SSID procedure outlined in the previous section for identifying the lifted state z and system matrices provides a method for obtaining the system matrices from input–output data collected from the system. However, it is also necessary to construct a mapping from the original states, xt (or partial observation yt) to the lifted state. For this, we employ Gaussian process regression, based on the idea discussed in Sec. 2 that the lifted states zt obtained in subspace identification can be viewed as evaluations at a particular xt of a dictionary function, which is itself a realization of a random variable in a function space. In Sec. 4.1, we give the procedure when measurements of the full state observable are available and in Sec. 4.2, we consider the case where only a partial observation is available.
4.1 Gaussian Processes as Basis Functions of Full State Observable.
4.2 Gaussian Processes as Basis Functions of Time-Delayed Observables.
5 Prediction Using GP-SSID
6 Results
6.1 Linear Predictor Without Control.
To compare the prediction of the proposed GP-SSID algorithm to the traditional EDMD algorithm, see for instance, Ref. [5], we use the same data from the multiple trajectory data set to construct the Koopman operator using the EDMD method with the choice of 25 radial basis functions and the states themselves, which gives us the total number of basis functions r = 27. We then compare predictions using 500 randomly sampled initial points in the state space, enabling us to predict a 2-second trajectory for each initial point. The average root mean squared error for the SSID model trained with multiple trajectories is 0.0728, while for the single trajectory model, it is 1.2093, and for the K-EDMD method average root-mean-squared error (RMSE) is 0.5424. Figures 3(a)–3(c) show one of the prediction example with an initial state at [ −1.1, − 1]. Figure 3(d) illustrates the increase in RMSE as the prediction time increases.
![Comparison of the prediction of multiple trajectory model (solid line), single trajectory model (dash-dotted line), and Koopman EDMD model (dotted line) with the solution obtained by direct integration of the nonlinear system (dash-squared line) (24) for the initial state [ − 1.1, − 1]](https://asmedc.silverchair-cdn.com/asmedc/content_public/journal/lettersdynsys/3/3/10.1115_1.4063604/1/m_aldsc_3_3_031009_f003.png?Expires=1742296224&Signature=oCt-LyvHrfdecSNDYJzO~01eI2dcj9q3yvMrhp5NTZU1akRWOPfjD79s8Cx~W7p82WVfEolu~1GVQ63FkkQBPMVFZPCqpaBOCfRW3aHFEbNoeUu~Y5cD-aY6jhAOI5OOIP8uyFnVpw7kePGU-LhvpIh3zGcZ0b51rWoG-3aih3SnkbGoxC4EsKJ81bC5koAX-PnoiQiHpQxdm-KjafTI4Q6BUD2uhdEjBk0c36fa6r21vaEydiAwp~ojenN8f54f~-lKkuY3YvAgYhVzJhmM-v5JBIJd8mkdInHoflPw6hB1jmheyEEW2E1--E~I5qZf8Zws9VgLbjRW2U-4~p83qA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Comparison of the prediction of multiple trajectory model (solid line), single trajectory model (dash-dotted line), and Koopman EDMD model (dotted line) with the solution obtained by direct integration of the nonlinear system (dash-squared line) (24) for the initial state [ − 1.1, − 1]
![Comparison of the prediction of multiple trajectory model (solid line), single trajectory model (dash-dotted line), and Koopman EDMD model (dotted line) with the solution obtained by direct integration of the nonlinear system (dash-squared line) (24) for the initial state [ − 1.1, − 1]](https://asmedc.silverchair-cdn.com/asmedc/content_public/journal/lettersdynsys/3/3/10.1115_1.4063604/1/m_aldsc_3_3_031009_f003.png?Expires=1742296224&Signature=oCt-LyvHrfdecSNDYJzO~01eI2dcj9q3yvMrhp5NTZU1akRWOPfjD79s8Cx~W7p82WVfEolu~1GVQ63FkkQBPMVFZPCqpaBOCfRW3aHFEbNoeUu~Y5cD-aY6jhAOI5OOIP8uyFnVpw7kePGU-LhvpIh3zGcZ0b51rWoG-3aih3SnkbGoxC4EsKJ81bC5koAX-PnoiQiHpQxdm-KjafTI4Q6BUD2uhdEjBk0c36fa6r21vaEydiAwp~ojenN8f54f~-lKkuY3YvAgYhVzJhmM-v5JBIJd8mkdInHoflPw6hB1jmheyEEW2E1--E~I5qZf8Zws9VgLbjRW2U-4~p83qA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Comparison of the prediction of multiple trajectory model (solid line), single trajectory model (dash-dotted line), and Koopman EDMD model (dotted line) with the solution obtained by direct integration of the nonlinear system (dash-squared line) (24) for the initial state [ − 1.1, − 1]
6.2 Model Predictive Control With Koopman Operator.
We gather training data sets shown in Fig. 4; for the first model, we use multiple data records by simulating 100 randomly sampled initial states forward for 500 time-steps with random control inputs u ∈ [−3, 3]. For the second model, we propagate an initial condition for 8000 time-steps with a random control input of a similar range. Subsequently, we construct mosaic-Hankel matrices from the collected data, identify the linear system matrices, and train the basis function (Gaussian Processes) for both models, as discussed in Secs. 3 and 4. The size of the lifted states selected is r = 35 for both the models.

Training data set for the Duffing oscillator (25). Star points are the training data for the GPs, and the circular points show the full trajectories: (a) multiple data records and (b) single trajectory.

Training data set for the Duffing oscillator (25). Star points are the training data for the GPs, and the circular points show the full trajectories: (a) multiple data records and (b) single trajectory.
![MPC results for linear model constructed using single trajectory data. The initial state x0 = [ − 0.7160, − 0.9789] and the target state is [2, 0].](https://asmedc.silverchair-cdn.com/asmedc/content_public/journal/lettersdynsys/3/3/10.1115_1.4063604/1/m_aldsc_3_3_031009_f005.png?Expires=1742296224&Signature=utSLhbyg1MTBEJwGUkyee~AvlYGwvzRkZNtRKXVx00riMMV2ML4U25RzLakbC3FdlVB9d8nSKWGMepmML4BSkeVk0JFAksZTYm13BggyIDjcY40J5aIUZP7znrAVtOSqk~WlUWH8r83dLcGZ51x1hPtqgtD41jtrZy9Eo1Izhi7Cs~GPTq5f-ozMZQXkFmprFledoZ1xA0uzPSDRp~xxqws2A0HW~9DOIdee9AiRluXe6e0FacORtQQ5PSzq4dUhioSS9orSpSz9~-UbBZncEOm0x9GRRtyCiypqdyj-pJuUpT26U76ZEZok83U6kABlcI0rzmHi1SiVj9zwkXIwiQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
MPC results for linear model constructed using single trajectory data. The initial state x0 = [ − 0.7160, − 0.9789] and the target state is [2, 0].
![Comparison of the MPC results from the linear models constructed using GP-SSID with multiple data records (solid line) and K-EDMD (dotted line) with model-based nonlinear MPC solutions (dash-dotted line). The initial state x0 = [−0.7160, −0.9789] and the target state is [2, 0].](https://asmedc.silverchair-cdn.com/asmedc/content_public/journal/lettersdynsys/3/3/10.1115_1.4063604/1/m_aldsc_3_3_031009_f006.png?Expires=1742296224&Signature=oSVciVwc3v2RHDjr70WKLR4ECBlWi-EB4LY3WoK12SSnyoafLCsISIHUFMhO~S6E0K8MUkreENnCGBbyj2nXrgbUmxZsEIBkGbeH6q-uBW0FRcEvALi85f-RPZH0cVcNoAIiIi3vZ1mHOUvv5-y2eZgB0RlP0jSQbwwNRMhEufpcm53U3SQuaqbqPVM2Cc3hcIC6BkIwmCbVDwd9vnXMzj0rYwmLsL3Diqg-SEhQLnSoCZKKocrLJ6btBbFCFoJgO6aXMta6MPgqWanJlE4rpFjcaGnlMdjNg0TYEVRN~efNzCCCO6jB2s~Du3sxkRyJMMY0aW2D5ypBf9DCKv4FLg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Comparison of the MPC results from the linear models constructed using GP-SSID with multiple data records (solid line) and K-EDMD (dotted line) with model-based nonlinear MPC solutions (dash-dotted line). The initial state x0 = [−0.7160, −0.9789] and the target state is [2, 0].
![Comparison of MPC results with control horizon Nc = 50 obtained from the linear models constructed using GP-SSID with multiple data records (in solid line) and K-EDMD (dotted line). The initial state x0 = [−0.7160, − 0.9789] and the target state is [2, 0].](https://asmedc.silverchair-cdn.com/asmedc/content_public/journal/lettersdynsys/3/3/10.1115_1.4063604/1/m_aldsc_3_3_031009_f007.png?Expires=1742296224&Signature=mTUWjjqDQ7jIsaUAmQW~1T7hVqZiAsXQlB4gIKcCgQ2jsgDs~i4kFZ8-VOTVUTggVIIDq29VQgdN2ZKRMC7FwEqmPTfC8TAiVvfNr3X130y06kUFKtM87LDMHuxs2JzGNlChAJI75bGb4D3lTrTswkdy0i7Zlb1MtZATYlXxC~UJSOfdkB~JR6l3Auc~VLAmpIHuBs1A-5zwhaDTJ07YuYlCZJp99nan0eKTx20ZIB~ELXVEXYPdf3AKyZQRE~XFCHnUca8~NQEQQQS0lD6x2CLucJ1kqqeqncwhgFbLYO42gHyfkjNXQHoHZCmlMhxmplPW-7tIvM1rKsCM8vPPXA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Comparison of MPC results with control horizon Nc = 50 obtained from the linear models constructed using GP-SSID with multiple data records (in solid line) and K-EDMD (dotted line). The initial state x0 = [−0.7160, − 0.9789] and the target state is [2, 0].
To assess the effectiveness of our proposed methodology in contrast to established data-driven techniques for addressing the presented optimal control problem, we leverage the K-EDMD algorithm as outlined in Ref. [5]. We employ this algorithm to construct a linear system using the same multiple trajectory dataset. This construction involves using 36 radial basis functions alongside the states themselves, leading to generating a Koopman operator with a total of r = 38 basis functions. Subsequently, we apply an MPC formulation, as detailed in Sec. 6.2, tailored to the linear system derived through the K-EDMD algorithm. The results of our MPC solutions are illustrated in Fig. 6. Notably, our proposed methodology exhibits faster convergence compared to the K-EDMD approach. Furthermore, the terminal error is larger in the case of the K-EDMD algorithm.
While the GP-SSID based Koopman linear MPC is well-suited for data-driven systems, we also compare this approach with a purely model-based nonlinear model predictive control approach with a similar cost function. We observe that the model-based controller exhibits faster convergence and achieves a lower cost when compared to the data-driven Koopman operator-based linear controllers. However, the computational time required for linear MPC based on both the GP-SSID and K-EDMD approaches is lower in comparison to the computational demands of the nonlinear model-based controller for the same time-step δt = 0.01 s. As an illustration of the difference in computational time, for the example shown in Fig. 6, GP-SSID and K-EDMD methods take about 101 s and 105 s, respectively, and the nonlinear MPC takes 1108 s.
The GP-SSID method consistently outperforms the K-EDMD algorithm in terms of prediction accuracy, as illustrated in Fig. 3 and while it seems that the MPC implementations of both approaches (as shown in Fig. 6) yield similar solutions, this similarity in performance is a result of using a one-step control horizon, continually updating the initial conditions. Nonetheless, it’s essential to note that the one-step control horizon also entails a higher computational cost and may make online implementation infeasible. Therefore we evaluate the performance of both algorithms with a larger control horizon of Nc = 50. For the example considered in Fig. 7 when Nc is increased to 50 the computational time for both GP-SSID and K-EDMD approaches is about 1.7 s although the terminal error with the K-EDMD approach grows compared to the GP-SSID approach.
6.3 Model Predictive Control With Koopman Operator for Partially Observed State.
7 Conclusion
In this paper, we have explored connections between the classical system identification technique of subspace identification and recently developed methods for system identification based on constructing a finite-dimensional approximation of the Koopman operator. In particular, we have shown that these methods can be directly related by considering the latent state identified through SSID to be an evaluation of a set of dictionary functions at a particular state of the underlying nonlinear system. With this formulation, we have shown that for nonlinear systems, it is necessary to consider multiple data records, as a single trajectory does not sufficiently explore the state space in most cases. This is indicated by improved prediction and control performance. We also show that the same formulation can be applied in cases where only partial state measurements are available.
Footnote
Paper presented at the 2023 Modeling, Estimation, and Control Conference (MECC 2023), Lake Tahoe, NV, Oct. 2–5. Paper No. MECC2023-159.
Acknowledgment
This work was supported by Grant No. 2021612 from the National Science Foundation and Grant No. 13204704 from the Office of Naval Research.
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.