MXS

Cost of Energy Optimised by Reinforcement Learning (CEORL)

Programme

Control Systems

Status

Completed

Stage

3

Lead contractor

MaxSim Ltd

Sub-contractor(s)

University of Edinburgh
Mocean Energy Ltd
Caelulum Ltd
Wave Conundrums Consulting
Aquaharmonics Inc.
David Pizer
Marine Systems Modelling
REOptimize Systems
Quoceant Ltd
Pelagic Innovation Ltd

Overview

Wave energy converters (WECs) need to attract forces in order to capture power but they also need to avoid forces to survive extreme events. This can be considered the 'force contradiction'. The ‘risk contradiction’ involves the costs of risk and uncertainty. These include the increased cost of capital, and costs due to reduced reliability, availability and survivability. Lowering these risks usually involves increased capital costs.

Reinforcement Learning (RL) is a leading machine learning method with the potential to bypass these contradictions. It can learn the probabilistic relationship between chosen actions and the device behaviour or 'state', and desirable states can be assigned 'rewards'. Undesirable states can be penalised with negative rewards. RL calculates the best long-term control strategy by summing the probabilities of future rewards. RL is robust to sensor errors, delay and drift, as well as unidentified non-linear response. It builds a 'map' of the relationship between the chosen actions and the device response or 'state'. Furthermore, this map is built for each device, so it accounts for manufacturing, installation and operational differences. RL would also enable subsystems that have been developed independently to be integrated into a single WEC. RL can address the force contradiction by deciding when to limit forces and when to maximise capture. It can address the risk contradiction by including the inherent uncertainties in the mapping and decision-making process.

The initial basis of the project in its early Stages was to use the WEC developer Aquaharmonics, who won the US Wave Energy Prize in 2016, as a base case. Part of their winning formula was to have a simple WEC and use control to maximise performance while limiting extreme loads, and RL may be a promising technique to extend that approach to more realistic conditions.

As the project progressed into Stage 2, the CEORL project identified it had the potential to overcome the following challenges for WEC control:

  • Absence of adequate models: model-predictive control is only as good as the model, and it needs bespoke development for each device type.  Model-free RL does not require a model.
  • Need for wave-by-wave control: Sophisticated wave-by-wave control could improve the economic viability of WECs by balancing competing requirements of high and low loads.  Large capture widths require large forces, however, large forces increase operational costs due to both peak and fatigue loads.  Finding the correct balance between these competing requirements is likely to lead to the lowest LCOE.  RL will be rewarded for learning control policies that trade-off these competing requirements to find policies that give the lowest LCOE.

In Stage 3, the aims of the project evolved to attempt to show that RL-derived policies can either double the energy or halve the loads in comparison to the baseline control by testing an RL-derived control policy in a physical model in a wave tank with power taken off by an electrical generator. Testing would be able to provide evidence of policy transferability - that a policy learnt on a numerical model can be transferred to a physical system, that training in one set of conditions leads to a policy that will be effective in different conditions. However, the main project ambitions are to de-risk the R&D process by gaining a better understanding of opportunities and limitations, and to build confidence and interest in our results within the wave energy community.

The limitations of the RL approach were also investigated, including limits to policy transferability, difficulties with particular types of sensor, or practical implementation problems, in order to support planning of any future R&D effort for a WEC developer. The project also developed intellectual property on the process of deriving a policy for a particular WEC, which involves assessing whether HIL is a useful step in the process.

Stage 1

July 2018

The Stage 1 Public Report for the MaxSim "Cost of Energy Optimised by Reinforcement Learning" project includes a description of the technology, scope of work, achievements and recommendations for further work.

The goal is to apply reinforcement learning (RL) algorithms to learn good control policies for specific wave energy converters (WECs). We plan for our methods to be applicable to all types of WEC. Even though the same RL algorithms could be used for training, the control policies they produce will clearly be specific to each WEC type. A longer‐term goal would be to learn individual control policies for individual WECs of the same type. The policies could change as the machine experiences wear, minor faults, biofouling or repairs. Policies would also be specific to the development level. The development level impacts the complexity and accuracy of information available about levelised cost of energy (LCOE), and the commercial focus and targets of the device developer.

CorPower Ocean AB were an additional subcontractor within Stage 1. 

Stage 2

September 2024

The Stage 2 Public Report for the MaxSim "Cost of Energy Optimised by Reinforcement Learning" project includes a description of the technology, scope of work, achievements and recommendations for further work.

The CEORL project, (Cost of Energy Optimised by Reinforcement Learning) has been investigating the use of reinforcement learning (RL) to develop control policies for wave energy converters (WECs). Stage 2 has demonstrated encouraging results: a step change in energy capture, while at the same time reducing peak loads. It is necessary to improve both in order to reduce the levelised cost of energy (LCOE) of wave power.

Stage 3

September 2024

The Stage 3 Public Report for the MaxSim "Cost of Energy Optimised by Reinforcement Learning" project includes a description of the technology, scope of work, achievements and recommendations for further work.

The file also includes the public report from a follow on project, which investigated applying RL directly in order to learn on real WECs. The activities are presented, along with advantages and residual challenges to implementation of the approach. 

Where to next?

Explore our other programmes, or find out more about how we operate.

Control Systems

The Control Systems programme sought to design, develop and demonstrate advanced control systems for WECs and sub-systems which could deliver improvements in the WES Target Outcome Metrics.

Wave Technology

Explore the technology development programmes we have run at Wave Energy Scotland.

About Wave Energy Scotland

Wave Energy Scotland was created by request of the Scottish Government. Discover our purpose, our procurement model and more about our team.

Contact

Please get in touch if you need our help or would like to discuss working with us.