How quality data enables deeper insights into wind farm performance

How is your data integrity? Where did your data go?

Data is the foundation of any wind farm analytics, benchmarking or reporting. At Clir, we talk a lot about data integrity, of which one key aspect is simply data coverage: how much data is available? How much data is missing?

Since the ability to understand an asset and drive improvements is based on the quality of site data, the answers to these questions can significantly influence how results from analytics should be interpreted.

With Clir Portfolio, we’ve implemented smart logic for the automatic labeling and categorization of missing data periods based on all available information. Based on our experience, we have also established benchmarks of best practices for data coverage. This enables us to get the most accurate results from our client's data.

The problem with missing SCADA data

Any period of missing data at a turbine can be categorized in one of two ways:

The turbine was producing power while data was missing.
The turbine was not producing power while the data was missing.

Causes of missing data include:

Turbines are not operating, and are disconnected from power and communication.
Turbines are operating but are disconnected from communication with the server. Data may be logged by the turbine for future backfill.
Turbines are operating and in communication with the server but the server is not logging data correctly.
Turbines are operating and in communication with the server, and the data was logged by the server, but the data was later lost or overwritten.

Wind farm data coverage is typically 95% or more. This may sound high enough that data coverage isn’t a problem but consider the following example.

A turbine operated for a year. Data coverage was 97%. Time-based turbine availability was 95% based on the available data.

	Days
Full period length	365
Time with data	354
Time with missing data	11
Time turbine was known to be online	336.3
Time turbine was known to be offline	17.7

For the period when we have data, turbine availability was 95%, but what is the impact of the missing 3% of data? What was actual turbine availability? This depends on the turbine state during the missing data periods. Two possible scenarios are as follows:

Scenario	A	B
Description	The turbine was online and producing during the missing data periods	The turbine was offline and not producing during the missing data periods
Time online (days)	347.3	336.3
Time offline (days)	17.7	28.7
Turbine availability	95.2%	92.2%

The missing 3% of the data introduces an uncertainty of 3% in the turbine availability, which is significant. Implications include:

A 3% difference when calculating gross production during an energy yield assessment.
A 3% difference when benchmarking availability against the broader industry.
In the eyes of an owner or operator, 95.2% is a lot better than 92.2%.
Optimization: in scenario B there is much more opportunity to improve turbine availability than in scenario A.

Scenarios A and B represent the range of what could have actually happened at the turbine. The true answer is likely scenario C, which is somewhere in between: the turbine was operating and producing power during some of the missing data periods and offline for the rest of the missing data periods.

Standard industry approaches to missing data

Clir’s software and services are regularly used to support wind farm transactions, namely buying and selling shares of projects. Through this role, we regularly see how missing data is treated by consultants across the industry.

One common assumption is that periods of available data are representative of periods of unavailable data. This is an easy assumption to make because it implies we don’t need to worry about missing data. One major problem with this approach is that missing data periods are often correlated to outages. Data is often missing because the turbine is offline and disconnected from power or communications while being serviced. This assumption introduces an upward bias on turbine availability.

Another approach used by the industry is to manually investigate each period of missing data using log books, monthly reports or interviews with operators to understand the turbine state. Although this works, it is not feasible at scale. It is not uncommon for there to be dozens or even hundreds of intermittent periods of missing data at a wind farm in a year.

Clir’s approach to missing data

Unfortunately, when data is missing at the source, we can’t get it back. On the positive side, there’s a lot we can do to address this problem with the data that is typically ingested.

Clir’s software and data model facilitate the ingestion, standardization and application of any data tag or data feed. There are often dozens or even hundreds of tags in the 10-minute turbine SCADA data, some of which are useful to ascertain turbine state during previous or subsequent missing periods. The following occurs during the automatic enrichment process when new data is ingested:

Periods of missing turbine SCADA data are initially identified and labeled. The period is assigned a performance category of ‘information unavailable.’
A wide range of ancillary data sources are considered to categorize the turbine state during each period of missing data. These data sources include counter tags in the turbine SCADA, substation SCADA and sales meter data.
Based on the ancillary data, each missing data period is then assigned a performance category of either ‘suspected operational’ or ‘suspected shutdown.’ If the turbine state cannot be determined from ancillary data then the performance category remains as information unavailable.
The performance category then feeds into the reporting, analytics and benchmarking carried out by Clir’s software, such that availability and other metrics can be accurately and robustly calculated.

What does improved labelling of missing data periods tell us about data integrity?

If the turbine is operating and producing power during most or all periods of missing data, then data integrity is considered poor. There is a problem somewhere along the way with the transmission, logging or storing of turbine SCADA data. Actual turbine availability is similar to what’s indicated by available data only, in line with Scenario A.

If the turbine is not operational during most or all missing data periods then in this regard, data integrity is considered good. Data is missing when the turbine is disconnected from power or communications. Actual turbine availability is significantly lower than what’s indicated by available data only, in line with Scenario B.

How does your farm’s data integrity compare to others?

At any project, we can look at how frequently the turbine was operating during missing data periods to evaluate data integrity. These results are then benchmarked against results from a peer group of wind farms to provide further insights into data integrity. This is presented to clients through our market insights reports.

The figure below shows the percentage of missing data where the turbine was actually operational for a set of eleven wind farms with the same turbine manufacturer for a recent year. Results vary significantly by farm.

At Farm C, the vast majority of missing data occurred during turbine outages, indicating good data integrity and little room for improvement. Missing data periods should mostly be counted against turbine availability.
At Farm F, the vast majority of missing data occurred while the turbine was operational, indicating relatively poor integrity and significant room for improvement. Missing data periods should not be counted against turbine availability.

Graph of percentage of missing data periods where the turbine was operational.

Benefits of improved data integrity

Some farms experience more frequent periods of missing turbine SCADA data than others. The extent to which the turbines are actually operating and producing power during these missing data periods varies significantly by farm. Clir supports wind farm owners in improving data integrity by identifying, quantifying and categorizing periods of missing data.

On the market insights reports, available through Clir Portfolio, we grade data practices to ensure that clients are best-in-class for data quality and coverage. This enables more robust and accurate turbine performance metrics and lower uncertainty energy yield assessment results. The increased P90 can be used to support debt optimization and improved financial returns for the project.

Thanks to Thomas Broatch, Intermediate Software Developer, for the implementation work of this new feature.