Periodically fetches publicly available forecast and measurement information and stores it into Redis streams
The AIT RDP Data Crawler is mainly driven by configurable sources and sinks that access remote systems. This section describes the main configuration options.
Interface Type: Source
Type Name: data_crawler.sources.weatherbit.CurrentWeather
Description: The current weather API fetches the actual state estimations from the Weatherbit API. It is not recommended by Weatherbit to use these values for archive purpose.
Parameters
api key: The API key to access the Weatherbit API. This key is required for all Weatherbit APIs.latitude: The latitude of the location to be queried.longitude: The longitude of the location to be queried.Interface Type: Source
Type Name: data_crawler.sources.weatherbit.HourlyForecasts
Description: The hourly forecast source fetches the hourly forecast data from the Weatherbit API.
Parameters
api key: The API key to access the Weatherbit API. This key is required for all Weatherbit APIs.latitude: The latitude of the location to be queried.longitude: The longitude of the location to be queried.horizon hours: The number of hours in the future to be queried. The maximum value returned by the API is 240
hours (default).Interface Type: Source
Type Name: data_crawler.sources.yr_no.LocationForecast
Description: The location forecast source fetches the numerical weather prediction data from the met.no API for a single location.
Parameters
latitude: The latitude of the location to be queried.longitude: The longitude of the location to be queried.altitude: The altitude of the location to be queried. If none is given, the default ground altitude as induced by
the API is used.contact address: A contact address that should be sent along with the API request. This is required by the met.no API
and should be a valid email address. The address is used to contact you in case of problems with the API.Interface Type: Source
Type Name: data_crawler.sources.zamg.MeasurementStationData
Description: The measurement station data source fetches the live and historic measurements from the Geosphere measurement stations. There are two dedicated endpoint types. TAWES and climate. The first one returns the real-time information with less quality control and a shorter historic timeframe and the later returns the quality controlled measurements. Note that both endpoints use different station IDs and therefore may not be easily exchanged. Please consider the climate data and TAWES data documentation for further details on the data sources and station IDs.
Parameters
station id: The station ID to be queried. Note that the station is different for the TAWES and climate endpoints.initial history: The past duration to fetch data from. After the initial query, only new samples will be returned.
Defaults to 48h.endpoint: The name of the endpoint to be used. Either TAWES or climate. The default is climate.data points: A list of data points to be fetched. The data point nomenclature corresponds to the Geosphere namingInterface Type: Source
Type Name: data_crawler.sources.zamg.NumericalWeatherPredictionData
Description: The numerical weather prediction data source fetches the weather forecasts from the Geosphere API. The source supports two endpoints, a standard numeric weather prediction data that return a single value for each observation and an ensemble forecast that returns some percentiles in addition.
Parameters:
latitude: The latitude of the location to be queried.longitude: The longitude of the location to be queried.endpoint: The name of the endpoint to be used. Either NWP or ensemble. The default is NWP.data points: A list of data points to be fetched. The data point nomenclature corresponds to the Geosphere naming
and not the AIR RDP names. Please consider the Geosphere documentation for further details. Per default, all supported
data points will be added.Interface Type: Source
Type Name: data_crawler.sources.knmi.WeatherStationsKNMI
Description: The KNMI weather stations source fetches the live and historic measurements from the KNMI weather stations. For accessing the dataset, an API Key is required.
Parameters
api_key: The API key to access the KNMI API.stations: A list of station IDs or a dict-based configuration having an id attribute listing the station IDs.initial_history: The past duration to fetch data from. After the initial query, only new samples will be returned.
The default history for KNMI weather stations is 12h.drop_missing_observations: Drop observations that do not contain any valid values. Per default, all returned
observations are included, even if they have just NaN values.Interface Type: Source and Sink
Type Name: data_crawler.sources.modbus.ModbusTCP and data_crawler.sinks.modbus.ModbusTCP
Description: The Modbus TCP source and sink fetches the data from a Modbus TCP server and writes dedicated message fields back. Right now, sink and source are separated and maintain one Modbus TCP connection, each. If this is an issue (e.g., due to single-connection servers), please open a ticket on GitHub.
Parameters:
address: The address of the Modbus TCP server.port: The port of the Modbus TCP server. Defaults to 502.register_spec: The register specification to be used. This is a list of dictionaries that define the register
addresses and types. Instead of directly defining the register specification, also an (external CSV) table can beregister_start: The start address of the register. This is a required field, however, for some rows, it may be
intentionally left empty or NaN. In case an empty or NaN value is observed, the register will be fetched in one
block with the previous one and a consecutive addressing is assumed.name: The name of the data point to be fetched. This is a required field. The name is used to create the message
field of the output message.data_type: The data type of the register. This is also a required field and will determine the number of
consecutive registers to be read or written. The following data types are supported:
int16: 16-bit signed integeruint16: 16-bit unsigned integerint32: 32-bit signed integeruint32: 32-bit unsigned integerfloat16: 16-bit floatfloat32: 32-bit floatfloat64: 64-bit floatstringN: N-character ASCII String (e.g. string16 for 16 characters). In case the string is null-terminated
before the maximum number of characters, it will be shortened. Similarly, it will be trimmed, in case it is
filled with space characters.bool: Boolean valueregister_type: The modbus type of the register. This is a required field and will determine the access method
(holding register, input register, etc.). The following types are supported:
holdingreg/holdingregister/h: Holding registerinputreg/inputregister/i: Input registercoils/c: Coilsdiscreteinput/d: Discrete inputunit: An optional unit description of the data point. This is not used for the Modbus TCP interface, but may be
useful for documentation purposes.scaling: An optional scaling factor to be applied to the data point. In case an integer is scaled by a double,
then a double message field will be output. Otherwise, the original data type will be kept.used: boolean flag that indicates whether the register is used or not. Per default, it is set to true and
therefore included in the output.unit_id: The device ID of the Modbus TCP server. Per default, 1 is assumed.description: A textual description of the data point. This is not used for the Modbus TCP interface, but may be
useful for documentation purposes as well.mode: The access mode (r for reading, w for writing and rw for read/write). Per default, r is assumed. The
mode modifier can be used to reference the same register description both for a source and a sink.byte_order: The byte order of the Modbus TCP server. Per default, big-endian is assumed. > indicates big-endian,
< indicates little-endian.word_order: The word order of the Modbus TCP server. Per default, big-endian is assumed. > indicates big-endian,
< indicates little-endian.Example: For instance, the following configuration snippet defines a Modbus TCP source that reads power values from an electricity meter:
submeters.modbus.F3_104:
type: "data_crawler.sources.modbus.ModbusTCP"
source parameter:
register spec: !table/csv
path: "modbus/PAC2200_modbus_registers.csv"
address: "10.0.3.104"
polling:
frequency: 10s
redis:
stream: "measurements.submeters.modbus.F1"
tags:
location_code: "GG2"
data_provider: "PAC2200"
device_name: "F3_104"
with the following register specification in modbus/PAC2200_modbus_registers.csv:
unit_id;Register_start;Register_end;Name;Data_type;Unit;Register_type;Scaling
1;63;72;S_tot;SINGLE;VA;i;1
;;;P_tot;SINGLE;W;i;1
;;;Q_tot;SINGLE;var;i;1
;;;PF_tot;SINGLE;-;i;1
Interface Type: Source
Type Name: data_crawler.sources.entsoe_da.ENTSOEDATransparency
Description: The ENTSO-E day-ahead market prices source fetches the day-ahead market prices from the ENTSO-E. It requires registration to the ENTSO-E Transparency Platform and acquisition of an API token via the portal.
Parameters:
api_key: The API key to access the ENTSO-E API.day_ahead_prices: The day-ahead market price configurations to be fetched. Each entry contains a dict with the
following attributes:
country_code: The country code of the market area to be queried. (e.g., AT for Austria)timezone: The timezone of the market prices to be queried. This information is used to determine the beginning
and ending of the next dayresolution: The resolution of the market prices to be queried. Right now, MIN_15, MIN_30, and MIN_60 are
supported.