Getting Started¶
Define information structure to be monitored¶
Before using opmonlib it is important to understand and define what needs to be monitored.
Monotorable objects can then be captured in a schema
file to create C++ structs using moo
. Documentation and instructions on generating schema data structures and using moo can be found here:
Examples of how to write schemas can be found here. In general each .jsonnet
file contains definitions of types, and the objects to be monitored using these types. Each schema
file will generate a C++ header file containing the structures which hold the monitoring data, as defined in the .jsonnet
file. Typically each module may only need one struct to hold its monitoring information; however it is possible to create multiple nested structs within a schema, which are filled by the same module, as demonstrated by the timing module.
Filling and collecting structures¶
Once the information structures have been generated, we need to fill them with data and collect the information for monitoring.
For opmonlib
to collect the relevant information from the DAQ module, define the public get_info()
function in the DAQ module header file as the following:
void get_info(opmonlib::InfoCollector& ci, int level) override;
InfoCollector
class is used collect the information structures using the add
function. The level
defines the level of information the user wants to monitor (e.g. default, detailed, debug, etc.). Default information is defined as level=0
. An example of how the get_info
function is implemented in FakeCardReader.hpp
in the readout
module:
void
FakeCardReader::get_info(opmonlib::InfoCollector& ci, int level=0) {
fakecardreaderinfo::Info fcr;
fcr.packets = m_packet_count_tot.load();
fcr.new_packets = m_packet_count.exchange(0);
ci.add(fcr);
}
fcr
is filled with the relevant data members, and then added to the InfoCollector
for monitoring. In this case the filling and collecting is implemented in the same instance.
It is important at this point to consider whether it is necessary to separate filling from collecting. For example, opmonlib
may call get_info()
for information to be collected at a faster rate than the hardware can handle. In this case, one may want to separate the filling and collecting of information into two separate threads.
The timing module provides an example of how one can do this using its InfoGatherer
shown here. InfoGatherer
provides a template class which can fill different types of information structures. Examples of this are implemented in TimingHardwareManagerPDI.hpp
:
// monitoring
InfoGatherer<pdt::timingmon::TimingPDIMasterTLUMonitorData> m_master_monitor_data_gatherer;
virtual void gather_master_monitor_data(InfoGatherer<pdt::timingmon::TimingPDIMasterTLUMonitorData>& gatherer);
InfoGatherer<pdt::timingmon::TimingEndpointFMCMonitorData> m_endpoint_monitor_data_gatherer;
virtual void gather_endpoint_monitor_data(InfoGatherer<pdt::timingmon::TimingEndpointFMCMonitorData>& gatherer);
Dynamic structures¶
The structures generated using the opmonlib
schema cannot contain dynamic structures, i.e. sequences and maps are not allowed.
The idea being that dynamic information breaks the logic of the monitoring: it breaks the link between the source of information (source_id
) and the information itself.
In order to preserve the structure of the opmon
information and to publish dynamic information, a module needs to publish sub-component monitoring information and attach it to its parent.
This is done creating a generic InfoCollector
object that can be populated with a static schema content and then adding the InfoCollector
object to the parent.
Pseudo code is:
Nested::example::get_info(opmonlib::InfoCollector& ci, int level)
{
parentinfo::Info par_info;
par_info.counter = ...
ci.add( par_info );
opmonlib::InfoCollector tmp_ic;
daughterinfo::Info info;
info.daughter_counter = ...
tmp_ic.add( info );
ci.add( "daughter_name", tmp_ic );
}
opmon
blocks.
The first of type parentinfo::Info
associated to a source_id
decided by upper level code, let's assume it's going to be "parent.id"
.
The second block will be of type daughterinfo::Info
and its source_id
will be "parent.id.daughter_name"
.
Of course, any number of InfoCollector
can be attached to a parent, effectively turning this procedure into having a dynamic structure.
Examples of this procedure can be seen in dfmodule
, in the way the DFO publishes information related to the dfapplications: DFO side and Daughter side.
The links point to the code itself, here are the important parts:
// DFO side: the parent that contains a number of dynamic subcomponents
// in a map<string, TriggerRecordBuilderData> called m_dataflow_availability
void DataFlowOrchestrator::get_info(opmonlib::InfoCollector& ci, int level) {
for (auto& [name, app] : m_dataflow_availability) {
opmonlib::InfoCollector tmp_ic;
app.get_info(tmp_ic, level);
ci.add(name, tmp_ic);
}
}
// daughter side
void TriggerRecordBuilderData::get_info(opmonlib::InfoCollector& ci, int /*level*/) {
// daughter schema-generated object
dfapplicationinfo::Info info;
info.completed_trigger_records = m_complete_counter.exchange(0);
info.waiting_time = m_complete_microsecond.exchange(0);
info.min_completion_time = m_min_complete_time.exchange(std::numeric_limits<int64_t>::max());
info.max_completion_time = m_max_complete_time.exchange(0);
// fill metrics for pending TDs
info.min_time_since_assignment = std::numeric_limits<decltype(info.min_time_since_assignment)>::max();
info.max_time_since_assignment = 0;
info.total_time_since_assignment = 0;
ci.add(info);
}
Testing¶
The configuration of opmonlib
is currently managed through the environment variables: DUNEDAQ_OPMON_INTERVAL
and DUNEDAQ_OPMON_LEVEL
. These can be seen further in Application.cpp
:
setenv("DUNEDAQ_OPMON_INTERVAL", "10",0);
setenv("DUNEDAQ_OPMON_LEVEL", "1",0);
DUNEDAQ_OPMON_INTERVAL
sets the interval in seconds between each instance of calling get_info
(currently set to 10 seconds), and DUNEDAQ_OPMON_LEVEL
allows the user to define the level for get_info
(currently set to 1).
Note: To disable operational monitoring set the interval to 0 seconds.
Last git commit to the markdown source of this page:
Author: Marco Roda
Date: Fri Mar 24 15:44:15 2023 +0000
If you see a problem with the documentation on this page, please file an Issue at https://github.com/DUNE-DAQ/opmonlib/issues