


action #157615

Updated by okurz 4 months ago


     2024-03-20T16:23:32Z E! [agent] Error killing process: os: process already finished 
     2024-03-20T16:23:32Z E! [agent] Error killing process: os: process already finished 
     2024-03-20T16:23:32Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/ --state masked --exclude ""':  
     2024-03-20T16:23:32Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/ --state failed --exclude ""':  
     2024-03-20T16:23:32Z E! [telegraf] Error running agent: input plugins recorded 2 errors 
     telegraf errors 
     2024-03-20T16:23:31Z E! [inputs.x509_cert] Error in plugin: cannot get SSL cert '': dial tcp: lookup i/o timeout 
     2024-03-20T16:23:35Z E! [telegraf] Error running agent: input plugins recorded 1 errors 
     telegraf errors 
 ++ grep ' E! ' salt_post_deploy_checks.log 
     2024-03-20T16:23:32Z E! [agent] Error killing process: os: process already finished 
     2024-03-20T16:23:32Z E! [agent] Error killing process: os: process already finished 
     2024-03-20T16:23:32Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/ --state masked --exclude ""':  
     2024-03-20T16:23:32Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/ --state failed --exclude ""':  
     2024-03-20T16:23:32Z E! [telegraf] Error running agent: input plugins recorded 2 errors 
     2024-03-20T16:23:31Z E! [inputs.x509_cert] Error in plugin: cannot get SSL cert '': dial tcp: lookup i/o timeout 
     2024-03-20T16:23:35Z E! [telegraf] Error running agent: input plugins recorded 1 errors 

 ## Suggestions 
 1. Understand why and where `` times out.    It could be the general telegraf-timeout in the pipeline, in the execution of the script itself (from telegraf.conf) or another place. Adjust the timeout to match expected runtime or fix the script to complete faster -> schort-server only has 1 VM core, consider configuring the hypervisor to use at least 2 cores 
 2. "Error killing process: os: process already finished" might just be a consequence of the above 
 3. "Error in plugin: cannot get SSL cert '': dial tcp: lookup i/o timeout" possibly to be covered with some retrying? Investigate what the real error message means, ask (or if that does not work invest in coal-powered ) or something 
 4. If we cannot solve these problems, consider excluding them from CI execution to avoid false-positives. Consider the impact of doing this first however!
