Linux – Systemd exits from bash scripts that execute commands that result in failure, instead of continuing

bashcoreoslinuxsystemd

I'm trying to get a script that pushes up my system log on CoreOS to logentries. To compensate for the fact that instances don't have internet connectivity right away when being spun up on AWS I've stuck the command in a while loop.

Running the script from the command line the while loop works just fine. But when systemd runs it the script it immediately exits when the netcat times out so it never has a chance to try again.

Is there a way to get systemd to be less aggressive about exiting the script?

systemd output, never gets to "sleeping netcat"

Jul 23 22:26:21 core-01 systemd[1]: Starting Push journal logs to logentries.com...
Jul 23 22:26:21 core-01 systemd[1]: Started Push journal logs to logentries.com.
Jul 23 22:26:21 core-01 bash[880]: trying netcat
Jul 23 22:26:31 core-01 bash[880]: Ncat: Connection timed out.

journal2logentries.sh

#!/usr/bin/env bash
token=logentriestoken
while true
do
  echo 'trying netcat'
  journalctl -o short -f | awk -v token=$token '{ print token, $0; fflush(); }' | ncat --ssl --ssl-verify data.logentries.com 20000
  echo 'sleeping netcat'
  sleep 30s
done

logentries.service

[Unit] 
Description=Push journal logs to logentries.com 
After=systemd-journald.service
After=systemd-networkd.service

[Service]
Restart=always
ExecStart=/bin/bash /home/core/journal2logentries.sh

[Install]
WantedBy=multi-user.target

Update:

It seems that the real issue is that when netcat dies systemd things that the /bin/sh process is still running. Note: url is intentionally incorrect for testing

logentries.service - Push journal logs to logentries.com
   Loaded: loaded (/etc/systemd/system/logentries.service; disabled)
   Active: active (running) since Mon 2014-07-28 17:12:04 UTC; 1min 48s ago
 Main PID: 16305 (sh)
   CGroup: /system.slice/logentries.service
           ├─16305 /bin/sh -c journalctl -o short -f | awk -v token=token_here '{ print token, $0; fflush(); }' | ncat --ssl --ssl-verify -vv ogentries.com 20000
           ├─16306 journalctl -o short -f
           └─16307 awk -v token=80b4b3b6-1315-4b76-ac69-f530c1dec47f { print token, $0; fflush(); }

Jul 28 17:12:04 ip-172-31-19-155.us-west-2.compute.internal systemd[1]: logentries.service holdoff time over, scheduling restart.
Jul 28 17:12:04 ip-172-31-19-155.us-west-2.compute.internal systemd[1]: Stopping Push journal logs to logentries.com...
Jul 28 17:12:04 ip-172-31-19-155.us-west-2.compute.internal systemd[1]: Starting Push journal logs to logentries.com...
Jul 28 17:12:04 ip-172-31-19-155.us-west-2.compute.internal systemd[1]: Started Push journal logs to logentries.com.
Jul 28 17:12:04 ip-172-31-19-155.us-west-2.compute.internal sh[16305]: Ncat: Version 6.40 ( http://nmap.org/ncat )
Jul 28 17:12:04 ip-172-31-19-155.us-west-2.compute.internal sh[16305]: Ncat: Could not resolve hostname "ogentries.com": Name or service not known. QUITTING.

Best Answer

Switched from pipes to process substitution.

http://paraf.in/abs-guide/process-sub.html

https://stackoverflow.com/a/18360260/136408

Here is the unit file I came up with:

logentries.service

[Unit]
Description=Push journal logs to logentries.com
After=systemd-journald.service
After=systemd-networkd.service

[Service]
Restart=always
RestartSec=30s
ExecStart=/bin/bash -c "ncat --ssl --ssl-verify data.logentries.com 20000 < <(awk -v token=token_here '{ print token, $0; fflush(); }' < <(journalctl -o short -f))"

[Install]
WantedBy=multi-user.target
Related Topic