Mesos → Completed Tasks i Sandbox
In stdout file, i can see killTask signal:
Received killTask for task sources.b4e2c8e6-5b42-11e7-aec0-024227901b13
Full snap of stdout file is as follows. You can see even after receiving killTask
signal my process is still running. i.e. my process doesn't terminates itself.
2017-06-27 14:16:08,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:16:18,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:16:28,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:16:38,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:16:48,337 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:16:58,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:17:08,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:17:18,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:17:28,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:17:38,334 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:17:48,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:17:58,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:18:08,334 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:18:18,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:18:28,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:18:38,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:18:48,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:18:58,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:19:08,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:19:18,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:19:28,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:19:38,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:19:48,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:19:58,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:20:08,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:20:18,334 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:20:28,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:20:38,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:20:48,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far Received killTask for task sources.b4e2c8e6-5b42-11e7-aec0-024227901b13 2017-06-27 14:20:58,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far
Full Snap of stderr file is as follows:
I0627 19:42:51.959991 7613 fetcher.cpp:533] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"executable":false,"extract":true,"value":"file:\/\/\/etc\/docker.tar.gz"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0\/frameworks\/0e528b66-37aa-4d7a-933e-4638aabf494a-0000\/executors\/sources.b4e2c8e6-5b42-11e7-aec0-024227901b13\/runs\/219c102b-28ae-41d5-b98f-11829315119e"} I0627 19:42:51.963241 7613 fetcher.cpp:444] Fetching URI 'file:///etc/docker.tar.gz' I0627 19:42:51.963279 7613 fetcher.cpp:285] Fetching directly into the sandbox directory I0627 19:42:51.963295 7613 fetcher.cpp:222] Fetching URI 'file:///etc/docker.tar.gz' I0627 19:42:51.964923 7613 fetcher.cpp:207] Copied resource '/etc/docker.tar.gz' to '/var/lib/mesos/slaves/632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.b4e2c8e6-5b42-11e7-aec0-024227901b13/runs/219c102b-28ae-41d5-b98f-11829315119e/docker.tar.gz' I0627 19:42:52.070482 7613 fetcher.cpp:123] Extracted '/var/lib/mesos/slaves/632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.b4e2c8e6-5b42-11e7-aec0-024227901b13/runs/219c102b-28ae-41d5-b98f-11829315119e/docker.tar.gz' into '/var/lib/mesos/slaves/632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.b4e2c8e6-5b42-11e7-aec0-024227901b13/runs/219c102b-28ae-41d5-b98f-11829315119e' I0627 19:42:52.070533 7613 fetcher.cpp:582] Fetched 'file:///etc/docker.tar.gz' to '/var/lib/mesos/slaves/632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.b4e2c8e6-5b42-11e7-aec0-024227901b13/runs/219c102b-28ae-41d5-b98f-11829315119e/docker.tar.gz' I0627 19:42:56.096325 7643 exec.cpp:162] Version: 1.3.0 I0627 19:42:56.101958 7647 exec.cpp:237] Executor registered on agent 632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0 WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 221 100 138 100 83 8657 5207 --:--:-- --:--:-- --:--:-- 9200 E0627 19:51:03.219312 7652 process.cpp:951] Failed to accept socket: future discarded
Messages Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. and Failed to accept socket: future discarded seems to be culprit which are killing my container.
My question is who is killing my container after 5 to 10 minutes again and again?
I also updated the /etc/default/grub
file with
GRUB_CMDLINE_LINUX_DEFAULT="cgroup_enable=memory swapaccount=1"
and rebooted my system but no progress.
Any insights on this issue.
My Ubuntu VMWare Configuration is as:
- Cores assigned 3
- memory : 6GB
- HDD: 32 GB
- I am running only one container still it exits after few minutes.
[EDIT: Adding stderr file contents from mesos UI at: /var/lib/mesos/slaves/29df799b-4797-41df-a005-465f211d286b-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000 executors/sources.a634642c-5bbc-11e7-ba8b-024239f32c24/runs/1bda209c-c2b8-4bb5-a41b-26361e00a284
]
Adding stderr file content of another job.
I0628 10:15:45.951104 4357 fetcher.cpp:533] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/29df799b-4797-41df-a005-465f211d286b-S0","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"executable":false,"extract":true,"value":"file:\/\/\/etc\/docker.tar.gz"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/29df799b-4797-41df-a005-465f211d286b-S0\/frameworks\/0e528b66-37aa-4d7a-933e-4638aabf494a-0000\/executors\/sources.a634642c-5bbc-11e7-ba8b-024239f32c24\/runs\/1bda209c-c2b8-4bb5-a41b-26361e00a284"} I0628 10:15:45.953835 4357 fetcher.cpp:444] Fetching URI 'file:///etc/docker.tar.gz' I0628 10:15:45.953881 4357 fetcher.cpp:285] Fetching directly into the sandbox directory I0628 10:15:45.953974 4357 fetcher.cpp:222] Fetching URI 'file:///etc/docker.tar.gz' I0628 10:15:45.956663 4357 fetcher.cpp:207] Copied resource '/etc/docker.tar.gz' to '/var/lib/mesos/slaves/29df799b-4797-41df-a005-465f211d286b-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.a634642c-5bbc-11e7-ba8b-024239f32c24/runs/1bda209c-c2b8-4bb5-a41b-26361e00a284/docker.tar.gz' I0628 10:15:46.061069 4357 fetcher.cpp:123] Extracted '/var/lib/mesos/slaves/29df799b-4797-41df-a005-465f211d286b-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.a634642c-5bbc-11e7-ba8b-024239f32c24/runs/1bda209c-c2b8-4bb5-a41b-26361e00a284/docker.tar.gz' into '/var/lib/mesos/slaves/29df799b-4797-41df-a005-465f211d286b-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.a634642c-5bbc-11e7-ba8b-024239f32c24/runs/1bda209c-c2b8-4bb5-a41b-26361e00a284' I0628 10:15:46.061148 4357 fetcher.cpp:582] Fetched 'file:///etc/docker.tar.gz' to '/var/lib/mesos/slaves/29df799b-4797-41df-a005-465f211d286b-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.a634642c-5bbc-11e7-ba8b-024239f32c24/runs/1bda209c-c2b8-4bb5-a41b-26361e00a284/docker.tar.gz' I0628 10:15:49.898803 4389 exec.cpp:162] Version: 1.3.0 I0628 10:15:49.903390 4390 exec.cpp:237] Executor registered on agent 29df799b-4797-41df-a005-465f211d286b-S0 WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 221 100 138 100 83 5385 3239 --:--:-- --:--:-- --:--:-- 11500 W0628 10:15:49.903390 4389 logging.cpp:91] RAW: Received signal SIGTERM from process 3287 of user 0; exiting
No new logs created in /var/lib/mesos-master.ERROR
file today
Content of /var/log/mesos-master.WARNING
file:
Log file created at: 2017/06/28 10:04:56 Running on machine: ubuntu Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg W0628 10:04:56.387049 3193 authenticator.cpp:512] No credentials provided, authentication requests will be refused W0628 10:14:56.617103 3221 master.cpp:2011] Agent 632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0 (ubuntu) did not re-register within 10mins after master failover; marking it unreachable
Content of /var/log/mesos-slave.WARNING
file is same as that if mesos-slave.ERROR
file.
Content of /var/log/mesos-slave.ERROR
file:
Log file created at: 2017/06/28 10:05:00 Running on machine: ubuntu Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg E0628 10:05:00.712286 3287 shell.hpp:107] Command 'hadoop version 2>&1' failed; this is the output: sh: 1: hadoop: not found E0628 10:24:45.502921 3326 slave.cpp:4496] Failed to update resources for container 1bda209c-c2b8-4bb5-a41b-26361e00a284 of executor 'sources.a634642c-5bbc-11e7-ba8b-024239f32c24' running task sources.a634642c-5bbc-11e7-ba8b-024239f32c24 on status update for terminal task, destroying container: Failed to determine cgroup for the 'cpu' subsystem: Failed to read /proc/4469/cgroup: Failed to open file: No such file or directory E0628 10:33:45.789072 3327 slave.cpp:4496] Failed to update resources for container 858170ce-0775-48be-8c85-3a1dbf320569 of executor 'sources.e7e069ed-5bbd-11e7-ba8b-024239f32c24' running task sources.e7e069ed-5bbd-11e7-ba8b-024239f32c24 on status update for terminal task, destroying container: Failed to determine cgroup for the 'cpu' subsystem: Failed to read /proc/5215/cgroup: Failed to open file: No such file or directory
I observed that the message:
Failed to read /proc/5215/cgroup: Failed to open file: No such file or directory
comes only when container/task get killed. Whereas these files exists for current running containers. Thanks.
Best Answer
It seems that marathon rely on user for implementation of health check. i.e If we are providing health check in app config, we have to implement it. I removed all the health check that I provided in application configuration. After that marathon is showing app health as unknown but now marathon(specifically mesos-slave) does not kills the task.