Ceph osd down and rgw Initialization timeout, failed to initialize after reboot

ceph

Centos7.2, Ceph with 3 OSD, 1 MON running on a same node. radosgw and all the daemons are running on the same node, and everything was working fine. After reboot the server, all osd could not communicate (looks like) and the radosgw does not work properly, it's log says:

2016-03-09 17:03:30.916678 7fc71bbce880  0 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403), process radosgw, pid 24181
2016-03-09 17:08:30.919245 7fc712da8700 -1 Initialization timeout, failed to initialize

ceph health shows:

HEALTH_WARN 1760 pgs stale; 1760 pgs stuck stale; too many PGs per OSD (1760 > max 300); 2/2 in osds are down

and ceph osd tree give:

ID WEIGHT  TYPE NAME               UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 2.01999 root default
-2 1.01999     host app112
 0 1.00000         osd.0              down  1.00000          1.00000
 1 0.01999         osd.1              down        0          1.00000
-3 1.00000     host node146
 2 1.00000         osd.2              down  1.00000          1.00000

and service ceph status results:

=== mon.app112 ===
mon.app112: running {"version":"0.94.6"}
=== osd.0 ===
osd.0: running {"version":"0.94.6"}
=== osd.1 ===
osd.1: running {"version":"0.94.6"}
=== osd.2 ===
osd.2: running {"version":"0.94.6"}
=== osd.0 ===
osd.0: running {"version":"0.94.6"}
=== osd.1 ===
osd.1: running {"version":"0.94.6"}
=== osd.2 ===
osd.2: running {"version":"0.94.6"}

and this is service radosgw status:

Redirecting to /bin/systemctl status  radosgw.service
● ceph-radosgw.service - LSB: radosgw RESTful rados gateway
   Loaded: loaded (/etc/rc.d/init.d/ceph-radosgw)
   Active: active (exited) since Wed 2016-03-09 17:03:30 CST; 1 day 23h ago
     Docs: man:systemd-sysv-generator(8)
  Process: 24134 ExecStop=/etc/rc.d/init.d/ceph-radosgw stop (code=exited, status=0/SUCCESS)
  Process: 2890 ExecReload=/etc/rc.d/init.d/ceph-radosgw reload (code=exited, status=0/SUCCESS)
  Process: 24153 ExecStart=/etc/rc.d/init.d/ceph-radosgw start (code=exited, status=0/SUCCESS)

Seeing this, I have tried sudo /etc/init.d/ceph -a start osd.1 and stop for a couple of times, but the result is the same as above.

sudo /etc/init.d/ceph -a stop osd.1
=== osd.1 ===
Stopping Ceph osd.1 on open-kvm-app92...kill 12688...kill 12688...done

sudo /etc/init.d/ceph -a start osd.1
=== osd.1 ===
create-or-move updated item name 'osd.1' weight 0.02 at location {host=open-kvm-app92,root=default} to crush map
Starting Ceph osd.1 on open-kvm-app92...
Running as unit ceph-osd.1.1457684205.040980737.service.

Please help. thanks

EDIT:
it seems like mon cannot talk to osd. but both daemons are running ok. the osd log shows:

2016-03-11 17:35:21.649712 7f003c633700  5 osd.0 234 tick
2016-03-11 17:35:22.649982 7f003c633700  5 osd.0 234 tick
2016-03-11 17:35:23.650262 7f003c633700  5 osd.0 234 tick
2016-03-11 17:35:24.650538 7f003c633700  5 osd.0 234 tick
2016-03-11 17:35:25.650807 7f003c633700  5 osd.0 234 tick
2016-03-11 17:35:25.779693 7f0024c96700  5 osd.0 234 heartbeat: osd_stat(6741 MB used, 9119 MB avail, 15861 MB total, peers []/[] op hist [])
2016-03-11 17:35:26.651059 7f003c633700  5 osd.0 234 tick
2016-03-11 17:35:27.651314 7f003c633700  5 osd.0 234 tick
2016-03-11 17:35:28.080165 7f0024c96700  5 osd.0 234 heartbeat: osd_stat(6741 MB used, 9119 MB avail, 15861 MB total, peers []/[] op hist [])

Best Answer

I did eventually work out what was wrong. I had to manually change 'type host' to 'type osd' in our crushmap, which is different from Spongman's suggestion.

after booting rgw, I find that the owner of radosgw process is "root", not "ceph". command "ceph -s" also show that "100.000% pgs not active".

I search the clue "100.000% pgs not active", the post "https://www.cnblogs.com/boshen-hzb/p/13305560.html" tell how to solve it - change 'type host' to 'type osd' , as result, "ceph -s" show "HEALTH_OK" and the owner of radosgw process become "ceph", and rgw web service(7480) is listening.

the owner of radosgw process is root

Related Solutions

Ceph OSD always ‘down’ in Ubuntu 14.04.1

I experienced the same issue in very much the same environment. I finally tracked down the problem to a messed-up OSD UUID. What gave it away was the following line in the MON log (not the OSD log!):

... mon.minion-001@0(leader).osd e75 preprocess_boot from osd.0 10.208.66.2:6800/3427 clashes with existing osd: different fsid (ours: 71b33e7f-b464-4ba9-96b3-8c814921fea2 ; theirs: 5401be6f-b4ff-42ef-8531-78ee73772d5b)

I resolved the problem by first manually removing the OSD, destroying its file system and manually re-creating it from scratch. How the problem came into existence is something I will subsequently have to track down.

Given the fact that I used puppet to set up the OSDs and the reason for it to mess up is probably related to something particular to my environment means that the issue you are experiencing is likely to be a different one, but maybe you can check your MON log anyway. You will have to enable debugging on the MON, though, by stating something like this in ceph.conf:

[mon]
        debug mon = 9

The message in question is logged at level 7, so this gives you some more details without making everything terribly chatty.

@LoicDachary: wouldn't it make sense to log this error/warning message at level 0? I would certainly have spotted this issue earlier had it been logged right away.

Ceph e5 handle_auth_request failed to assign global_id after a host outage

I've finally sorted this out but the documentation regarding this is fairly obscure so I'll answer my own question. It appears the host which went down had also filled up its disk, which is why it was behaving differently to the other two hosts and why its mon wasn't starting up. I solved that by clearing old logs and unnecessary packages. That then meant the three hosts behaved identically because all three mons could start up.

To troubleshoot the cluster, I found the easiest way to start is to get the mon_status of each monitor. I use cephadm, so the commands below are with Docker containers. In the "normal" setup, you'd instead do sudo ceph tell mon.s64-ceph mon_status.

ceph --admin-daemon /run/ceph/9ea4d206-baec-11ea-b970-2165cf493db2/ceph-mon.<mon_name>.asok mon_status

That will give you something like:

{
"name": "s64-ceph",
"rank": 0,
"state": "leader",
"election_epoch": 25568,
"quorum": [
    0,
    1
],
"quorum_age": 17,
"features": {
    "required_con": "2449958747315978244",
    "required_mon": [
        "kraken",
        "luminous",
        "mimic",
        "osdmap-prune",
        "nautilus",
        "octopus"
    ],
    "quorum_con": "4540138292836696063",
    "quorum_mon": [
        "kraken",
        "luminous",
        "mimic",
        "osdmap-prune",
        "nautilus",
        "octopus"
    ]
},
"outside_quorum": [],
"extra_probe_peers": [],
"sync_provider": [],
"monmap": {
    "epoch": 5,
    "fsid": "9ea4d206-baec-11ea-b970-2165cf493db2",
    "modified": "2020-07-15T12:13:10.390355Z",
    "created": "2020-06-30T16:15:22.596364Z",
    "min_mon_release": 15,
    "min_mon_release_name": "octopus",
    "features": {
        "persistent": [
            "kraken",
            "luminous",
            "mimic",
            "osdmap-prune",
            "nautilus",
            "octopus"
        ],
        "optional": []
    },
    "mons": [
        {
            "rank": 0,
            "name": "s64-ceph",
            "public_addrs": {
                "addrvec": [
                    {
                        "type": "v2",
                        "addr": "10.2.64.2:3300",
                        "nonce": 0
                    },
                    {
                        "type": "v1",
                        "addr": "10.2.64.2:6789",
                        "nonce": 0
                    }
                ]
            },
            "addr": "10.2.64.2:6789/0",
            "public_addr": "10.2.64.2:6789/0",
            "priority": 0,
            "weight": 0
        },
        {
            "rank": 1,
            "name": "s63-ceph",
            "public_addrs": {
                "addrvec": [
                    {
                        "type": "v2",
                        "addr": "10.2.63.2:3300",
                        "nonce": 0
                    },
                    {
                        "type": "v1",
                        "addr": "10.2.63.2:6789",
                        "nonce": 0
                    }
                ]
            },
            "addr": "10.2.63.2:6789/0",
            "public_addr": "10.2.63.2:6789/0",
            "priority": 0,
            "weight": 0
        },
        {
            "rank": 2,
            "name": "s65-ceph",
            "public_addrs": {
                "addrvec": [
                    {
                        "type": "v2",
                        "addr": "10.2.65.2:3300",
                        "nonce": 0
                    },
                    {
                        "type": "v1",
                        "addr": "10.2.65.2:6789",
                        "nonce": 0
                    }
                ]
            },
            "addr": "10.2.65.2:6789/0",
            "public_addr": "10.2.65.2:6789/0",
            "priority": 0,
            "weight": 0
        }
    ]
},
"feature_map": {
    "mon": [
        {
            "features": "0x3f01cfb8ffadffff",
            "release": "luminous",
            "num": 1
        }
    ],
    "client": [
        {
            "features": "0x27018fb86aa42ada",
            "release": "jewel",
            "num": 1
        }
    ]
}

}

If you look at the quorum field, it only lists two out of three monitors as in the quorum. This is because s65-ceph was the one whose disk had filled up and whose mon wouldn't start up. When you do get the third host's mon up, it will show all three monitors are in the quorum.

Ordinarily, Ceph should be able to run (albeit not in a healthy state) even with only 2/3 monitors up because 2/3 is a majority, meaning they'd be able to form quorum. However, this was not the case here. Examine the journal on each host, and at least in my case they were calling for elections very frequently (you'll see lines containing "calling for election"). They were calling elections so frequently (about every 5-10 seconds) so they were switching monitors before the cluster was available to users again, and this is why the cluster always appeared to be down.

When troubleshooting many problems, I keep Glances open and I noticed very high RAM utilisation, and network and disk read/write spikes were occurring when the mons had done an election, which made me think the frequent monitor switching was causing the high IO and paging was making the IO problem worse. I found one blog post which seemed to support this.

I can't add more RAM to any of the hosts to test that but I found that if a monitor is being very slow, the other monitors will call for elections. In my case the HDDs I use aren't fast enough for constant monitor switching (read: frequent bursts of random read/writes), and that meant if one monitor had just been elected leader, it would write to its HDD for a few seconds, but it would be extremely unresponsive while it's doing that. That meant the other monitors would call for elections, and another monitor would face the same problem. The cycle would continue on and on like this, in a sort of positive feedback.

I eventually found there's a parameter called mon_lease, which by default is set to 5.0 seconds. That controls how long the other monitors will wait for a given monitor to respond before calling for elections again. 5 seconds is the default because Ceph is usually run on somewhat fast servers, but I know my cluster runs much slower because I'm using three very old recycled laptops as my cluster. I set my mon_lease time to 30 s with the command below so this frequent switching problem would go away, and I'm also not running much software atop Ceph so I'm not concerned about read/writes timing out if there is a mon switch. After changing mon_lease, wait a few minutes and THEN check your journal logs. You should find none of the hosts are doing the constant monitor switching. Make sure to check your cluster is working as expected, and ideally reboot all Ceph hosts just to make sure everything will work on the next boot.

ceph --admin-daemon /run/ceph/9ea4d206-baec-11ea-b970-2165cf493db2/ceph-mon.s64-ceph.asok config set mon_lease 30.0

I hope my answer helps someone avoid experiencing the same misfortunes with Ceph and put a comment if you need clarification.

Best Answer

Related Solutions

Ceph OSD always ‘down’ in Ubuntu 14.04.1

Ceph e5 handle_auth_request failed to assign global_id after a host outage

Related Topic