*: rewrite snapshot merging and sending for v3 by xiang90 · Pull Request #3970 · etcd-io/etcd

xiang90 · 2015-12-08T17:37:17Z

I am rewriting the snapshot merging and sending logic for v3 (#3666).

The previous approach is complicated and introduced a few unnecessary components (snapshot_store in both etcdserver and transport). And the coordination between components are error-prone. We have fixed a few bugs but there are still deadlocks around. Moreover the current snapshot creating logic is wrong. It does not stop the apply loop while getting the v3 snapshot.

The previous complexity comes from the delay of the snapshot merging at the final transportation phase. I try to move the merging to the very beginning at etcdserver and it makes thing a lot simper. It is mainly because etcdserver has the full control of the workflow and does not require any coordination.

This pull requests try to simplify the logic. Currently it is work in progress. But the approach works and is significantly simpler.

The snapshot sending logic should be as simple as:

raft requires snapshot and sends out a MsgSnap.
etcdserver receives the MsgSnap.
etcdserver gets current snapshot of v2 and v3 and merges them into MsgSnap.
etcdserver asks transportation layer to send merged MsgSnap out.

gyuho · 2015-12-08T19:04:32Z

where do we get the size of buffer 128? If arbitrary, we should briefly comment.

gyuho · 2015-12-08T19:05:54Z

New code looks cleaner. Look forward to seeing more updates. Left some comments.

xiang90 · 2015-12-09T02:24:20Z

@gyuho I cleaned up a bunch of things. Now it is reviewable.

gyuho · 2015-12-09T03:01:03Z

Thanks, I will have a look

gyuho · 2015-12-09T06:14:08Z

Does this mean that it forwards to leader?

this (change To to 0) means the message will be dropped. see line 823 too. probably we should doc it.

we drop the original snapshot message since etcdserver will send it later when it gets merged snapshot.

Oh got it. Yeah I couldn't find documentation on that.

It is doced somewhere in rafthttp. :(

gyuho · 2015-12-09T06:27:07Z

@xiang90 I read all the change and makes sense to me, but it's a big change. Do you plan any more updates on this PR? Otherwise, I will run the actual cluster with this code tomorrow to double check.

xiang90 · 2015-12-09T06:28:23Z

Otherwise, I will run the actual cluster with this code tomorrow to double check.

It will fail since there is an issue with bolt db. I will update you with details tomorrow.

xiang90 · 2015-12-09T06:28:44Z

Do you plan any more updates on this PR?

No. I do not.

gyuho · 2015-12-09T06:29:21Z

Got it. What was the issue with boltdb? Code-wise, LGTM.

xiang90 · 2015-12-09T06:31:50Z

@gyuho I will update with you more details about boltdb tomorrow or the day after tomorrow.

gyuho · 2015-12-09T06:34:08Z

Got it. Thanks! I will keep reviewing this to understand more.

xiang90 · 2015-12-09T17:30:21Z

@gyuho The issue with boltdb is that its reader might block writer. So when sending out a large snapshot (open a long running read txn in bolt), the writes might get blocked. This is what we do not want to see. We have a pending pr at boltdb/bolt#432

xiang90 · 2015-12-09T17:31:27Z

Other than that feel free to test it out.

What I have done is to set a 3 nodes cluster with v3 enabled. Then I use benchmark tool to generate enough puts to trigger snapshot while killing nodes to force snapshot sending.

gyuho · 2015-12-09T17:42:24Z

Cool, I will test it out as well.

gyuho · 2015-12-09T23:06:08Z

I tested this with benchmark tool and it works well. LGTM.
I will do more testings tonight. Thanks,

xiang90 · 2015-12-09T23:11:45Z

@gyuho Did you trigger a snapshot sending?

gyuho · 2015-12-09T23:14:06Z

Yes

15:13:47 etcd3 | 2015-12-09 15:13:47.102151 I | etcdserver: start to snapshot (applied: 20069, lastsnap: 10066)
15:13:47 etcd2 | 2015-12-09 15:13:47.182511 I | etcdserver: start to snapshot (applied: 20069, lastsnap: 10066)
15:13:47 etcd1 | 2015-12-09 15:13:47.182548 I | etcdserver: start to snapshot (applied: 20069, lastsnap: 10067)

xiang90 · 2015-12-09T23:14:48Z

@gyuho Great. I will keep this open until tomorrow night. You can try to play with it more.

gyuho · 2015-12-09T23:16:42Z

Sure, thanks,

On Wed, Dec 9, 2015 at 3:15 PM, Xiang Li notifications@github.com wrote:

@gyuho https://github.com/gyuho Great. I will keep this open until
tomorrow night. You can try to play with it more.

—
Reply to this email directly or view it on GitHub
#3970 (comment).

Sincerely,
Gyu-Ho Lee

xiang90 · 2015-12-09T23:19:58Z

@gyuho

BTW

15:13:47 etcd3 | 2015-12-09 15:13:47.102151 I | etcdserver: start to snapshot (applied: 20069, lastsnap: 10066)

This is not a snapshot sending. This is just etcdserver doing a snapshot. You have to kill a etcd member, force etcd leader to compact its log. Then when the killed member comes up, the leader does not have required log to send to it (due to compaction). So the leader will have to send a snapshot to that member. You will see log like recovering from incoming snapshot at the other side.

gyuho · 2015-12-09T23:22:22Z

@xiang90 +1. Thanks!
I will try to simulate that snapshot as you describe tonight.

gyuho · 2015-12-10T15:42:15Z

@xiang90 I tested that scenario.

set up 3 member cluster
SIGTERM to one member
Put 50k keys while the killed member gets behind
Put the killed-member back to cluster, and see if it triggers log compaction

And I checked that it generates log compaction. Please go over this log.
And let me know if you have any question.

5/12/10 07:34:17 Set up the default etcd v3 cluster.
2015/12/10 07:34:17 Start all of those 3 members in default cluster group.
2015/12/10 07:34:17 Sleeping 50 seconds...
07:34:17 etcd1 | Starting etcd1
07:34:17 etcd3 | Starting etcd3
07:34:17 etcd2 | Starting etcd2
07:34:17 etcd1 | 2015-12-10 07:34:17.982748 I | etcdmain: etcd Version: 2.3.0-alpha.0+git
07:34:17 etcd1 | 2015-12-10 07:34:17.982785 I | etcdmain: Git SHA: a1b12f8
07:34:17 etcd1 | 2015-12-10 07:34:17.982788 I | etcdmain: Go Version: go1.5.2
07:34:17 etcd1 | 2015-12-10 07:34:17.982791 I | etcdmain: Go OS/Arch: linux/amd64
07:34:17 etcd1 | 2015-12-10 07:34:17.982798 I | etcdmain: setting maximum number of CPUs to 8, total number of available CPUs is 8
07:34:17 etcd1 | 2015-12-10 07:34:17.982848 I | etcdmain: listening for peers on http://127.0.0.1:12380
07:34:17 etcd1 | 2015-12-10 07:34:17.982863 I | etcdmain: listening for client requests on http://127.0.0.1:12379
07:34:17 etcd1 | 2015-12-10 07:34:17.982873 I | etcdmain: listening for client rpc on 127.0.0.1:12378
07:34:17 etcd2 | 2015-12-10 07:34:17.983040 I | etcdmain: etcd Version: 2.3.0-alpha.0+git
07:34:17 etcd2 | 2015-12-10 07:34:17.983083 I | etcdmain: Git SHA: a1b12f8
07:34:17 etcd2 | 2015-12-10 07:34:17.983088 I | etcdmain: Go Version: go1.5.2
07:34:17 etcd2 | 2015-12-10 07:34:17.983093 I | etcdmain: Go OS/Arch: linux/amd64
07:34:17 etcd2 | 2015-12-10 07:34:17.983106 I | etcdmain: setting maximum number of CPUs to 8, total number of available CPUs is 8
07:34:17 etcd2 | 2015-12-10 07:34:17.983156 I | etcdmain: listening for peers on http://127.0.0.1:22380
07:34:17 etcd2 | 2015-12-10 07:34:17.983178 I | etcdmain: listening for client requests on http://127.0.0.1:22379
07:34:17 etcd2 | 2015-12-10 07:34:17.983202 I | etcdmain: listening for client rpc on 127.0.0.1:22378
07:34:17 etcd1 | 2015-12-10 07:34:17.983194 I | etcdserver: name = infra1
07:34:17 etcd1 | 2015-12-10 07:34:17.983209 I | etcdserver: data dir = d311bcb5-db89-4db7-899a-2f9dd5d068ee.etcd
07:34:17 etcd1 | 2015-12-10 07:34:17.983216 I | etcdserver: member dir = d311bcb5-db89-4db7-899a-2f9dd5d068ee.etcd/member
07:34:17 etcd1 | 2015-12-10 07:34:17.983222 I | etcdserver: heartbeat = 100ms
07:34:17 etcd1 | 2015-12-10 07:34:17.983226 I | etcdserver: election = 1000ms
07:34:17 etcd1 | 2015-12-10 07:34:17.983231 I | etcdserver: snapshot count = 10000
07:34:17 etcd1 | 2015-12-10 07:34:17.983240 I | etcdserver: advertise client URLs = http://127.0.0.1:12379
07:34:17 etcd1 | 2015-12-10 07:34:17.983247 I | etcdserver: initial advertise peer URLs = http://127.0.0.1:12380
07:34:17 etcd1 | 2015-12-10 07:34:17.983262 I | etcdserver: initial cluster = infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380
07:34:17 etcd3 | 2015-12-10 07:34:17.983090 I | etcdmain: etcd Version: 2.3.0-alpha.0+git
07:34:17 etcd3 | 2015-12-10 07:34:17.983126 I | etcdmain: Git SHA: a1b12f8
07:34:17 etcd3 | 2015-12-10 07:34:17.983131 I | etcdmain: Go Version: go1.5.2
07:34:17 etcd3 | 2015-12-10 07:34:17.983135 I | etcdmain: Go OS/Arch: linux/amd64
07:34:17 etcd3 | 2015-12-10 07:34:17.983143 I | etcdmain: setting maximum number of CPUs to 8, total number of available CPUs is 8
07:34:17 etcd3 | 2015-12-10 07:34:17.983190 I | etcdmain: listening for peers on http://127.0.0.1:32380
07:34:17 etcd3 | 2015-12-10 07:34:17.983211 I | etcdmain: listening for client requests on http://127.0.0.1:32379
07:34:17 etcd3 | 2015-12-10 07:34:17.983228 I | etcdmain: listening for client rpc on 127.0.0.1:32378
07:34:18 etcd1 | 2015-12-10 07:34:18.167380 I | etcdserver: starting member 8211f1d0f64f3269 in cluster ef37ad9dc622a7c4
07:34:18 etcd1 | 2015-12-10 07:34:18.167444 I | raft: 8211f1d0f64f3269 became follower at term 0
07:34:18 etcd1 | 2015-12-10 07:34:18.167459 I | raft: newRaft 8211f1d0f64f3269 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
07:34:18 etcd1 | 2015-12-10 07:34:18.167467 I | raft: 8211f1d0f64f3269 became follower at term 1
07:34:18 etcd1 | 2015-12-10 07:34:18.242658 I | etcdserver: starting server... [version: 2.3.0-alpha.0+git, cluster version: to_be_decided]
07:34:18 etcd1 | 2015-12-10 07:34:18.243402 E | etcdmain: failed to notify systemd for readiness: No socket
07:34:18 etcd1 | 2015-12-10 07:34:18.243428 E | etcdmain: forgot to set Type=notify in systemd service file?
07:34:18 etcd1 | 2015-12-10 07:34:18.243599 N | etcdserver: added local member 8211f1d0f64f3269 [http://127.0.0.1:12380] to cluster ef37ad9dc622a7c4
07:34:18 etcd1 | 2015-12-10 07:34:18.243881 N | etcdserver: added member 91bc3c398fb3c146 [http://127.0.0.1:22380] to cluster ef37ad9dc622a7c4
07:34:18 etcd1 | 2015-12-10 07:34:18.244106 N | etcdserver: added member fd422379fda50e48 [http://127.0.0.1:32380] to cluster ef37ad9dc622a7c4
07:34:18 etcd3 | 2015-12-10 07:34:18.244440 I | etcdserver: name = infra3
07:34:18 etcd3 | 2015-12-10 07:34:18.244458 I | etcdserver: data dir = c1605c14-97ba-4022-bdfd-19c38438f2d2.etcd
07:34:18 etcd3 | 2015-12-10 07:34:18.244468 I | etcdserver: member dir = c1605c14-97ba-4022-bdfd-19c38438f2d2.etcd/member
07:34:18 etcd3 | 2015-12-10 07:34:18.244486 I | etcdserver: heartbeat = 100ms
07:34:18 etcd3 | 2015-12-10 07:34:18.244491 I | etcdserver: election = 1000ms
07:34:18 etcd3 | 2015-12-10 07:34:18.244496 I | etcdserver: snapshot count = 10000
07:34:18 etcd3 | 2015-12-10 07:34:18.244508 I | etcdserver: advertise client URLs = http://127.0.0.1:32379
07:34:18 etcd3 | 2015-12-10 07:34:18.244515 I | etcdserver: initial advertise peer URLs = http://127.0.0.1:32380
07:34:18 etcd3 | 2015-12-10 07:34:18.244529 I | etcdserver: initial cluster = infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380
07:34:18 etcd2 | 2015-12-10 07:34:18.243952 I | etcdserver: name = infra2
07:34:18 etcd2 | 2015-12-10 07:34:18.243973 I | etcdserver: data dir = 83b7583e-d669-4c2c-80b7-736a7ed4e632.etcd
07:34:18 etcd2 | 2015-12-10 07:34:18.243982 I | etcdserver: member dir = 83b7583e-d669-4c2c-80b7-736a7ed4e632.etcd/member
07:34:18 etcd2 | 2015-12-10 07:34:18.243991 I | etcdserver: heartbeat = 100ms
07:34:18 etcd2 | 2015-12-10 07:34:18.243998 I | etcdserver: election = 1000ms
07:34:18 etcd2 | 2015-12-10 07:34:18.244005 I | etcdserver: snapshot count = 10000
07:34:18 etcd2 | 2015-12-10 07:34:18.244018 I | etcdserver: advertise client URLs = http://127.0.0.1:22379
07:34:18 etcd2 | 2015-12-10 07:34:18.244028 I | etcdserver: initial advertise peer URLs = http://127.0.0.1:22380
07:34:18 etcd2 | 2015-12-10 07:34:18.244051 I | etcdserver: initial cluster = infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380
07:34:18 etcd2 | 2015-12-10 07:34:18.262113 I | etcdserver: starting member 91bc3c398fb3c146 in cluster ef37ad9dc622a7c4
07:34:18 etcd2 | 2015-12-10 07:34:18.262175 I | raft: 91bc3c398fb3c146 became follower at term 0
07:34:18 etcd2 | 2015-12-10 07:34:18.262188 I | raft: newRaft 91bc3c398fb3c146 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
07:34:18 etcd2 | 2015-12-10 07:34:18.262193 I | raft: 91bc3c398fb3c146 became follower at term 1
07:34:18 etcd3 | 2015-12-10 07:34:18.282296 I | etcdserver: starting member fd422379fda50e48 in cluster ef37ad9dc622a7c4
07:34:18 etcd3 | 2015-12-10 07:34:18.282354 I | raft: fd422379fda50e48 became follower at term 0
07:34:18 etcd3 | 2015-12-10 07:34:18.282366 I | raft: newRaft fd422379fda50e48 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
07:34:18 etcd3 | 2015-12-10 07:34:18.282371 I | raft: fd422379fda50e48 became follower at term 1
07:34:18 etcd1 | 2015-12-10 07:34:18.374476 I | rafthttp: the connection with 91bc3c398fb3c146 became active
07:34:18 etcd2 | 2015-12-10 07:34:18.374528 I | rafthttp: the connection with 8211f1d0f64f3269 became active
07:34:18 etcd2 | 2015-12-10 07:34:18.378833 I | etcdserver: starting server... [version: 2.3.0-alpha.0+git, cluster version: to_be_decided]
07:34:18 etcd2 | 2015-12-10 07:34:18.379260 E | etcdmain: failed to notify systemd for readiness: No socket
07:34:18 etcd2 | 2015-12-10 07:34:18.379277 E | etcdmain: forgot to set Type=notify in systemd service file?
07:34:18 etcd2 | 2015-12-10 07:34:18.379322 N | etcdserver: added member 8211f1d0f64f3269 [http://127.0.0.1:12380] to cluster ef37ad9dc622a7c4
07:34:18 etcd2 | 2015-12-10 07:34:18.379471 N | etcdserver: added local member 91bc3c398fb3c146 [http://127.0.0.1:22380] to cluster ef37ad9dc622a7c4
07:34:18 etcd2 | 2015-12-10 07:34:18.379645 N | etcdserver: added member fd422379fda50e48 [http://127.0.0.1:32380] to cluster ef37ad9dc622a7c4
07:34:18 etcd1 | 2015-12-10 07:34:18.421567 I | rafthttp: the connection with fd422379fda50e48 became active
07:34:18 etcd3 | 2015-12-10 07:34:18.421773 I | rafthttp: the connection with 8211f1d0f64f3269 became active
07:34:18 etcd3 | 2015-12-10 07:34:18.424632 I | etcdserver: starting server... [version: 2.3.0-alpha.0+git, cluster version: to_be_decided]
07:34:18 etcd3 | 2015-12-10 07:34:18.425399 E | etcdmain: failed to notify systemd for readiness: No socket
07:34:18 etcd3 | 2015-12-10 07:34:18.425414 E | etcdmain: forgot to set Type=notify in systemd service file?
07:34:18 etcd3 | 2015-12-10 07:34:18.425427 I | rafthttp: the connection with 91bc3c398fb3c146 became active
07:34:18 etcd2 | 2015-12-10 07:34:18.425708 I | rafthttp: the connection with fd422379fda50e48 became active
07:34:18 etcd3 | 2015-12-10 07:34:18.428978 N | etcdserver: added member 8211f1d0f64f3269 [http://127.0.0.1:12380] to cluster ef37ad9dc622a7c4
07:34:18 etcd3 | 2015-12-10 07:34:18.429127 N | etcdserver: added member 91bc3c398fb3c146 [http://127.0.0.1:22380] to cluster ef37ad9dc622a7c4
07:34:18 etcd3 | 2015-12-10 07:34:18.429251 N | etcdserver: added local member fd422379fda50e48 [http://127.0.0.1:32380] to cluster ef37ad9dc622a7c4
07:34:18 etcd3 | 2015-12-10 07:34:18.682611 I | raft: fd422379fda50e48 is starting a new election at term 1
07:34:18 etcd3 | 2015-12-10 07:34:18.682651 I | raft: fd422379fda50e48 became candidate at term 2
07:34:18 etcd3 | 2015-12-10 07:34:18.682659 I | raft: fd422379fda50e48 received vote from fd422379fda50e48 at term 2
07:34:18 etcd3 | 2015-12-10 07:34:18.682669 I | raft: fd422379fda50e48 [logterm: 1, index: 3] sent vote request to 8211f1d0f64f3269 at term 2
07:34:18 etcd3 | 2015-12-10 07:34:18.682678 I | raft: fd422379fda50e48 [logterm: 1, index: 3] sent vote request to 91bc3c398fb3c146 at term 2
07:34:18 etcd1 | 2015-12-10 07:34:18.701013 I | raft: 8211f1d0f64f3269 [term: 1] received a MsgVote message with higher term from fd422379fda50e48 [term: 2]
07:34:18 etcd1 | 2015-12-10 07:34:18.701049 I | raft: 8211f1d0f64f3269 became follower at term 2
07:34:18 etcd1 | 2015-12-10 07:34:18.701093 I | raft: 8211f1d0f64f3269 [logterm: 1, index: 3, vote: 0] voted for fd422379fda50e48 [logterm: 1, index: 3] at term 2
07:34:18 etcd2 | 2015-12-10 07:34:18.701181 I | raft: 91bc3c398fb3c146 [term: 1] received a MsgVote message with higher term from fd422379fda50e48 [term: 2]
07:34:18 etcd2 | 2015-12-10 07:34:18.701207 I | raft: 91bc3c398fb3c146 became follower at term 2
07:34:18 etcd2 | 2015-12-10 07:34:18.701237 I | raft: 91bc3c398fb3c146 [logterm: 1, index: 3, vote: 0] voted for fd422379fda50e48 [logterm: 1, index: 3] at term 2
07:34:18 etcd3 | 2015-12-10 07:34:18.737691 I | raft: fd422379fda50e48 received vote from 8211f1d0f64f3269 at term 2
07:34:18 etcd3 | 2015-12-10 07:34:18.737709 I | raft: fd422379fda50e48 [q:2] has received 2 votes and 0 vote rejections
07:34:18 etcd3 | 2015-12-10 07:34:18.737728 I | raft: fd422379fda50e48 became leader at term 2
07:34:18 etcd3 | 2015-12-10 07:34:18.737740 I | raft: raft.node: fd422379fda50e48 elected leader fd422379fda50e48 at term 2
07:34:18 etcd2 | 2015-12-10 07:34:18.765748 I | raft: raft.node: 91bc3c398fb3c146 elected leader fd422379fda50e48 at term 2
07:34:18 etcd1 | 2015-12-10 07:34:18.765743 I | raft: raft.node: 8211f1d0f64f3269 elected leader fd422379fda50e48 at term 2
07:34:18 etcd3 | 2015-12-10 07:34:18.821586 I | etcdserver: setting up the initial cluster version to 2.3
07:34:18 etcd3 | 2015-12-10 07:34:18.848464 I | etcdserver: published {Name:infra3 ClientURLs:[http://127.0.0.1:32379]} to cluster ef37ad9dc622a7c4
07:34:18 etcd3 | 2015-12-10 07:34:18.899583 N | etcdserver: set the initial cluster version to 2.3
07:34:18 etcd2 | 2015-12-10 07:34:18.899847 I | etcdserver: published {Name:infra2 ClientURLs:[http://127.0.0.1:22379]} to cluster ef37ad9dc622a7c4
07:34:18 etcd1 | 2015-12-10 07:34:18.940166 I | etcdserver: published {Name:infra1 ClientURLs:[http://127.0.0.1:12379]} to cluster ef37ad9dc622a7c4
07:34:18 etcd2 | 2015-12-10 07:34:18.940478 N | etcdserver: set the initial cluster version to 2.3
07:34:18 etcd1 | 2015-12-10 07:34:18.974150 N | etcdserver: set the initial cluster version to 2.3
2015/12/10 07:35:07 Try to terminate one of the member.
07:35:07 etcd2 | Terminate: etcd2
2015/12/10 07:35:07 Sleeping 15 seconds...
07:35:07 etcd2 | 2015-12-10 07:35:07.976333 N | osutil: received terminated signal, shutting down...
07:35:07 etcd2 | 2015-12-10 07:35:07.977823 E | rafthttp: failed to read 8211f1d0f64f3269 on stream Message (net/http: request canceled)
07:35:07 etcd2 | 2015-12-10 07:35:07.977886 I | rafthttp: the connection with 8211f1d0f64f3269 became inactive
07:35:07 etcd2 | Exiting etcd2
07:35:08 etcd1 | 2015-12-10 07:35:08.077439 E | rafthttp: failed to dial 91bc3c398fb3c146 on stream MsgApp v2 (dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:08 etcd1 | 2015-12-10 07:35:08.077462 I | rafthttp: the connection with 91bc3c398fb3c146 became inactive
07:35:08 etcd3 | 2015-12-10 07:35:08.078788 E | rafthttp: failed to dial 91bc3c398fb3c146 on stream MsgApp v2 (dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:08 etcd3 | 2015-12-10 07:35:08.078829 I | rafthttp: the connection with 91bc3c398fb3c146 became inactive
07:35:08 etcd3 | 2015-12-10 07:35:08.833490 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:08 etcd3 | 2015-12-10 07:35:08.833511 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:13 etcd3 | 2015-12-10 07:35:13.834380 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:13 etcd3 | 2015-12-10 07:35:13.834403 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:18 etcd1 | 2015-12-10 07:35:18.236400 W | rafthttp: the connection to peer 91bc3c398fb3c146 is unhealthy
07:35:18 etcd3 | 2015-12-10 07:35:18.425302 W | rafthttp: the connection to peer 91bc3c398fb3c146 is unhealthy
07:35:18 etcd3 | 2015-12-10 07:35:18.835141 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:18 etcd3 | 2015-12-10 07:35:18.835162 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
2015/12/10 07:35:22 Stress without that one member to trigger log compaction later
[Stress] Started with GRPC endpoint 127.0.0.1:12378
07:35:23 etcd3 | 2015-12-10 07:35:23.835669 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:23 etcd3 | 2015-12-10 07:35:23.835693 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
[Stress] Done with generating 50000 random data!
07:35:28 etcd3 | 2015-12-10 07:35:28.836607 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:28 etcd3 | 2015-12-10 07:35:28.836629 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:33 etcd3 | 2015-12-10 07:35:33.837249 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:33 etcd3 | 2015-12-10 07:35:33.837275 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:38 etcd3 | 2015-12-10 07:35:38.837925 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:38 etcd3 | 2015-12-10 07:35:38.837945 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:43 etcd3 | 2015-12-10 07:35:43.838818 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:43 etcd3 | 2015-12-10 07:35:43.838840 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:48 etcd1 | 2015-12-10 07:35:48.236527 W | rafthttp: the connection to peer 91bc3c398fb3c146 is unhealthy
07:35:48 etcd3 | 2015-12-10 07:35:48.425412 W | rafthttp: the connection to peer 91bc3c398fb3c146 is unhealthy
07:35:48 etcd3 | 2015-12-10 07:35:48.839550 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:48 etcd3 | 2015-12-10 07:35:48.839570 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:52 etcd3 | 2015-12-10 07:35:52.750365 I | etcdserver: start to snapshot (applied: 10028, lastsnap: 0)
07:35:52 etcd1 | 2015-12-10 07:35:52.803097 I | etcdserver: start to snapshot (applied: 10028, lastsnap: 0)
07:35:52 etcd3 | 2015-12-10 07:35:52.890986 I | etcdserver: saved snapshot at index 10028
07:35:52 etcd3 | 2015-12-10 07:35:52.891267 I | etcdserver: compacted raft log at 5028
07:35:52 etcd1 | 2015-12-10 07:35:52.958430 I | etcdserver: saved snapshot at index 10028
07:35:52 etcd1 | 2015-12-10 07:35:52.958627 I | etcdserver: compacted raft log at 5028
07:35:53 etcd3 | 2015-12-10 07:35:53.840440 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:53 etcd3 | 2015-12-10 07:35:53.840463 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:58 etcd3 | 2015-12-10 07:35:58.841230 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:35:58 etcd3 | 2015-12-10 07:35:58.841392 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:03 etcd3 | 2015-12-10 07:36:03.842298 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:03 etcd3 | 2015-12-10 07:36:03.842320 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:08 etcd3 | 2015-12-10 07:36:08.843086 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:08 etcd3 | 2015-12-10 07:36:08.843106 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:13 etcd3 | 2015-12-10 07:36:13.843945 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:13 etcd3 | 2015-12-10 07:36:13.843966 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:18 etcd1 | 2015-12-10 07:36:18.236636 W | rafthttp: the connection to peer 91bc3c398fb3c146 is unhealthy
07:36:18 etcd3 | 2015-12-10 07:36:18.425533 W | rafthttp: the connection to peer 91bc3c398fb3c146 is unhealthy
07:36:18 etcd3 | 2015-12-10 07:36:18.844861 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:18 etcd3 | 2015-12-10 07:36:18.844885 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:22 etcd3 | 2015-12-10 07:36:22.246826 I | etcdserver: start to snapshot (applied: 20058, lastsnap: 10028)
07:36:22 etcd1 | 2015-12-10 07:36:22.324427 I | etcdserver: start to snapshot (applied: 20058, lastsnap: 10028)
07:36:22 etcd3 | 2015-12-10 07:36:22.359878 I | etcdserver: saved snapshot at index 20058
07:36:22 etcd3 | 2015-12-10 07:36:22.360084 I | etcdserver: compacted raft log at 15058
07:36:22 etcd1 | 2015-12-10 07:36:22.504575 I | etcdserver: saved snapshot at index 20058
07:36:22 etcd1 | 2015-12-10 07:36:22.505330 I | etcdserver: compacted raft log at 15058
07:36:23 etcd3 | 2015-12-10 07:36:23.845745 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:23 etcd3 | 2015-12-10 07:36:23.845767 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:28 etcd3 | 2015-12-10 07:36:28.846382 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:28 etcd3 | 2015-12-10 07:36:28.846400 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:33 etcd3 | 2015-12-10 07:36:33.847091 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:33 etcd3 | 2015-12-10 07:36:33.847111 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:38 etcd3 | 2015-12-10 07:36:38.847994 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:38 etcd3 | 2015-12-10 07:36:38.848016 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:43 etcd3 | 2015-12-10 07:36:43.848870 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:43 etcd3 | 2015-12-10 07:36:43.848891 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:48 etcd1 | 2015-12-10 07:36:48.236788 W | rafthttp: the connection to peer 91bc3c398fb3c146 is unhealthy
07:36:48 etcd3 | 2015-12-10 07:36:48.425706 W | rafthttp: the connection to peer 91bc3c398fb3c146 is unhealthy
07:36:48 etcd3 | 2015-12-10 07:36:48.849539 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:48 etcd3 | 2015-12-10 07:36:48.849559 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:51 etcd3 | 2015-12-10 07:36:51.207239 I | etcdserver: start to snapshot (applied: 30115, lastsnap: 20058)
07:36:51 etcd1 | 2015-12-10 07:36:51.306796 I | etcdserver: start to snapshot (applied: 30116, lastsnap: 20058)
07:36:51 etcd3 | 2015-12-10 07:36:51.413803 I | etcdserver: saved snapshot at index 30115
07:36:51 etcd3 | 2015-12-10 07:36:51.414007 I | etcdserver: compacted raft log at 25115
07:36:51 etcd1 | 2015-12-10 07:36:51.513074 I | etcdserver: saved snapshot at index 30116
07:36:51 etcd1 | 2015-12-10 07:36:51.513393 I | etcdserver: compacted raft log at 25116
07:36:53 etcd3 | 2015-12-10 07:36:53.850144 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:53 etcd3 | 2015-12-10 07:36:53.850164 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:58 etcd3 | 2015-12-10 07:36:58.850808 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:36:58 etcd3 | 2015-12-10 07:36:58.850832 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:03 etcd3 | 2015-12-10 07:37:03.851610 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:03 etcd3 | 2015-12-10 07:37:03.851632 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:08 etcd3 | 2015-12-10 07:37:08.852403 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:08 etcd3 | 2015-12-10 07:37:08.852423 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:13 etcd3 | 2015-12-10 07:37:13.853205 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:13 etcd3 | 2015-12-10 07:37:13.853230 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:18 etcd1 | 2015-12-10 07:37:18.236967 W | rafthttp: the connection to peer 91bc3c398fb3c146 is unhealthy
07:37:18 etcd3 | 2015-12-10 07:37:18.425848 W | rafthttp: the connection to peer 91bc3c398fb3c146 is unhealthy
07:37:18 etcd3 | 2015-12-10 07:37:18.854029 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:18 etcd3 | 2015-12-10 07:37:18.854053 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:20 etcd3 | 2015-12-10 07:37:20.906717 I | etcdserver: start to snapshot (applied: 40116, lastsnap: 30115)
07:37:21 etcd3 | 2015-12-10 07:37:21.104956 I | etcdserver: saved snapshot at index 40116
07:37:21 etcd3 | 2015-12-10 07:37:21.105207 I | etcdserver: compacted raft log at 35116
07:37:21 etcd1 | 2015-12-10 07:37:21.127751 I | etcdserver: start to snapshot (applied: 40117, lastsnap: 30116)
07:37:21 etcd1 | 2015-12-10 07:37:21.239496 I | etcdserver: saved snapshot at index 40117
07:37:21 etcd1 | 2015-12-10 07:37:21.239664 I | etcdserver: compacted raft log at 35117
07:37:23 etcd3 | 2015-12-10 07:37:23.855085 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:23 etcd3 | 2015-12-10 07:37:23.855114 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:28 etcd3 | 2015-12-10 07:37:28.855972 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:28 etcd3 | 2015-12-10 07:37:28.855993 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:33 etcd3 | 2015-12-10 07:37:33.856778 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:33 etcd3 | 2015-12-10 07:37:33.856799 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:38 etcd3 | 2015-12-10 07:37:38.857455 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:38 etcd3 | 2015-12-10 07:37:38.857476 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:43 etcd3 | 2015-12-10 07:37:43.858233 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:43 etcd3 | 2015-12-10 07:37:43.858253 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:48 etcd1 | 2015-12-10 07:37:48.237063 W | rafthttp: the connection to peer 91bc3c398fb3c146 is unhealthy
07:37:48 etcd3 | 2015-12-10 07:37:48.426015 W | rafthttp: the connection to peer 91bc3c398fb3c146 is unhealthy
07:37:48 etcd3 | 2015-12-10 07:37:48.858925 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:48 etcd3 | 2015-12-10 07:37:48.858945 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:51 etcd3 | 2015-12-10 07:37:51.035228 I | etcdserver: start to snapshot (applied: 50164, lastsnap: 40116)
07:37:51 etcd1 | 2015-12-10 07:37:51.094139 I | etcdserver: start to snapshot (applied: 50164, lastsnap: 40117)
07:37:51 etcd3 | 2015-12-10 07:37:51.169155 I | etcdserver: saved snapshot at index 50164
07:37:51 etcd3 | 2015-12-10 07:37:51.169316 I | etcdserver: compacted raft log at 45164
07:37:51 etcd1 | 2015-12-10 07:37:51.198827 I | etcdserver: saved snapshot at index 50164
07:37:51 etcd1 | 2015-12-10 07:37:51.199101 I | etcdserver: compacted raft log at 45164
[Stress] Done!
2015/12/10 07:37:51 Sleeping 2 seconds...
07:37:53 etcd3 | 2015-12-10 07:37:53.859653 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:22380) of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
07:37:53 etcd3 | 2015-12-10 07:37:53.859675 W | etcdserver: cannot get the version of member 91bc3c398fb3c146 (Get http://127.0.0.1:22380/version: dial tcp 127.0.0.1:22380: getsockopt: connection refused)
2015/12/10 07:37:53 Try to restart that member.
07:37:53 etcd2 | Restart: etcd2
07:37:53 etcd2 | 2015-12-10 07:37:53.974786 I | etcdmain: etcd Version: 2.3.0-alpha.0+git
07:37:53 etcd2 | 2015-12-10 07:37:53.974820 I | etcdmain: Git SHA: a1b12f8
07:37:53 etcd2 | 2015-12-10 07:37:53.974834 I | etcdmain: Go Version: go1.5.2
07:37:53 etcd2 | 2015-12-10 07:37:53.974836 I | etcdmain: Go OS/Arch: linux/amd64
07:37:53 etcd2 | 2015-12-10 07:37:53.974842 I | etcdmain: setting maximum number of CPUs to 8, total number of available CPUs is 8
07:37:53 etcd2 | 2015-12-10 07:37:53.974871 N | etcdmain: the server is already initialized as member before, starting as etcd member...
07:37:53 etcd2 | 2015-12-10 07:37:53.974904 I | etcdmain: listening for peers on http://127.0.0.1:22380
07:37:53 etcd2 | 2015-12-10 07:37:53.974923 I | etcdmain: listening for client requests on http://127.0.0.1:22379
07:37:53 etcd2 | 2015-12-10 07:37:53.974934 I | etcdmain: listening for client rpc on 127.0.0.1:22378
07:37:53 etcd2 | 2015-12-10 07:37:53.975061 W | snap: skipped unexpected non snapshot file db
07:37:53 etcd2 | 2015-12-10 07:37:53.975068 I | etcdserver: name = infra2
07:37:53 etcd2 | 2015-12-10 07:37:53.975081 I | etcdserver: data dir = 83b7583e-d669-4c2c-80b7-736a7ed4e632.etcd
07:37:53 etcd2 | 2015-12-10 07:37:53.975084 I | etcdserver: member dir = 83b7583e-d669-4c2c-80b7-736a7ed4e632.etcd/member
07:37:53 etcd2 | 2015-12-10 07:37:53.975087 I | etcdserver: heartbeat = 100ms
07:37:53 etcd2 | 2015-12-10 07:37:53.975090 I | etcdserver: election = 1000ms
07:37:53 etcd2 | 2015-12-10 07:37:53.975092 I | etcdserver: snapshot count = 10000
07:37:53 etcd2 | 2015-12-10 07:37:53.975098 I | etcdserver: advertise client URLs = http://127.0.0.1:22379
07:37:53 etcd2 | 2015-12-10 07:37:53.977695 I | etcdserver: restarting member 91bc3c398fb3c146 in cluster ef37ad9dc622a7c4 at commit index 107
07:37:53 etcd2 | 2015-12-10 07:37:53.977753 I | raft: 91bc3c398fb3c146 became follower at term 2
07:37:53 etcd2 | 2015-12-10 07:37:53.977771 I | raft: newRaft 91bc3c398fb3c146 [peers: [], term: 2, commit: 107, applied: 0, lastindex: 107, lastterm: 2]
07:37:54 etcd2 | 2015-12-10 07:37:54.011824 I | etcdserver: starting server... [version: 2.3.0-alpha.0+git, cluster version: to_be_decided]
07:37:54 etcd2 | 2015-12-10 07:37:54.012216 E | etcdmain: failed to notify systemd for readiness: No socket
07:37:54 etcd2 | 2015-12-10 07:37:54.012236 E | etcdmain: forgot to set Type=notify in systemd service file?
07:37:54 etcd2 | 2015-12-10 07:37:54.012493 E | rafthttp: failed to find member fd422379fda50e48 in cluster ef37ad9dc622a7c4
07:37:54 etcd2 | 2015-12-10 07:37:54.012588 I | raft: raft.node: 91bc3c398fb3c146 elected leader fd422379fda50e48 at term 2
07:37:54 etcd2 | 2015-12-10 07:37:54.012615 E | rafthttp: failed to find member 8211f1d0f64f3269 in cluster ef37ad9dc622a7c4
07:37:54 etcd2 | 2015-12-10 07:37:54.012648 E | rafthttp: failed to find member 8211f1d0f64f3269 in cluster ef37ad9dc622a7c4
07:37:54 etcd2 | 2015-12-10 07:37:54.012699 E | rafthttp: failed to find member fd422379fda50e48 in cluster ef37ad9dc622a7c4
07:37:54 etcd3 | 2015-12-10 07:37:54.012677 I | rafthttp: the connection with 91bc3c398fb3c146 became active
07:37:54 etcd3 | 2015-12-10 07:37:54.012724 E | rafthttp: failed to dial 91bc3c398fb3c146 on stream MsgApp v2 (remote member 91bc3c398fb3c146 could not recognize local member)
07:37:54 etcd3 | 2015-12-10 07:37:54.012735 I | rafthttp: the connection with 91bc3c398fb3c146 became inactive
07:37:54 etcd2 | 2015-12-10 07:37:54.016267 N | etcdserver: added member 8211f1d0f64f3269 [http://127.0.0.1:12380] to cluster ef37ad9dc622a7c4
07:37:54 etcd2 | 2015-12-10 07:37:54.016475 N | etcdserver: added local member 91bc3c398fb3c146 [http://127.0.0.1:22380] to cluster ef37ad9dc622a7c4
07:37:54 etcd1 | 2015-12-10 07:37:54.016846 I | rafthttp: the connection with 91bc3c398fb3c146 became active
07:37:54 etcd2 | 2015-12-10 07:37:54.016884 I | rafthttp: the connection with 8211f1d0f64f3269 became active
07:37:54 etcd2 | 2015-12-10 07:37:54.021070 N | etcdserver: added member fd422379fda50e48 [http://127.0.0.1:32380] to cluster ef37ad9dc622a7c4
07:37:54 etcd2 | 2015-12-10 07:37:54.021277 N | etcdserver: set the initial cluster version to 2.3
07:37:54 etcd3 | 2015-12-10 07:37:54.021606 I | rafthttp: the connection with 91bc3c398fb3c146 became active
07:37:54 etcd2 | 2015-12-10 07:37:54.021663 I | rafthttp: the connection with fd422379fda50e48 became active
07:37:54 etcd2 | 2015-12-10 07:37:54.142696 I | rafthttp: received and saved database snapshot [index: 50439, from: fd422379fda50e48] successfully
07:37:54 etcd2 | 2015-12-10 07:37:54.142813 I | raft: 91bc3c398fb3c146 [commit: 107, lastindex: 107, lastterm: 2] starts to restore snapshot [index: 50439, term: 2]
07:37:54 etcd2 | 2015-12-10 07:37:54.142833 I | raft: log [committed=107, applied=107, unstable.offset=108, len(unstable.Entries)=0] starts to restore snapshot [index: 50439, term: 2]
07:37:54 etcd2 | 2015-12-10 07:37:54.142850 I | raft: 91bc3c398fb3c146 restored progress of 8211f1d0f64f3269 [next = 50440, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
07:37:54 etcd2 | 2015-12-10 07:37:54.142859 I | raft: 91bc3c398fb3c146 restored progress of 91bc3c398fb3c146 [next = 50440, match = 50439, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
07:37:54 etcd2 | 2015-12-10 07:37:54.142868 I | raft: 91bc3c398fb3c146 restored progress of fd422379fda50e48 [next = 50440, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
07:37:54 etcd2 | 2015-12-10 07:37:54.142874 I | raft: 91bc3c398fb3c146 [commit: 107] restored snapshot [index: 50439, term: 2]
07:37:54 etcd3 | 2015-12-10 07:37:54.142883 I | rafthttp: snapshot [index: 50439, to: 91bc3c398fb3c146] sent out successfully
07:37:54 etcd2 | 2015-12-10 07:37:54.203614 I | etcdserver: raft applied incoming snapshot at index 50439
07:37:54 etcd2 | 2015-12-10 07:37:54.385850 I | etcdserver: added member 8211f1d0f64f3269 [http://127.0.0.1:12380] to cluster ef37ad9dc622a7c4 from store
07:37:54 etcd2 | 2015-12-10 07:37:54.385867 I | etcdserver: added member 91bc3c398fb3c146 [http://127.0.0.1:22380] to cluster ef37ad9dc622a7c4 from store
07:37:54 etcd2 | 2015-12-10 07:37:54.385875 I | etcdserver: added member fd422379fda50e48 [http://127.0.0.1:32380] to cluster ef37ad9dc622a7c4 from store
07:37:54 etcd2 | 2015-12-10 07:37:54.385896 I | etcdserver: set the cluster version to 2.3 from store
07:37:54 etcd2 | 2015-12-10 07:37:54.386326 E | rafthttp: failed to read fd422379fda50e48 on stream Message (net/http: request canceled)
07:37:54 etcd2 | 2015-12-10 07:37:54.386335 I | rafthttp: the connection with fd422379fda50e48 became inactive
07:37:54 etcd2 | 2015-12-10 07:37:54.386565 E | rafthttp: failed to read 8211f1d0f64f3269 on stream Message (net/http: request canceled)
07:37:54 etcd2 | 2015-12-10 07:37:54.386572 I | rafthttp: the connection with 8211f1d0f64f3269 became inactive
07:37:54 etcd2 | 2015-12-10 07:37:54.387171 I | etcdserver: recovered from incoming snapshot at index 50439
07:37:54 etcd2 | 2015-12-10 07:37:54.387186 I | rafthttp: the connection with 8211f1d0f64f3269 became active
07:37:54 etcd2 | 2015-12-10 07:37:54.387459 I | rafthttp: the connection with fd422379fda50e48 became active
07:37:54 etcd2 | 2015-12-10 07:37:54.387475 I | etcdserver: published {Name:infra2 ClientURLs:[http://127.0.0.1:22379]} to cluster ef37ad9dc622a7c4

gyuho · 2015-12-10T15:43:09Z

And ran this with this command:

etcd3: bin/etcd --initial-cluster-token='etcd-cluster-1' --initial-cluster-state='new' --data-dir='c1605c14-97ba-4022-bdfd-19c38438f2d2.etcd' --name='infra3' --advertise-client-urls='http://127.0.0.1:32379' --listen-peer-urls='http://127.0.0.1:32380' --initial-advertise-peer-urls='http://127.0.0.1:32380' --experimental-v3demo='true' --experimental-gRPC-addr='127.0.0.1:32378' --listen-client-urls='http://127.0.0.1:32379' --initial-cluster='infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380'
etcd1: bin/etcd --experimental-v3demo='true' --advertise-client-urls='http://127.0.0.1:12379' --listen-peer-urls='http://127.0.0.1:12380' --initial-advertise-peer-urls='http://127.0.0.1:12380' --initial-cluster-token='etcd-cluster-1' --initial-cluster='infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state='new' --experimental-gRPC-addr='127.0.0.1:12378' --name='infra1' --listen-client-urls='http://127.0.0.1:12379' --data-dir='d311bcb5-db89-4db7-899a-2f9dd5d068ee.etcd'
etcd2: bin/etcd --experimental-v3demo='true' --name='infra2' --listen-client-urls='http://127.0.0.1:22379' --initial-cluster-token='etcd-cluster-1' --initial-cluster='infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state='new' --data-dir='83b7583e-d669-4c2c-80b7-736a7ed4e632.etcd' --experimental-gRPC-addr='127.0.0.1:22378' --advertise-client-urls='http://127.0.0.1:22379' --listen-peer-urls='http://127.0.0.1:22380' --initial-advertise-peer-urls='http://127.0.0.1:22380'

xiang90 · 2015-12-10T17:54:22Z

@gyuho Yea. That is what I expected. The log is still a little bit noisy. We need to improve it over time.

*: rewrite snapshot merging and sending for v3

gyuho · 2015-12-10T17:55:48Z

@xiang90 Yeah other than that, code works great. Thanks,

gyuho · 2015-12-11T04:47:20Z

1million_first.txt
1million_second.txt

@xiang90 So I tried to see blocking behavior from boltdb.

First case is with 1 million keys, without boltdb/bolt#432.

Second case is with 1 million keys and with boltdb/bolt#432.
I set the InitialMapSize as 1 million as well.

Do you have any suggestion to benchmark the InitialMapSize in boltdb?
In my two experiments, I don't see much difference.

Please let me know.

xiang90 · 2015-12-11T05:22:53Z

@gyuho There should not be any perf difference. You should try to create a case that you can observe the blocking behavior.

gyuho · 2015-12-11T05:27:55Z

OK I will try that. Thank

Sincerely,
Gyu-Ho Lee
On Dec 10, 2015 21:23, "Xiang Li" notifications@github.com wrote:

@gyuho https://github.com/gyuho There should not be any perf
difference. You should try to create a case that you can observe the
blocking behavior.

—
Reply to this email directly or view it on GitHub
#3970 (comment).

xiang90 · 2015-12-11T05:32:20Z

@gyuho My suggestion will be:

Try with boltdb itself first

create a long running read routine. create another routine to put keys into the db. you can observe the read routine blocks the write routine very soon.
with the pr: set initial size to 1gb. do the same thing. you can observe the read routine blocks the write routine until the size exceeds 1gb, which is the initial size.

Then you can understand why we need this in etcd. When we might have a read routine (the snapshot sender that can take up to 1 minute for sending large snapshot), it might block writes for 10s of seconds.

gyuho · 2015-12-11T05:33:40Z

OK got it. Will try. Thanks!

Sincerely,
Gyu-Ho Lee
On Dec 10, 2015 21:32, "Xiang Li" notifications@github.com wrote:

@gyuho https://github.com/gyuho My suggestion will be:

Try with boltdb itself first

create a long running read routine. create another routine to put keys
into the db. you can observe the read routine blocks the write routine very
soon.
2.

with the pr: set initial size to 1gb. do the same thing. you can
observe the read routine blocks the write routine until the size exceeds
1gb, which is the initial size.

Then you can understand why we need this in etcd. When we might have a
read routine (the snapshot sender that can take up to 1 minute for sending
large snapshot), it might block writes for 10s of seconds.

—
Reply to this email directly or view it on GitHub
#3970 (comment).

gyuho reviewed Dec 8, 2015
View reviewed changes

xiang90 force-pushed the snapshot branch from b41d337 to 6322486 Compare December 8, 2015 23:31

*: rewrite snapshot sending

23bd60c

xiang90 force-pushed the snapshot branch from 6322486 to 23bd60c Compare December 9, 2015 02:21

xiang90 changed the title ~~WIP (*: rewrite snapshot sending)~~ *: rewrite snapshot merging and sending for v3 Dec 9, 2015

gyuho reviewed Dec 9, 2015
View reviewed changes

xiang90 added a commit that referenced this pull request Dec 10, 2015

Merge pull request #3970 from xiang90/snapshot

9b26753

*: rewrite snapshot merging and sending for v3

xiang90 merged commit 9b26753 into etcd-io:master Dec 10, 2015

xiang90 deleted the snapshot branch December 10, 2015 17:55

xiang90 mentioned this pull request Dec 10, 2015

v3: etcdserver: snapshot causes deadlock #3929

Closed

Conversation

xiang90 commented Dec 8, 2015

Uh oh!

gyuho Dec 8, 2015

Choose a reason for hiding this comment

Uh oh!

gyuho commented Dec 8, 2015

Uh oh!

xiang90 commented Dec 9, 2015

Uh oh!

gyuho commented Dec 9, 2015

Uh oh!

gyuho Dec 9, 2015

Choose a reason for hiding this comment

Uh oh!

xiang90 Dec 9, 2015

Choose a reason for hiding this comment

Uh oh!

gyuho Dec 9, 2015

Choose a reason for hiding this comment

Uh oh!

xiang90 Dec 9, 2015

Choose a reason for hiding this comment

Uh oh!

gyuho commented Dec 9, 2015

Uh oh!

xiang90 commented Dec 9, 2015

Uh oh!

xiang90 commented Dec 9, 2015

Uh oh!

gyuho commented Dec 9, 2015

Uh oh!

xiang90 commented Dec 9, 2015

Uh oh!

gyuho commented Dec 9, 2015

Uh oh!

xiang90 commented Dec 9, 2015

Uh oh!

xiang90 commented Dec 9, 2015

Uh oh!

gyuho commented Dec 9, 2015

Uh oh!

gyuho commented Dec 9, 2015

Uh oh!

xiang90 commented Dec 9, 2015

Uh oh!

gyuho commented Dec 9, 2015

Uh oh!

xiang90 commented Dec 9, 2015

Uh oh!

gyuho commented Dec 9, 2015

Uh oh!

xiang90 commented Dec 9, 2015

Uh oh!

gyuho commented Dec 9, 2015

Uh oh!

gyuho commented Dec 10, 2015

Uh oh!

gyuho commented Dec 10, 2015

Uh oh!

xiang90 commented Dec 10, 2015

Uh oh!

gyuho commented Dec 10, 2015

Uh oh!

gyuho commented Dec 11, 2015

Uh oh!

xiang90 commented Dec 11, 2015

Uh oh!

gyuho commented Dec 11, 2015

Uh oh!

xiang90 commented Dec 11, 2015

Uh oh!

gyuho commented Dec 11, 2015

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!