Forward-sync to HDFS? (OR continue an incomplete hdfs upload?)

hadoophdfsrsyncsynchronization

Anyone have a good suggestion for doing a forward sync to HDFS? ("forward-sync" in contrast to "bi-directional sync")

Basically I have a large number of files I want to put into the HDFS. Its so large that I'll often, say, lose connectivity before it is finished. What I would like to do is just do a "resume" of my file upload. However hadoop fs -put will just upload the whole directory again (or complain if it exists).

Anyone have a good way to continue an incomplete hdfs upload?

Best Answer

If you're running a new enough Hadoop, you could mount hdfs using FUSE and just use rsync.

Might also be possible to build a local-only hdfs and then use distcp.

Related Topic