Meant by “streaming data access” in HDFS

filesystemshadoophdfsstreaming

According to the HDFS Architecture page HDFS was designed for "streaming data access". I'm not sure what that means exactly, but would guess it means an operation like seek is either disabled or has sub-optimal performance. Would this be correct?

I'm interested in using HDFS for storing audio/video files that need to be streamed to browser clients. Most of the streams will be start to finish, but some could have a high number of seeks.

Maybe there is another file system that could do this better?

Best Answer

Streaming just implies that it can offer you a constant bitrate above a certain threshhold when transferring the data, as opposed to having the data come in in bursts or waves.

If HDFS is laid out for streaming, it will probably still support seek, with a bit of overhead it requires to cache the data for a constant stream.

Of course, depending on system and network load, your seeks might take a bit longer.

Related Topic