Can anyone explain me what byte stream actually contains? Does it contain bytes (hex data) or binary data or english letters only? I am also confused about the term "raw data". If someone asked me to "reverse the 4 byte data", then what should I assume the data is hex code or binary code?
A byte stream actually
stream-processing
Related Solutions
A streaming app is an app that consumes a stream of data.
A stream of data is transmitted data formatted in a way that can be useful even when incomplete. Since partial stream data does not require complete transmission this allows consumers to join and leave at any time. It also allows for transmission to be continuous, though it may start and stop on demand. It models how broadcast radio and television work.
This contrasts with file transfers that may be meaningless to consume until the transfer has been completed.
Java streams allow consumption of partial data but do nothing to transfer data over a network on their own.
And like any popular buzz word money is being spent to make it seem like it's more than it is.
Kafka deals in ordered logs of atomic messages. You can view it sort of like the pub/sub
mode of message brokers, but with strict ordering and the ability to replay or seek around the stream of messages at any point in the past that's still being retained on disk (which could be forever).
Kafka's flavor of streaming stands opposed to remote procedure call like Thrift or HTTP, and to batch processing like in the Hadoop ecosystem. Unlike RPC, components communicate asynchronously: hours or days may pass between when a message is sent and when the recipient wakes up and acts on it. There could be many recipients at different points in time, or maybe no one will ever bother to consume a message. Multiple producers could produce to the same topic without knowledge of the consumers. Kafka does not know whether you are subscribed, or whether a message has been consumed. A message is simply committed to the log, where any interested party can read it.
Unlike batch processing, you're interested in single messages, not just giant collections of messages. (Though it's not uncommon to archive Kafka messages into Parquet files on HDFS and query them as Hive tables).
Case 1: Kafka does not preserve any particular temporal relationship between producer and consumer. It's a poor fit for streaming video because Kafka is allowed to slow down, speed up, move in fits and starts, etc. For streaming media, we want to trade away overall throughput in exchange for low and, more importantly, stable latency (otherwise known as low jitter). Kafka also takes great pains to never lose a message. With streaming video, we typically use UDP and are content to drop a frame here and there to keep the video running. The SLA on a Kafka-backed process is typically seconds to minutes when healthy, hours to days when healthy. The SLA on streaming media is in tens of milliseconds.
Netflix could use Kafka to move frames around in an internal system that transcodes terabytes of video per hour and saves it to disk, but not to ship them to your screen.
Case 2: Absolutely. We use Kafka this way at my employer.
Case 3: You can use Kafka for this kind of thing, and we do, but you are paying some unnecessary overhead to preserve ordering. Since you don't care about order, you could probably squeeze some more performance out of another system. If your company already maintains a Kafka cluster, though, probably best to reuse it rather than take on the maintenance burden of another messaging system.
Best Answer
Byte streams contain, well, bytes. Broken down into what it is actually, it is 8 bits composed of 1s and 0s. If it were representing a number, it would be any number from 0 to 255 (which, I may add, is no coincidence why the 4 numbers in an IP address always range from 0 to 255). Byte streams are usually sophisticated interfaces meant to hide the underlying basic byte array used to hold a circular buffer (you fill up the buffer and wait for someone to empty it, at which time it simply fills up the buffer again).
What the heck does that represent? Well, it could represent a text file, or an image, or a live video stream. What it is is entirely dependent upon the context of who is reading it. Hex representation is another way of saying the same thing, though it is sometimes more convenient to manage bytes in terms of their hex representation rather than numbers however it is the same thing.
When you're referring to raw data, you are usually referring to byte data. The data comes without a tag saying "I am an image file!" Usually you only deal with raw data when you don't really care what the data represents overall. For example, if I wanted to convert an image to its black and white version, I might say to read an image's raw data and for every 3 bytes read (which would actually be representation of red color, representation of green color, and representation of blue color), add its number value and divide by 3, then write that value 3 times. Essentially what I'd be doing is averaging a pixel's red, green, and blue values and making its gray equivalent pixel from that. However, when you talk about performing operations to data at the level of "byte by byte", you don't really care about the big picture, so to speak.
Or, perhaps you wish to save a file in a database, but it asks you to insert its "raw data" in a blob data type. This simply means to convert the data of a file into a large byte array that the database can understand and manage. You'll find that when you retrieve that value from the database, it will be simply one large byte array as you initially provided to the database to begin with. If that data was a file, then you, the programmer, must reinterpret that byte data as if you were reading a file one byte at a time.
If someone asked you to "reverse the 4 byte data", I would assume it refers to big-endian vs little-endian interpretation of numbers, which writes numbers starting with the most or least significant byte. It does not matter if a number is represented as big-endian or little-endian, just that all systems reading the number interpret it consistently.
This isn't to say that the actual number representation (or hex representation for that matter) is changed, simply that the order in which these 4 bytes make a number should be reversed. So say you have 0x01, 0x02, 0x03, and 0x04. To reverse these, you'd have 0x04, 0x03, 0x02, 0x01 instead. The system would presumably read these 4 bytes in the reverse order and since you've already reversed it, the value is interpreted to be the very same as what was intended in the raw data.
I hope that explains it!