An address in a cached system has up to three parts: tag, set and offset.
Since the given system is byte addressable, and a cache line is two words (eight bytes), the offset portion of the address requires 3 bits.
A direct mapped cache has no set association. Or, if you will, it may be regarded as collection of sets, each of which holds only one block. So no set -> block associative lookup is required, and the set field of the address can be called a block field. This field directly determines the block to which the address maps, hence the "direct mappped" designation. The tag is used to determine whether a given block in the cache is a "hit" for the address, or holds data for some other address. (Whereas under set association, the tag is used to search through a set of blocks for a hit: the set elements are associated with addresses via the tag field.
The cache has four blocks, because it holds eight words, but pairs of words are considered blocks. So the set/block part of the address requires two bits.
The remainder are tag bits. Since memory space is 4 Kb wide (let us assume there is no virtual memory), addresses are 12 bits wide, and so there are 12 - 3 - 2 = 7 tag bits.
Note that if the set size were 256 bits, it would make the cache fully associative rather than direct mapped: a situation in which the entire cache is one big set of blocks, and so there is no set field in an address, only a tag and offset. The tag is used to search the entire cache for a hit. Under set association, the additional set field restricts the search to an indexed set, which just holds a single block under the direct mapped cache.
Best Answer
1) If the cache is direct-mapped (i.e. 1 way) of 16 words, its size (excluding tags) is 16*4=64 bytes. Line length does not matter.
2) There is a mismatch between "A address bits" and "tag size in Bytes"
Each tag contains :
A validity bit
The portion of addresses not indexed in the cache.
Usually some history bits for the various cache replacement algorithms (LRU, PLRU), used on multiways caches.
Each cache way size is (in bytes) : 4(bytes per word)*L(line size)*S(sets)
Even if question 2) is about the cache described in question 1), you cannot answer without knowing the number of address bits.