I have found over the years that except in speed-critical or multi-master applications, it's actually easier to bit-bang an I2C master than to try to use the I2C facilities built into many chips.
Note that if a device uses clock stretching, any time you release SCK, you must wait for it to actually go high. For simplicity, such delays are omitted from the following descriptions, but should be included if appropriate in your "release_SCK()" routine.
To start an I2C transaction, release SCK (if it isn't already) and, if the data line is low, assert SCK (drive it low), release SDA (if it isn't already), and release the SCK. Repeat this process up to nine times until SDA is high. If SDA is still low after nine repetitions, the bus is unusable.
To output each byte (including the address byte), assert SDA, and then for each bit repeat the sequence (assert SCK; set SDA high or low to match next bit of data; release SCK) eight times. After the last bit, assert SCK, release SDA, and release SCK. If SDA is low, a slave is acknowledging; if SDA is high, no slave is acknowledging and the transaction should be aborted.
When all output is complete, assert SCK, then SDA, and then release SCK, then SDA.
To input each byte, assert SDA, then release SCK if it isn't already (it will be for the first byte, but not others). Then reassert SCK, release SDA, and repeat the sequence (release SCK, read data bit, assert SCK) eight times. Note that at the end of this sequence, unlike when outputting a byte, SCK will be left asserted.
When all input is complete, release SDA (it should already already be released) and SCK.
Note that because the clock is left asserted after inputting each byte, it's not necessary to specify whether the byte should be ack'ed or nak'ed. If you read another byte, the last byte read will be nak'ed. If you terminate the read, it will be nak'ed.
Start; send address; write one byte, finish
SCK - -__-__-__-__-__-__-__-__-__-- -__-__-__-__-__-__-__-__-__--- -__--
SDA(M) - __777666555444333222111___--- --777666555444333222111000---- --__-
SDA(S) - -------------------------??AA A------------------------??AAA A----
Start; send address; read two bytes; finish
SCK - -__-__-__-__-__-__-__-__-__--- -__--_--_--_--_--_--_--_--__ -__--_--_--_--_--_--_--_--__ _-
SDA(M) - __777666555444333222111------- __-------------------------- __-------------------------- --
SDA(S) - -------------------------??AAA A??77?66?55?44?33?22?11?00?? -??77?66?55?44?33?22?11?00?? ?-
Best Answer
Disregarding start, restart and stop conditions, that transaction is 6 bytes times 9 bits. If the PEC is optional, it's 5 bytes * 9 bits = 45 bits. The start, restart, and stop conditions may not equal a bit in duration, but can be roughly approximated as equaling a bit. So I count 48 bits, and in practice there is some software overhead.
So your estimation of 49 bits is approximately right, you can read a 16-bit word from the chip about 2040 times per second, or about every 0.49 milliseconds.
Keep in mind that this is the theoretical maximum throughput of the bus. Any other things that the software does will slow it down, and also if the slave needs to stretch the clock to slow down the communications.