Cell phone size OLED displays are driven much the same as cell phone size LCDs. The manufacturers try to make the interface similar to reduce engineering effort required to switch technologies. This is why LCDs, in turn, are driven similar to old CRT displays. These interfaces (for the smaller displays anyways) generally are parallel interfaces with 3 clocks. The parallel data bus will be as wide as the color depth of the display. A 24 bit display, for example, will have a 24 bit wide color bus, with 8 bits representing red, green and blue. The RGB numbers will represent the brightness of each color for an individual pixel. Just about any color can be formed by varying the intensity of the RGB values. The 3 clocks are the pixel clock, the horizontal clock and the vertical (or frame) clock. The pixel clock is the fastest, and each tick of the pixel clock moves the selected pixel horizontally across the screen, the horizontal clock ticks for every new line, and the vertical clock clicks every new frame. So for a 320 x 240 pixel screen, the horizontal clock will tick every 320 pixels, and the vertical clock will click every 240 lines. This is ignoring delays. In reality, there is a bunch of delays at the end of each line and the end of each frame. In the CRT days, the delays allowed time for the ray to physically move to the beginning of a new line or back up to the corner for a new frame. These delays provided time to "hide" digital information in the old analog days of cable (like subtitles and v-chip info). Today they are still handy because they force the display driver to share memory bandwidth (as it pulls information from the frame buffer in chunks of time). You can basically think of the interface as painting one pixel at a time across the display, line by line until the image is drawn, at which time it starts all over. The pixels are generally designed to hold the information long enough to last until the next full refresh cycle (which typical occurs at least 60Hz).
EDIT: Sorry, I thought you were looking for how the interface is driven. The pixels themselves are usually driven directly by a display driver of controller integrated with the display (and so the end user usually doesn't need to worry about implementation details). I'm not an expert here, but a simplification is to represent each pixel as a diode in parallel with a capacitor. The capacitor is charged to a certain voltage, which dictates the amount of current, which dictates the brightness of the pixel. So an analog 'programming' voltage will determine brightness, but this gets refreshed constantly.
I assume you understand how to create the I2C byte sequence for the SSD1306 but I'll repeat it anyway: The SSD1306 distinguishes between commands (incl. command parameters) and data (pixel data). With SPI, it uses a dedicated input pin to distinguish commands and data.
With I2C, 0x80 needs to be prepended to each command byte. 0x40 switch to data mode. The data mode continues until the end of the I2C transaction (indicated by a STOP condition).
To update a part of the screen, the start address of the top left corner has to be set and then the data can be sent. A valid byte sequence for starting at the coordinates (20, 16) for x and y looks like this:
0x80, 0xb1, // page start address: 0xb0 | (y >> 3)
0x80, 0x04, // lower nibble of column: 0x00 | (x & 0x0f)
0x80, 0x11, // upper nibble of column: 0x10 | ((x >> 4) & 0x0f)
0x40, // switch to data mode
0x01, 0x02, 0x03, 0x04, 0x05, 0x06, ... // pixel data
The memory is divided into pages. Each page covers 8 pixel rows. So you can only update stripes of 8 vertical pixels and the stripes must be aligned to multiples of 8. As you can see in the first line of the byte sequence, the lower 3 bits of y are simply discarded.
The horizontal start position is provided in two parts: the upper and lower nibble. A nibble is four bits, i.e. half a byte. See lines 2 and 3 above.
The remaining two lines switch to data mode and send the pixel data. With each byte, the address advances by 1, i.e. it advances horizontally from left to right and each byte written affects a vertical piece of 8 pixels.
With the different addressing modes (command 0x20 to 0x22), you can determine how the address advances at the end of page, at the end of your update area etc. The simplest approach is to write to each page separately and to explicitly set the address the beginning of each page.
Note that there are clones of the SSD1306 chips that do not support the different addressing modes.
Best Answer
Driving OLED displays is easy, but you need to provide two voltages:
For inspiration of the supply, here is breakout with your display display and also schematic.