Pipelining is the technique used to enable instruction to complete with each clock. On a non-pipelined processor, nine clock cycles are required for the individual fetch, decode and execute cycles for the three instructions. On a pipelined processor, where fetch, decode and execute operations are performed =’0′, only five cycles are required to execute the same three instructions.

First instruction requires three cycles to complete. Additional instructions then complete at a rate of one per cycle. A long sequence of the instruction, say 1000 of them, might requires 3000 clock cycles on a non-pipelined processor and only 1002 clock cycles with a pipelined processor. So pipelining result in a tremendous performance gain. In 8086,pipelining is implemented by providing 6 byte queue, where as long as 6 one byte instructions can be stored, then one by one instructions for decoding and execution.
pipelining

So, while executing first instruction in a queue, processor decodes second instruction and fetches 8th instruction from the memory. In this way, 8086 perform fetch, decode and execute operation in parallel.