Skip to content
Snippets Groups Projects
Forked from Reform / reform-boundary-uboot
Source project has a limited visibility.
  • Nick Thompson's avatar
    77436d66
    Davinci: SPI performance enhancements · 77436d66
    Nick Thompson authored
    
    The following restructuring and optimisations increase the SPI
    read performance from 1.3MiB/s (on da850) to 2.87MiB/s (on da830):
    
    Remove continual revaluation of driver state from the core of the
    copy loop. State can not change during the copy loop, so it is
    possible to move these evaluations to before the copy loop.
    
    Cost is more code space as loop variants are required for each set
    of possible configurations. The loops are simpler however, so the
    extra is only 128bytes on da830 with CONFIG_SPI_HALF_DUPLEX
    defined.
    
    Unrolling the first copy loop iteration allows the TX buffer to be
    pre-loaded reducing SPI clock starvation.
    
    Unrolling the last copy loop iteration removes testing for the
    final loop iteration every time round the loop.
    
    Using the RX buffer empty flag as a transfer throttle allows the
    assumption that it is always safe to write to the TX buffer, so
    polling of TX buffer full flag can be removed.
    
    Signed-off-by: default avatarNick Thompson <nick.thompson@ge.com>
    Signed-off-by: default avatarSandeep Paulraj <s-paulraj@ti.com>
    77436d66
    History
    Davinci: SPI performance enhancements
    Nick Thompson authored
    
    The following restructuring and optimisations increase the SPI
    read performance from 1.3MiB/s (on da850) to 2.87MiB/s (on da830):
    
    Remove continual revaluation of driver state from the core of the
    copy loop. State can not change during the copy loop, so it is
    possible to move these evaluations to before the copy loop.
    
    Cost is more code space as loop variants are required for each set
    of possible configurations. The loops are simpler however, so the
    extra is only 128bytes on da830 with CONFIG_SPI_HALF_DUPLEX
    defined.
    
    Unrolling the first copy loop iteration allows the TX buffer to be
    pre-loaded reducing SPI clock starvation.
    
    Unrolling the last copy loop iteration removes testing for the
    final loop iteration every time round the loop.
    
    Using the RX buffer empty flag as a transfer throttle allows the
    assumption that it is always safe to write to the TX buffer, so
    polling of TX buffer full flag can be removed.
    
    Signed-off-by: default avatarNick Thompson <nick.thompson@ge.com>
    Signed-off-by: default avatarSandeep Paulraj <s-paulraj@ti.com>