Abstract: The advent of vision-language pre-training techniques enhanced substantial progress in the development of models for image captioning. However, these models frequently produce generic ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results