InstructBLIP model using Flan-T5-xl as language model. InstructBLIP was introduced in the paper InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning by Dai et al.
Disclaimer: The team releasing InstructBLIP did not write a model card for this model so this model card has been written by the Hugging Face team.
Model description
InstructBLIP is a visual instruction tuned version of BLIP-2. Refer to the paper for details.
