投稿

[论文速览]Open-vocabulary Object Segmentation with Diffusion Models[2301.05221]

3782

1

2023-08-18 13:40:00

未经作者授权，禁止转载

正在缓冲...

论文题目:Guiding Text-to-Image Diffusion Model Towards Grounded Generation / Open-vocabulary Object Segmentation with Diffusion Models 论文地址:http://arxiv.org/abs/2301.05221 项目地址:https://lipurple.github.io/Grounded_Diffusion/ * 本视频旨在传递一篇论文的存在推荐感兴趣的您阅读，并不是详细介绍，受up能力限制经常出现中英混杂，散装英语等现象，请见谅。如论文报道出了偏差，欢迎各位怒斥。 ** 新论文推荐，过往论文查找，欢迎编辑这个文档： https://docs.qq.com/sheet/DSUdOTG9xWUdydVB6 *** Slides每1-2月会上传到置顶动态地址

秋刀鱼的炼丹工坊

Diffusion model

计算机视觉

秋刀鱼的炼丹工坊发消息

经中此篇如此高深，我确实不懂。

道士无限月灵，0氪霸服，散人必玩！

道士无限月灵

通识

机器人学

表征学习

多模态/语言模型

新模型

内容生成

数据增广

一包薯条

[论文速览]Socratic Models: Zero-Shot Multimodal Reasoning[2204.00598]

11:32

[论文简析]Large Language Models as General Pattern Machines[2307.04721]

15:46

[论文速览]LaFTer: Label-Free Tuning of Zero-shot Classifier...[2305.18287]

06:52

[论文速览]OWL-ViT: Simple Open-Vocabulary Object Detection with ViT[2205.06230]

07:45

[论文简析]GroupViT: Semantic Segmentation Emerges from Text Supervision[2202.11094]

11:09

[论文简析]VATT: Video-Audio-Text Transformer[2104.11178]

11:59

[论文简析]End-to-End Video-Language Transformers..Masked Visual-token..[2111.12681]

09:09

[论文简析]TAN: Temporal Alignment Networks for Long-term Video[2204.02968]

16:21

[论文简析]Improving fine-grained understanding in image-text pre-training[2401.0986]

17:36

[论文简析]End-to-End Learning... from Uncurated Instructional Videos[1912.06430]

08:59

[论文简析]Region-Aware Pretraining for Open-Vocab. Object Det. w/ ViT[2305.07011]

08:11

[论文简析]Toolformer: Language Models Can Teach Themselves to Use Tools[2302.04761]

13:09

[论文速览]ViperGPT: Visual Inference via Python Execution for Reasoning[2303.08128]

05:59

[论文简析]VoxPoser: Composable 3D Value Maps for Robotic...[2307.05973]

13:04

[论文速览]LLaMA-Adapter: Efficient Fine-tuning..Zero-init Attention[2303.16199]

04:22

[论文速览]Guiding Text-to-Image Diffusion Model Towards Grounded Gen..[2301.05221]

09:20

[论文简析]Open Vocab. Semantic Seg. with Patch Aligned Contrastive...[2212.04994]

04:32

[论文简析]Representation Learning via Global Temporal Alignment and ...[2105.05217]

14:30

[论文简析]LLaVA: Visual Instruction Tuning[2304.08485]

09:33

[论文速览]LongLoRA: Efficient Fine-tuning of Long-Context LLMs[2309.12307]

05:04

[论文简析]CLIP Dense Inference Yields Open-Vocab ... For-Free[2309.14289]

05:51

[论文速览]Ferret: Refer and Ground Anything Anywhere at Any Granularity[2310.07704]

09:43

[论文速览]VLMs are Zero-Shot Reward Models for RL[2310.12921]

04:42

[论文速览]CRG: Improving Grounding in VLM w/o training[2403.02325]

05:24

[论文速览]A Simple LLM Framework for Long-Range Video Question-Answering[2312.17235]

12:22

[论文速览]Align before Fuse / ALBEF: ...[2107.07651]

13:16

[论文速览]Bootstrapping Language-Image Pre-training...[2201.12086]

11:26

[论文速览]BLIP-2 ...with Frozen Image Encoders and Large Language Models[2301.12597]

13:43

[论文速览]Ferret-v2: An Improved...for Referring and Grounding with LLMs[2404.07973]

09:30

顶部