[论文速览]Open-vocabulary Object Segmentation with Diffusion Models[2301.05221]

3782
1
2023-08-18 13:40:00
正在缓冲...
82
26
105
35
论文题目:Guiding Text-to-Image Diffusion Model Towards Grounded Generation / Open-vocabulary Object Segmentation with Diffusion Models 论文地址:http://arxiv.org/abs/2301.05221 项目地址:https://lipurple.github.io/Grounded_Diffusion/ * 本视频旨在传递一篇论文的存在推荐感兴趣的您阅读,并不是详细介绍,受up能力限制经常出现中英混杂,散装英语等现象,请见谅。如论文报道出了偏差,欢迎各位怒斥。 ** 新论文推荐,过往论文查找,欢迎编辑这个文档: https://docs.qq.com/sheet/DSUdOTG9xWUdydVB6 *** Slides每1-2月会上传到置顶动态地址
经中此篇如此高深,我确实不懂。
自动连播
91.7万播放
简介
通识
机器人学
表征学习
多模态/语言模型
新模型
内容生成
数据增广
一包薯条
[论文速览]Socratic Models: Zero-Shot Multimodal Reasoning[2204.00598]
11:32
[论文简析]Large Language Models as General Pattern Machines[2307.04721]
15:46
[论文速览]LaFTer: Label-Free Tuning of Zero-shot Classifier...[2305.18287]
06:52
[论文速览]OWL-ViT: Simple Open-Vocabulary Object Detection with ViT[2205.06230]
07:45
[论文简析]GroupViT: Semantic Segmentation Emerges from Text Supervision[2202.11094]
11:09
[论文简析]VATT: Video-Audio-Text Transformer[2104.11178]
11:59
[论文简析]End-to-End Video-Language Transformers..Masked Visual-token..[2111.12681]
09:09
[论文简析]TAN: Temporal Alignment Networks for Long-term Video[2204.02968]
16:21
[论文简析]Improving fine-grained understanding in image-text pre-training[2401.0986]
17:36
[论文简析]End-to-End Learning... from Uncurated Instructional Videos[1912.06430]
08:59
[论文简析]Region-Aware Pretraining for Open-Vocab. Object Det. w/ ViT[2305.07011]
08:11
[论文简析]Toolformer: Language Models Can Teach Themselves to Use Tools[2302.04761]
13:09
[论文速览]ViperGPT: Visual Inference via Python Execution for Reasoning[2303.08128]
05:59
[论文简析]VoxPoser: Composable 3D Value Maps for Robotic...[2307.05973]
13:04
[论文速览]LLaMA-Adapter: Efficient Fine-tuning..Zero-init Attention[2303.16199]
04:22
[论文速览]Guiding Text-to-Image Diffusion Model Towards Grounded Gen..[2301.05221]
09:20
[论文简析]Open Vocab. Semantic Seg. with Patch Aligned Contrastive...[2212.04994]
04:32
[论文简析]Representation Learning via Global Temporal Alignment and ...[2105.05217]
14:30
[论文简析]LLaVA: Visual Instruction Tuning[2304.08485]
09:33
[论文速览]LongLoRA: Efficient Fine-tuning of Long-Context LLMs[2309.12307]
05:04
[论文简析]CLIP Dense Inference Yields Open-Vocab ... For-Free[2309.14289]
05:51
[论文速览]Ferret: Refer and Ground Anything Anywhere at Any Granularity[2310.07704]
09:43
[论文速览]VLMs are Zero-Shot Reward Models for RL[2310.12921]
04:42
[论文速览]CRG: Improving Grounding in VLM w/o training[2403.02325]
05:24
[论文速览]A Simple LLM Framework for Long-Range Video Question-Answering[2312.17235]
12:22
[论文速览]Align before Fuse / ALBEF: ...[2107.07651]
13:16
[论文速览]Bootstrapping Language-Image Pre-training...[2201.12086]
11:26
[论文速览]BLIP-2 ...with Frozen Image Encoders and Large Language Models[2301.12597]
13:43
[论文速览]Ferret-v2: An Improved...for Referring and Grounding with LLMs[2404.07973]
09:30
客服
顶部
赛事库 课堂 2021拜年纪