“ 关注晓理紫,每日更新论文,如感兴趣,请转发给有需要的同学,谢谢支持
“ 如果你感觉对你有所帮助,请关注我,每日准时为你推送最新论文。
“ 为了答谢各位网友的支持,从今日起免费为300名读者提供订阅主题论文服务,只需VX关注公号并回复{邮箱+论文主题}(如:123456@xx.com + chatgpt@large language model @LLM),主题必须是同一个领域,最多三个关键词。解释权归博主所有

“ 分类:
大语言模型LLM
视觉模型VLM
扩散模型
视觉语言导航VLN
强化学习 RL
模仿学习 IL
机器人
开放词汇,检测分割
作者: Vassil Atanassov, Jiatao Ding, Jens Kober
PubTime: 2024-01-29
Downlink: http://arxiv.org/abs/2401.16337v1
Project: https://youtu.be/nRaMCrwU5X8|
中文摘要: 深度强化学习(DRL)已经成为掌握爆发性和多功能四足跳跃技能的一种有前途的解决方案。然而,当前基于DRL的框架通常依赖于定义明确的参考轨迹,这些轨迹是通过捕捉动物运动或从现有控制器转移经验来获得的。这项工作探索了在不模仿参考轨迹的情况下学习动态跳跃的可能性。为此,我们将课程设计纳入DRL,以逐步完成具有挑战性的任务。从垂直原地跳跃开始,我们将学习到的策略推广到向前和对角跳跃,最后,学习跳过障碍。以期望的着陆位置、方向和障碍尺寸为条件,所提出的方法有助于大范围的跳跃运动,包括全向跳跃和健壮跳跃,减轻了预先提取参考的努力。特别是,在没有参考运动约束的情况下,实现了90厘米的向前跳跃,超过了现有文献中报道的类似机器人的先前记录。此外,即使在训练阶段没有遇到,也可以在柔软的草地上连续跳跃。展示我们结果的补充视频可以在https://youtu.be/nRaMCrwU5X8。
摘要: Deep reinforcement learning (DRL) has emerged as a promising solution to mastering explosive and versatile quadrupedal jumping skills. However, current DRL-based frameworks usually rely on well-defined reference trajectories, which are obtained by capturing animal motions or transferring experience from existing controllers. This work explores the possibility of learning dynamic jumping without imitating a reference trajectory. To this end, we incorporate a curriculum design into DRL so as to accomplish challenging tasks progressively. Starting from a vertical in-place jump, we then generalize the learned policy to forward and diagonal jumps and, finally, learn to jump across obstacles. Conditioned on the desired landing location, orientation, and obstacle dimensions, the proposed approach contributes to a wide range of jumping motions, including omnidirectional jumping and robust jumping, alleviating the effort to extract references in advance. Particularly, without constraints from the reference motion, a 90cm forward jump is achieved, exceeding previous records for similar robots reported in the existing literature. Additionally, continuous jumping on the soft grassy floor is accomplished, even when it is not encountered in the training stage. A supplementary video showing our results can be found at https://youtu.be/nRaMCrwU5X8 .
作者: Jesse Zhang, Karl Pertsch, Jiahui Zhang
PubTime: 2024-01-29
Downlink: http://arxiv.org/abs/2306.11886v3
Project: https://clvrai.com/sprint|
中文摘要: 利用丰富的技能集对机器人策略进行预训练,可以大大加快下游任务的学习速度。之前的研究通过自然语言指令来定义预训练任务,但这样做需要对成千上万条指令进行繁琐的人工标注。因此,我们提出了 SPRINT,这是一种可扩展的离线策略预训练方法,可大幅减少预训练各种技能所需的人力。我们的方法利用两个核心理念来自动扩展预训练任务的基础集:通过大型语言模型进行指令重标注,以及通过离线强化学习进行跨轨迹技能链。因此,SPRINT 预训练为机器人配备了更丰富的技能库。家庭模拟器和真实机器人厨房操作任务的实验结果表明,与以前的预训练方法相比,SPRINT能更快地学习新的长期任务。网站:https://clvrai.com/sprint。
摘要: Pre-training robot policies with a rich set of skills can substantially accelerate the learning of downstream tasks. Prior works have defined pre-training tasks via natural language instructions, but doing so requires tedious human annotation of hundreds of thousands of instructions. Thus, we propose SPRINT, a scalable offline policy pre-training approach which substantially reduces the human effort needed for pre-training a diverse set of skills. Our method uses two core ideas to automatically expand a base set of pre-training tasks: instruction relabeling via large language models and cross-trajectory skill chaining through offline reinforcement learning. As a result, SPRINT pre-training equips robots with a much richer repertoire of skills. Experimental results in a household simulator and on a real robot kitchen manipulation task show that SPRINT leads to substantially faster learning of new long-horizon tasks than previous pre-training approaches. Website at https://clvrai.com/sprint.
作者: Jianlan Luo, Zheyuan Hu, Charles Xu
PubTime: 2024-01-29
Downlink: http://arxiv.org/abs/2401.16013v1
Project: https://serl-robot.github.io/|
中文摘要: 近年来,机器人强化学习(RL)领域取得了重大进展,实现了处理复杂图像观察、在现实世界中训练以及整合辅助数据(如演示和先前经验)的方法。然而,尽管有这些进步,机器人RL仍然很难使用。从业者公认,这些算法的特定实现细节对于性能来说通常与算法的选择一样重要(如果不是更重要的话)。我们认为,机器人RL的广泛采用以及机器人RL方法的进一步发展的一个重大挑战是这种方法的相对不可及性。为了应对这一挑战,我们开发了一个精心实现的库,其中包含一个样本高效的非策略深度RL方法,以及计算奖励和重置环境的方法,一个广泛采用的机器人的高质量控制器,以及许多具有挑战性的示例任务。我们提供这个库作为社区的资源,描述它的设计选择,并展示实验结果。也许令人惊讶的是,我们发现我们的实施可以实现非常有效的学习,平均在每个策略25到50分钟的训练中获得PCB板组装、电缆布线和对象重新定位的策略,比文献中报告的类似任务的最先进结果有所改善。这些策略实现了完美或接近完美的成功率,即使在扰动下也具有极强的鲁棒性,并表现出紧急恢复和修正行为。我们希望这些有希望的结果和我们高质量的开源实现将为机器人社区提供一个工具,以促进机器人RL的进一步发展。我们的代码、文档和视频可以在https://serl-robot.github.io/
摘要: In recent years, significant progress has been made in the field of robotic reinforcement learning (RL), enabling methods that handle complex image observations, train in the real world, and incorporate auxiliary data, such as demonstrations and prior experience. However, despite these advances, robotic RL remains hard to use. It is acknowledged among practitioners that the particular implementation details of these algorithms are often just as important (if not more so) for performance as the choice of algorithm. We posit that a significant challenge to widespread adoption of robotic RL, as well as further development of robotic RL methods, is the comparative inaccessibility of such methods. To address this challenge, we developed a carefully implemented library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment, a high-quality controller for a widely-adopted robot, and a number of challenging example tasks. We provide this library as a resource for the community, describe its design choices, and present experimental results. Perhaps surprisingly, we find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation between 25 to 50 minutes of training per policy on average, improving over state-of-the-art results reported for similar tasks in the literature. These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent recovery and correction behaviors. We hope that these promising results and our high-quality open-source implementation will provide a tool for the robotics community to facilitate further developments in robotic RL. Our code, documentation, and videos can be found at https://serl-robot.github.io/
作者: Xinran Li, Jun Zhang
PubTime: 2024-01-29
Downlink: http://arxiv.org/abs/2312.15600v2
GitHub: https://github.com/LXXXXR/CACOM|
中文摘要: 多智能体强化学习(MARL)中有效的通信协议对于促进合作和提高团队绩效至关重要。为了利用通信,许多先前的工作已经提出将本地信息压缩成单个消息,并将其广播给所有可到达的代理。然而,这种简单的消息传递机制可能无法向单个代理提供足够的、关键的和相关的信息,尤其是在带宽严重受限的情况下。这促使我们为MARL开发上下文感知通信方案,旨在向不同的代理传递个性化的消息。我们的通信协议名为CACOM,由两个阶段组成。在第一阶段,代理以广播的方式交换粗略的表示,为第二阶段提供上下文。接下来,代理在第二阶段利用注意力机制来选择性地为接收者生成个性化的消息。此外,我们采用学习步长量化(LSQ)技术进行消息量化,以减少通信开销。为了评估CACOM的有效性,我们将其与演员——评论家和基于价值的MARL算法相结合。协作基准任务的实证结果表明,在通信受限的情况下,CACOM提供了明显的性能增益。该代码可在https://github.com/LXXXXR/CACOM。
摘要: Effective communication protocols in multi-agent reinforcement learning (MARL) are critical to fostering cooperation and enhancing team performance. To leverage communication, many previous works have proposed to compress local information into a single message and broadcast it to all reachable agents. This simplistic messaging mechanism, however, may fail to provide adequate, critical, and relevant information to individual agents, especially in severely bandwidth-limited scenarios. This motivates us to develop context-aware communication schemes for MARL, aiming to deliver personalized messages to different agents. Our communication protocol, named CACOM, consists of two stages. In the first stage, agents exchange coarse representations in a broadcast fashion, providing context for the second stage. Following this, agents utilize attention mechanisms in the second stage to selectively generate messages personalized for the receivers. Furthermore, we employ the learned step size quantization (LSQ) technique for message quantization to reduce the communication overhead. To evaluate the effectiveness of CACOM, we integrate it with both actor-critic and value-based MARL algorithms. Empirical results on cooperative benchmark tasks demonstrate that CACOM provides evident performance gains over baselines under communication-constrained scenarios. The code is publicly available at https://github.com/LXXXXR/CACOM.
作者: Suning Huang, Boyuan Chen, Huazhe Xu
PubTime: 2024-01-29
Downlink: http://arxiv.org/abs/2401.13231v2
Project: https://dittogym.github.io|
中文摘要: 机器人协同设计是一个新兴的研究领域,即机器人的形态与学习策略共同优化,以解决特定任务。对于软体机器人来说,协同设计尤其具有前景,因为软体机器人可以采用新型制造技术,实现学习形态和执行器。受大自然和最新新型机器人设计的启发,我们建议更进一步探索新型可重构机器人,即可以在其生命周期内改变形态的机器人。我们将可重构软机器人的控制形式化为一个高维强化学习(RL)问题。我们将形态变化、运动和环境交互统一在同一个行动空间中,并引入了一个适当的从粗到细的课程,使我们能够发现对所产生的机器人进行细粒度控制的策略。我们还介绍了 DittoGym,这是一个针对可重构软体机器人的综合 RL 基准,需要精细的形态变化来完成任务。最后,我们在 DittoGym 上评估了我们提出的 "从粗到细 "算法,并展示了在一个序列中学习多次改变形态的机器人,这是我们的 RL 算法所独有的。更多结果请访问 https://dittogym.github.io。
摘要: Robot co-design, where the morphology of a robot is optimized jointly with a learned policy to solve a specific task, is an emerging area of research. It holds particular promise for soft robots, which are amenable to novel manufacturing techniques that can realize learned morphologies and actuators. Inspired by nature and recent novel robot designs, we propose to go a step further and explore the novel reconfigurable robots, defined as robots that can change their morphology within their lifetime. We formalize control of reconfigurable soft robots as a high-dimensional reinforcement learning (RL) problem. We unify morphology change, locomotion, and environment interaction in the same action space, and introduce an appropriate, coarse-to-fine curriculum that enables us to discover policies that accomplish fine-grained control of the resulting robots. We also introduce DittoGym, a comprehensive RL benchmark for reconfigurable soft robots that require fine-grained morphology changes to accomplish the tasks. Finally, we evaluate our proposed coarse-to-fine algorithm on DittoGym and demonstrate robots that learn to change their morphology several times within a sequence, uniquely enabled by our RL algorithm. More results are available at https://dittogym.github.io.
作者: Kavya Puthuveetil, Sasha Wald, Atharva Pusalkar
PubTime: 2024-01-29
Downlink: http://arxiv.org/abs/2304.04822v2
摘要: Robotic caregivers could potentially improve the quality of life of many who require physical assistance. However, in order to assist individuals who are lying in bed, robots must be capable of dealing with a significant obstacle: the blanket or sheet that will almost always cover the person's body. We propose a method for targeted bedding manipulation over people lying supine in bed where we first learn a model of the cloth's dynamics. Then, we optimize over this model to uncover a given target limb using information about human body shape and pose that only needs to be provided at run-time. We show how this approach enables greater robustness to variation relative to geometric and reinforcement learning baselines via a number of generalization evaluations in simulation and in the real world. We further evaluate our approach in a human study with 12 participants where we demonstrate that a mobile manipulator can adapt to real variation in human body shape, size, pose, and blanket configuration to uncover target body parts without exposing the rest of the body. Source code and supplementary materials are available online.
作者: Gagan Khandate, Tristan L. Saidi, Siqi Shang
PubTime: 2024-01-27
Downlink: http://arxiv.org/abs/2401.15484v1
Project: https://sbrl.cs.columbia.edu|
中文摘要: 我们提出了一种方法,用于实现复杂技能(如灵巧操作)的运动控制策略的强化学习。我们假设训练这种策略的一个关键困难是探索问题状态空间的困难,因为该空间的可访问和有用区域沿着原始高维状态空间的流形形成复杂的结构。这项工作提出了一种方法,以实现和支持勘探与采样为基础的规划。我们使用了一种普遍适用的非完整快速探索随机树算法,并提出了多种方法来使用由此产生的结构来引导无模型强化学习。我们的方法在学习比以前显示的难度更高的各种具有挑战性的灵巧运动控制技能方面是有效的。特别是,我们实现了复杂物体的灵巧手动操作,同时在不使用被动支撑表面的情况下固定物体。这些政策也有效地转移到真正的机器人身上。在项目网站上也可以找到一些示例视频:https://sbrl.cs.columbia.edu
摘要: We present a method for enabling Reinforcement Learning of motor control policies for complex skills such as dexterous manipulation. We posit that a key difficulty for training such policies is the difficulty of exploring the problem state space, as the accessible and useful regions of this space form a complex structure along manifolds of the original high-dimensional state space. This work presents a method to enable and support exploration with Sampling-based Planning. We use a generally applicable non-holonomic Rapidly-exploring Random Trees algorithm and present multiple methods to use the resulting structure to bootstrap model-free Reinforcement Learning. Our method is effective at learning various challenging dexterous motor control skills of higher difficulty than previously shown. In particular, we achieve dexterous in-hand manipulation of complex objects while simultaneously securing the object without the use of passive support surfaces. These policies also transfer effectively to real robots. A number of example videos can also be found on the project website: https://sbrl.cs.columbia.edu
作者: Federco Malato, Florian Leopold, Andrew Melnik
PubTime: 2024-01-29
Downlink: http://arxiv.org/abs/2401.16398v1
中文摘要: 行为克隆使用演示数据集来学习策略。为了克服计算昂贵的训练过程并解决策略适应问题,我们建议使用预训练基础模型的潜在空间来索引演示数据集,即时访问类似的相关经验,并从这些情况中复制行为。来自所选相似情境的动作可以由主体执行,直到主体的当前情境和所选体验的表征在潜在空间中出现分歧。因此,我们将我们的控制问题表述为在专家演示数据集上的动态搜索问题。我们在视频预训练模型的潜在表示中的玄武岩矿物数据集上测试了我们的方法。我们将我们的模型与最先进的、基于模仿学习的《我的世界》代理进行比较。我们的方法可以有效地恢复有意义的演示,并在各种场景中显示《我的世界》环境中代理的类似人类的行为。实验结果表明,我们的基于搜索的方法在准确性和感知评估方面明显优于基于学习的模型。
摘要: Behavioral cloning uses a dataset of demonstrations to learn a policy. To overcome computationally expensive training procedures and address the policy adaptation problem, we propose to use latent spaces of pre-trained foundation models to index a demonstration dataset, instantly access similar relevant experiences, and copy behavior from these situations. Actions from a selected similar situation can be performed by the agent until representations of the agent's current situation and the selected experience diverge in the latent space. Thus, we formulate our control problem as a dynamic search problem over a dataset of experts' demonstrations. We test our approach on BASALT MineRL-dataset in the latent representation of a Video Pre-Training model. We compare our model to state-of-the-art, Imitation Learning-based Minecraft agents. Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment in a wide variety of scenarios. Experimental results reveal that performance of our search-based approach clearly wins in terms of accuracy and perceptual evaluation over learning-based models.
作者: Zuojin Tang, Xiaoyu Chen, YongQiang Li
PubTime: 2024-01-29
Downlink: http://arxiv.org/abs/2401.11792v4
中文摘要: 智能驾驶系统应该能够根据当前环境和车辆状态动态制定适当的驾驶策略,同时保证系统的安全性和可靠性。然而,基于强化学习和模仿学习的现有方法存在安全性低、泛化能力差和采样效率低的问题。此外,它们不能准确预测未来的驾驶轨迹,而对未来驾驶轨迹的准确预测是做出最优决策的先决条件。为了解决这些问题,本文介绍了一种安全通用的端到端自动驾驶系统(SGADS),适用于复杂多样的场景。我们的SGADS将变分推理与归一化流量相结合,使智能车辆能够准确预测未来的驾驶轨迹。此外,我们提出了鲁棒安全约束的公式。此外,我们将强化学习与演示相结合,以增强代理的搜索过程。实验结果表明,与现有方法相比,我们的SGADS可以显著提高安全性能,表现出很强的泛化能力,并提高智能车辆在复杂城市场景中的训练效率。
摘要: An intelligent driving system should be capable of dynamically formulating appropriate driving strategies based on the current environment and vehicle status, while ensuring the security and reliability of the system. However, existing methods based on reinforcement learning and imitation learning suffer from low safety, poor generalization, and inefficient sampling. Additionally, they cannot accurately predict future driving trajectories, and the accurate prediction of future driving trajectories is a precondition for making optimal decisions. To solve these problems, in this paper, we introduce a Safe and Generalized end-to-end Autonomous Driving System (SGADS) for complex and various scenarios. Our SGADS incorporates variational inference with normalizing flows, enabling the intelligent vehicle to accurately predict future driving trajectories. Moreover, we propose the formulation of robust safety constraints. Furthermore, we combine reinforcement learning with demonstrations to augment search process of the agent. The experimental results demonstrate that our SGADS can significantly improve safety performance, exhibit strong generalization, and enhance the training efficiency of intelligent vehicles in complex urban scenarios compared to existing methods.
作者: Zhiyu Huang, Chen Tang, Chen Lv
PubTime: 2024-01-27
Downlink: http://arxiv.org/abs/2401.15315v1
中文摘要: 自动驾驶中的有效决策依赖于对其他交通代理未来行为的准确推断。为了实现这一点,我们提出了一个基于在线学习的行为预测模型和一个有效的部分可观测马尔可夫决策过程规划器。我们开发了一个基于学习的预测模型,用递归神经记忆网络增强,以动态更新潜在的信念状态并推断其他代理的意图。该模型还可以整合自我车辆的意图,以反映智能体之间的闭环交互,并从离线数据和在线交互中学习。对于规划,我们采用了基于选项的蒙特卡罗树搜索(MCTS)规划器,它通过搜索动作序列来降低计算复杂性。在MCTS规划器中,我们使用预测的长期多模态轨迹来近似未来的更新,这消除了迭代的信念更新并提高了运行效率。我们的方法还结合了深度Q学习(DQN)作为搜索先验,这显著提高了MCTS规划器的性能。模拟环境的实验结果验证了我们提出的方法的有效性。在线信念更新模型可以显著提高预测的准确性和时间一致性,从而提高决策性能。在MCTS规划器中使用DQN作为搜索先验大大提高了它的性能,并且优于基于模仿学习的先验。此外,我们还表明,基于期权的MCTS在性能和效率方面明显优于普通方法。
摘要: Effective decision-making in autonomous driving relies on accurate inference of other traffic agents' future behaviors. To achieve this, we propose an online learning-based behavior prediction model and an efficient planner for Partially Observable Markov Decision Processes (POMDPs). We develop a learning-based prediction model, enhanced with a recurrent neural memory network, to dynamically update latent belief states and infer the intentions of other agents. The model can also integrate the ego vehicle's intentions to reflect closed-loop interactions among agents, and it learns from both offline data and online interactions. For planning, we employ an option-based Monte-Carlo Tree Search (MCTS) planner, which reduces computational complexity by searching over action sequences. Inside the MCTS planner, we use predicted long-term multi-modal trajectories to approximate future updates, which eliminates iterative belief updating and improves the running efficiency. Our approach also incorporates deep Q-learning (DQN) as a search prior, which significantly improves the performance of the MCTS planner. Experimental results from simulated environments validate the effectiveness of our proposed method. The online belief update model can significantly enhance the accuracy and temporal consistency of predictions, leading to improved decision-making performance. Employing DQN as a search prior in the MCTS planner considerably boosts its performance and outperforms an imitation learning-based prior. Additionally, we show that the option-based MCTS substantially outperforms the vanilla method in terms of performance and efficiency.
作者: Rishabh Madan, Skyler Valdez, David Kim
PubTime: 2024-01-26
Downlink: http://arxiv.org/abs/2401.15159v1
Project: https://emprise.cs.cornell.edu/rabbit|
中文摘要: 本文介绍了RABBIT,这是一种新颖的机器人辅助床上沐浴系统,旨在解决个人卫生任务中对辅助技术日益增长的需求。它结合了多模态感知和双重(软件和硬件)合规性,以执行安全舒适的物理人机交互。使用RGB和热成像来准确分割干燥、肥皂和潮湿的皮肤区域,RABBIT可以根据专家护理实践有效地执行洗涤、漂洗和干燥任务。我们的系统包括受人类护理技术启发的定制设计的运动原语,以及一种名为Scrubby的新型柔顺末端执行器,针对温和有效的交互进行了优化。我们对12名参与者进行了用户研究,其中包括一名行动严重受限的参与者,证明了该系统的有效性和感知舒适度。补充材料和视频可在我们的网站https://emprise.cs.cornell.edu/rabbit。
摘要: This paper introduces RABBIT, a novel robot-assisted bed bathing system designed to address the growing need for assistive technologies in personal hygiene tasks. It combines multimodal perception and dual (software and hardware) compliance to perform safe and comfortable physical human-robot interaction. Using RGB and thermal imaging to segment dry, soapy, and wet skin regions accurately, RABBIT can effectively execute washing, rinsing, and drying tasks in line with expert caregiving practices. Our system includes custom-designed motion primitives inspired by human caregiving techniques, and a novel compliant end-effector called Scrubby, optimized for gentle and effective interactions. We conducted a user study with 12 participants, including one participant with severe mobility limitations, demonstrating the system's effectiveness and perceived comfort. Supplementary material and videos can be found on our website https://emprise.cs.cornell.edu/rabbit.
作者: Farhin Farhad Riya, Shahinul Hoque, Xiaopeng Zhao
PubTime: 2024-01-28
Downlink: http://arxiv.org/abs/2401.15762v1
摘要: The future of transportation is being shaped by technology, and one revolutionary step in improving road safety is the incorporation of robotic systems into driver monitoring infrastructure. This literature review explores the current landscape of driver monitoring systems, ranging from traditional physiological parameter monitoring to advanced technologies such as facial recognition to steering analysis. Exploring the challenges faced by existing systems, the review then investigates the integration of robots as intelligent entities within this framework. These robotic systems, equipped with artificial intelligence and sophisticated sensors, not only monitor but actively engage with the driver, addressing cognitive and emotional states in real-time. The synthesis of existing research reveals a dynamic interplay between human and machine, offering promising avenues for innovation in adaptive, personalized, and ethically responsible human-robot interactions for driver monitoring. This review establishes a groundwork for comprehending the intricacies and potential avenues within this dynamic field. It encourages further investigation and advancement at the intersection of human-robot interaction and automotive safety, introducing a novel direction. This involves various sections detailing technological enhancements that can be integrated to propose an innovative and improved driver monitoring system.
作者: Shahinul Hoque, Farhin Farhad Riya, Jinyuan Sun
PubTime: 2024-01-28
Downlink: http://arxiv.org/abs/2401.15760v1
中文摘要: 人工智能和机器学习的突破带来了机器人学的新革命,导致了更复杂的机器人系统的构建。这些机器人系统不仅可以惠及所有领域,还可以完成几年前似乎不可想象的任务。从成群的自主小型机器人一起工作,到更重更大的物体,再到能够前往最恶劣环境的看似坚不可摧的机器人,我们可以看到为每一项可以想象的任务而设计的机器人系统。其中,机器人系统可以受益的一个关键场景是灾难响应场景和救援行动。机器人系统能够成功地执行任务,例如移除沉重的材料,利用多个先进的传感器来寻找感兴趣的物体,在碎片和各种恶劣的环境中移动,并且至少具有飞行能力。即使有如此大的潜力,我们也很少看到机器人系统在灾难响应场景和救援任务中的应用。在这种情况下,许多因素可能导致机器人系统的低利用率。其中一个关键因素涉及与人机交互(HRI)问题相关的挑战。因此,在本文中,我们试图了解在灾难响应和救援行动中利用机器人系统所面临的HRI挑战。此外,我们还介绍了一些为灾难响应场景设计的机器人系统,并确定了这些系统的HRI挑战。最后,我们试图通过介绍各种拟议研究工作的想法来应对挑战。
摘要: The breakthrough in AI and Machine Learning has brought a new revolution in robotics, resulting in the construction of more sophisticated robotic systems. Not only can these robotic systems benefit all domains, but also can accomplish tasks that seemed to be unimaginable a few years ago. From swarms of autonomous small robots working together to more very heavy and large objects, to seemingly indestructible robots capable of going to the harshest environments, we can see robotic systems designed for every task imaginable. Among them, a key scenario where robotic systems can benefit is in disaster response scenarios and rescue operations. Robotic systems are capable of successfully conducting tasks such as removing heavy materials, utilizing multiple advanced sensors for finding objects of interest, moving through debris and various inhospitable environments, and not the least have flying capabilities. Even with so much potential, we rarely see the utilization of robotic systems in disaster response scenarios and rescue missions. Many factors could be responsible for the low utilization of robotic systems in such scenarios. One of the key factors involve challenges related to Human-Robot Interaction (HRI) issues. Therefore, in this paper, we try to understand the HRI challenges involving the utilization of robotic systems in disaster response and rescue operations. Furthermore, we go through some of the proposed robotic systems designed for disaster response scenarios and identify the HRI challenges of those systems. Finally, we try to address the challenges by introducing ideas from various proposed research works.
作者: Chao Wang, Stephan Hasler, Daniel Tanneberg
PubTime: 2024-01-26
Downlink: http://arxiv.org/abs/2401.15174v1
中文摘要: 本文提出了一种创新的基于大语言模型(LLM)的机器人系统,用于增强多模态人机交互(HRI)。传统的HRI系统依赖于复杂的设计来进行意图估计、推理和行为生成,这些都是资源密集型的。相比之下,我们的系统使研究人员和从业人员能够通过三个关键方面来规范机器人行为:提供高级语言指导,为机器人可以使用的动作和表情创建“原子”,以及提供一组示例。在物理机器人上实现,它展示了适应多模态输入和确定适当的行动方式以帮助人类的熟练程度,遵循研究人员定义的指南。同时,它将机器人的盖子、脖子和耳朵运动与语音输出相协调,以产生动态的多模态表情。这展示了该系统通过从传统的、手动的状态和流程设计方法转变为直观的、基于指导的和示例驱动的方法来彻底改变HRI的潜力。
摘要: This paper presents an innovative large language model (LLM)-based robotic system for enhancing multi-modal human-robot interaction (HRI). Traditional HRI systems relied on complex designs for intent estimation, reasoning, and behavior generation, which were resource-intensive. In contrast, our system empowers researchers and practitioners to regulate robot behavior through three key aspects: providing high-level linguistic guidance, creating "atomics" for actions and expressions the robot can use, and offering a set of examples. Implemented on a physical robot, it demonstrates proficiency in adapting to multi-modal inputs and determining the appropriate manner of action to assist humans with its arms, following researchers' defined guidelines. Simultaneously, it coordinates the robot's lid, neck, and ear movements with speech output to produce dynamic, multi-modal expressions. This showcases the system's potential to revolutionize HRI by shifting from conventional, manual state-and-flow design methods to an intuitive, guidance-based, and example-driven approach.
作者: Yuxue Yang, Lue Fan, Zhaoxiang Zhang
PubTime: 2024-01-29
Downlink: http://arxiv.org/abs/2401.16305v1
GitHub: https://github.com/BraveGroup/PointSAM-for-MixSup|https://github.com/BraveGroup/PointSAM-for-MixSup|
中文摘要: 标签高效的基于激光雷达的3D目标检测目前由弱/半监督方法主导。我们提出MixSup,而不是只遵循其中一个,这是一个更实用的范例,同时利用大量廉价的粗标签和有限数量的精确标签进行混合粒度监管。我们首先观察到点云通常是无纹理的,这使得学习语义变得困难。然而,点云具有丰富的几何形状,并且与传感器的距离比例不变,这使得学习对象的几何形状相对容易,例如姿势和形状。因此,MixSup利用大量粗糙的簇级标签来学习语义,并利用一些昂贵的框级标签来学习准确的姿势和形状。我们重新设计了主流检测器中的标签分配,这允许它们无缝集成到MixSup中,实现实用性和通用性。我们在nuScenes、Waymo开放数据集和KITTI中验证了它的有效性,采用了各种检测器。MixSup实现了高达97.31%的完全监督性能,使用廉价的集群注释和仅10%的盒子注释。此外,我们提出了基于Segment Anything模型的PointSAM用于自动粗标注,进一步减轻了注释负担。该代码可在https://github.com/brave group/point sam-for-mix sup获得。
摘要: Label-efficient LiDAR-based 3D object detection is currently dominated by weakly/semi-supervised methods. Instead of exclusively following one of them, we propose MixSup, a more practical paradigm simultaneously utilizing massive cheap coarse labels and a limited number of accurate labels for Mixed-grained Supervision. We start by observing that point clouds are usually textureless, making it hard to learn semantics. However, point clouds are geometrically rich and scale-invariant to the distances from sensors, making it relatively easy to learn the geometry of objects, such as poses and shapes. Thus, MixSup leverages massive coarse cluster-level labels to learn semantics and a few expensive box-level labels to learn accurate poses and shapes. We redesign the label assignment in mainstream detectors, which allows them seamlessly integrated into MixSup, enabling practicality and universality. We validate its effectiveness in nuScenes, Waymo Open Dataset, and KITTI, employing various detectors. MixSup achieves up to 97.31% of fully supervised performance, using cheap cluster annotations and only 10% box annotations. Furthermore, we propose PointSAM based on the Segment Anything Model for automated coarse labeling, further reducing the annotation burden. The code is available at https://github.com/BraveGroup/PointSAM-for-MixSup.
作者: Siteng Ma, Haochang Wu, Aonghus Lawlor
PubTime: 2024-01-29
Downlink: http://arxiv.org/abs/2401.16298v1
GitHub: https://github.com/HelenMa9998/Selective_Uncertainty_AL|
中文摘要: 主动学习(AL)在医学图像分割中得到了广泛的应用,旨在减轻注释工作量并提高性能。传统的基于不确定性的人工智能方法,如熵和贝叶斯方法,通常依赖于所有像素级度量的集合。然而,在不平衡的环境中,这些方法倾向于忽略目标区域的重要性,例如病变和肿瘤。此外,基于不确定性的选择引入了冗余。这些因素导致性能不令人满意,在许多情况下,甚至表现不佳的随机抽样。为了解决这个问题,我们引入了一种新的方法,称为基于选择性不确定性的AL,避免了将所有像素的度量相加的传统做法。通过过滤过程,我们的策略对目标区域内和决策边界附近的像素进行优先排序。这解决了上述无视目标领域和冗余的问题。我们的方法在五种不同的基于不确定性的方法和两个不同的数据集上显示出实质性的改进,利用更少的标记数据来达到监督基线,并始终实现最高的整体性能。我们的代码可以在https://github.com/HelenMa9998/Selective_underun确定_AL。
摘要: Active learning (AL) has found wide applications in medical image segmentation, aiming to alleviate the annotation workload and enhance performance. Conventional uncertainty-based AL methods, such as entropy and Bayesian, often rely on an aggregate of all pixel-level metrics. However, in imbalanced settings, these methods tend to neglect the significance of target regions, eg., lesions, and tumors. Moreover, uncertainty-based selection introduces redundancy. These factors lead to unsatisfactory performance, and in many cases, even underperform random sampling. To solve this problem, we introduce a novel approach called the Selective Uncertainty-based AL, avoiding the conventional practice of summing up the metrics of all pixels. Through a filtering process, our strategy prioritizes pixels within target areas and those near decision boundaries. This resolves the aforementioned disregard for target areas and redundancy. Our method showed substantial improvements across five different uncertainty-based methods and two distinct datasets, utilizing fewer labeled data to reach the supervised baseline and consistently achieving the highest overall performance. Our code is available at https://github.com/HelenMa9998/Selective_Uncertainty_AL.
作者: Iris de Gélis, Thomas Corpetti, Sébastien Lefèvre
PubTime: 2024-01-29
Downlink: http://arxiv.org/abs/2304.12639v2
GitHub: https://github.com/IdeGelis/torch-points3d-SiamKPConvVariants|
中文摘要: 变化检测是一项重要的任务,可以快速识别修改区域,尤其是在涉及多时态数据时。在具有复杂几何形状的景观中(例如,城市环境),垂直信息是非常有用的知识来源,可以突出变化并将其分为不同的类别。在这项研究中,我们专注于直接使用原始三维(3D)点云(PCs)进行变化分割,以避免由于光栅化过程造成的任何信息损失。虽然深度学习最近通过连体网络编码信息证明了其对这一特定任务的有效性,但我们在此研究了在深度网络的早期步骤中也使用变化信息的想法。为了做到这一点,我们首先建议提供一个具有手工制作特征的暹罗KPConv最先进(SoTA)网络,特别是与变化相关的网络,它将变化类别上的交集/并集(IoU)的平均值提高了4.70%。考虑到由于与变化相关的特征而获得了重大改进,我们随后提出了三种新的架构来解决3D PC变化分割:OneConvFusion、Triplet KPConv和编码器融合SiamKPConv。所有这些网络都在早期步骤中考虑了变化信息,并且优于SoTA方法。特别是,编码器融合SiamKPConv在变化类别上超过SoTA方法超过IoU平均值的5%,强调了让网络关注变化信息对于变化检测任务的价值。代码可从https://github.com/IdeGelis/torch-points 3D-siamkpconvariants获得。
摘要: Change detection is an important task that rapidly identifies modified areas, particularly when multi-temporal data are concerned. In landscapes with a complex geometry (e.g., urban environment), vertical information is a very useful source of knowledge that highlights changes and classifies them into different categories. In this study, we focus on change segmentation using raw three-dimensional (3D) point clouds (PCs) directly to avoid any information loss due to the rasterization processes. While deep learning has recently proven its effectiveness for this particular task by encoding the information through Siamese networks, we investigate herein the idea of also using change information in the early steps of deep networks. To do this, we first propose to provide a Siamese KPConv state-of-the-art (SoTA) network with hand-crafted features, especially a change-related one, which improves the mean of the Intersection over Union (IoU) over the classes of change by 4.70%. Considering that a major improvement is obtained due to the change-related feature, we then propose three new architectures to address 3D PC change segmentation: OneConvFusion, Triplet KPConv, and Encoder Fusion SiamKPConv. All these networks consider the change information in the early steps and outperform the SoTA methods. In particular, Encoder Fusion SiamKPConv overtakes the SoTA approaches by more than 5% of the mean of the IoU over the classes of change, emphasizing the value of having the network focus on change information for the change detection task. The code is available at https://github.com/IdeGelis/torch-points3d-SiamKPConvVariants.
作者: Cilin Yan, Haochen Wang, Jie Liu
PubTime: 2024-01-29
Downlink: http://arxiv.org/abs/2304.11609v4
GitHub: https://github.com/cilinyan/PiClick|
中文摘要: 基于点击的交互式分割旨在通过人工点击生成目标蒙版,这有助于高效的像素级注释和图像编辑。在这样的任务中,目标模糊仍然是阻碍分割的准确性和效率的问题。也就是说,在上下文丰富的场景中,一次点击可能对应多个潜在目标,而以前的大多数交互式分割器只生成单个掩模,无法处理目标模糊。在本文中,我们提出了一种新的交互式分割网络,名为PiClick,以产生所有潜在的合理掩码,并为用户建议最合理的掩码。具体来说,PiClick利用基于Transformer model的架构,通过相互交互的掩码查询来生成所有潜在的目标掩码。此外,PiClick中设计了一个目标推理模块,可以从所有候选人中自动建议用户想要的掩码,从而缓解目标模糊性和额外的人工努力。在9个交互式分割数据集上的大量实验表明,考虑到分割结果,PiClick的性能优于以前的技术水平。此外,我们表明PiClick有效地减少了人类在注释和挑选所需掩码方面的努力。为了方便使用并启发未来的研究,我们发布了PiClick的源代码以及一个即插即用的注释工具,网址为https://github.com/cilinyan/PiClick。
摘要: Click-based interactive segmentation aims to generate target masks via human clicking, which facilitates efficient pixel-level annotation and image editing. In such a task, target ambiguity remains a problem hindering the accuracy and efficiency of segmentation. That is, in scenes with rich context, one click may correspond to multiple potential targets, while most previous interactive segmentors only generate a single mask and fail to deal with target ambiguity. In this paper, we propose a novel interactive segmentation network named PiClick, to yield all potentially reasonable masks and suggest the most plausible one for the user. Specifically, PiClick utilizes a Transformer-based architecture to generate all potential target masks by mutually interactive mask queries. Moreover, a Target Reasoning module is designed in PiClick to automatically suggest the user-desired mask from all candidates, relieving target ambiguity and extra-human efforts. Extensive experiments on 9 interactive segmentation datasets demonstrate PiClick performs favorably against previous state-of-the-arts considering the segmentation results. Moreover, we show that PiClick effectively reduces human efforts in annotating and picking the desired masks. To ease the usage and inspire future research, we release the source code of PiClick together with a plug-and-play annotation tool at https://github.com/cilinyan/PiClick.
作者: Lei Yang, Xinyu Zhang, Jun Li
PubTime: 2024-01-29
Downlink: http://arxiv.org/abs/2401.16110v1
GitHub: https://github.com/yanglei18/SGV3D|
中文摘要: 路边感知可以将自动驾驶汽车的感知能力扩展到视觉范围之外,解决盲点问题,从而大大提高自动驾驶汽车的安全性。然而,当前最先进的基于视觉的路边检测方法在标记场景上具有高精度,但在新场景上具有较差的性能。这是因为路边摄像机在安装后保持静止,只能从单个场景收集数据,导致算法过度拟合这些路边背景和摄像机姿态。为了解决这个问题,在本文中,我们提出了一个创新的基于视觉的路边三维目标检测场景泛化框架,称为SGV3D。具体来说,我们采用背景抑制模块(BSM),通过在2D到鸟瞰投影期间衰减背景特征来减轻以视觉为中心的管道中的背景过度拟合。此外,通过引入使用来自新场景的未标记图像的半监督数据生成流水线(SSDG),生成具有不同相机姿态的不同实例前景,解决了过度拟合特定相机姿态的风险。我们在两个大规模路边基准上评估了我们的方法。在新场景中,我们的方法远远超过了所有以前的方法,包括在DAIR-V2X-I异源基准测试中,与BEVHeight相比,车辆+42.57%,行人+5.87%,骑自行车者+14.89%。在更大规模的Rope3D异源基准测试中,我们实现了轿车14.48%和大型车辆12.41%的显著收益。我们渴望对路边感知技术的探索贡献见解,强调它们对场景概括的能力。该代码将在{\url{https://github.com/yanglei18/SGV3D}}
摘要: Roadside perception can greatly increase the safety of autonomous vehicles by extending their perception ability beyond the visual range and addressing blind spots. However, current state-of-the-art vision-based roadside detection methods possess high accuracy on labeled scenes but have inferior performance on new scenes. This is because roadside cameras remain stationary after installation and can only collect data from a single scene, resulting in the algorithm overfitting these roadside backgrounds and camera poses. To address this issue, in this paper, we propose an innovative Scenario Generalization Framework for Vision-based Roadside 3D Object Detection, dubbed SGV3D. Specifically, we employ a Background-suppressed Module (BSM) to mitigate background overfitting in vision-centric pipelines by attenuating background features during the 2D to bird's-eye-view projection. Furthermore, by introducing the Semi-supervised Data Generation Pipeline (SSDG) using unlabeled images from new scenes, diverse instance foregrounds with varying camera poses are generated, addressing the risk of overfitting specific camera poses. We evaluate our method on two large-scale roadside benchmarks. Our method surpasses all previous methods by a significant margin in new scenes, including +42.57% for vehicle, +5.87% for pedestrian, and +14.89% for cyclist compared to BEVHeight on the DAIR-V2X-I heterologous benchmark. On the larger-scale Rope3D heterologous benchmark, we achieve notable gains of 14.48% for car and 12.41% for large vehicle. We aspire to contribute insights on the exploration of roadside perception techniques, emphasizing their capability for scenario generalization. The code will be available at {\url{ https://github.com/yanglei18/SGV3D}}
作者: Dingyuan Zhang, Dingkang Liang, Hongcheng Yang
PubTime: 2024-01-29
Downlink: http://arxiv.org/abs/2306.02245v2
GitHub: https://github.com/DYZhang09/SAM3D|
摘要: With the development of large language models, many remarkable linguistic systems like ChatGPT have thrived and achieved astonishing success on many tasks, showing the incredible power of foundation models. In the spirit of unleashing the capability of foundation models on vision tasks, the Segment Anything Model (SAM), a vision foundation model for image segmentation, has been proposed recently and presents strong zero-shot ability on many downstream 2D tasks. However, whether SAM can be adapted to 3D vision tasks has yet to be explored, especially 3D object detection. With this inspiration, we explore adapting the zero-shot ability of SAM to 3D object detection in this paper. We propose a SAM-powered BEV processing pipeline to detect objects and get promising results on the large-scale Waymo open dataset. As an early attempt, our method takes a step toward 3D object detection with vision foundation models and presents the opportunity to unleash their power on 3D vision tasks. The code is released at https://github.com/DYZhang09/SAM3D.
“ VX关注晓理紫,每日更新论文,如感兴趣,请转发给有需要的同学,谢谢支持。谢谢提供建议
“ 如果你感觉对你有所帮助,请关注我,每日准时为你推送最新论文
“ 为了答谢各位网友的支持,从今日起免费为300名读者提供订阅主题论文服务,只需VX关注公号并回复{邮箱+论文主题}(如:123456@xx.com + chatgpt@large language model @LLM),主题必须是同一个领域,最多三个关键词。解释权归博主所有
