この前、このmambaの論文の日本語訳をした記事を出しました。 izmyon.hatenablog.com mambaの理論についてはさっぱりだったので、理論を説明する記事を書き始めました。 ただし、hatenaだと数式が書きづらいのでZennにしました。 今後も続きを書いていくので…
arxiv.org Gu, Albert, and Tri Dao. "Mamba: Linear-Time Sequence Modeling with Selective State Spaces." arXiv preprint arXiv:2312.00752 (2023). ©2023 The Authors License: Creative Commons Attribution 4.0 International License(CC-BY) github.…
In a paper of RetNet, regarded as a successor to Transformer, particularly in Chapter Two, the architecture of RetNet is explained. However, the formula in the paper is a little confusing. In this post, the details of formula is explained …
Transformerの後継と称されるRetNetの以下の論文中にて、特に二章で解説されるRetNetのアーキテクチャについて、行間を埋めながら解説する。 arxiv.org *自分の理解をもとに書いているので、違っているようでしたらコメントください。 Retentive Network Ret…
ERNIE: Enhanced Language Representation with Informative Entities aclanthology.org ©2022 Association for Computational Linguistics License: Creative Commons Attribution 4.0 International License(CC-BY) 本記事は、原著の内容に基づき筆者が要…
COMET: Commonsense Transformers for Automatic Knowledge Graph Construction aclanthology.org ©2022 Association for Computational Linguistics License: Creative Commons Attribution 4.0 International License(CC-BY) 本記事は、原著の内容に基づき…
今日の論文2023/06/11,12:LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention
LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention aclanthology.org Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, and Yuji Matsumoto. 2020. LUKE: Deep Contextualized Entity Representations with E…
Rethinking with Retrieval: Faithful Large Language Model Inference arxiv.org He, Hangfeng, Hongming Zhang, and Dan Roth. "Rethinking with Retrieval: Faithful Large Language Model Inference." arXiv preprint arXiv:2301.00303 (2022). ©2022 Th…
RWKV: Reinventing RNNs for the Transformer Era arxiv.org ©2022 The Authors License: Creative Commons Attribution 4.0 International License(CC-BY) 本記事は、原著の一部を筆者が翻訳したものです。以下の図は、そこから引用しています。 This artic…
Brain-inspired learning in artificial neural networks: a review arxiv.org Schmidgall, Samuel, Jascha Achterberg, Thomas Miconi, Louis Kirsch, Rojin Ziaei, S. Hajiseyedrazi, and Jason Eshraghian. "Brain-inspired learning in artificial neura…
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models arxiv.org Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and Denny Zhou. "Chain of thought prompting elicits reasoning in large language model…
Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data aclanthology.org Katja Filippova. 2020. Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data. In Findings of the Association for Computational…
Retrieval Augmentation Reduces Hallucination in Conversation aclanthology.org Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, and Jason Weston. 2021. Retrieval Augmentation Reduces Hallucination in Conversation. In Findings of the Asso…
On the Origin of Hallucinations in Conversational Models:Is it the Datasets or the Models? aclanthology.org ©2022 Association for Computational Linguistics License: Creative Commons Attribution 4.0 International License(CC-BY) 本記事は、原…
Diving Deep into Modes of Fact Hallucinations in Dialogue Systems aclanthology.org ©2022 Association for Computational Linguistics License: Creative Commons Attribution 4.0 International License(CC-BY) 本記事は、原著の内容に基づき筆者が要…
The Curious Case of Hallucinations in Neural Machine Translation aclanthology.org ©2022 Association for Computational Linguistics License: Creative Commons Attribution 4.0 International License(CC-BY) 本記事は、原著の内容に基づき筆者が要約…
A Distributional Lens for Multi-Aspect Controllable Text Generation aclanthology.org ©2022 Association for Computational Linguistics License: Creative Commons Attribution 4.0 International License(CC-BY) 本記事は、原著の内容に基づき筆者が…
AttentionViz: A Global View of Transformer Attention arxiv.org Yeh, Catherine, Yida Chen, Aoyu Wu, Cynthia Chen, Fernanda Viégas, and Martin Wattenberg. "AttentionViz: A Global View of Transformer Attention." arXiv preprint arXiv:2305.0321…
CIKQA: Learning Commonsense Inference with a Unified Knowledge-in-the-loop QA Paradigm aclanthology.org ©2022 Association for Computational Linguistics License: Creative Commons Attribution 4.0 International License(CC-BY) 本記事は、原著の…
Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents aclanthology.org Eric Smith, Orion Hsu, Rebecca Qian, Stephen Roller, Y-Lan Boureau, and Jason Weston. 2022. …
Long-term Control for Dialogue Generation: Methods and Evaluation aclanthology.org Ramya Ramakrishnan, Hashan Narangodage, Mauro Schilman, Kilian Weinberger, and Ryan McDonald. 2022. Long-term Control for Dialogue Generation: Methods and E…
SKILL: Structured Knowledge Infusion for Large Language Models. aclanthology.org Fedor Moiseev, Zhe Dong, Enrique Alfonseca, and Martin Jaggi. 2022. SKILL: Structured Knowledge Infusion for Large Language Models. In Proceedings of the 2022…
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space aclanthology.org Mor Geva, Avi Caciularu, Kevin Wang, and Yoav Goldberg. 2022. Transformer Feed-Forward Layers Build Predictions by Promoting C…
The Geometry of Multilingual Language Model Representations aclanthology.org Tyler Chang, Zhuowen Tu, and Benjamin Bergen. 2022. The Geometry of Multilingual Language Model Representations. In Proceedings of the 2022 Conference on Empirica…
LLaMA: Open and Efficient Foundation Language Models arxiv.org Touvron H, Lavril T, Izacard G, Martinet X, Lachaux MA, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F, Rodriguez A. Llama: Open and efficient foundation language models. arX…
N-best Response-based Analysis of Contradiction-awareness in Neural Response Generation Models aclanthology.org Shiki Sato, Reina Akama, Hiroki Ouchi, Ryoko Tokuhisa, Jun Suzuki, and Kentaro Inui. 2022. N-best Response-based Analysis of Co…
A System For Robot Concept Learning Through Situated Dialogue aclanthology.org Benjamin Kane, Felix Gervits, Matthias Scheutz, and Matthew Marge. 2022. A System For Robot Concept Learning Through Situated Dialogue. In Proceedings of the 23…
User Satisfaction Modeling with Domain Adaptation in Task-oriented Dialogue Systems aclanthology.org Yan Pan, Mingyang Ma, Bernhard Pflugfelder, and Georg Groh. 2022. User Satisfaction Modeling with Domain Adaptation in Task-oriented Dialo…
Reducing Model Churn: Stable Re-training of Conversational Agents aclanthology.org Christopher Hidey, Fei Liu, and Rahul Goel. 2022. Reducing Model Churn: Stable Re-training of Conversational Agents. In Proceedings of the 23rd Annual Meeti…
Graph Neural Network Policies and Imitation Learning for Multi-Domain Task-Oriented Dialogues aclanthology.org Thibault Cordier, Tanguy Urvoy, Fabrice Lefèvre, and Lina M. Rojas Barahona. 2022. Graph Neural Network Policies and Imitation L…