Zennにてmambaの解説記事を書き始めました。

この前、このmambaの論文の日本語訳をした記事を出しました。 izmyon.hatenablog.com mambaの理論についてはさっぱりだったので、理論を説明する記事を書き始めました。ただし、hatenaだと数式が書きづらいのでZennにしました。今後も続きを書いていくので…

2023-12-11

今日の論文2023/12/11：Mamba: Linear-Time Sequence Modeling with Selective State Spaces

今日の論文

arxiv.org Gu, Albert, and Tri Dao. "Mamba: Linear-Time Sequence Modeling with Selective State Spaces." arXiv preprint arXiv:2312.00752 (2023). ©2023 The Authors License: Creative Commons Attribution 4.0 International License(CC-BY) github.…

2023-11-21

Understanding RetNet①: Theory of Retention

In a paper of RetNet, regarded as a successor to Transformer, particularly in Chapter Two, the architecture of RetNet is explained. However, the formula in the paper is a little confusing. In this post, the details of formula is explained …

2023-11-14

RetNetを完全に理解する①：Retentionメカニズム

Transformerの後継と称されるRetNetの以下の論文中にて、特に二章で解説されるRetNetのアーキテクチャについて、行間を埋めながら解説する。 arxiv.org *自分の理解をもとに書いているので、違っているようでしたらコメントください。 Retentive Network Ret…

2023-06-21

今日の論文2023/06/20,21：ERNIE: Enhanced Language Representation with Informative Entities

今日の論文

ERNIE: Enhanced Language Representation with Informative Entities aclanthology.org ©2022 Association for Computational Linguistics License: Creative Commons Attribution 4.0 International License(CC-BY) 本記事は、原著の内容に基づき筆者が要…

2023-06-18

今日の論文2023/06/16,17：COMET: Commonsense Transformers for Automatic Knowledge Graph Construction

今日の論文

COMET: Commonsense Transformers for Automatic Knowledge Graph Construction aclanthology.org ©2022 Association for Computational Linguistics License: Creative Commons Attribution 4.0 International License(CC-BY) 本記事は、原著の内容に基づき…

2023-06-12

今日の論文2023/06/11,12：LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention

今日の論文

LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention aclanthology.org Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, and Yuji Matsumoto. 2020. LUKE: Deep Contextualized Entity Representations with E…

2023-06-10

今日の論文2023/06/09：Rethinking with Retrieval: Faithful Large Language Model Inference

今日の論文

Rethinking with Retrieval: Faithful Large Language Model Inference arxiv.org He, Hangfeng, Hongming Zhang, and Dan Roth. "Rethinking with Retrieval: Faithful Large Language Model Inference." arXiv preprint arXiv:2301.00303 (2022). ©2022 Th…

2023-06-06

今日の論文2023/06/04,05：RWKV: Reinventing RNNs for the Transformer Era

今日の論文

2023-06-04

今日の論文2023/06/03：Brain-inspired learning in artificial neural networks: a review

今日の論文

Brain-inspired learning in artificial neural networks: a review arxiv.org Schmidgall, Samuel, Jascha Achterberg, Thomas Miconi, Louis Kirsch, Rojin Ziaei, S. Hajiseyedrazi, and Jason Eshraghian. "Brain-inspired learning in artificial neura…

2023-05-27

今日の論文2023/05/26：Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

今日の論文

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models arxiv.org Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and Denny Zhou. "Chain of thought prompting elicits reasoning in large language model…

2023-05-26

今日の論文2023/05/24,25：Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data

今日の論文

Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data aclanthology.org Katja Filippova. 2020. Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data. In Findings of the Association for Computational…

2023-05-23

今日の論文2023/05/21,22：Retrieval Augmentation Reduces Hallucination in Conversation

今日の論文

Retrieval Augmentation Reduces Hallucination in Conversation aclanthology.org Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, and Jason Weston. 2021. Retrieval Augmentation Reduces Hallucination in Conversation. In Findings of the Asso…

2023-05-21

今日の論文2023/05/20：On the Origin of Hallucinations in Conversational Models:Is it the Datasets or the Models?

今日の論文

On the Origin of Hallucinations in Conversational Models:Is it the Datasets or the Models? aclanthology.org ©2022 Association for Computational Linguistics License: Creative Commons Attribution 4.0 International License(CC-BY) 本記事は、原…

2023-05-20

今日の論文2023/05/18,19：Diving Deep into Modes of Fact Hallucinations in Dialogue Systems

今日の論文

Diving Deep into Modes of Fact Hallucinations in Dialogue Systems aclanthology.org ©2022 Association for Computational Linguistics License: Creative Commons Attribution 4.0 International License(CC-BY) 本記事は、原著の内容に基づき筆者が要…

2023-05-17

今日の論文2023/05/15,16：The Curious Case of Hallucinations in Neural Machine Translation

今日の論文

The Curious Case of Hallucinations in Neural Machine Translation aclanthology.org ©2022 Association for Computational Linguistics License: Creative Commons Attribution 4.0 International License(CC-BY) 本記事は、原著の内容に基づき筆者が要約…

2023-05-15

今日の論文2023/05/13,14：A Distributional Lens for Multi-Aspect Controllable Text Generation

今日の論文

A Distributional Lens for Multi-Aspect Controllable Text Generation aclanthology.org ©2022 Association for Computational Linguistics License: Creative Commons Attribution 4.0 International License(CC-BY) 本記事は、原著の内容に基づき筆者が…

2023-05-13

今日の論文2023/05/11, 12：AttentionViz: A Global View of Transformer Attention

今日の論文

AttentionViz: A Global View of Transformer Attention arxiv.org Yeh, Catherine, Yida Chen, Aoyu Wu, Cynthia Chen, Fernanda Viégas, and Martin Wattenberg. "AttentionViz: A Global View of Transformer Attention." arXiv preprint arXiv:2305.0321…

2023-05-11

今日の論文2023/05/9,10：CIKQA: Learning Commonsense Inference with a Unified Knowledge-in-the-loop QA Paradigm

今日の論文

CIKQA: Learning Commonsense Inference with a Unified Knowledge-in-the-loop QA Paradigm aclanthology.org ©2022 Association for Computational Linguistics License: Creative Commons Attribution 4.0 International License(CC-BY) 本記事は、原著の…

2023-05-09

今日の論文2023/05/07,8：Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents

今日の論文

Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents aclanthology.org Eric Smith, Orion Hsu, Rebecca Qian, Stephen Roller, Y-Lan Boureau, and Jason Weston. 2022. …

2023-05-06

今日の論文2023/05/04,05：Long-term Control for Dialogue Generation: Methods and Evaluation

今日の論文

Long-term Control for Dialogue Generation: Methods and Evaluation aclanthology.org Ramya Ramakrishnan, Hashan Narangodage, Mauro Schilman, Kilian Weinberger, and Ryan McDonald. 2022. Long-term Control for Dialogue Generation: Methods and E…

2023-05-04

今日の論文2023/05/03：SKILL: Structured Knowledge Infusion for Large Language Models.

今日の論文

SKILL: Structured Knowledge Infusion for Large Language Models. aclanthology.org Fedor Moiseev, Zhe Dong, Enrique Alfonseca, and Martin Jaggi. 2022. SKILL: Structured Knowledge Infusion for Large Language Models. In Proceedings of the 2022…

2023-05-02

今日の論文2023/05/01,02：Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space

今日の論文

Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space aclanthology.org Mor Geva, Avi Caciularu, Kevin Wang, and Yoav Goldberg. 2022. Transformer Feed-Forward Layers Build Predictions by Promoting C…

2023-04-30

今日の論文2023/04/29,30：The Geometry of Multilingual Language Model Representations

今日の論文

The Geometry of Multilingual Language Model Representations aclanthology.org Tyler Chang, Zhuowen Tu, and Benjamin Bergen. 2022. The Geometry of Multilingual Language Model Representations. In Proceedings of the 2022 Conference on Empirica…

2023-04-29

今日の論文2023/04/27,28：LLaMA: Open and Efficient Foundation Language Models

今日の論文

LLaMA: Open and Efficient Foundation Language Models arxiv.org Touvron H, Lavril T, Izacard G, Martinet X, Lachaux MA, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F, Rodriguez A. Llama: Open and efficient foundation language models. arX…

2023-04-27

今日の論文2023/04/26：N-best Response-based Analysis of Contradiction-awareness in Neural Response Generation Models

今日の論文

N-best Response-based Analysis of Contradiction-awareness in Neural Response Generation Models aclanthology.org Shiki Sato, Reina Akama, Hiroki Ouchi, Ryoko Tokuhisa, Jun Suzuki, and Kentaro Inui. 2022. N-best Response-based Analysis of Co…

2023-04-25

今日の論文2023/04/24：A System For Robot Concept Learning Through Situated Dialogue

今日の論文

A System For Robot Concept Learning Through Situated Dialogue aclanthology.org Benjamin Kane, Felix Gervits, Matthias Scheutz, and Matthew Marge. 2022. A System For Robot Concept Learning Through Situated Dialogue. In Proceedings of the 23…

2023-04-24

今日の論文2023/04/23：User Satisfaction Modeling with Domain Adaptation in Task-oriented Dialogue Systems

今日の論文

User Satisfaction Modeling with Domain Adaptation in Task-oriented Dialogue Systems aclanthology.org Yan Pan, Mingyang Ma, Bernhard Pflugfelder, and Georg Groh. 2022. User Satisfaction Modeling with Domain Adaptation in Task-oriented Dialo…

2023-04-23

今日の論文2023/4/22：Reducing Model Churn: Stable Re-training of Conversational Agents

今日の論文

Reducing Model Churn: Stable Re-training of Conversational Agents aclanthology.org Christopher Hidey, Fei Liu, and Rahul Goel. 2022. Reducing Model Churn: Stable Re-training of Conversational Agents. In Proceedings of the 23rd Annual Meeti…

2023-04-22

今日の論文2023/4/21：Graph Neural Network Policies and Imitation Learning for Multi-Domain Task-Oriented Dialogues

今日の論文

Graph Neural Network Policies and Imitation Learning for Multi-Domain Task-Oriented Dialogues aclanthology.org Thibault Cordier, Tanguy Urvoy, Fabrice Lefèvre, and Lina M. Rojas Barahona. 2022. Graph Neural Network Policies and Imitation L…

izmyonの日記

奈良の山奥で研究にいそしむ大学院生の学習記録。

Zennにてmambaの解説記事を書き始めました。

今日の論文2023/12/11：Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Understanding RetNet①: Theory of Retention

RetNetを完全に理解する①：Retentionメカニズム

今日の論文2023/06/20,21：ERNIE: Enhanced Language Representation with Informative Entities

今日の論文2023/06/16,17：COMET: Commonsense Transformers for Automatic Knowledge Graph Construction

今日の論文2023/06/11,12：LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention

今日の論文2023/06/09：Rethinking with Retrieval: Faithful Large Language Model Inference

今日の論文2023/06/04,05：RWKV: Reinventing RNNs for the Transformer Era

今日の論文2023/06/03：Brain-inspired learning in artificial neural networks: a review

今日の論文2023/05/26：Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

今日の論文2023/05/24,25：Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data

今日の論文2023/05/21,22：Retrieval Augmentation Reduces Hallucination in Conversation

今日の論文2023/05/20：On the Origin of Hallucinations in Conversational Models:Is it the Datasets or the Models?

今日の論文2023/05/18,19：Diving Deep into Modes of Fact Hallucinations in Dialogue Systems

今日の論文2023/05/15,16：The Curious Case of Hallucinations in Neural Machine Translation

今日の論文2023/05/13,14：A Distributional Lens for Multi-Aspect Controllable Text Generation

今日の論文2023/05/11, 12：AttentionViz: A Global View of Transformer Attention

今日の論文2023/05/9,10：CIKQA: Learning Commonsense Inference with a Unified Knowledge-in-the-loop QA Paradigm

今日の論文2023/05/07,8：Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents

今日の論文2023/05/04,05：Long-term Control for Dialogue Generation: Methods and Evaluation

今日の論文2023/05/03：SKILL: Structured Knowledge Infusion for Large Language Models.

今日の論文2023/05/01,02：Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space

今日の論文2023/04/29,30：The Geometry of Multilingual Language Model Representations

今日の論文2023/04/27,28：LLaMA: Open and Efficient Foundation Language Models

今日の論文2023/04/26：N-best Response-based Analysis of Contradiction-awareness in Neural Response Generation Models

今日の論文2023/04/24：A System For Robot Concept Learning Through Situated Dialogue

今日の論文2023/04/23：User Satisfaction Modeling with Domain Adaptation in Task-oriented Dialogue Systems

今日の論文2023/4/22：Reducing Model Churn: Stable Re-training of Conversational Agents

今日の論文2023/4/21：Graph Neural Network Policies and Imitation Learning for Multi-Domain Task-Oriented Dialogues