[9] Baydin, A. G., Pearlmutter, B. A., & Radul, A. A. (2015). Automatic differentiation in machine learning: a survey. CoRR, abs/1502.05767.
[79] Pearlmutter, B. A. & Siskind, J. M. (2008). Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator. ACM Transactions on Programming Languages and Systems (TOPLAS), 30(2), 7.
[93] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning repres by back-propagating errors. Nature, 323, 533–536.
[114] Widrow, B. & Lehr, M. A. (1990). 30 years of adaptive neural networks: perceptron, madaline, and backpropagation. Proceedings of the IEEE, 78(9), 1415–1442.
クロスアテンションブロック:画像トークン列とは別に、との埋め込みを長さ2のシーケンスに連結する。トランスフォーマーブロックは、マルチヘッドセルフアテンションブロックの後に、マルチヘッドクロスアテンション層を追加するように変更されている。これは、Vaswaniet et al.[57]のオリジナルの設計と同様であり、またLDMがクラスラベルの条件付けのために用いるものと同様である。クロスアテンションは最も多くのGflopsをモデルに追加し、およそ15%のオーバーヘッドとなる。
[1] JamesBradbury,RoyFrostig,PeterHawkins,Matthew James Johnson, Chris Leary, Dougal Maclau-rin, George Necula, Adam Paszke, Jake VanderPlas, SkyeWanderman-Milne, and Qiao Zhang.JAX: composabletransformations of Python+NumPy programs, 2018. 6
[2] Andrew Brock, Jeff Donahue, and Karen Simonyan. Largescale GAN training for high fidelity natural image synthesis.InICLR, 2019. 5, 9
[3] Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Sub-biah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan,Pranav Shyam, Girish Sastry, Amanda Askell, et al. Lan-guage models are few-shot learners. InNeurIPS, 2020. 1
[4] Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, and William TFreeman. Maskgit: Masked generative image transformer. InCVPR, pages 11315–11325, 2022. 2
[5] Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee,Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srini-vas, and Igor Mordatch. Decision transformer: Reinforce-ment learning via sequence modeling. InNeurIPS, 2021. 2
[6] Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Hee-woo Jun, David Luan, and Ilya Sutskever. Generative pre-training from pixels. InICML, 2020. 1, 2
[7] Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever.Generating long sequences with sparse transformers.arXivpreprint arXiv:1904.10509, 2019. 2
[8] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and KristinaToutanova. Bert: Pre-training of deep bidirectional trans-formers for language understanding. InNAACL-HCT, 2019.1
[9] Prafulla Dhariwal and Alexander Nichol. Diffusion modelsbeat gans on image synthesis. InNeurIPS, 2021. 1, 2, 3, 5,6, 9, 12
[10] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov,Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner,Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl-vain Gelly, et al. An image is worth 16x16 words: Trans-formers for image recognition at scale. InICLR, 2020. 1, 2,4, 5
[11] Patrick Esser, Robin Rombach, and Bj ̈orn Ommer. Tamingtransformers for high-resolution image synthesis, 2020. 2
[12] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, BingXu, David Warde-Farley, Sherjil Ozair, Aaron Courville, andYoshua Bengio. Generative adversarial nets. InNIPS, 2014.3
[13] Priya Goyal, Piotr Doll ́ar, Ross Girshick, Pieter Noord-huis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch,Yangqing Jia, and Kaiming He. Accurate, large minibatchsgd: Training imagenet in 1 hour.arXiv:1706.02677, 2017.5
[14] Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, BoZhang, Dongdong Chen, Lu Yuan, and Baining Guo. Vec-tor quantized diffusion model for text-to-image synthesis. InCVPR, pages 10696–10706, 2022. 2
[15] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.Deep residual learning for image recognition. InCVPR,2016. 2
[16] Dan Hendrycks and Kevin Gimpel. Gaussian error linearunits (gelus).arXiv preprint arXiv:1606.08415, 2016. 12
[17] Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen,Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom BBrown, Prafulla Dhariwal, Scott Gray, et al. Scaling lawsfor autoregressive generative modeling.arXiv preprintarXiv:2010.14701, 2020. 2
[18] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner,Bernhard Nessler, and Sepp Hochreiter. Gans trained by atwo time-scale update rule converge to a local nash equilib-rium. 2017. 6
[19] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu-sion probabilistic models. InNeurIPS, 2020. 2, 3
[20] Jonathan Ho, Chitwan Saharia, William Chan, David JFleet, Mohammad Norouzi, and Tim Salimans.Cas-caded diffusion models for high fidelity image generation.arXiv:2106.15282, 2021. 3, 9
[21] Jonathan Ho and Tim Salimans. Classifier-free diffusionguidance. InNeurIPS 2021 Workshop on Deep GenerativeModels and Downstream Applications, 2021. 3, 4
[22] Aapo Hyv ̈arinen and Peter Dayan.Estimation of non-normalized statistical models by score matching.Journalof Machine Learning Research, 6(4), 2005. 3
[23] Michael Janner, Qiyang Li, and Sergey Levine. Offline rein-forcement learning as one big sequence modeling problem.InNeurIPS, 2021. 2
[24] Jared Kaplan, Sam McCandlish, Tom Henighan, Tom BBrown, Benjamin Chess, Rewon Child, Scott Gray, AlecRadford, Jeffrey Wu, and Dario Amodei. Scaling laws forneural language models.arXiv:2001.08361, 2020. 2
[25] Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine.Elucidating the design space of diffusion-based generativemodels. InProc. NeurIPS, 2022. 3
[26] Tero Karras, Samuli Laine, and Timo Aila. A style-basedgenerator architecture for generative adversarial networks. InCVPR, 2019. 5
[27] Diederik Kingma and Jimmy Ba. Adam: A method forstochastic optimization. InICLR, 2015. 3, 5, 6
[28] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton.Imagenet classification with deep convolutional neural net-works. InNeurIPS, 2012. 5
[29] Tuomas Kynk ̈a ̈anniemi, Tero Karras, Samuli Laine, JaakkoLehtinen, and Timo Aila. Improved precision and recall met-ric for assessing generative models. InNeurIPS, 2019. 6
[30] Ilya Loshchilov and Frank Hutter. Decoupled weight decayregularization.arXiv:1711.05101, 2017. 5
[31] Charlie Nash, Jacob Menick, Sander Dieleman, and Peter WBattaglia. Generating images with sparse representations.arXiv preprint arXiv:2103.03841, 2021. 6
[32] Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, PranavShyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever,and Mark Chen.Glide: Towards photorealistic imagegeneration and editing with text-guided diffusion models.arXiv:2112.10741, 2021. 3, 4
[33] Alexander Quinn Nichol and Prafulla Dhariwal. Improveddenoising diffusion probabilistic models. InICML, 2021. 3
[34] Gaurav Parmar, Richard Zhang, and Jun-Yan Zhu.Onaliased resizing and surprising subtleties in gan evaluation.InCVPR, 2022. 6
[35] Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, LukaszKaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. Im-age transformer. InInternational conference on machinelearning, pages 4055–4064. PMLR, 2018. 2
[36] William Peebles, Ilija Radosavovic, Tim Brooks, AlexeiEfros, and Jitendra Malik. Learning to learn with genera-tive models of neural network checkpoints.arXiv preprintarXiv:2209.12892, 2022. 2
[37] Ethan Perez, Florian Strub, Harm De Vries, Vincent Du-moulin, and Aaron Courville. Film: Visual reasoning with ageneral conditioning layer. InAAAI, 2018. 2, 5
[38] Alec Radford, Jong Wook Kim, Chris Hallacy, AdityaRamesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry,Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn-ing transferable visual models from natural language super-vision. InICML, 2021. 2
[39] Alec Radford, Karthik Narasimhan, Tim Salimans, and IlyaSutskever. Improving language understanding by generativepre-training. 2018. 1
[40] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, DarioAmodei, Ilya Sutskever, et al. Language models are unsu-pervised multitask learners. 2019. 1
[41] Ilija Radosavovic, Justin Johnson, Saining Xie, Wan-Yen Lo,and Piotr Doll ́ar. On network design spaces for visual recog-nition. InICCV, 2019. 3
[42] Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick,Kaiming He, and Piotr Doll ́ar. Designing network designspaces. InCVPR, 2020. 3
[43] Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu,and Mark Chen. Hierarchical text-conditional image gener-ation with clip latents.arXiv:2204.06125, 2022. 1, 2, 3, 4
[44] Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray,Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever.Zero-shot text-to-image generation. InICML, 2021. 1, 2
[45] Robin Rombach, Andreas Blattmann, Dominik Lorenz,Patrick Esser, and Bj ̈orn Ommer. High-resolution image syn-thesis with latent diffusion models. InCVPR, 2022. 2, 3, 4,6, 9
[46] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmen-tation. InInternational Conference on Medical image com-puting and computer-assisted intervention, pages 234–241.Springer, 2015. 2, 3
[47] Chitwan Saharia, William Chan, Saurabh Saxena, LalaLi, Jay Whang, Emily Denton, Seyed Kamyar SeyedGhasemipour, Burcu Karagol Ayan, S. Sara Mahdavi,Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David JFleet, and Mohammad Norouzi.Photorealistic text-to-image diffusion models with deep language understanding.arXiv:2205.11487, 2022. 3
[48] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, VickiCheung, Alec Radford, Xi Chen, and Xi Chen. Improvedtechniques for training GANs. InNeurIPS, 2016. 6
[49] Tim Salimans, Andrej Karpathy, Xi Chen, and Diederik PKingma. PixelCNN++: Improving the pixelcnn with dis-cretized logistic mixture likelihood and other modifications.arXiv preprint arXiv:1701.05517, 2017. 2
[50] Axel Sauer, Katja Schwarz, and Andreas Geiger. Stylegan-xl: Scaling stylegan to large diverse datasets. InSIGGRAPH,2022. 9
[53] Yang Song and Stefano Ermon. Generative modeling by es-timating gradients of the data distribution. InNeurIPS, 2019.3
[54] Andreas Steiner, Alexander Kolesnikov, Xiaohua Zhai, RossWightman, Jakob Uszkoreit, and Lucas Beyer. How to trainyour ViT? data, augmentation, and regularization in visiontransformers.TMLR, 2022. 6
[55] Aaron Van den Oord, Nal Kalchbrenner, Lasse Espeholt,Oriol Vinyals, Alex Graves, et al. Conditional image genera-tion with pixelcnn decoders.Advances in neural informationprocessing systems, 29, 2016. 2
[56] Aaron Van Den Oord, Oriol Vinyals, et al. Neural discreterepresentation learning.Advances in neural information pro-cessing systems, 30, 2017. 2
[57] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko-reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and IlliaPolosukhin. Attention is all you need. InNeurIPS, 2017. 1,2, 5
[58] Tete Xiao, Piotr Dollar, Mannat Singh, Eric Mintun, TrevorDarrell, and Ross Girshick. Early convolutions help trans-formers see better. InNeurIPS, 2021. 6
[7] Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar SeyedGhasemipour, Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, et al. Photorealistic text-to-image diffusion models with deep language understanding.arXivpreprintarXiv:2205.11487, 2022.
[8] Bradley Efron. Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106(496):1602–1614, 2011.
[9] Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32, 2019.
[10] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
[11] Yang Song and Stefano Ermon. Improved techniques for training score-based generative models. Advances in neural information processing systems, 33:12438–12448, 2020.
[12] Yann LeCun, Sumit Chopra, Raia Hadsell, M Ranzato, and F Huang. A tutorial on energy-based learning. Predicting structured data, 1(0), 2006.
[13] Yang Song and Diederik P Kingma. How to train your energy-based models. arXiv preprint arXiv:2101.03288,2021.
[14] Aapo Hyvärinen and Peter Dayan. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 2005.
[15] Saeed Saremi, Arash Mehrjou, Bernhard Schölkopf, and Aapo Hyvärinen. Deep energy estimator networks. arXiv preprint arXiv:1805.08306, 2018.
[16] Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in Artificial Intelligence, pages 574–584. PMLR, 2020.
[17] Pascal Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011.
[18] Jonathan Ho, Chitwan Saharia, William Chan, David J Fleet, Mohammad Norouzi, and Tim Salimans. Cascaded diffusion models for high fidelity image generation. J.Mach.Learn.Res., 23:47–1, 2022.
[19] Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
[20] Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
[21] Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
[4] ] Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265.PMLR, 2015.
[5] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
[6] Diederik Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
[1] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint, arXiv:1312.6114, 2013.
[2] Durk P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. Improved variational inference with inverse autoregressive flow. Advances in neural information processing systems, 29, 2016.
[3] Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. Ladder variational autoencoders. Advances in neural information processing systems, 29, 2016.
Safi Z, Abd-Alrazaq A, Khalifa M, Househ M
Technical Aspects of Developing Chatbots for Medical Applications: Scoping Review
J Med Internet Res 2020;22(12):e19127, doi: 10.2196/19127, PMID: 33337337, PMCID: 7775817
Abd-Alrazaq A, Safi Z, Alajlani M, Warren J, Househ M, Denecke K
Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review
J Med Internet Res 2020;22(6):e18301, doi: 10.2196/18301, PMID: 32442157, PMCID: 7305563
XiaoIce[85]に代表されるように、ソーシャルチャットボットの成功指標としてConversational-turns Per Session(CPS)が提案されている。ヘルスチャットボットの目的はソーシャルチャットボットと同一ではないが、CPSがソーシャルチャットボット領域で標準的な指標として受け入れられるようになれば、健康チャットボットの評価でソーシャルエンゲージメントの次元を評価するための標準指標の有力候補になるだろう。社会的次元に関連する代替・補足的な指標としては、ユーザーにチャットボットの共感度を採点してもらう方法があるが、CPSは客観的・定量的な指標であるという利点がある。インタラクション時間やタスクにかかる時間など、他の客観的かつ定量的な指標もCPSの代替となり得るが、例えばユーザーが他のタスクとチャットボットのインタラクションをマルチタスクしている場合、CPSよりもエンゲージメントの代表度が低くなる可能性がある。ソーシャルエンゲージメントの他に、タスク完了度(会話ログの分析により評価されることが多い)も有望なグローバル指標である。
Alaa A. Abd-alrazaq(College of Science and Engineering, Hamad Bin Khalifa University, Qatar)らによる、メンタルヘルスのための対話システムに関する、一連のレビュー論文についてまとめる。それぞれのレビュー論文について、とりあえずAbstractとPrincipal findingsを訳し、それ以外の個人的に面白そうな知見はOther Interesting Findingsにまとめた。今回は、患者の見方と意見に着目したレビューと、有効性と安全性に関するレビューの二つについてまとめる。
Abd-Alrazaq AA, Alajlani M, Ali N, Denecke K, Bewick BM, Househ M
Perceptions and Opinions of Patients About Mental Health Chatbots: Scoping Review
J Med Internet Res 2021;23(1):e17828, doi: 10.2196/17828, PMID: 33439133, PMCID: 7840290
PRISMA(Preferred Reporting Items for Systematic reviews and Meta-Analyses)extension for scoping reviewsガイドラインに沿ってスコーピングレビューを実施した。研究は、8つの電子データベース(例えば、MEDLINEとEmbase)を検索し、さらに、このレビューに含まれる研究と関連する他のレビューの後方および前方参照リストチェックを行うことによって同定された。合計で2名の査読者が独立して研究を選択し、含まれる研究からデータを抽出した。データは主題分析により統合された。
メンタルヘルスチャットボットを評価するためには、ベンチマークを作成し、一貫した指標と方法を開発する必要がある。Laranjoら[71]は、ヘルスチャットボットの特徴、現在のアプリケーション、評価指標をレビューした。評価指標は、技術的性能、ユーザーエクスペリエンス、健康研究指標の3種類に大別された。デジタルヘルス介入[82]とヘルスチャットボット[83,84]の評価フレームワークに向けた最初の試みは、最近発表された。考慮したい観点に応じて、異なる指標を用いることができる。例えば、システムの性能と有効性は異なる計算指標により評価される(例えば、使い勝手usability、使いやすさeasse of use、有用性usefulness)。ソフトウェアの品質は、ソフトウェア工学の指標を用いた信頼性、セキュリティ、保守性、効率性によって測定することができる[86]。システムがAIや機械学習の技術を用いる場合、指標は予測や推奨の精度や正確さで構成される。さらに、システムの効率性は、既存のケアモデルと評価・比較されなければならない。アプリの安全な使用に関しては、(1)治療内容の質、(2)機能性、(3)データの安全性と保護という3つの基準で評価する必要がある[87]。
Effectiveness and Safety of Using Chatbots to Improve Mental Health: Systematic Review and Meta-Analysis
Abd-Alrazaq AA, Rababeh A, Alajlani M, Bewick BM, Househ M
Effectiveness and Safety of Using Chatbots to Improve Mental Health: Systematic Review and Meta-Analysis
J Med Internet Res 2020;22(7):e16021, doi: 10.2196/16021, PMID: 32673216, PMCID: 7385637