Index of /library/Computing/transformers/

File NameFile SizeDate
Parent directory/--
An Ultra-Low Energy Internally Analog, Externally Digital Vector-Matrix Multiplier Based on NOR Flash Memory Technolo..> 2M25-Jul-2023 17:17
Are Emergent Abilities of Large Language Models a Mirage_ arxiv2304.15004.pdf 2M07-May-2023 01:08
Climbing towards Natural Language Understanding_ On Meaning Form and Understanding in the Age of Data_ Emily M Bender..> 472K08-May-2023 03:18
Deep neural networks are robust to weight binarization and other non-linear distortions_ arxiv1606.01981.pdf 829K02-Mar-2024 16:08
Efficient streaming language models with attention sinks_ arxiv2309.17453.pdf 12M02-Oct-2023 17:46
Exponentially Faster Language Modeling_ arxiv2311.10770.pdf 231K27-Nov-2023 05:35
Extending Context Window of Large Language Models via Positional Interpolation_ arxiv2306.15595.pdf 734K29-Jun-2023 02:06
GLU Variants Improve Transformer_arxiv2002.05202.pdf 107K02-May-2023 21:23
GQA_ Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints_ arxiv2305.13245.pdf 248K01-Sep-2023 22:34
Is Cosine-Similarity of Embeddings Really About Similarity_ arxiv2403.05440.pdf 2M12-Mar-2024 04:07
LLaMA_ Open and Efficient Foundation Language Models_ arxiv2302.13971.pdf 710K13-May-2023 17:45
Landmark Attention_ Random-Access Infinite Context Length for Transformers_ arxiv2305.16300.pdf 500K28-May-2023 17:34
Llama 2_ Open Foundation and Fine-Tuned Chat Models_ arxiv2307.09288.pdf 13M30-Aug-2023 23:29
Mixtral of Experts_ arxiv2401.04088.pdf 2M09-Jan-2024 03:21
Photonic Matrix Computing_ From Fundamentals to Applications_ Junwei Cheng_ Hailong Zhou_ Jianji Dong_ Nanomaterials ..> 3M25-Jul-2023 17:13
RoFormer_ Enhanced Transformer with Rotary Position Embedding_ arxiv2104.09864v4.pdf 573K21-Apr-2023 00:35
SentencePiece_ A simple and language independent subword tokenizer and detokenizer for Neural Text Processing_ arxiv1..> 207K13-May-2023 17:44
SmoothQuant_ Accurate and Efficient Post-Training Quantization for Large Language Models_ arxiv2211.10438.pdf 5M11-Dec-2023 23:12
Stay on topic with Classifier-Free Guidance_ arxiv2306.17806.pdf 2M30-Sep-2023 04:35
Steering Llama 2 via Contrastive Activation Addition_ arxiv2312.06681.pdf 27M13-Dec-2023 04:54
The Curse of Recursion_ Training on Generated Data Makes Models Forget_ arxiv2305.17493.pdf 2M24-Aug-2023 19:28
The Poison of Alignment_ arxiv2308.13449.pdf 185K30-Aug-2023 14:18
The Transformer Model in Equations_ John Thickstun_ 2023.pdf 191K24-Jun-2023 02:24
The case for 4-bit precision_ k-bit Inference Scaling Laws_ arxiv2212.09720.pdf 885K28-Aug-2023 18:57
Train Short, Test Long_ Attention with Linear Biases Enables Input Length Extrapolation_arxiv2108.12409.pdf 741K17-Jun-2023 23:34
Unigram Algorithm_ Subword Regularization_ Improving Neural Network Translation Models with Multiple Subword Candidat..> 322K13-May-2023 17:38
gpt4-maybe-leaked-details-sort-of-again.txt 804311-Jul-2023 03:35