Parent directory/ | - | - |
An Ultra-Low Energy Internally Analog, Externally Digital Vector-Matrix Multiplier Based on NOR Flash Memory Technolo..> | 2M | 25-Jul-2023 17:17 |
Are Emergent Abilities of Large Language Models a Mirage_ arxiv2304.15004.pdf | 2M | 07-May-2023 01:08 |
Climbing towards Natural Language Understanding_ On Meaning Form and Understanding in the Age of Data_ Emily M Bender..> | 472K | 08-May-2023 03:18 |
Deep neural networks are robust to weight binarization and other non-linear distortions_ arxiv1606.01981.pdf | 829K | 02-Mar-2024 16:08 |
Efficient streaming language models with attention sinks_ arxiv2309.17453.pdf | 12M | 02-Oct-2023 17:46 |
Exponentially Faster Language Modeling_ arxiv2311.10770.pdf | 231K | 27-Nov-2023 05:35 |
Extending Context Window of Large Language Models via Positional Interpolation_ arxiv2306.15595.pdf | 734K | 29-Jun-2023 02:06 |
GLU Variants Improve Transformer_arxiv2002.05202.pdf | 107K | 02-May-2023 21:23 |
GQA_ Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints_ arxiv2305.13245.pdf | 248K | 01-Sep-2023 22:34 |
How Good Are Low-bit Quantized LLAMA3 Models_ An Empirical Study_ arxiv2404.14047v1.pdf | 260K | 26-Apr-2024 20:16 |
Is Cosine-Similarity of Embeddings Really About Similarity_ arxiv2403.05440.pdf | 2M | 12-Mar-2024 04:07 |
LLaMA_ Open and Efficient Foundation Language Models_ arxiv2302.13971.pdf | 710K | 13-May-2023 17:45 |
Landmark Attention_ Random-Access Infinite Context Length for Transformers_ arxiv2305.16300.pdf | 500K | 28-May-2023 17:34 |
Llama 2_ Open Foundation and Fine-Tuned Chat Models_ arxiv2307.09288.pdf | 13M | 30-Aug-2023 23:29 |
Mixtral of Experts_ arxiv2401.04088.pdf | 2M | 09-Jan-2024 03:21 |
Photonic Matrix Computing_ From Fundamentals to Applications_ Junwei Cheng_ Hailong Zhou_ Jianji Dong_ Nanomaterials ..> | 3M | 25-Jul-2023 17:13 |
R ULER_ What’s the Real Context Size of Your_ arxiv2404.06654v2.pdf | 643K | 30-Jul-2024 02:47 |
RoFormer_ Enhanced Transformer with Rotary Position Embedding_ arxiv2104.09864v4.pdf | 573K | 21-Apr-2023 00:35 |
SentencePiece_ A simple and language independent subword tokenizer and detokenizer for Neural Text Processing_ arxiv1..> | 207K | 13-May-2023 17:44 |
SmoothQuant_ Accurate and Efficient Post-Training Quantization for Large Language Models_ arxiv2211.10438.pdf | 5M | 11-Dec-2023 23:12 |
Stay on topic with Classifier-Free Guidance_ arxiv2306.17806.pdf | 2M | 30-Sep-2023 04:35 |
Steering Llama 2 via Contrastive Activation Addition_ arxiv2312.06681.pdf | 27M | 13-Dec-2023 04:54 |
The Curse of Recursion_ Training on Generated Data Makes Models Forget_ arxiv2305.17493.pdf | 2M | 24-Aug-2023 19:28 |
The Poison of Alignment_ arxiv2308.13449.pdf | 185K | 30-Aug-2023 14:18 |
The Transformer Model in Equations_ John Thickstun_ 2023.pdf | 191K | 24-Jun-2023 02:24 |
The case for 4-bit precision_ k-bit Inference Scaling Laws_ arxiv2212.09720.pdf | 885K | 28-Aug-2023 18:57 |
Train Short, Test Long_ Attention with Linear Biases Enables Input Length Extrapolation_arxiv2108.12409.pdf | 741K | 17-Jun-2023 23:34 |
Unigram Algorithm_ Subword Regularization_ Improving Neural Network Translation Models with Multiple Subword Candidat..> | 322K | 13-May-2023 17:38 |
gpt4-maybe-leaked-details-sort-of-again.txt | 8043 | 11-Jul-2023 03:35 |