- Home
- Index
- Biology
  - Agrobacterium
- Chemistry
- Computing
- Electromagnetics
- Electronics
- Engineering
- Fiction
- Games
- Light
- Math
- Neuroscience
- Physics
- Space

Index of /library/Computing/transformers/

File Name	File Size	Date
Parent directory/	-	-
An Ultra-Low Energy Internally Analog, Externally Digital Vector-Matrix Multiplier Based on NOR Flash Memory Technolo..>	2M	25-Jul-2023 17:17
Are Emergent Abilities of Large Language Models a Mirage_ arxiv2304.15004.pdf	2M	07-May-2023 01:08
Byte Latent Transformer_ Patches Scale Better Than Tokens_ A Pagnoni_ R Pasunuru_ R Rodriguez_ J Nguyen_ B Muller_ M ..>	2M	14-Dec-2024 04:20
Climbing towards Natural Language Understanding_ On Meaning Form and Understanding in the Age of Data_ Emily M Bender..>	472K	08-May-2023 03:18
Deep neural networks are robust to weight binarization and other non-linear distortions_ arxiv1606.01981.pdf	829K	02-Mar-2024 16:08
DeepSeek-R1_ Incentivizing Reasoning Capability in LLMs via Reinforcement Learning_ arxiv2501.12948v1.pdf	1M	24-Jan-2025 01:16
Efficient streaming language models with attention sinks_ arxiv2309.17453.pdf	12M	02-Oct-2023 17:46
Exponentially Faster Language Modeling_ arxiv2311.10770.pdf	231K	27-Nov-2023 05:35
Extending Context Window of Large Language Models via Positional Interpolation_ arxiv2306.15595.pdf	734K	29-Jun-2023 02:06
Fourier Position Embedding_ Enhancing Attention’s Periodic Extension for Length Generalization_ arxiv2412.17739v1.pdf	793K	26-Dec-2024 05:15
GLU Variants Improve Transformer_arxiv2002.05202.pdf	107K	02-May-2023 21:23
GQA_ Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints_ arxiv2305.13245.pdf	248K	01-Sep-2023 22:34
How Good Are Low-bit Quantized LLAMA3 Models_ An Empirical Study_ arxiv2404.14047v1.pdf	260K	26-Apr-2024 20:16
Is Cosine-Similarity of Embeddings Really About Similarity_ arxiv2403.05440.pdf	2M	12-Mar-2024 04:07
LLaMA_ Open and Efficient Foundation Language Models_ arxiv2302.13971.pdf	710K	13-May-2023 17:45
Landmark Attention_ Random-Access Infinite Context Length for Transformers_ arxiv2305.16300.pdf	500K	28-May-2023 17:34
Llama 2_ Open Foundation and Fine-Tuned Chat Models_ arxiv2307.09288.pdf	13M	30-Aug-2023 23:29
MVDRAM_ Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration_ arxiv2503.23817v1.pdf	3M	05-May-2025 14:47
Mixtral of Experts_ arxiv2401.04088.pdf	2M	09-Jan-2024 03:21
Photonic Matrix Computing_ From Fundamentals to Applications_ Junwei Cheng_ Hailong Zhou_ Jianji Dong_ Nanomaterials ..>	3M	25-Jul-2023 17:13
R ULER_ What’s the Real Context Size of Your_ arxiv2404.06654v2.pdf	643K	30-Jul-2024 02:47
RoFormer_ Enhanced Transformer with Rotary Position Embedding_ arxiv2104.09864v4.pdf	573K	21-Apr-2023 00:35
SentencePiece_ A simple and language independent subword tokenizer and detokenizer for Neural Text Processing_ arxiv1..>	207K	13-May-2023 17:44
SmoothQuant_ Accurate and Efficient Post-Training Quantization for Large Language Models_ arxiv2211.10438.pdf	5M	11-Dec-2023 23:12
Stay on topic with Classifier-Free Guidance_ arxiv2306.17806.pdf	2M	30-Sep-2023 04:35
Steering Llama 2 via Contrastive Activation Addition_ arxiv2312.06681.pdf	27M	13-Dec-2023 04:54
The Curse of Recursion_ Training on Generated Data Makes Models Forget_ arxiv2305.17493.pdf	2M	24-Aug-2023 19:28
The Poison of Alignment_ arxiv2308.13449.pdf	185K	30-Aug-2023 14:18
The Transformer Model in Equations_ John Thickstun_ 2023.pdf	191K	24-Jun-2023 02:24
The case for 4-bit precision_ k-bit Inference Scaling Laws_ arxiv2212.09720.pdf	885K	28-Aug-2023 18:57
Ties-Merging_ Resolving Interference When Merging Models_ arxiv2306.01708v2.pdf	1M	04-Nov-2024 23:54
Train Short, Test Long_ Attention with Linear Biases Enables Input Length Extrapolation_arxiv2108.12409.pdf	741K	17-Jun-2023 23:34
Unigram Algorithm_ Subword Regularization_ Improving Neural Network Translation Models with Multiple Subword Candidat..>	322K	13-May-2023 17:38
gpt4-maybe-leaked-details-sort-of-again.txt	8043	11-Jul-2023 03:35