The scFM Landscape at a Glance
Foundation models for single-cell transcriptomics have rapidly evolved from small specialized transformers to massive LLM-based systems that jointly process gene expression and natural language.
Model Scale Comparison
Training Data Scale
Cell Type Annotation Accuracy
Task Coverage Radar
Model Profiles
Detailed profiles of each single-cell foundation model — architecture, training data, key capabilities, and notable results.
Architecture Taxonomy
Four distinct strategies for encoding single-cell gene expression data into transformer-compatible representations.
Gene Ranking / Ordering
Converts continuous expression to discrete orderings. Genes ranked by expression level, then modeled autoregressively or with masked prediction. Simple, interpretable, but loses quantitative expression information.
Value Categorization / Binning
Expression values binned into discrete categories (e.g., 0–50 bins). Transforms regression into classification. Each gene embedded with its bin value. Attention masks enable autoregressive prediction. Most widely adopted approach.
Value Projection
Projects raw expression values directly into embedding space via learned linear projection. Preserves full data resolution without discretization. More complex but captures subtle expression differences.
LLM-Based / Cell Sentences
Transform scRNA-seq profiles into "cell sentences" (gene names ordered by expression). Leverage pretrained LLMs directly — inheriting scaling laws, text understanding, and few-shot capabilities. Enables joint transcriptomic + textual reasoning.
Architecture Type Distribution
Benchmark Comparison
Head-to-head performance across standardized evaluation tasks. Select a benchmark to see detailed scores.
| Model | Parameters | Architecture | Cell Type Ann. | Gene Network | Batch Integ. | Perturbation | NL Queries | Multi-cell |
|---|
Scaling Laws & Trends
How model performance scales with parameters, data, and compute — paralleling LLM scaling laws but for biological sequences.
Parameters vs. Performance
Data Scale vs. Performance
Key Scaling Observations
LLM Scaling Advantage
C2S-Scale demonstrates that LLM-based approaches benefit from scaling laws: 1B → 8B → 27B parameters yield consistent improvements across all evaluated tasks. The 27B model achieves 95.4% cell type accuracy, outperforming all specialized scFMs.
Data Scale Matters
CellFM (100M cells) outperforms models trained on 33–50M cells across gene function prediction tasks. However, diminishing returns appear beyond ~50M cells for cell type annotation, suggesting task-dependent saturation.
Architecture vs. Scale
Value projection models (scFoundation, CellFM) achieve competitive performance at modest parameter counts (~100M), suggesting that preserving expression resolution can partially compensate for smaller scale. Gene ranking models plateau earlier.
Text Integration
Models integrating textual metadata (C2S-Scale, scGenePT, LangCell) show particular strength on complex tasks requiring biological reasoning. The ability to process natural language queries is a unique capability of LLM-based approaches.
Task Coverage Matrix
Which models support which downstream tasks — from standard cell type annotation to advanced perturbation prediction and natural language queries.
| Task | scGPT | Geneformer | C2S-Scale | scFoundation | CellFM | UCE | scPRINT | GeneCompass |
|---|
Task Support Count
Unique Capabilities
- Natural language biological Q&A
- Multi-cell context reasoning
- Virtual drug screening
- Context-conditioned perturbation
- Gene regulatory network inference benchmark leader
- Additive benchmark gymnasium evaluation
- 100M human cell pre-training (largest single-species)
- Value projection with 82M parameters — efficient scaling
Key References
Primary publications for each foundation model and major benchmarking studies.
- Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nature Methods 21, 1470–1480 (2024). PubMed
- Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023). DOI [Geneformer]
- Rizvi, R. et al. Scaling Large Language Models for Next-Generation Single-Cell Analysis. bioRxiv (2025). DOI [C2S-Scale]
- Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nature Methods 21, 1481–1491 (2024). DOI [scFoundation]
- Zeng, Z. et al. CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells. Nature Communications 16, 4667 (2025). DOI
- Rosen, Y. et al. Universal Cell Embeddings: A Foundation Model for Cell Biology. bioRxiv (2023). DOI [UCE]
- Kalfon, J. et al. scPRINT: pre-training on 50 million cells allows robust gene network predictions. Nature Biotechnology (2025). PMC
- Yang, X. et al. GeneCompass: Deciphering Universal Gene Regulatory Mechanisms with Knowledge-Informed Cross-Species Foundation Model. bioRxiv (2023). DOI
- Levine, D. et al. Cell2Sentence: Teaching Large Language Models the Language of Biology. bioRxiv (2023). DOI
- Zhao, S. et al. BioLLM: A standardized framework for integrating and benchmarking single-cell foundation models. Cell Reports Methods (2025). DOI
- Kalfon, J. et al. scPRINT-2: Towards the next-generation of cell foundation models and benchmarks. bioRxiv (2026). DOI