scFM Comparator · Single-Cell Foundation Models

The scFM Landscape at a Glance

Foundation models for single-cell transcriptomics have rapidly evolved from small specialized transformers to massive LLM-based systems that jointly process gene expression and natural language.

27B

Largest Model (C2S-Scale)

100M

Max Training Cells (CellFM)

95.4%

Best Cell Type Acc (C2S-Scale)

1T+

Max Training Tokens

Core Benchmark Tasks

Model Scale Comparison

Parameters (millions) — log scale

Training Data Scale

Number of cells used for pre-training (millions)

Cell Type Annotation Accuracy

Best reported accuracy across benchmark datasets

Task Coverage Radar

Number of supported downstream tasks per model

Model Profiles

Detailed profiles of each single-cell foundation model — architecture, training data, key capabilities, and notable results.

Architecture Taxonomy

Four distinct strategies for encoding single-cell gene expression data into transformer-compatible representations.

Gene Ranking / Ordering

Rank genes by expression level → predict next gene or masked rank

Converts continuous expression to discrete orderings. Genes ranked by expression level, then modeled autoregressively or with masked prediction. Simple, interpretable, but loses quantitative expression information.

Geneformer tGPT iSEEEK

Value Categorization / Binning

Bin expression values → classification over discrete buckets

Expression values binned into discrete categories (e.g., 0–50 bins). Transforms regression into classification. Each gene embedded with its bin value. Attention masks enable autoregressive prediction. Most widely adopted approach.

scGPT scBERT UCE

Value Projection

Direct projection of continuous expression values → preserve full resolution

Projects raw expression values directly into embedding space via learned linear projection. Preserves full data resolution without discretization. More complex but captures subtle expression differences.

scFoundation CellFM GeneCompass scPRINT

LLM-Based / Cell Sentences

Convert expression to text → fine-tune large language models

Transform scRNA-seq profiles into "cell sentences" (gene names ordered by expression). Leverage pretrained LLMs directly — inheriting scaling laws, text understanding, and few-shot capabilities. Enables joint transcriptomic + textual reasoning.

C2S-Scale Cell2Sentence GenePT

Architecture Type Distribution

Models grouped by representation strategy

Benchmark Comparison

Head-to-head performance across standardized evaluation tasks. Select a benchmark to see detailed scores.

Cell Type Annotation

Classify cells into known types based on expression profiles. Primary metric: accuracy or macro-F1.

Model	Parameters	Architecture	Cell Type Ann.	Gene Network	Batch Integ.	Perturbation	NL Queries	Multi-cell

Scaling Laws & Trends

How model performance scales with parameters, data, and compute — paralleling LLM scaling laws but for biological sequences.

Parameters vs. Performance

Cell type annotation accuracy vs. model size

Data Scale vs. Performance

Pre-training cells (M) vs. cell type annotation accuracy

Key Scaling Observations

LLM Scaling Advantage

C2S-Scale demonstrates that LLM-based approaches benefit from scaling laws: 1B → 8B → 27B parameters yield consistent improvements across all evaluated tasks. The 27B model achieves 95.4% cell type accuracy, outperforming all specialized scFMs.

Data Scale Matters

CellFM (100M cells) outperforms models trained on 33–50M cells across gene function prediction tasks. However, diminishing returns appear beyond ~50M cells for cell type annotation, suggesting task-dependent saturation.

Architecture vs. Scale

Value projection models (scFoundation, CellFM) achieve competitive performance at modest parameter counts (~100M), suggesting that preserving expression resolution can partially compensate for smaller scale. Gene ranking models plateau earlier.

Text Integration

Models integrating textual metadata (C2S-Scale, scGenePT, LangCell) show particular strength on complex tasks requiring biological reasoning. The ability to process natural language queries is a unique capability of LLM-based approaches.

Task Coverage Matrix

Which models support which downstream tasks — from standard cell type annotation to advanced perturbation prediction and natural language queries.

Task	scGPT	Geneformer	C2S-Scale	scFoundation	CellFM	UCE	scPRINT	GeneCompass

Task Support Count

Number of benchmark tasks each model is evaluated on

Unique Capabilities

Capabilities exclusive to specific model architectures

C2S-Scale Only

Natural language biological Q&A
Multi-cell context reasoning
Virtual drug screening
Context-conditioned perturbation

scPRINT-2 Only

Gene regulatory network inference benchmark leader
Additive benchmark gymnasium evaluation

CellFM Only

100M human cell pre-training (largest single-species)
Value projection with 82M parameters — efficient scaling

Key References

Primary publications for each foundation model and major benchmarking studies.

Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nature Methods 21, 1470–1480 (2024). PubMed
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023). DOI [Geneformer]
Rizvi, R. et al. Scaling Large Language Models for Next-Generation Single-Cell Analysis. bioRxiv (2025). DOI [C2S-Scale]
Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nature Methods 21, 1481–1491 (2024). DOI [scFoundation]
Zeng, Z. et al. CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells. Nature Communications 16, 4667 (2025). DOI
Rosen, Y. et al. Universal Cell Embeddings: A Foundation Model for Cell Biology. bioRxiv (2023). DOI [UCE]
Kalfon, J. et al. scPRINT: pre-training on 50 million cells allows robust gene network predictions. Nature Biotechnology (2025). PMC
Yang, X. et al. GeneCompass: Deciphering Universal Gene Regulatory Mechanisms with Knowledge-Informed Cross-Species Foundation Model. bioRxiv (2023). DOI
Levine, D. et al. Cell2Sentence: Teaching Large Language Models the Language of Biology. bioRxiv (2023). DOI
Zhao, S. et al. BioLLM: A standardized framework for integrating and benchmarking single-cell foundation models. Cell Reports Methods (2025). DOI
Kalfon, J. et al. scPRINT-2: Towards the next-generation of cell foundation models and benchmarks. bioRxiv (2026). DOI