AI & Tech·May 19, 2026

Introducing the Ettin Reranker Family

r/LocalLLaMAMay 196 min readSingle source

The gist

5-point summary · 1 min

Our six rerankers paired with google/embeddinggemma-300m on MTEB(eng, v2) Retrieval.
I bootstrapped the training recipe below with the new train-sentence-transformers Agent Skill shipped in Sentence Transformers v5.5.0.
Throughout this blogpost I'll use "reranker" and "cross-encoder" interchangeably.
Load a model to finetune with model card data # The model mirrors ModernBertForSequenceClassification, but with a 'headless' Transformer that just loads # AutoModel.
Pointwise MSE distillation from a strong teacher onto a broad-domain and retrieval-specific mix scales cleanly from 17M to 1B parameters, with only the learning rate and per-device batch size changing between sizes.

Source

In this article

GOOGL· Alphabet

—

Loading…

Yahoo Finance

Back to Articles TL;DR Today I'm releasing six new Sentence Transformers CrossEncoder rerankers, state-of-the-art at their respective sizes, built on top of the Ettin ModernBERT encoders, together with the data and full training recipe that produced them: cross-encoder/ettin-reranker-17m-v1 cross-encoder/ettin-reranker-32m-v1 cross-encoder/ettin-reranker-68m-v1 cross-encoder/ettin-reranker-150m-v1 cross-encoder/ettin-reranker-400m-v1 cross-encoder/ettin-reranker-1b-v1 The models were trained with a distillation recipe: pointwise MSE on mixedbread-ai/mxbai-rerank-large-v2 scores over cross-encoder/ettin-reranker-v1-data, which is a subset of lightonai/embeddings-pre-training mixed with a reranked subset of lightonai/embeddings-fine-tuning. Our six rerankers paired with google/embeddinggemma-300m on MTEB(eng, v2) Retrieval. See Results for five more embedder pairings. If you're new to rerankers and want the "why" first, jump to What is a reranker, and why pair one with an embedder?. If you just want to plug a model in, jump to Usage. If you want to train your own, jump to Training. I bootstrapped the training recipe below with the new train-sentence-transformers Agent Skill shipped in Sentence Transformers v5.5.0. Install it with hf skills add train-sentence-transformers [--global] [--claude] and ask your AI coding agent (Claude Code, Codex, Cursor, Gemini CLI,...) to fine-tune a SentenceTransformer, CrossEncoder, or SparseEncoder model on your data. Table of contents What is a reranker, and why pair one with an embedder? Usage End-to-end retrieve-then-rerank pipeline Architecture Details Results MTEB(eng, v2) Retrieval Speed Training Distillation recipe Dataset Training Arguments Evaluation Overall Training Script Conclusion Acknowledgements What is a reranker, and why pair one with an embedder? A reranker (a.k.a. pointwise cross-encoder) is a neural model that takes a (query, document) pair and outputs a single relevance score. Unlike an embedding model, which encodes the query and document separately and computes their similarity from the two embedding vectors, a reranker lets the two texts attend to each other through every transformer layer. That joint encoding is more accurate but also more expensive: the model has to be run once per (query, document) pair rather than once per text. Because cross-encoders are too expensive to run over a full corpus, the common production pattern is retrieve-then-rerank: a fast embedding model retrieves the top-K candidates (cheap), then a cross-encoder re-orders just those K with high accuracy. The total cost stays bounded while the final ranking is much closer to what an exhaustive cross-encoder pass would produce. Throughout this blogpost I'll use "reranker" and "cross-encoder" interchangeably. Usage The released models are normal Sentence Transformers CrossEncoder models, so you can use them with just 3 lines of code: from import CrossEncoder model = CrossEncoder("cross-encoder/ettin-reranker-32m-v1") scores = model.predict([ ("Where was Apple founded?", "Apple Inc. was founded in Cupertino, California in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne."), ("Where was Apple founded?", "The Fuji apple is an apple cultivar developed in the late 1930s and brought to market in 1962."), ]) print(scores) # [11.393298 2.968891] None: config = CONFIGS[ ] = config["base_model_name"] = config[" "] = config[" "] = int(os.environ.get(" ", 1)) per_device_batch_size = // dataloader_workers = 0 if > 8 else 4 run_name = f"ettin-reranker-{ }-lr{:.0e}" # 1. Load a model to finetune with model card data # The model mirrors ModernBertForSequenceClassification, but with a 'headless' Transformer that just loads # AutoModel. This allows for unpadding with FA2, which isn't possible with AutoModelForSequenceClassification. # This speeds up training considerably, while heavily reducing memory usage. torch. (12) transformer = Transformer(, ={"attn_implementation": " "}) transformer.model.config.num_labels = 1 embedding_dimension = transformer.get_embedding_dimension() pooling = Pooling(embedding_dimension=embedding_dimension, ="cls") = Dense( in_features=embedding_dimension, out_features=embedding_dimension, bias=False, activation_function=nn.GELU(), =" ", =" ", ) norm = LayerNorm(dimension=embedding_dimension) = Dense( in_features=embedding_dimension, out_features=1, bias=True, activation_function=nn.Identity(), =" ", ="scores", ) model = CrossEncoder( modules=[transformer, pooling,, norm, ], num_labels=1, activation_fn=nn.Identity(), =CrossEncoderModelCardData( =f"Ettin Reranker { } distilled from mxbai-rerank-large-v2", language="en", license="apache-2.0", ), ) = getattr(model[0].model.config, "_attn_implementation", None) if not ( and "flash" in.lower()): logging.warning(f"FA2 may not be active (attn_impl={!r}); training will be slower.") # 2. Load the dataset. Each config is one source subset (32 lighton + 7 rerank retrieval # domains). The held-out eval rows live as the 'validation' split of the 'quora' config. = "cross-encoder/ettin-reranker-v1-data" = [] eval_dataset = None for in get_dataset_config_names( ): dataset = load_dataset(, ).append(dataset["train"]) if "validation" in dataset: eval_dataset = dataset["validation"] = concatenate_datasets( ) print( ) # 3. Define a loss function loss = MSELoss(model) # 4. Specify training arguments args = CrossEncoderTrainingArguments( =f"models/{run_name}", num_train_epochs=1, per_device_train_batch_size=per_device_batch_size, per_device_eval_batch_size=per_device_batch_size, =1, =, =0.03, bf16=True, eval_strategy="steps", eval_steps=0.05, save_strategy="steps", save_steps=0.05, save_total_limit=5, =0.025, =True, load_best_model_at_end=True, ="eval_NanoBEIR_R100_mean_ndcg@10", dataloader_num_workers=dataloader_workers, run_name=run_name, seed=12, ) # 5. Create an evaluator evaluator = CrossEncoderNanoBEIREvaluator( =["msmarco", "nfcorpus", "nq", "fiqa2018", "touche2020", "scifact", "hotpotqa", "arguana", "fever", "dbpedia", "climatefever", "scidocs", "quoraretrieval"], =per_device_batch_size, =False, show_progress_bar=False, ) # 6. Create a trainer trainer = CrossEncoderTrainer( model=model, args=args, =, eval_dataset=eval_dataset, loss=loss, evaluator=evaluator, ) # 7. Evaluate before training if trainer.is_world_process_zero(): with torch.autocast( ="cuda", dtype=torch.bfloat16): evaluator(model) # 8. Train trainer.train() # 9. Evaluate the final model if trainer.is_world_process_zero(): with torch.autocast( ="cuda", dtype=torch.bfloat16): evaluator(model) # 10. Save the final model = f"models/{run_name}/final" model.save_pretrained( ) if __name__ == "__main__": main() For multi-node training (anything past 17m/32m), launch the same script with torchrun: # Single-node (17m, 32m): defaults work python train.py # Multi-node 4n setup for 150m, preserves =192: torchrun -- =8 --nnodes=4... train.py Conclusion The ettin-reranker-v1 family, trained with a single simple recipe, is state-of-the-art at every released size up to 1B parameters. Pointwise MSE distillation from a strong teacher onto a broad-domain and retrieval-specific mix scales cleanly from 17M to 1B parameters, with only the learning rate and per-device batch size changing between sizes. Every ettin-reranker-v1 model beats the ms-marco-MiniLM-L*-v2 family by a comfortable margin on MTEB and NanoBEIR. cross-encoder/ettin-reranker-150m-v1 is the strongest mid-tier reranker I tested in the under-600M range, cross-encoder/ettin-reranker-400m-v1 lands within 0.0024 of the 1.54B teacher's MTEB score, and cross-encoder/ettin-reranker-1b-v1 matches that teacher within 0.0001. Everything in one place: Models: cross-encoder/ettin-reranker-17m-v1 cross-encoder/ettin-reranker-32m-v1 cross-encoder/ettin-reranker-68m-v1 cross-encoder/ettin-reranker-150m-v1 cross-encoder/ettin-reranker-400m-v1 cross-encoder/ettin-reranker-1b-v1 Dataset: cross-encoder/ettin-reranker-v1-data with ~143M (query, document, label) triples, kept as 39 named splits so the provenance of every row is visible. Training script: the ~150 lines in Overall Training Script above, which is the same script used for all six models. If you build something on top of these, please let me know! I'd genuinely love to see what people do with them, and if you manage to train better rerankers using the released data, even better. The recipe is intentionally simple, partly so that there's plenty of headroom for someone else to improve it. Train a stronger teacher and the same script can keep producing better students. Acknowledgements I'd like to thank the Ettin team (Orion Weller, Kathryn Ricci, Marc Marone, Antoine Chaffin, Dawn Lawrie, and Benjamin Van Durme) for building the base encoders that these rerankers are built on, the LightOn team (Antoine Chaffin, Raphael Sourty, Paulo Moura, and Amélie Chatelain) for their work on the training data collection, and the Mixedbread AI team (Xianming Li, Aamir Shakir, Rui Huang, Tsz-fung Andrew Lee, Julius Lipp, Benjamin Clavié, and Jing Li) for their work on the teacher model. Citation If you use the ettin-reranker-v1 family or any of the released artifacts, please cite this blogpost: @misc{aarsen2026ettin-reranker, title = "Introducing the Ettin Reranker Family", author = "Aarsen, Tom", year = "2026", publisher = "Hugging Face", url = "https://huggingface.co/blog/ettin-reranker", }

Integrity note · Xela does not rewrite or paraphrase article content. The excerpt above is the source publication's own words, sanitized for display. For the full piece — including any quotes, charts, or images — read it at r/LocalLLaMA. Xela's rewritten version is off for this story, so there's no editorial angle attached — you're getting the source's reporting unfiltered. When the rewrite is on, we add a What this means block underneath with the operator/trader takeaway.