Enterprise DNA
O Open Source Observability medium

dolly

by Community

Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform

D

OSS

dolly

Added 1 June 2026

#chatbot #databricks #dolly #gpt

Overview

Dolly is a large language model trained on the Databricks Machine Learning Platform. It is designed for instruction following and fine-tuning on custom datasets. The model is open source and available under a community license.

Best for

Best for
Developers and researchers who want to fine-tune a mid-sized open-source LLM on Databricks infrastructure.

Use cases

  • Fine-tuning a language model on domain-specific instructions
  • Building a conversational agent for customer support
  • Generating code or documentation from natural language prompts

Notes

Dolly is a large language model trained on the Databricks Machine Learning Platform. It is designed for instruction following and fine-tuning on custom datasets. The model is open source and available under a community license.

10,790 stars on GitHub. Last updated 2023-06-30. Licensed Apache-2.0.

Use cases

  • Fine-tuning a language model on domain-specific instructions
  • Building a conversational agent for customer support
  • Generating code or documentation from natural language prompts

Pros

  • Open source with permissive community license
  • Trained on a robust enterprise ML platform
  • Good foundation for custom instruction-following tasks

Cons

  • Requires substantial GPU memory for inference and training
  • Not as capable as larger proprietary models for complex reasoning
  • Limited pre-training data scope compared to frontier models

Indexed from awesome-llmops and enriched against its public facts.

Pros

  • Open source with permissive community license
  • Trained on a robust enterprise ML platform
  • Good foundation for custom instruction-following tasks

Cons

  • Requires substantial GPU memory for inference and training
  • Not as capable as larger proprietary models for complex reasoning
  • Limited pre-training data scope compared to frontier models