Lodestone outperforms other models of its size and sequence length, as evidenced by its ranking on the MTEB Leaderboard. It was trained on over one million publicly available scholarly articles and publications, enabling it to process text sequences of 4096 tokens and better capture topical context and nuance compared to other LLMs.
Key features of Lodestone include long sequence embedding, improved semantic understanding, and sentence vectorization for tasks such as information retrieval, clustering, and sentence similarity. Developers and enterprises can now fine-tune and deploy their own models using Lodestone on Hugging Face, making long-sequence AI applications accessible to more projects and businesses.
Hum’s CTO, Niall Little, expressed excitement about contributing to the open source community and advancing natural language processing research. He noted that Lodestone can contextualize an entire research paper, revealing content insights that previous models could not comprehend by analyzing only one or two paragraphs at a time. While Hum will continue to optimize Lodestone for the media and publishing industry, it is also enthusiastic about seeing how the community leverages the model to advance content intelligence and responsible AI applications.