Research, Pre-Training Data

🇺🇸 San Francisco, California
$4K - $5K Annual
Posted 7 months ago
Expires July 19, 2026
Full TimeOn-siteEngineeringData Science

The Research, Pre-Training Data role at Thinking Machines Lab is integral to developing the next generation of AI models. This position involves designing and implementing methods for sourcing, curating, and analyzing pre-training datasets, ensuring their quality and performance. The successful candidate will work within a team dedicated to advancing collaborative general intelligence, contributing both scientific insight and production-grade code.

Key responsibilities include developing techniques for curating large-scale text, code, and multimodal data, as well as creating data quality metrics to assess coverage and diversity. The role also involves collaborating with research and infrastructure teams to scale data processing systems efficiently and reproducibly. Additionally, the candidate will investigate and mitigate data risks, including privacy and licensing concerns, to ensure responsible data use.

Required qualifications include proficiency in Python and familiarity with deep learning frameworks such as PyTorch, TensorFlow, or JAX. Candidates should have a bachelor's degree or equivalent experience in Computer Science, Machine Learning, Physics, Mathematics, or a related discipline, with a strong theoretical and empirical grounding. Clear communication skills and the ability to explain complex technical concepts in writing are also essential.

Thinking Machines Lab offers a competitive annual salary ranging from $350,000 to $475,000, depending on background, skills, and experience. The company provides generous health, dental, and vision benefits, unlimited paid time off, paid parental leave, and relocation support as needed. Visa sponsorship is available for qualified candidates.

The company fosters a culture of innovation and collaboration, bringing together scientists, engineers, and builders who have created widely used AI products and open-source projects. Employees have opportunities for growth and are encouraged to publish and present research that advances the AI community. This role is ideal for individuals passionate about shaping the foundations of how AI learns and who enjoy both theoretical exploration and hands-on experimentation.

More Jobs at Thinking Machines Lab