Descrizione:
Cambrex Profarmaco Milano is looking for a curious and dynamic intern interesed in Artificial Intelligence.
For our Site in Paullo (Milan) we are looking for a trainee interested in with an expertise in ML methods and AI to insert in our Analytical Development team for an industrial secondment focused on building an AI-powered HPLC recommendation engine. This project leverages proprietary pharmaceutical data to develop machine learning models that predict optimal chromatographic conditions for new compounds.
- Responsibilities:
- Design and curate a structured experimental knowledge database linking molecular representations (SMILES, InChI, fingerprints), physicochemical descriptors, chromatographic method parameters, and experimental outcomes, with a focus on data quality, reproducibility, and design suitable for ML;
- Develop a feature selection pipeline for classical descriptors (RDKit, Mordred) and learned representations (molecular fingerprints, graph-based embeddings);
- Research, train, and benchmark predictive models for chromatographic outcomes (retention behaviour, mobile phase strength, column selectivity class), exploring both interpretable models and state-of-the-art approaches;
- Design a molecular similarity module grounded in chemical space geometry, evaluating distance metrics and embedding spaces for nearest-neighbour method retrieval from historical data;
- Build a recommendation engine that unifies predictive modelling and the similarity module, with uncertainty quantification, confidence scoring, and explainability to support trust and adoption by domain scientists;
- Extend the engine into an agentic LLM interface that allows natural language interaction with the underlying models and database;
- Validate the system on held-out experimental data, document methodology to publication standard, and present research outputs to both technical and domain-expert audiences.
Qualifications and Skills:
- Degree in Computer Science/Engineering or a closely related field;
- Deep understanding of ML and AI.
- Strong research instincts: ability to identify the right problem formulation, design-controlled experiments and critically evaluate model behaviour rather than just benchmark metrics;
- Proficiency in Python and relevant libraries (scikit-learn, pandas, NumPy, PyTorch, TensorFlow), comfort reading and adapting research code.
Soft Skills:
- Excellent interpersonal and communication skills.
- Ability to work effectively in a team and flexibility;
- Proactivity, a strong focus on results, and problem-solving skills;
- Ability to work independently and communicate technical results to a non-specialist audience.
It would be considered a plus:
- Familiarity with molecular representations and cheminformatics tools (SMILES, fingerprints, graph neural networks for molecules) or willingness to learn;
- Active interest in explainable and interpretable ML (XAI), particularly in applied scientific contexts where trust and transparency are critical;
- Hands-on experience with LLM tool-use, function calling, or agentic frameworks or conceptual grounding in how LLMs interact with external systems;
- Exposure to scientific, industrial, or experimental datasets with inherent noise, class imbalance, or sparse labelling, common in real-world R&D settings.
What You Will Gain:
- Access to a proprietary industrial HPLC dataset not available in academic settings;
- AI research challenge with real-world constraints;
- Immersion in cheminformatics and pharmaceutical analytical R&D, with direct collaboration with domain scientists who will challenge and sharpen your modelling decisions;
- Research outputs aligned with your doctoral trajectory: publishable methodology, a working system demonstrating scientific AI in industrial settings;
- Mentoring from both chemoinformatics and domain experts, with genuine intellectual exchange;
- Professional networking opportunities.
Location: Cambrex Profarmaco Milano Srl, Paullo (MI) On-site or Hybrid Model (to be discussed)
Contract: We offer 1 year scholarship contract, details will be clarified during the interview process.