Accelerating Scientific Workflows with LLM Agents

Large Language Model (LLM) agents are transforming the way we approach the Design, Make, Test, and Scale (DMTS) steps in chemistry research. These AI-driven collaborators can extract insights from literature, model reaction conditions, and analyze data with unprecedented speed.

Example Reaction and Corresponding Prompts

Acetylsalicylic acid is synthesized via the esterification of salicylic acid with acetic anhydride, catalyzed by sulfuric or phosphoric acid under reflux. The reaction follows a nucleophilic acyl substitution mechanism, yielding aspirin and acetic acid as a byproduct. Purification is typically achieved through recrystallization from water or ethanol to remove unreacted reagents and side products.

35 Prompts to Guide the Process (DMTA)

Some prompts are conceptual or under evaluation. Fine-tuning the overall system -the interaction of the agents – requires further work.

#PromptAnswer (Example)Involved LLM AgentInvolved Tool
Discovery Phase
1Supervisor, initiate aspirin discovery workflowDiscovery workflow initializedSupervisorParamus Routing
2Retrieve molecules similar to aspirin[Salicylic acid, methyl salicylate, benzocaine]DataMolecular Registry
3Generate SMILES for acetylsalicylic acidCC(=O)Oc1ccccc1C(=O)OOpen Cheminformatics (not LLM!)RDKit
4Compute physicochemical properties for aspirinMW=180.16, logP=1.19Open CheminformaticsRDKit
5Check inventory for salicylic acid and acetic anhydrideAvailable: Salicylic Acid (100g), Acetic Anhydride (250 mL)DataInventory System
6Review known aspirin reactions.Fischer esterification, acetylation (standard synthesis)LiteratureGoogle
7Summarize known risks of aspirin synthesisHydrolysis risk, moisture sensitivityLiteraturePubChem, ACS
Data Analysis Phase
8Design a fractional factorial DOE for aspirin synthesisGenerated DOE with factors: Temperature (60-80°C), Catalyst (H₂SO₄)DataSciencescikit-learn
9Calculate moles from 5g salicylic acid5 g corresponds to 0.0362 molCalculatorPython Interpreter
10Analyze experimental results to find optimal temperatureOptimal temperature: 70°CDataScienceRandom Forest (scikit-learn)
11Evaluate impurity correlation with catalyst amountHigher catalyst correlates to fewer impurities (r²=0.85)DataScienceStatistical Model
Experimentation Phase
12Compute mass of acetic anhydride needed for reaction7.39 mL requiredOpen Cheminformatics, CalculatorRDKit, Python Interpreter
13Check availability of acid catalyst in the labCatalyst (Sulfuric acid, 50 mL) availableDataInventory Query
14Predict reaction time for 95% yieldReaction time ~45 min at 75°CLiterature, Cheminformatics, DataScienceSimple ML Model (Regression)
(hypothetical) 15Monitor reaction temperature stabilityTemperature stable at 75±1°C throughout reactionData(2026: Paramus Collector) → IoT Sensor Integration
16Confirm reaction completion using TLC dataTLC indicates no salicylic acid presence; reaction completeDataScienceImage Analysis (OpenCV)
Optimization Phase
17Suggest adjustments to reduce reaction impuritiesDecrease temperature to 68°C and reduce catalyst by 10%DataScience, Cheminformatics, LiteratureDOE/Statistical analysis
18Calculate yield improvement from temperature adjustmentEstimated yield increase: 4.5%CalculatorPython Interpreter
19List historical experiments related to yield optimizationExperiment IDs [THGR-201, THGR-434, AMBR-1001]Data (from ELN!)ELN
20Verify inventory for scale-up materialsMaterials sufficient for 10-fold scale-upDataInventory
21Compute cost impact of scale-up conditionsCost reduction: 15% per gram productLiterature, CodePython Interpreter
Literature Review Phase
22Retrieve latest pharmacokinetics studies on aspirin3 recent studies found, summarized in fileLiteraturePubMed API
23Summarize current safety guidelines for aspirin useFDA guidelines updated 2023, dosage max 325 mg/dayLiteratureGoogle
24Check regulatory compliance for aspirin synthesis methodComplies with current chemical manufacturing standardsLiteratureGoogle
25Obtain current patent landscape for aspirin synthesisNo conflicting patents foundLiteratureGoogle patent
26Search adverse events databases for aspirinCommon adverse events: gastrointestinal irritationLiteratureGoogle
Computational Validation Phase
27Optimize aspirin molecular geometryOptimized structure converged at B3LYP/6-31G*Computational ChemistryPSI4
28Compute infrared spectrum of aspirinMajor IR peaks at 1750 cm⁻¹ and 1650 cm⁻¹Computational ChemistryPSI4
29Predict aspirin solubility via QSAR modelingPredicted solubility in water: 3 mg/mLDataScience, Cheminformaticssklearn, SVM and ensemble
models
30Calculate transition states of aspirin hydrolysisActivation energy calculated: ΔG‡ = 23.4 kcal/molComputational ChemistryPSI4, OPT_TYPE set to TS
31Verify predicted IR spectra against literature valuesIR peaks match literature within ±5 cm⁻¹LiteraturePubchem
Deployment Phase
32Store optimized reaction protocolProtocol saved as THGR-ASP-2025-01DataELN (is also in Copilots file system)
33Update inventory based on experimental consumptionInventory updated: salicylic acid reduced by 50gDataInventory UPDATE
34Generate compliance report for QAReport generated and ready for QA reviewLiteratureas .rd file in local file system
35Prepare summary of synthesis and computational data for internal reportSummary report ready; includes experimental yields, computational validation, and literature referencesSupervisor
as .rd file in local file system

Although not every agent from the example is fully productive yet, each one contributes targeted expertise that drives efficiency. By combining literature analysis, computational chemistry, and data insights, researchers gain a powerful toolset to streamline a development of aspirin synthesis. The following curated list of prompts illustrates how LLM agents can pave the way for a more robust and scalable DMTS process.

Scroll to Top