AI Data Marketplace
The Paramus Data Marketplace provides a curated portfolio of datasets ready to install on your data server. Hosting large datasets locally can provide significantly faster data access and eliminate bandwidth bottlenecks during high-volume machine learning training.
It ensures reproducibility and data security, particularly when dealing with proprietary or sensitive chemical information. Moreover, local infrastructure allows full control over hardware utilization, reducing latency and cost associated with repeated cloud data transfers.
POLY / Polymer

RadonPy
Free (BSD 3)
RadonPy dataset and holds molecular‐simulation data for 1,070 (amorphous) polymers, including computed physical properties (e.g. density, heat capacity, refractive index, thermal conductivity) under given conditions.
1.6 MB
Hayashi, Y., Shiomi, J., Morikawa, J. & Yoshida, R. RadonPy: Automated Physical Property Calculation using All-atom Classical Molecular Dynamics Simulations for Polymer Informatics. npj Computational Materials 8, 222 (2022). DOI:10.1038/s41524-022-00906-4
InstallPL1M
Free (MIT License), but academic expected
A benchmark dataset for polymer informatics comprising over one million polymer entries.
PL1M provides computed and experimental polymer properties, enabling large-scale machine learning applications in polymer design, property prediction, and molecular informatics workflows.
113.5 MB
Ruimin Ma, et al. “PI1M: A Benchmark Dataset for Polymer Informatics.”
J. Chem. Inf. Model. 2020, 60 (12), 5714–5722.
DOI: 10.1021/acs.jcim.0c00726
OMG
Open Macromolecular Genome
Free (GPL v3.0)
A polymer database for generative machine learning containing ~12 million constitutional repeating units from 77,281 commercially available monomer reactants using 17 canonical polymerization reactions, enabling synthetically accessible polymer design.
369.4 MB
Kim, S., Schroeder, C. M., & Jackson, N. E. (2023). Open Macromolecular Genome: Generative Design of Synthetically Accessible Polymers.
ACS Polymers Au, 3(4), 318-330. DOI: 10.1021/acspolymersau.3c00003
VipEA
Vertical ionization potentials Electron Affinities
Free (MIT License)
Computational dataset with vertical ionization potentials and electron affinities for >10,000 polymer and copolymers using xTB calculations. Mentions acid and bromide monomers, extended tight binding methods, machine learning benchmarking
6.5 MB
Aldeghi, M. & Coley, C. W. A graph representation of molecular ensembles for polymer property prediction. Chem. Sci. 13, 10486-10498 (2022). DOI: 10.1039/D2SC02839E
InstallOMG-Property DB
Free (MIT License)
Monomer-level properties for ~12 million synthetically accessible polymers from Open Macromolecular Genome Synthetic (quantum chemistry + active learning) across multiple archive files. Methods: DFT, TD-DFT, xTB calculations with active learning properties: Electronic properties, molecular descriptors, Flory-Huggins parameters
52.1 GB ![]()
Kim, S., Schroeder, C. M. & Jackson, N. E. Functional monomer design for synthetically accessible polymers. Chemical Science (2025). DOI: 10.1039/D4SC08617A
InstallPolyIE
Free (Apache 2.0 License)
Annotations of 146 full-length scholarly articles. Each article is annotated with named entities including compound names, property names, property values, and experimental conditions, along with their complex N-ary relations that capture the intricate relationships between materials, properties, and measurement contexts.
6.2 MB
Cheung, J. J., Zhuang, Y., Li, Y., Shetty, P., Zhao, W., Grampurohit, S., Ramprasad, R. & Zhang, C. PolyIE: A Dataset of Information Extraction from Polymer Material Scientific Literature. arXiv preprint (2023). DOI: 10.48550/arXiv.2311.07715
InstallCOMP / Quantum

QM9
Academic, non-commercial use only
QM9 is a quantum chemistry dataset containing computed geometric, energetic, electronic, and thermodynamic properties for 130,831 stable small organic molecules made up of C, H, O, N, and F. The dataset provides molecular structures and quantum chemical properties calculated using density functional theory for benchmarking molecular property prediction methods. B3LYP/6-31G(2df,p)
82.6 MB
Ramakrishnan, R., Dral, P. O., Rupp, M., von Lilienfeld, O. A. (2014). Quantum chemistry structures and properties of 134 kilo molecules. Sci Data 1, 140022 (2014). DOI: 10.1038/sdata.2014.22
InstallMSR-ACC-TAE25
Free (CDLA-Permissive-2.0)
77k Coupled Cluster Atomization Energies for Broad Chemical Space, CCSD(T)/CBS level using W1-F12 thermochemical protocol. Accuracy: Sub-chemical accuracy within ±1 kcal/mol. Coverage: Elements H, Li-Ar (excluding rare gases), up to 5 non-H atoms
913.7 MB
Ehlert, S., Hermann, J., Vogels, T., Garcia Satorras, V., Lanius, S., Segler, M., Kooi, D. P., Takeda, K., Huang, C.-W., Luise, G., van den Berg, R., Gori-Giorgi, P. & Karton, A. Accurate Chemistry Collection: Coupled cluster atomization energies for broad chemical space. arXiv preprint (2025). DOI: 10.48550/arXiv.2506.14492
InstallINOR / Material Science

Crystallography Open Database
(COD)
Free (CC0 1.0)
528k experimental crystal structures of organic, inorganic, metal-organic compounds and minerals, crystallographic data in CIF format including unit cell parameters, space groups, atomic coordinates, crystal densities.
24.8 GB ![]()
Gražulis, S., Chateigner, D., Downs, R. T., Yokochi, A. F. T., Quirós, M., Lutterotti, L., Manakova, E., Butkus, J., Moeck, P. & Le Bail, A. (2009). Crystallography Open Database – an open-access collection of crystal structures. J. Appl. Cryst. 42, 726-729. DOI: 10.1107/S0021889809016690
Installa-Si-24
Free (MIT License)
Amorphous silicon simulation dataset containing MD trajectories for a-Si systems with varying sizes (64, 216, 512, and 1000 atoms). Generated using classical molecular dynamics simulations to study structural and dynamical properties of amorphous silicon materials.
139.2 MB
Multiple research contributions on amorphous silicon simulations and structural analysis methodologies.
InstallAnionic Solvation Dataset
Free (CC BY 4.0)
Physicochemical properties of ionizable solutes including 8k experimental pKa values across 8 solvents, 5.6k gas-phase acidities from DLPNO-CCSD(T) calculations, 6k anionic solvation free energies from thermodynamic cycles, and 6k neutral compound solvation energies from COSMO-RS.
652.0 MB
Nevolianis, T., Zheng, J. W., Müller, S., Baumann, M., Tshepelevitsh, S., Kaljurand, I., Leito, I., Smirnova, I., Green, W. H. & Leonhard, K. Solvation free energies of anions: from curated reference data to predictive models. ChemRxiv preprint (2025). DOI: 10.26434/chemrxiv-2025-8bj2t-v2
InstallANYL / Analytics

QM9S
Free (Creative Commons Attribution 4.0)
An enhanced version of the popular QM9 dataset, containing quantum chemical properties and molecular spectra for 130,000 small organic molecules with re-optimized geometries at B3LYP/def-TZVP + TD-DFT level, comprehensive tensorial properties, and complete spectroscopic data (IR, Raman, UV-Vis spectra).
24.15 GB ![]()
Zou, Z., et al. (2023). QM9S, a comprehensive quantum mechanical dataset of molecular spectra for machine learning. Nature Computational Science. DOI: 10.1038/s43588-023-00550-y
InstallMonetize Without Losing Control
Paramus.ai provides a secure data marketplace enabling external vendors to monetize their chemistry datasets with full cost carry-over and transparent revenue models. Intellectual property remains fully protected;
Empower Your Chemistry Data
Paramus acts only as a distribution and licensing platform. Vendors gain access to a qualified R&D audience across academia and industry without operational overhead.
