Chemistry Data and Apps Marketplace
The Paramus Data Marketplace provides a curated portfolio of datasets and applications ready to run. Hosting large datasets locally can provide significantly faster data access and eliminate bandwidth bottlenecks during high-volume machine learning training.
It ensures reproducibility and data security, particularly when dealing with proprietary or sensitive chemical information. Moreover, local PARAMUS WORLD infrastructure allows full control over hardware utilization, reducing latency and cost associated with repeated cloud data transfers.
POLY / Polymer

RadonPy Polymer Dataset (PI1070)
Free (BSD 3)
Contains molecular-simulation data for 1,070 amorphous polymers, including computed physical properties such as density, heat capacity, refractive index, and thermal conductivity under defined conditions.
1.6 MB
RadonPy: Automated Physical Property Calculation using All-atom Classical Molecular Dynamics Simulations for Polymer Informatics npj Computational Materials (2022) DOI:10.1038/s41524-022-00906-4
PI1M
A Benchmark Database for Polymer Informatics
Free (MIT License)
A benchmark database containing approximately 1 million synthetic polymer structures generated using a generative model trained on ~12,000 polymers from PolyInfo. Designed to provide data resources for machine learning research in polymer informatics, covering density, glass transition temperature, melting temperature, and dielectric constants prediction tasks.
108.2 MB
Ruimin Ma, Tengfei Luo PI1M: A Benchmark Database for Polymer Informatics (2020) DOI:10.1021/acs.jcim.0c00726
OMG
Open Macromolecular Genome
Free (GPL)
The Open Macromolecular Genome (OMG) is a comprehensive polymer database designed for generative machine learning and synthetically accessible polymer design. OMG contains nearly 12 million chemically distinct constitutional repeating units (CRUs) generated from 77,281 commercially available monomer reactants using 17 canonical polymerization reactions. The database enables property-driven polymer design by providing synthetic pathways, purchasable reactants, and machine learning-compatible polymer representations.
369.5 MB
Kim, S., Schroeder, C. M., Jackson, N. E. Open Macromolecular Genome: Generative Design of Synthetically Accessible Polymers (2023) DOI:10.1021/acspolymersau.3c00003
VipEA
Vertical Ionization Potentials and Electron Affinities Dataset
Free (MIT License)
Computational dataset of vertical ionization potentials (IP) and electron affinities (EA) for polymer copolymers. Generated using xTB calculations for graph-based molecular property prediction of polymeric materials. Contains data for over 10,000 copolymers with associated quantum chemical properties.
6.5 MB
Matteo Aldeghi, Connor W. Coley A graph representation of molecular ensembles for polymer property prediction (2022) DOI:10.1039/D2SC02839E
OMG-Property-Database
Monomer-level Properties for Synthetically Accessible Polymers
Free (MIT License)
Comprehensive database containing monomer-level chemical and physical properties for approximately 12 million synthetically accessible polymers from the Open Macromolecular Genome. Generated through quantum chemistry calculations integrated with active learning to efficiently probe vast chemical space of synthetically feasible polymers. Includes DFT, TD-DFT calculations, conformer geometries, and ML-based property predictions with uncertainties.
40.9 GB
Seonghwan Kim, Charles M. Schroeder, Nicholas E. Jackson Functional monomer design for synthetically accessible polymers (2025) DOI:10.1039/D4SC08617A
PolyIE
Free (Apache 2.0 License)
Annotations of 146 full-length scholarly articles. Each article is annotated with named entities including compound names, property names, property values, and experimental conditions, along with their complex N-ary relations that capture the intricate relationships between materials, properties, and measurement contexts.
6.2 MB
Cheung, J. J., Zhuang, Y., Li, Y., Shetty, P., Zhao, W., Grampurohit, S., Ramprasad, R. & Zhang, C. PolyIE: A Dataset of Information Extraction from Polymer Material Scientific Literature. arXiv preprint (2023). DOI: 10.48550/arXiv.2311.07715
COMP / Quantum

QM9
Quantum Chemistry Structures and Properties of 134 Kilo Molecules
Free (Creative Commons Attribution 4.0)
QM9 is a comprehensive quantum chemistry dataset containing computed geometric, energetic, electronic, and thermodynamic properties for 130,831 stable small organic molecules made up of C, H, O, N, and F. The dataset provides molecular structures (SMILES, coordinates) and quantum chemical properties calculated using density functional theory for benchmarking molecular property prediction methods and quantum chemistry applications.
82.6 MB
Ramakrishnan, R., Dral, P. O., Rupp, M., von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules (2014) DOI:10.1038/sdata.2014.22
MSR-ACC/TAE25
77k Coupled Cluster Atomization Energies for Broad Chemical Space
Free (CDLA-Permissive-2.0)
Microsoft Research Accurate Chemistry Collection (MSR-ACC) presents MSR-ACC/TAE25, a dataset of 76,879 accurate total atomization energies (TAE) of small molecules with up to 5 non-hydrogen elements up to argon, excluding rare-gas atoms. The atomization energies are computed at the CCSD(T)/CBS level with the W1-F12 thermochemical protocol to provide sub-chemical accuracy within ±1 kcal/mol. The dataset exhaustively covers chemical space by enumerating and sampling chemical graphs, avoiding bias towards particular molecular subspaces.
913.7 MB
Sebastian Ehlert, Jan Hermann, Thijs Vogels, Victor Garcia Satorras, Stephanie Lanius, Marwin Segler, Derk P. Kooi, Kenji Takeda, Chin-Wei Huang, Giulia Luise, Rianne van den Berg, Paola Gori-Giorgi, Amir Karton Accurate Chemistry Collection: Coupled cluster atomization energies for broad chemical space (2025) DOI:10.48550/arXiv.2506.14492
INOR / Material Science

COD
Crystallography Open Database
Free (CC0 1.0)
Open-access collection of crystal structures of organic, inorganic, metal-organic compounds and minerals, excluding biopolymers. Contains over 528,000 crystal structure entries with comprehensive crystallographic data in CIF format, derived from experimental measurements and literature sources. The database serves as a comprehensive resource for crystallographic research, materials science, and structural chemistry applications.
91.8 GB
Saulius Gražulis, Daniel Chateigner, Robert T. Downs, Alexandre F. T. Yokochi, Miguel Quirós, Luca Lutterotti, Elena Manakova, Justas Butkus, Peter Moeck, Armel Le Bail Crystallography Open Database – an open-access collection of crystal structures (2009) DOI:10.1107/S0021889809016690
a-Si-24 synthetic dataset
a-Si-24 (Synthetic Amorphous Silicon Dataset
Free (MIT License)
Synthetic dataset of amorphous silicon structures generated via molecular dynamics simulations using melt-quench trajectories. Each structure represents a final MD snapshot and includes forces and energies labeled by an MTP potential. The dataset covers 3,069 structures (1,317,240 atoms) across various quench rates and densities.
652.0 MB
Signatures of Paracrystallinity in Amorphous Silicon (2024) DOI:10.48550/arXiv.2407.16681
Anionic Solvation Dataset
Solvation Free Energies of Anions: From Curated Reference Data to Predictive Models
Free (Creative Commons Attribution 4.0)
Comprehensive dataset for predicting physicochemical properties of ionizable solutes including 8,241 experimental pKa values across 8 solvents, 5,536 computed gas-phase acidities from DLPNO-CCSD(T) calculations, 6,090 solvation free energies of anions, and 6,088 solvation free energies of neutral compounds computed using COSMO-RS. Includes trained graph neural network models for rapid property prediction as alternative to quantum mechanical approaches.
652.0 MB
Thomas Nevolianis, Jonathan W. Zheng, Simon Müller, Matthias Baumann, Sofja Tshepelevitsh, Ivari Kaljurand, Ivo Leito, Irina Smirnova, William H. Green, Kai Leonhard Solvation free energies of anions: from curated reference data to predictive models (2025) DOI:10.26434/chemrxiv-2025-8bj2t-v2
ANYL / Analytics

QM9S
Dataset (QM9 Spectra
Free (Creative Commons Attribution 4.0)
The QM9S dataset is an enhanced version of the popular QM9 dataset, containing quantum chemical properties and molecular spectra for 130,000 small organic molecules. Built upon the original QM9 dataset, QM9S includes re-optimized molecular geometries at B3LYP/def-TZVP level and comprehensive molecular properties including scalars (energy, NPA charges), vectors (dipole moments), 2nd order tensors (Hessian matrix, polarizability), 3rd order tensors (hyperpolarizability), and complete spectroscopic data (IR, Raman, UV-Vis spectra).
24.1 GB
Zou, Z., et al. QM9S, a comprehensive quantum mechanical dataset of molecular spectra for machine learning (2023) DOI:10.1038/s43588-023-00550-y
ORGN / Organic Chemistry
BigSolDB
2.1: Solubility Values for Organic Compounds in Organic Solvents and Water
Free (Creative Commons Attribution 4.0)
112,465 experimentally measured solubility values of 1,525 organic compounds in 218 solvents reported in 1,687 peer-reviewed articles. Temperature range 243-403 K. Includes mole fraction, molar concentration, and LogS values. Companion density dataset with 218 solvent densities at various temperatures.
19.8 MB
Lev Krasnov, Dmitry Malikov, Marina Kiseleva, Sergei Tatarin, Sergey Sosnin, Stanislav Bezzubov BigSolDB 2.0, dataset of solubility values for organic compounds in different solvents (2025) DOI:10.1038/s41597-025-05559-8
Monetize Without Losing Control
Paramus.ai provides a secure marketplace enabling external vendors to monetize their chemistry applications, AI models, and datasets with full cost carry-over and transparent revenue models. Intellectual property remains fully protected; all packages run within the customer’s local infrastructure.
Publish Your Work
Paramus acts as a distribution and licensing platform for HPC applications, AI models, datasets, and LLM packages. Vendors gain access to a qualified R&D audience across academia and industry without operational overhead.
