The Small Molecule Suite
The Small Molecule Suite (SMS) is a free, open-acces tool developed by the Harvard Program in Therapeutic Sciences (HiTS) and funded by the NIH. The goal of the SMS is to help scientists understand and work with the targets of molecular probes, approved drugs and other drug-like molecules, while acknowliging the complexity of polypharmacology — the phenomenon that virtually all drug-like molecules bind multiple target proteins. The SMS combines data from the ChEMBL database with prepublished data from the Laboratory of Systems pharmacology. The methodology of calculating selectivities and similarities are explained in Moret et al. Cell Chem Biol 2019 (which can also be used to cite the Small Molecule Suite).
This work is licensed under the Creative Commons Attribution-ShareAlike license.
Use cases
Compound affinity and binding assertions
Target gene
The Selectivity app helps you find selective and potent small molecules against your target of interest.
To use the Selectivity app:
- Select a gene of interest in the top left corner of the application
- Change the filter settings as needed
- Look at the 'Affinity and selectivity plot' and select a region of compounds you are interested in
- The 'Affinity and selectivity data' will change upon your selection in (3), select the compound you are most interested in to see all its known targets in the 'Affinity and selectivity reference' (you may have to scroll down)
Reference compound
The Similarity app helps you find compounds similar to your compound of interest.
To use the Similarity app:
- Select a reference compound and set filters as desired. Three plots show up under 'Compound similarity plots'. These plots describe the similarity to the reference compound in phenotype (PFP), targets (TAS), and chemical structure (structural similarity) -- calculated using Morgan2 fingerprints in RDkit.
- Select an area of the compound similarity plots you are interested in. They will show up in table format under 'Compound similarity data'.
- In the 'Compound similarity data' select a compound so see Its Target Afinity Spectrum in the 'Compound similarity selections' that shows up below (you may have to scroll down).
Type or paste gene symbols in the text box below to generate a downloadable table of drugs targetting those genes. One gene per line.
The Library app helps you build custom small molecule libraries
To use the Library app:
- Submit a list of targets that you want to build the library for (in HUGO nomenclature), or select one of the pre-selected gene lists.
- Select up to which selectivity level you want to be included.
- Select which approval phases you want to include for clinical compounds.
- Select whether to include the compounds from chemicalprobes.org (4.0 star rating only).
- Choose whether to view the table per target or per compound
- Download the library.
Download Small Molecule Suite data
SMS version based on ChEMBL v29
The entire Small Molecule Suite dataset is available for download.
The data are organized in separate tables. Documentation
for each table and their relationships are available.
Table documentation Download tables from Synapse
Understanding compound and target identifiers
Compounds and targets in all tables are referred to using ID numbers
in the columns
lspci_id
and
lspci_target_id
,
respectively.
Compound and target IDS can be translated into compound names and target
symbols using the tables
lsp_compound_dictionary
and
lsp_target_dictionary
.
The table
lsp_compound_dictionary
also contains mappings
for the most common compound databases, such as ChEMBL, eMolecules and HMS LINCS.
Download tables in CSV format
SMS version based on ChEMBL v29
Name | Description | Size |
---|---|---|
lsp_biochem_agg | Table of aggregated biochemical affinity measurements. All available data for a single compound target pair were aggregated by taking the first quartile. | 15.2 MB |
lsp_biochem | Table of biochemical affinity measurements. | 52.9 MB |
lsp_clinical_info | Table of the clinical approval status of compounds. Sourced from ChEMBL | 45.5 kB |
lsp_commercial_availability | Table of the commercial availability of compounds. Sourced from eMolecules (https://www.emolecules.com/). | 388.0 MB |
lsp_compound_dictionary | Primary table listing all compounds in the database. During compound processing distinct salts of the same compound are aggregated into a single compound entry in this table. The constituent compound IDs for each compound in this table are available in the lsp_compound_mapping table. | 1.2 GB |
lsp_compound_library | Library of optimal compounds for each target. See 10.1016/j.chembiol.2019.02.018 for details. | 115.5 kB |
lsp_compound_mapping | Table of mappings between compound IDs from different sources to the internal lspci_ids. | 295.6 MB |
lsp_compound_names | Table of all annotated names for compounds. The sources for compound names generally distinguish between primary and alternative (secondary) names. | 13.2 MB |
lsp_manual_curation | Table of manual compund target binding assertions. | 9.6 kB |
lsp_one_dose_scan_agg | Table of single dose compound activity measurements as opposed to full dose-response affinity measurements. All available data for a single concentration and compound target pair were aggregated by taking the first quartile. | 2.2 MB |
lsp_one_dose_scans | Table of single dose compound activity measurements as opposed to full dose-response affinity measurements. | 5.2 MB |
lsp_phenotypic_agg | Table of aggregated phenotypic assays performed on the compounds. All available data for a single assay and compound target pair were aggregated by taking the first quartile. | 70.6 MB |
lsp_phenotypic | Table of phenotypic assays performed on the compounds. | 82.7 MB |
lsp_references | External references for the data in the database. | 2.2 MB |
lsp_selectivity | Table of selectivity assertions of compounds to their targets. See 10.1016/j.chembiol.2019.02.018 for details. | 24.9 MB |
lsp_structures | Additional secondary InChIs for compounds. | 14.2 MB |
lsp_target_dictionary | Table of drug targets. The original drug targets are mostly annotated as ChEMBL or UniProt IDs. For convenience we converted these IDs to Entrez gene IDs. The original mapping between ChEMBL and UniProt target IDs are in the table `lsp_target_mapping` | 1.4 MB |
lsp_target_mapping | Mapping between the original ChEMBL target IDs, their corresponding UniProt IDs and Entrez gene IDs. A single UniProt or ChEMBL ID can refer to protein complexes, therefore multiple gene IDs often map to the same UniProt or ChEMBL ID. | 273.8 kB |
lsp_tas_references | Table that makes it easier to link TAS values to the references that were used to compute the TAS values | 3.8 MB |
lsp_tas | Table of Target Affinity Spectrum (TAS) values for the affinity between compound and target. TAS enables aggregation of affinity measurements from heterogeneous sources and assays into a single value. See 10.1016/j.chembiol.2019.02.018 for details. | 10.6 MB |