QSAR WORLD
Home | About QSAR World | Strand Life Sciences | Contact Us
Google Custom Search

Scripts

All the scripts on this page are written in the programming language Jython and they work in Sarchitect Designer version 2.2. Send your feedback, comments and suggestions about any of these scripts at edi...@qsarworld.com.

  
Scripts

1. Linear Regression Diagnostics

Author: Shaillay Kumar Dogra

Once a Linear Regression model has been built, various diagnostics can be run on it to check which compounds have had a strong influence in fitting the model. The model parameters may have been unduly decided by these data points and one may need to refit a model after removing these data points (and compare the effect). Also, homoscedasticity in data, that is an assumption for a linear regresion fit, needs to be checked.

Read discussion about the script >>
Download script >>

 

2. Rankit Plot

Author: Shaillay Kumar Dogra

Assumption of normal distribution of data is an important prerequisite for some statistical tests (parametric) and regression methods. This assumption can be tested by using various tests and graphical methods like rankit plots. A graph plotting the rankits versus the data points is known as a rankit plot. Such a plot should approximate a straight line, deviations from which can indicate evidence against normality of the distribution.

Read discussion about the script >>
Download script >>

 

3. Normal Probability Plot

Author: Shaillay Kumar Dogra

Normal probability plot is another way of testing the distribution of data as adhering to a normal distribution. As in the rankit plot, the points in the plot should approximate a straight line, deviations from which can indicate evidence against normality of the distribution (skewness). Plots to particularly look at are those for the residuals - raw, standardized or jack-knifed.

Read discussion about the script >>
Download script >>

 

4. Boltzman Probability

Author: Shaillay Kumar Dogra

This script computes Boltzmann-probability (of existence) of a conformer from a given set of conformers for a structure. It asks for a column containing the Energy values and another with the identifier tag (that indicates all conformers of a structure with the same label). Various columns get appended as and when they are calculated but the one of relevance is p-value column that has Boltzmann-probability term for those conformers that satisfy certain energy criterion, as explained in the script and the discussion notes.

Read discussion about the script >>
Download script >>

 

5. Average Descriptors

Author: Shaillay Kumar Dogra

This is a sister script to the Boltzmann-probability script and uses the p-values generated by the latter for weighted averaging of descriptors. Thus, using the two scripts in conjunction, we can generate averaged descriptors for a set of conformers of a given structure.

Read discussion about the script >>
Download script >>

 

6. Get MACCS Keys

Author: Shaillay Kumar Dogra

The script displays 166 MACCS Keys for the given compounds in a 166 columns table view. Such information may be helpful in say, running clustering or similarity analysis based on the MACCS keys.

Read discussion about the script >>
Download script >>

 

7. Pick 2D Descriptors

Author: Shaillay Kumar Dogra

Running this script will create a subset with only the 2D descriptors. Input is a superset of descriptors that also contains these 2D descriptors, in particular, "numAtoms" and "Lop" as these are used in the script to mark the start and end of 2D descriptor columns.

Read discussion about the script >>
Download script >>

 

8. Pick 3D Descriptors

Author: Shaillay Kumar Dogra

A small but useful script that allows one to pick 3D (conformational) descriptors from a set wherein ‘all’ descriptors have been computed. This is of help if, say, one wants to build models on 3D descriptors only, then compare against models obtained on the full set (or the 2D descriptors only).

Read discussion about the script >>
Download script >>

 

9. Tanimoto Coefficient

Author: Shaillay Kumar Dogra

Various similarity metric exist that return a score indicating the level of similarity between two molecules under comparison. Frequently used metrics are simple distance measures such as Hamming and Euclidean distance, and association coefficients such as Tanimoto, Dice and Cosine coefficients.

The script provided here computes Tanimoto coefficient between all pairs of compounds thus displaying an n x n matrix. Compounds that have Tanimoto coefficient values > 0.85 are generally considered similar to each other.

Read discussion about the script >>
Download script >>
Download Script for Sarchitect Designer version 2.3

 

10. Correlation Based Filtering

Author: Shaillay Kumar Dogra

This script filters off descriptors whose correlation with the endpoint is less than the user-defined cutoff. A subset folder is created containing all the marked columns and those descriptors which satisfied the correlation based filtering.

Read discussion about the script >>
Download script >>

 

11. Filter Descriptors

Author: Shaillay Kumar Dogra

The script provided here filters out descriptors that show low variation. Specifically, the user is asked to set a cutoff value based on which a given descriptor under consideration is either retained or rejected. This cutoff pertains to the number of distinct values a descriptor should exhibit in order to be retained. Output is a child dataset named as "Filtered Set" that contains descriptors that passed the filtering criterion. Also, the descriptors that were filtered out are shown in a scatter plot for visual inspection.

Read discussion about the script >>
Download script >>

 

12. Regression Flow - A

Author: Shaillay Kumar Dogra

The script given here automates the steps of picking top N descriptors based on correlation with endpoint, running an auto-correlation filter on these top descriptors with some cut-off value, and finally calling "Regression Forest" modeling algorithm on descriptors thus filtered . Multiple parameters, separated by a comma have been provided for the algorithm which results in a grid-search and creation of multiple models for the user to select and train on an appropriate one.

Read discussion about the script >>
Download script >>

 

13. Regression Flow - B

Author: Shaillay Kumar Dogra

The script given here automates the steps of picking top N descriptors based on correlation with endpoint, running an auto-correlation filter on these top descriptors with given cut-off value, and finally calling "Partial Least Squares" modeling algorithm on thus filtered descriptors. User can save the model once the prediction results get displayed.

Read discussion about the script >>
Download script >>

 

14. Regression Flow - C

Author: Shaillay Kumar Dogra

The script given here automates the steps of picking top N descriptors based on correlation with endpoint, running an auto-correlation filter on these top descriptors with a given cut-off value, and running "Forward Selection" wrapper using "Multivariate Linear Regression" algorithm for selection of descriptors deemed best for modeling. User can now run and save some appropriate model.

Read discussion about the script >>
Download script >>

 

15. Regression Flow - D

Author: Shaillay Kumar Dogra

The script given here automates the steps of picking top N descriptors based on correlation with endpoint, running an auto-correlation filter on these top descriptors with a given cut-off value, and running "Genetic Algorithm" wrapper using "Multivariate Linear Regression" algorithm for selecting descriptor-sets deemed best for modeling. User can now run and save some appropriate model.

Read discussion about the script >>
Download script >>

 

16. Classification Flow - A

Author: Shaillay Kumar Dogra

The script given here automates the steps of picking top N descriptors based on "Kruskal-Wallis" statistical test and calling "Decision Forest" modeling algorithm on thus selected descriptors. Multiple parameters separated by "," can be provided for the algorithm which would result in a grid-search and creation of multiple models for the user to now select and train on an appropriate one.

Read discussion about the script >>
Download script >>

 

17. Classification Flow - B

Author: Shaillay Kumar Dogra

The script provided here automates the steps of picking top 100 descriptors based on "Kruskal-Wallis" statistical test and calling "Axis-Parallel Decision Tree" modeling algorithm on thus selected descriptors. The results of this exercise are displayed. Further, a "Forward Selection" wrapper on "Axis-Parallel Decision Tree"is also executed to pick 5 best descriptors. User can view, select and save the best model(s). (User can change various parameters in the script.)

Read discussion about the script >>
Download script >>

 

18. Classification Flow - C

Author: Shaillay Kumar Dogra

The script provided here automates the steps of picking top 100 descriptors based on "Kruskal-Wallis" statistical test and calling "Naive Bayes" modeling algorithm on thus selected descriptors. The results of this exercise are displayed. Further, a "Forward Selection" wrapper on "Naive Bayes" is also executed to pick 10 best descriptors. User can view, select and save the best model(s). (Various algorithm parameters can be changed in the script.)

Read discussion about the script >>
Download script >>

 

19. Classification Flow - D

Author: Shaillay Kumar Dogra

The script provided here automates the steps of picking top 100 descriptors based on "Kruskal-Wallis" statistical test and then calling "Genetic Algorithm" with "Axis-Parallel Decision Tree" modeling algorithm on thus selected descriptors. The results of this exercise are displayed. Further, another "Genetic Algorithm" with "Naive Bayes" is also executed to pick best descriptor-sets. User can finally view, select and save the best model(s). (Various algorithm parameters can be changed in the script.)

Read discussion about the script >>
Download script >>

 

20. Classification Flow - E

Author: Shaillay Kumar Dogra

The script provided here automates the steps of picking top 100 descriptors based on "Kruskal-Wallis" statistical test and then calling "SVM" algorithm on 'Linear Kernel' on thus selected descriptors. The results are displayed. Further, "Forward Selection" with "SVM" on 'Linear Kernel' is also executed to pick 5 best descriptors. User can finally view, select and save the best model(s). (Various algorithm parameters can be changed in the script.)

Read discussion about the script >>
Download script >>

 
Have any Questions?
Name:
Email:
Enter your query/comment here
 

    Facilitated by
    Strand Life Sciences Pvt. LtdStrandls Logo