LogP, LogD, pKa and LogS: A Physicists guide to basic chemical properties

logp
logd
pKa
logs
chemistry
drug discovery
molecular properties
Author

Michael Green

Published

January 6, 2024

Modified

April 1, 2024

Introduction

Ever since I got into AI Based Drug Discovery a lot of terminology has been thrown my way. The idea to write this came from (Bhal 2021) which outlines that it’s important to know the fundamentals of molecular properties when doing drug discovery and development. Now, as a theoretical physicist, I’m no stranger to mathematical concepts, but there are a lot of them in Chemistry. In the beginning the most prevalent ones were LogP, LogD, pKa and LogS. I quickly grasped sort of what they were all about but I never really dug into what they actually mean and why they are so important. Hence this blogpost. I hope it will be as useful to you reading it as it was for me writing it. I certainly got a lot of answers to my many questions.

A Quick Primer on Acidity

As most of you probably remember from chemistry in high school, knowing the acidity of a solution is quite useful. Or well, at least we were forced to measure it often enough.. Anyway, to quantify this, the metric pH was introduced by a Danish chemist named Søren Peder Lauritz, who worked at the Carlsberg Laboratory in Copenhagen at the time. It measures the acidity or alkalinity of a solution. It’s essentially a scale we use to quantify just how acidic or basic (alkaline) a water-based solution is. You may hove noticed that we snuck two terms in for the same thing, namely Alkaline and Basic. I will refer to them as equal for the remainder of the post.

The pH scale ranges from 0 to 14:

  • Acidic Solutions: These have a pH less than 7. The lower the pH, the more acidic the solution. For example, lemon juice and vinegar are acidic, with pH values typically around 2 to 3.
  • Neutral Solution: A pH of exactly 7 indicates a neutral solution. Pure water is a classic example of a neutral solution.
  • Basic Solutions: These have a pH greater than 7. The higher the pH, the more basic the solution. For instance, baking soda in water is basic, with a pH around 8 to 9.

Ok so now you know the interpretation of the pH value but we still haven’t defined what it is. So let me do that right now.

\[\text{pH} = -\log_{10} [H^+]\]

Here you can see that pH is defined on a log scale of the concentration of hydrogen ions (\([H^+]\)) in the solution where the concentration is given in moles per liter. One mole (Mole (Unit) - Wikipedia — En.wikipedia.org” 2023) of any substance contains exactly \(6.022\cdot~10^{23}\) entities (atoms, molecules, ions, etc.). This number is what the scientific community refers to as Avogadro’s number. You might wonder why the “-” sign in front of the logarithm? Well, it’s simply because we want the pH expressed in positive numbers. The logarithm of a number less than one, which the concentration of anything in a solution always is, is always negative. So the “-” sign converts that to a positive number instead. Ok dude, I hear your enquiring minds cry. Why did you say that the pH is always between 0 and 14 when it’s clear from your definition that we in principle can get a number much higher than 14? Well, it’s a technicality. While mathematically correct, in practice, we never see values outside of the 0–14 range. This is because, in order to reach pH levels below 0 or above 14, one would require extraordinarily acidic or basic solutions, respectively (Lunawat 2023). An example of the pH scale can be seen in Figure 1 where everything from a really basic solution to a pure acid is illustrated.

Figure 1: A pH scale with annotated examples of chemicals at each integer pH value. Courtesy of Edward Stevens.

Ok fine, but so what? Quite a bit actually. pH is important in many areas of science but when it comes to drug discovery and development, for instance, the pH of different parts of the body can affect the absorption and efficacy of drugs. That’s why it’s crucial to know how our molecules react in different pH regimes.

My last point about the pH metric is that since it’s a logarithm of base 10 it means that a pH of 4 is 10 times more basic than a pH of 3 which in turn is 100 times more acidic than a pH of 5.

Ready, Set, Get to the Point!

Armed with the primer on acidity we can start diving into the metrics I started this whole blog post about. Let’s begin with talking a bit about what we want to know about potential molecules that might be useful to distinguish possible drug candidates from other molecules. One important aspect of molecules is how much they like water (hydrophilic) or fat (lipophilic). This matters because a potential drug needs to be able to pass lipid membranes and it cannot very well do that if it prefers the water outside.

LogP: A Measure of Lipophilicity

LogP measures the logarithm of ration between the concentration of a compound in a non-polar solvent and in water. It is common practice to measure this using octanol (PubChem 2003) as the non-polar solvent. Mathematically LogP is defined like

\[\text{LogP} = \log \frac{[\text{Drug}]_{\text{non-polar}}}{[\text{Drug}]_{\text{water}}}\]

where \([\text{Drug}]\) represents the concentration of the drug candidate (molecule). Why does this tell you anything about whether the molecule likes fat or water? Well, to understand this it’s best to take a look at how we can measure this experimentally. Imagine you take a flask consisting of water in the bottom and an oily alcohol (octanol) on the top. Put the molecule inside and shake well. After shaking measure how much of the drug is left in the octanol vs the water. This ratio tells you about the preference of the molecule. If most of it is residing in the octanol it’s more lipophilic. If more of it is in the water it’s more hydrophilic.

Typical values that are considered good for a drug candidate resides somewhere between 2 and 5 which means at least a factor 100 more lipophilic than hydrophilic. This might sound a bit strange if you’re new to this field. Why would a range be the optimal? Why not all lipophilic or all hydrophilic? Well it turns out that a drug, and I’m thinking oral drug here, needs to overcome two main challenges.

  1. It needs to be polar (hydrophilic) to a certain extent since blood mainly consists of water.
  2. It needs to be non-polar (lipophilic) since it needs to be able to cross lipid membranes.

Why does it matter for drug discovery? A drug’s absorption, distribution, metabolism, excretion, and toxicity (ADMET) are profoundly influenced by its LogP. Generally, a moderate LogP (not too high, not too low) is desirable for good bioavailability. Another term entered into the mix here: Bioavailability. This is a bit fluffy in my opinion but Lin (2022) has a nice definition of it.

Bioavailability is the amount of the drug that enters the blood and produces a therapeutic effect, compared to the total administered amount. It is usually expressed as a fraction.

I will just leave it at that. Exploring the effect a given drug might have is way beyond the scope of this little writeup.

LogD: Because pH Just Ain’t the Same

As I might have hinted at above in my little pH primer, pH is of some significance to drug discovery. Mainly because pH is not the same in all parts of our body. In fact it can differ quite wildly. We have a pH of around \(7.4\) in our blood while our intenstines are more acidic with a pH between \(6\) and \(7.4\). Our stomach sports a whoppin pH between \(1.5\) to \(3.5\) making it a strongly acidic environment. See Figure 1 for examples of the pH scale again.

Ok, so different areas of the body have rather different environments as highlighted by the different levels of pH. But why does this matter to us when developing drugs? Well, because a molecule is not only one molecule. Molecules are susceptible to ionization, a process that adds/removes a hydrogen to/from a functional group. In Figure 2 we can see a molecule in three different states. It features two ionizable groups. Which can be ionized individually and simultaneously depending of the pH of the solvent. All more than one ionization state of the molecule can exist in different concentrations depending on the pH of the solvent.

Figure 2: Illustration of the concentration of a molecule in three different states of ionization as a function of pH.

As it is, these ionizations also matter for the bioavailability which means that pH can directly affect the properties a given molecule has. That’s why it’s a good idea to include the pH level of the part of the body where we want the molecule to be soluble. I know I’m skipping a lot of important details here and am being a bit handwavy but I hope you’ll forgive me. So with apologies prepared let’s move on to see how we can fix the shortcomings of our LogP metric which only measures the concentration of the unionized form of the molecule.

First let’s go back to the definition of LogP which was as given below.

\[\text{LogP} = \log P = \log \frac{[\text{Drug}]_{\text{non-polar}}}{[\text{Drug}]_{\text{water}}}\]

Thus LogP is just the logarithm of the partition coefficient \(P\). So as this only looks at the unionized form of the molecule we would like to extend the equation to also take into account the concentration of it’s ionized form in water. This can be accomplished like so.

\[\text{LogD} = \log D = \log \frac{[\text{Drug}]_{\text{non-polar}}}{[\text{Drug}]_{\text{water}} + [\text{Ion}]_{\text{water}}}\]

Here \(D\) is known as the dissociation coefficient. This also the reason why LogD is said to take pH into account since the amount of the molecule that is ionized depends heavily on the pH of the solvent. In Figure 3 I’m illustrating how LogD varies for Piroxicam (Multum 2022) at different levels of pH of the solvent.

Figure 3: Illustration of how the experimentally determined LogD values of Piroxicam vary at different pH values. The data from this plot was taken from Ulrich, Goss, and Ebert (2021).

Here you can see something quite typical which is that an acidic drug has the highest lipophilicity (high LogD) when the environment is acidic as well. In our case Piroxicam is indeed an acidic drug which is ionized at higher pH values leading to a lower liphophilicity (lower LogD).

There is a nice theoretical relationship between LogD, LogP and pKa. The method, known as CALlogD, can be derived from the predicted LogP and pKa values (Wang et al. 2023). The equation is

\[\text{LogD} = \text{LogP} - \log \left( 1 + 10^{(\text{pH} - \text{pKa})\delta_i} \right) \]

where \(\delta_i = \{1, −1\}\) for acids and bases, respectively. The astute reader will notice that only a single pKa is provided here which is the global pKa. In Figure 4 I show how the LogD varies with pH for two acidic and two basic drugs.

Figure 4: Illustration of how the theoretical LogD of a few drugs varies at different levels of pH. This plot uses calculated LogP and pKa values from the AiChemy (2024) platform.

pKa: All Acids Are Not Equally Strong

Now let’s turn the attention to pKa which connects, in a way, LogP, LogD and pH. The metric pKa tells us how strong an acid is. The lower the value the stronger the acid. The pKa value tells us something about how easily an acid gives away a hydrogen ion. We define pKa like the equation below.

\[\text{pKa} = -\log K_a = -\log \frac{[H^+][A^-]}{[HA]} = \text{pH} - \log \frac{[A^-]}{[HA]}\]

I guess you’re starting to see the trend with the logarithms entering all of our metrics. But, I digress. Now, the \(K_a\) is called the acid dissociation constant which measures how easily an acid can dissociate in water. Then we just take the negative base 10 logarithm of that. That’s pKa for you! As you can see in the equation above it’s also related to the pH of the solvent. This relationship is known as the The Henderson-Hasselbalch equation. So in plain language the pKa is the negative base 10 logarithm of the concentration ratio between the acid (\(\text{HA}\)) and its conjugate base (\(A^-\)). Extending this reasoning you can also see that pKa is equal to pH when precisely half of the acid has dissociated. This follows from the fact that half of the acid dissociating means \(\frac{[A^-]}{[\text{HA}]}=1\). This means that we can interprete pKa as the pH at which half of the acid has dissociated.

I realized I skipped something again as I was reading what I just wrote. The term “conjugate base” just found it’s way into the mix. Let me clarify what I mean by that by quoting “PH and pKa — Shiken.ai” (2024).

Conjugate acids are bases that gained a proton \(H^+\). Conversely, Conjugate bases are acids that lost a proton \(H^+\).

What this means is that two compounds which differs only by one \(H^+\) are said to form a conjugate acid and base pair. I think the illustration in Figure 5 shows this relationship nicely.

Figure 5: An illustration of the relationship between an acid and it’s conjugate base and a base and it’s conjugate acid. Illustration originating from byjus.com.

One important aspect of pKa that eluded me for the longest of time was that it is intrinsically a local metric. What I mean by that is that it is a given functional group of the molecule which has a pKa value. If the pKa of a functional group is less than 7 it means that it will be positively charged at \(\text{pH}=7\) thus being acidic. A pKa larger than 7 means it would be negatively charged at \(\text{pH}=7\) which means being more basic.

So how do we actually use the pKa value in our daily work you say? I said in the beginning of this section that the lower the pKa the stronger the acid. This also means that the higher the pKa the weaker the acid. That makes sense. But what is also important to remember is that pKa is also useful for evaluating the strength of bases. Wait what? Yeah indeed but a common trap here is to just invert the meaning pKa holds for acids. Unfortunately it’s not that simple. Almost, but not quite. The interpretation is the higher the pKa the stronger the conjugate base! Ashenhurst (2010) gives a nice explanation of this.

Figure 6: A pKa table showing the strength of functional groups of acids and their conjugate bases. Graphics taken from Ashenhurst (2010).

There is a small point to be made about bases as well. The pKa equation above is coming from the Henderson–Hasselbalch equation (Henderson 1908a, 1908b). It’s basically just a rearranged version of it. The original form is

\[\text{pH} = \text{pK}_a + \log_{10} \frac{[A^-]}{[HA]}\]

and can be used to assess the pH of a solution of a weak acid in equilibrium. The pKa is the negative logarithm of the acid dissociation constant \(K_a\). It models the chemical reaction where an acid HA dissociates into a conjugate base \(A^-\) and a hydrogen ion \(H^+\). The reaction is given below.

\[\text{HA}\rightleftharpoons A^- + H^+\]

But what about bases? Well, for them the same ingredients for the equation is used but we call it the association constant \(K_b\) instead.

\[K_b = \frac{[HA]}{[A^-][H^+]} = \frac{1}{K_a}\]

In the above equation you can see that \(K_b\) is really just the inverse of \(K_a\). This way we can redo the derivation above but for pKb instead.

\[\text{pK}_b=-\log_{10} [HA] + \log_{10} [A^-][H^+] = \log_{10}\frac{[A^-]}{[HA]} + \text{pH}\]

It’s important to remember that for any base or acid in water the following relationship \(\text{pK}_a + \text{pK}_b = \text{pK}_w\) holds. The unknown here is the self-ionization constant for water \(pK_w\), which in many cases can be put to \(14\) since that’s value \(pK_w\) takes at \(25^\circ\) C in water.

Note

In drug discovery we rarely use \(\text{pK}_b\) but rather express the \(\text{pK}_a\) of the conjugate acid instead.

LogS: Water solubility is key

So far I talked a bit about why we need to know how strong an acid or a base is and that we need to know the lipophilicity of a molecule. This is all well and good but we also need to know how soluable a given molecule is in water. This is where LogS comes in. LogS is as the logarithm of the solubility of a molecule measured in mol/L. Now to complicate matters a bit there are two kinds of LogS that you need to know about. The first is “Intrinsic solubility” which is the solubility measured after equilibrium between the dissolved and solid state at a pH where the compound is neutral (DrugBank 2023). The second is pH-dependent solubility which refers to the solvation equilibrium being affected by the pH of the solution.

Why does LogS matter? Simply because a low solubility goes hand in hand with bad absorption and is also indicative of distribution problems. Therefor most drug hunters avoid molecules with low LogS scores. More than 80% of all drugs in the market have a \(\text{LogS}>-4\) (OrganicChemistryPortal). In Figure 7 you can see the LogS vs. pH for five different ionizable drugs. Solubility is the lowest at a pH of around 4. The highest solubility is achieved around pH 7. The species are colored in the original data. Please check out Shoghi et al. (2013) for a nicer visual. I was too lazy to create all the series. Apologies.

Figure 7: The solubility vs. pH profiles of five ionizable drugs of different nature (a monoprotic acid, a monoprotic base, a diprotic base and two amphoteric compounds showing a zwitterionic species each one). The data from this plot was taken from Shoghi et al. (2013).

The actual drug in this data is Sulfadimethoxine (DrugBank 2007) which approval in the US was revoked. It’s unionized form happens at pH of around 4 which means it’s non-polar which leads to the poor solubility in water. Making the solvent more acidic creates a positively charged ion which increases solubility in water. Conversely, making the solvent more basic creates a negatively charged ion which in turn also increases solubility.

Wrapping Up

Well, that was a lot to take in. At least for me. I hope you have gotten a better understading of LogP, LogD, pKa and LogS and how they are affecting modern drug discovery. I will be the first to admit there’s still a lot more to understand, but this should be a good starting point for diving further into these metrics.

I would like to end this post with a quick cheat sheet in Table 1 for you regarding the 4 properties I have been talking about. But before checking that out I would like to draw your attention to the importance of uncerstanding the uncertainty of all measurements you might do on molecules and proteins. There are very few precise values in the microscopic world and everything is constantly moving. As such any measurement we take will have a significant error attached to it. The size of this error is important to quantify. Especially when we start to train AI models on these type of data. Don’t hesitate to reach out if I have made any mistakes or blunders in my explanation. So whenever you see someone claim that a drug has a certain effect, like killing cancer cells in a petri dish, remember that the problem is never killing the cancer cells. It’s avoiding to kill everything else in the process.

Figure 8: An illustration from xkcd showing that killing cancer is not that hard.

With no further ado, here’s the table.

Happy hacking!

Table 1: Quick overview of the four descriptors LogP, LogD, pKa and LogS and what they are used for in drug discovery.
Property Description Predictive Significance in Drug Discovery
LogP A measure of the lipophilicity of a compound, indicating its distribution between a non-polar solvent (like octanol) and water. Calculated as the negative logarithm of the partition coefficient (P). Predicts how well a drug can pass through cell membranes. A balanced LogP (typically between 2 and 5) is often desirable for optimal absorption and distribution.
LogD Similar to LogP, but pH-dependent. It considers the distribution of both ionized and non-ionized forms of the compound across a solvent and water. Provides insight into the drug’s behavior at different pH levels, which is crucial for understanding its absorption and solubility under physiological conditions.
pKa The negative base-10 logarithm of the acid dissociation constant (Ka) of a solution. Indicates the strength of an acid or base in a compound. Essential for predicting the ionization state of a drug under physiological pH, which affects solubility and permeability. Different pKa values are preferred depending on the target site within the body.
LogS A logarithmic measure of a compound’s solubility in water. Crucial for determining the bioavailability of a drug. Higher solubility (higher LogS) is generally preferred for effective absorption and distribution in the body.

This post is also published at Medium.

References

AiChemy. 2024. AIChemy | Healing the World Through Sustainable AI Driven Drug Discovery — Aichemy.io.” https://aichemy.io.
Ashenhurst, James. 2010. How to Use a pKa Table — Masterorganicchemistry.com.” https://www.masterorganicchemistry.com/2010/09/29/how-to-use-a-pka-table/.
Bhal, Sanji. 2021. Partitioning (LogP or LogD)—Are You Using or Measuring the Right Descriptor? - ACD/Labs — Acdlabs.com.” https://www.acdlabs.com/blog/partitioning-logp-or-logd-are-you-using-measuring-the-right-descriptor/.
DrugBank. 2007. Sulfadimethoxine: Uses, Interactions, Mechanism of Action | DrugBank Online — Go.drugbank.com.” https://go.drugbank.com/drugs/DB06150.
———. 2023. Log S | DrugBank Help Center — Dev.drugbank.com.” https://dev.drugbank.com/guides/terms/log-s.
Henderson, Lawrence J. 1908a. “Concerning the Relationship Between the Strength of Acids and Their Capacity to Preserve Neutrality.” American Journal of Physiology-Legacy Content 21 (2): 173–79. https://doi.org/10.1152/ajplegacy.1908.21.2.173.
———. 1908b. “The Theory of Neutrality Regulation in the Animal Organism.” American Journal of Physiology-Legacy Content 21 (4): 427–48. https://doi.org/10.1152/ajplegacy.1908.21.4.427.
Lin, Sean. 2022. Using Log P and Log D to Assess Drug Bioavailability — Ftloscience.com.” https://ftloscience.com/log-p-log-d-drug-bioavailability/.
Lunawat, Rajat. 2023. Why Are pH Values Only In A Range Of 0-14? — Scienceabc.com.” https://www.scienceabc.com/pure-sciences/can-ph-have-values-out-of-the-0-14-range.html.
Mole (Unit) - Wikipedia — En.wikipedia.org.” 2023. https://en.wikipedia.org/wiki/Mole_(unit).
Multum, Cerner. 2022. Piroxicam Uses, Side Effects & Warnings — Drugs.com.” https://www.drugs.com/mtm/piroxicam.html.
OrganicChemistryPortal. “LogS Calculation - Osiris Property Explorer — Organic-Chemistry.org.” https://www.organic-chemistry.org/prog/peo/logS.html.
“PH and pKa — Shiken.ai.” 2024. https://shiken.ai/chemistry/ph-and-pka.
PubChem. 2003. “1-Octanol — Pubchem.ncbi.nlm.nih.gov.” https://pubchem.ncbi.nlm.nih.gov/compound/1-Octanol.
Shoghi, Elham, Elisabet Fuguet, Elisabeth Bosch, and Clara Ràfols. 2013. “Solubility–pH Profiles of Some Acidic, Basic and Amphoteric Drugs.” European Journal of Pharmaceutical Sciences 48 (1): 291–300. https://doi.org/https://doi.org/10.1016/j.ejps.2012.10.028.
Ulrich, Nadin, Kai-Uwe Goss, and Andrea Ebert. 2021. “Exploring the Octanol–Water Partition Coefficient Dataset Using Deep Learning Techniques and Data Augmentation.” Communications Chemistry 4 (1): 90. https://doi.org/10.1038/s42004-021-00528-9.
Wang, Yitian, Jiacheng Xiong, Fu Xiao, Wei Zhang, Kaiyang Cheng, Jingxin Rao, Buying Niu, et al. 2023. LogD7.4 Prediction Enhanced by Transferring Knowledge from Chromatographic Retention Time, Microscopic pKa and logP.” Journal of Cheminformatics 15 (1): 76. https://doi.org/10.1186/s13321-023-00754-4.