This is Figure 3.14 from Dill and Bromberg's Molecular Driving Forces, which is my favorite book on statistical mechanics. It is a beautiful example from a beautiful book.

If you haven't already, it is a good idea to read Illustrating entropy and Where does the ln come from in S = k ln(W). Go ahead, I'll wait right here.

System $A$ has an internal energy $U_A=2$ which is distributed among the 10 particles in $$\frac{10!}{8!2!}=45$$different ways. Similarly, System $B$ has an internal energy $U_B=4$ which is distributed among the 10 particles in 210 different ways.

If the two systems are allowed to exchange energy, what is the most probable distribution of energies? It is the one for which $$W_{total}=W_AW_B$$ is largest.

Now compute $W_{total}$ for three cases: no energy transfer ($U_A=2,U_B=4$), one where energy is transferred from $A$ to $B$ ($U_A=1,U_B=5$), and one where energy is transferred from $B$ to $A$ ($U_A=3,U_B=3$). If you don't have a calculator handy, try Wolfram-Alpha. Which state is the most probable?

**Energy transfers to maximize the total entropy, not equalize energies**

The most probable state has the largest total entropy since $$S_{total}=k\ln(W_{total})$$In this particular case maximizing the entropy leads to equal energy, but that is only because the two systems have the same number of particles. Consider system $A$ in the figure above in thermal contact with system $B$ with the same energy ($U_B=2$) but only four particles. What is the most likely state? (Don't guess, compute!)

**Maximizing entropy, means equalizing temperatures**

It should be clear by now that if you change the internal energy, you change the entropy$$dS=\left(\frac{\partial S}{\partial U}\right)dU$$This is actually just the thermodynamic definition of entropy $dS=dq_{rev}/T$ which means that$$\frac{1}{T}=\left(\frac{\partial S}{\partial U}\right)$$When $S_{total}$ is a maximum the change in total entropy is zero, so $$\begin{aligned}dS_{total}&=dS_A+dS_B\\&=\left(\frac{\partial S_A}{\partial U_A}\right)dU_A+\left(\frac{\partial S_B}{\partial U_B}\right)dU_B\\&=\frac{1}{T_A}dU_A+\frac{1}{T_B}dU_B\\&=\left(\frac{1}{T_A}-\frac{1}{T_B}\right)dU_A\\&=0\end{aligned}$$Here I have made use of the fact that the total internal energy is conserved$$dU_A=-dU_B$$