getting the most out of your computer – threading

Flashback to 2002

Technology is amazing.  Pretty much since personal computers were created the speed of their processors have been increasing at almost a linear rate of doubling in speed every two years.  This is awesome but it is also a physical impossibility that the technology will continue to decrease in size forever.

At the same time that the speed of the cpu’s is coming into question, the hardware companies have started to add multiple cores to their processors.   The only trick is to be able to use them.

Back to the present

Processors are no longer doubling in speed every two years.  Technology is progressing but now everything is more evolutionary than revolutionary.  With one small exception. The CPU’s are packing in more cores.

A dual core machine was a noteworthy event a few years back but these days there is 64 bit quad core machines supporting one or two threads per core on every desk.  Twelve core processors may not be on every desktop but for the price of a quad core machine from yesteryear, you can purchase a 12 core CPU today for the cost of the computer a few years back.  Of course, then you will need to purchase the rest of the computer.

It is actually possible to create a big multi-core system with a reasonable budget rather than a princely budget.

With the acceleration of the CPU speeds slowing down the only way to get more power out of our machines is to effectively use all the processing power that is available.

Today the ability to take advantage of the underlying hardware is so much easier now than it has been in the past.  The past is obviously when the IT guys were trying to take full advantage of machines with multiple single core processors.

Quicksort

Normal Java programs are just a regular single threaded processes.  One of the main problems with multi-threaded programs is you need to have a problem that can easily be broken down into independent pieces.

I wanted to come up with a real world solution not just a couple of threads running independently sending some interleaved output to the console.

The only programming task that came to mind that was small enough to easily get an overview on was sorting a list- specifically the Quicksort method.  The Quicksort method splits up the list into smaller sub lists while swapping items that are on the wrong side of the pivot spot.  More information about Quicksort can be found in a lot computer science books or other web sites but a nice description of the Quicksort can also be found on wikipedia.

The sort is really efficient and much faster when compared to other sorting methods, but pretty much as expected it only uses one of the four cores that is available on my computer.

cpu-singlethread

Quicksort code at bottom of page

 

Java helps out

Despite the bad wrap that Java has received over efficiency when compared to other compiled languages it does offer quite a lot to the developer right out of the box.

In this case, Java provides the Thread class which provides the infrastructure to allow anyone to run multiple tasks simultaneously, well to the best ability of the underlying hardware.

The Thread class is actually just another java object.  There are a number of methods that are available to help control your thread.

  • start
  • run
  • join
  • isAlive
  • interrupt
  • sleep
  • yield

The run method is actually the implementation the interface Runnable.  This is the actual starting point for the execution of the thread.  What might seem a bit counter intuitive is that you don’t actually call this method yourself.  If you call the run method directly then you are simply executing another method call from your program.

The actual starting of the thread is done by calling the start method.  Java will then do all the necessary work to create a thread, either for your virtual machine or using a OS native thread.

The next and perhaps one of the most important methods is join.  This method call will cause the calling thread, or the main program, to block until the called thread is finished.

The threaded Quicksort that I have developed actually doesn’t really need to check if the threads are alive due to how it was was programmed.  The program simply creates threads as it recursively processes the data.  If your problem were somewhat differently structured you might be starting a number of parallel threads and look at a later point if they have finished by using the isAlive method.

The last three methods on my list are really about controlling if a thread is running or should be blocked.  The interrupt method is a way to force a thread to block.  The yield method is almost the same thing.  You are causing a thread to block itself to allow another waiting thread with the same priority to run.

The last method on the list is sleep.  This is another way to block a thread, albeit in a slightly different manner.  This method causes the thread to sleep for a given number of milliseconds.  This at least tacitly implies that although it is giving someone else a shot at the system resources it will be run again after the sleep period is finished.

How threads are run and when they are blocked really depends on the scheduler.  This would be dependent on both the operating system and Java virtual machine.

Converting over to threaded algorithm

The easiest way to take advantage of threading is to simply to create your own class to extend the Thread.  When this is done the only thing the developer needs to implement is his own run method.

The neat thing about the Quicksort algorithm is that it is recursive.  Sorting the entire list looks just like sorting any sub-part of the list.

The change to my sample quick sort program is simply to do each sub-list as a different thread.  The code is actually hardly different at all, but does it actually work?

cpu-maxthread

multi-threaded quicksort code at bottom of page

It works really well.  This screenshot shows a spike on all four cores when I run the threaded version of the program.

single threaded
took 5495 milli seconds

multi threaded
took 2783 milli seconds

Why does this work ?

It is important to remember that Java passes it variables to methods by value.  If this is the case, how can this sort routine work?

The thread example works because in Java variables are passed by value, but things are a bit special in the case of arrays.  In this case, the value being passed is the reference to the array.

Yet when we manipulate the individual elements of the array those changes will be reflected at the source, so the passing of the array around “willy nilly” between threads works.

This entire solution also works because the Quicksort partitions the data into mutually exclusive data sets.  This is why the program doesn’t have any semaphores to prevent us from corrupting our data by having two or more processes changing things simultaneously.

Finally it works because we are not changing what the “array” points to. If we did that, this would not be changed in the calling method as the array reference was passed by value. That would be exactly the same as passing in a integer to a method and changing it (while expecting it to change in the calling method)

It does four threads but can it do more?  I ran the program on another computer that could run eight threads simultaneously and the program ran as expected.  linux-multithreadThe cpu was saturated for a brief time.  The threaded version, as expected, does run faster when provided with more CPU resources.

single threaded
took 4901 milli seconds

multi threaded
took 1318 milli seconds

If a little threading is good then a lot of threading must be better? Right?

The limit of threading is somewhat controlled in the Quicksort method.  I only start a threaded version of the Quicksort if there are at least two million values in the list.  This is because there is overhead associated with creating a thread.

Other important facts

The reason why this threaded application looks so simple is because it is definitely not production quality code.  Just like memory or disk space there are limits to how many threads that can be created or run at one time.

The threaded version of the Quicksort doesn’t do any checks to ensure that it doesn’t over step the limits of the computer.  The limits are dependent on a number of factors such as the operating system and the resources available.

Is this really worth the effort

Yes and no.  Why don’t I have a lot of threaded programs laying about that can be used as examples.  There are a number of reasons.

  • The problem must lend itself to being easily broken up
  • You must have enough data to make the task worthwhile

I needed a lot of data in order to keep my computer busy for five seconds utilizing a single thread.  The flip side of the argument to using threads is that creating threads also take resources.  If I create a thread for each recursion when using small data sets, the overall runtime is longer for the threaded version of the program.  This extra run time is exclusively due to overhead for threads.

In order to get a good test, I created a small program to generate 40 million integers just to keep the CPU busy for a few seconds.

The threaded version of the program run about 3.6 seconds faster than the single threaded approach.  If the additional development time to add only the threading was calculated to be one hour, the program might need to be run almost a thousand times or more to offset the extra development costs versus the single threaded version.  This is might be fine for a task that runs every minute but perhaps a poor investment for a task that runs once a day.

Parting shot

Perhaps 30 years ago computer scientists got excited about sorting algorithms, but that was due to a lot of factors.  Computing time was quite expensive and resources were very limited.  There are a lot of things that are possible with a lot of cheap CPU, ram or disk space that would have made these old time developers cry about the inefficiency of some of the modern solutions.

Just for fun, I did a bubble sort and used the built in java sort to get some other numbers.

Java sort
55730 milliseconds

Bubble sort
***** milliseconds

Bubble sorting is known to be perhaps the easiest to understand for beginners but the most inefficient.  The time required to sort 40 million integers was excessive and my lack of patients only lasted 2 1/2 days.  I did some smaller runs which also ran fairly slowly, ten thousand integers were sorted in less than half a second while one million integers took about 70 minutes.  The runtimes did not have a perfectly linear increase but based on some of the calculations it might take about 4 days to sort through ten million integers.

Parallel processing has to be part of the new reality and in the right situation it can indeed make a tremendous difference.  Yet one of the real solutions is to use efficient algorithms, the right tool for the right situation and knowing when further optimizations are not adding value.

Note:  It is quite unfair to the java sort to accuse it of taking such a long time to sort this data.  The other sorts were using simple integers and a simple array.  To use the build in sort it was necessary to use the Integer class and a collection.  In any case, 55 seconds is probably a reasonable sort time on an old laptop for 40 million integers.

Code 

Single threaded Quicksort

package singlethreaded;

/*
 * this example is an implementation of the quicksort
 * algorithm described in wikipedia entry.
 * 
 *      https://en.wikipedia.org/wiki/Quicksort
 * 
 */
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.ObjectInputStream;

public class Quicksort 
{
	int datalist[] = {  12, 16, 333, 50, 1000, 5, 897, 1, 3, 66, 13 };

	public void quicksort(int datatosort[], int lo, int hi) 
	{
		if (lo < hi)
		{
			int partidx = partition(datatosort, lo, hi);
			quicksort(datatosort, lo, partidx);
			quicksort(datatosort, partidx + 1, hi);
		}
	}

	/*
	 * Hoare partition scheme
	 */
	public int  partition(int datatosort[], int lo, int hi)
	{
		int pivot = datatosort[lo];
		int i = lo - 1;
		int j = hi + 1;

		while (true)
		{
			do {
				i = i + 1;
			} while (datatosort[i] < pivot); 

			do {
			        j = j - 1; 
			} while (datatosort[j] > pivot);

			if (i >= j) 
				break;

			swap(datatosort,i,j); 
		}
		return j;
	}

	void swap( int datatosort[], int left, int right)
	{
		int temp = datatosort[left];
		datatosort[left] = datatosort[right];
		datatosort[right] = temp;
	}

	public void load()
	{
		String filename = "testdata.bin";
		ObjectInputStream inputStream = null;
		try {
			inputStream = new ObjectInputStream(new FileInputStream(filename));
		} catch (FileNotFoundException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		
		try {
			datalist = (int[])inputStream.readObject();
		} catch (ClassNotFoundException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		
		try {
			inputStream.close();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}

	public void sort ()
	{
		quicksort(datalist,0,datalist.length - 1);
	}

	public static void main(String args[])
	{
		Quicksort sorter = new Quicksort();
		sorter.load();
		long nanoStart = System.nanoTime();
		sorter.sort();
		long nanoEnd = System.nanoTime();
		long nanoseconds = (nanoEnd - nanoStart);
		System.out.println("took " + nanoseconds + " nano seconds");
		System.out.println("took " + nanoseconds / 1000000 + " milli seconds");
	}
}

 

Threaded controller

package multithreaded;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.ObjectInputStream;

public class Threadcontroller {

	int datalist[];
	
	public void sort ()
	{
		Threadquicksort sorter = new Threadquicksort();
		sorter.setupThread(datalist,0,datalist.length-1);
		sorter.start();


		// wait for it to finish
		try {
			sorter.join();
		} catch (InterruptedException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}

	public void load()
	{
		String filename = "testdata.bin";
		ObjectInputStream inputStream = null;
		try {
			inputStream = new ObjectInputStream(new FileInputStream(filename));
		} catch (FileNotFoundException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		
		try {
			datalist = (int[])inputStream.readObject();
		} catch (ClassNotFoundException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		
		try {
			inputStream.close();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
	
	public void print(boolean small)
	{
		String output = "";
		
		if (small == true)
		{
			for (int idx = 0; idx < datalist.length; idx++)
				output = output + datalist[idx] + " ";

			System.out.println(output);
		}
		else
		{
			for (int idx = 0; idx < datalist.length; idx++)
				System.out.println(datalist[idx]);
		}
	}
	
	public static void main(String args[])
	{
		Threadcontroller controller = new Threadcontroller();

		controller.load();
		long nanoStart = System.nanoTime();
		controller.sort();
		long nanoEnd = System.nanoTime();
		long nanoseconds = (nanoEnd - nanoStart);
		System.out.println("took " + nanoseconds + " nano seconds");
		System.out.println("took " + nanoseconds / 1000000 + " milli seconds");
	}
}

Threaded sort

package multithreaded;

public class Threadquicksort extends Thread
{
	int referenceToDatalist[];
	int start, end;

	public void setupThread(int datalist[], int start, int end)
	{
		this.start = start;
		this.end = end;
		referenceToDatalist = datalist;
	}

	public void run()
	{
		quicksort(referenceToDatalist,start,end);
	}

	public void quicksort(int datatosort[], int lo, int hi) 
	{
		if (hi - lo < 2000000)
		{
			// there must be a border where creating threads costs more than it solves.
			if (lo < hi)
			{
				int partidx = partition(datatosort, lo, hi);
				quicksort(datatosort, lo, partidx);
				quicksort(datatosort, partidx + 1, hi);
			}
		}
		else
		{
			int partidx = partition(datatosort, lo, hi);
			Threadquicksort lowerhalf = new Threadquicksort();
			Threadquicksort upperhalf = new Threadquicksort();
			
			// tell them what to do
			lowerhalf.setupThread(datatosort, lo, partidx);
			upperhalf.setupThread(datatosort, partidx + 1, hi);
			
			// send them on their merry way
			lowerhalf.start();
			upperhalf.start();
			
			// wait for them to finish.
			try {
				lowerhalf.join();
			} catch (InterruptedException e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
			}
			try {
				upperhalf.join();
			} catch (InterruptedException e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
			}
		}
	}

	/*
	 * Hoare partition scheme
	 */
	public int  partition(int datatosort[], int lo, int hi)
	{
		int pivot = datatosort[lo];
		int i = lo - 1;
		int j = hi + 1;

		while (true)
		{
			do {
				i = i + 1;
			} while (datatosort[i] < pivot); 

			do {
			        j = j - 1; 
			} while (datatosort[j] > pivot);

			if (i >= j) 
				break;

			swap(datatosort,i,j); 
		}
		return j;
	}

	void swap( int datatosort[], int left, int right)
	{
		int temp = datatosort[left];
		datatosort[left] = datatosort[right];
		datatosort[right] = temp;
	}
}

 

Bubble sort 

package singlethread;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.ObjectInputStream;

public class Bubblesort 
{
	int datalist[] = {  12, 16, 333, 50, 1000, 5, 897, 1, 3, 66, 13 };

	public void bubblesort()
	{
		System.out.println("starting bubble sort for " + datalist.length + " items ");
		int totalswaps = 0;
		boolean swapped = false;
		
		do
		{
			swapped = false;
			for (int left = 0; left < datalist.length -1; left++) 
                        { 
                                int right = left + 1; if (datalist[left] > datalist[right])
				{
					int temp = datalist[left];
					datalist[left] = datalist[right];
					datalist[right] = temp;
					swapped = true;
					totalswaps++;
				}
			}	
		}
		while (swapped == true);

		System.out.println("total swaps " + totalswaps);
	}

	public void load()
	{
		String filename = "testdata.bin";
		ObjectInputStream inputStream = null;
		try {
			inputStream = new ObjectInputStream(new FileInputStream(filename));
		} catch (FileNotFoundException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		
		try {
			datalist = (int[])inputStream.readObject();
		} catch (ClassNotFoundException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		
		try {
			inputStream.close();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}

	public void print(boolean small)
	{
		String output = "";
		
		if (small == true)
		{
			for (int idx = 0; idx < datalist.length; idx++)
				output = output + datalist[idx] + " ";

			System.out.println(output);
		}
		else
		{
			for (int idx = 0; idx < datalist.length; idx++)
				System.out.println(datalist[idx]);
		}
	}

	public void test()
	{
		//Bubblesort sorter = new Bubblesort();

		load();
		
		long nanoStart = System.nanoTime();
		bubblesort();

		long nanoEnd = System.nanoTime();
		long nanoseconds = (nanoEnd - nanoStart);

		System.out.println("took " + nanoseconds + " nano seconds");
		System.out.println("took " + nanoseconds / 1000000 + " milli seconds");
		//print(false);
	}
	
	public static void main(String args[])
	{
		Bubblesort sorter = new Bubblesort();

		sorter.test();
	}
}

Java sort

package singlethreaded;
 
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Random;
 
public class Javasort {
 
       List datalist = new ArrayList();
      
       public void create(int len)
       {     
             Random rand = new Random();
            
             long nanoStart = System.nanoTime();
             for (int idx = 0; idx < len; idx++)
             {
                 // nextInt is normally exclusive of the top value,
                 // so add 1 to make it inclusive
                 int randomNum = rand.nextInt(10000001); // numbers 0 - 1000000
                 datalist.add(new Integer(randomNum));
 
                 //System.out.println(idx + ". " + randomNum);
             }
             long nanoEnd = System.nanoTime();
 
             long nanoseconds = (nanoEnd - nanoStart);
             System.out.println(nanoseconds);
       }
       public void sort()
       {
             Collections.sort(datalist);
       }
      
       public void print()
       {
             int i=0;
             for(Integer temp: datalist){
                    System.out.println("item " + ++i + " : " + temp);
             }
       }
      
       public static void main(String args[])
       {
             Javasort gendata = new Javasort();
             gendata.create(40000000);
            
             long nanoStart = System.nanoTime();
 
             gendata.sort();
 
             long nanoEnd = System.nanoTime();
             long nanoseconds = (nanoEnd - nanoStart);
 
             System.out.println("took " + nanoseconds + " nano seconds");
             System.out.println("took " + nanoseconds / 1000000 + " milli seconds");
       }
}

Generate sort data

package singlethreaded;
 
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.util.Random;
 
public class GenerateData {
 
                int datalist[] = null;
               
                public void create(int len)
                {             
                               datalist = new int[len];
                               Random rand = new Random();
                              
                               long nanoStart = System.nanoTime();
                               for (int idx = 0; idx < len; idx++)
                               {
                                   // nextInt is normally exclusive of the top value,
                                   // so add 1 to make it inclusive
                                   int randomNum = rand.nextInt(10000001); // numbers 0 - 1000000
                                   datalist[idx] = randomNum;
                                   //System.out.println(idx + ". " + randomNum);
                               }
                               long nanoEnd = System.nanoTime();
 
                               long nanoseconds = (nanoEnd - nanoStart);
                               System.out.println(nanoseconds);
                }
 
                public void save()
                {
                               String  filename = "testdata.bin";
                               ObjectOutputStream outputStream = null;
                              
                               try {
                                               outputStream = new ObjectOutputStream(new FileOutputStream(filename ));
                               } catch (FileNotFoundException e) {
                                               // TODO Auto-generated catch block
                                               e.printStackTrace();
                               } catch (IOException e) {
                                               // TODO Auto-generated catch block
                                               e.printStackTrace();
                               }
                              
                               try {
                                               outputStream.writeObject(datalist);
                               } catch (IOException e) {
                                               // TODO Auto-generated catch block
                                               e.printStackTrace();
                               }
                              
                               try {
                                               outputStream.close();
                               } catch (IOException e) {
                                               // TODO Auto-generated catch block
                                               e.printStackTrace();
                               }
                }
               
                public static void main(String args[])
                {
                               GenerateData gendata = new GenerateData();
                               gendata.create(40000000);
                               gendata.save();
                }
}

Ant build script

<project name="quicksort" default="world" basedir=".">

	<property file="project.properties"/>

	<property name="version.num" value="3.15.047"/>
	<property name="build.num" value="6"/>
	<property name="mainclass" value="singlethreaded.Quicksort"/>
	<property name="lib.dir" value="${basedir}/libs"/>
	<property name="src.dir" value="${basedir}/src"/>
	<property name="build.dir" value="${basedir}/bin"/>
	<property name="jar.dir" value="${basedir}/final"/>

	<path id="classpath">
		<fileset dir="${lib.dir}" includes="**/*.jar"/>
	</path>
 
	<target name="clean" description="remove anything we created">
		<echo>"cleaning ..."</echo>
		<delete dir="${class.dir}"/>
		<delete dir="${build.dir}"/>
	</target>

	<target name="compile" description="compile source code">
		<echo>"compiling ..."</echo>
		<mkdir dir="${build.dir}"/>
		<javac srcdir="${src.dir}" destdir="${build.dir}" classpathref="classpath" />
	</target>
 
	<target name="jarfile" depends="compile" description="generate jar file">
		<echo>"build jar ..."</echo>
		<mkdir dir="${jar.dir}"/>
 
		<tstamp>
			<format property="today.timestamp" pattern="yyyy-MM-dd HH:mm:ss" />
		</tstamp>
 
		<jar destfile="${jar.dir}/${jarname}.jar" basedir="${build.dir}">
			<manifest>
			<attribute name="Built-By" value="${user.name}" />
			<attribute name="Main-Class" value="${mainclass}" />
			<attribute name="Implementation-Version" value="${version.num} build #${build.num}"/>
			<attribute name="Built-Date" value="${today.timestamp}"/>
			</manifest>
		</jar>
	</target>
 
	<target name="world" depends="jarfile">
		<echo>"build script version 1.00"</echo>
	</target>
 
</project>
This entry was posted in programming and tagged , , . Bookmark the permalink.