Saturday, July 21, 2012

Unlock the Supercomputer Hidden in Your PC

Everyone knows computers have been getting faster and have been getting more cores. In 2012, a consumer can buy a CPU with eight cores. However, hidden away in your computer, there may exist dozens or even hundreds of cores you're probably not even using to their fullest potential. They can exist in your Graphics Processing Unit.

In fact, within my laptop, there are fifty cores divided between my CPU and both of its GPUs. Keep in mind my laptop was made in 2009, so more recent computers may have even more, especially if its a desktop or a computer made for gaming.

With all these cores: there is the potential for massive performance boosts in the software we use everyday.

How do you Compare Performance?

Performance can be measured in FLOPSFloating Point Operations Per Second, it's a type of metric used in high performance computing. A floating point number is a decimal number, in binary form, used commonly in scientific and engineering simulations, which utilize a lot of floating point numbers. FLOPS are a very broad metic describing how many times you can manipulate floating point numbers in a second. Operations like comparing two numbers or adding, multiplying, etc. 


GPU vs. CPU
Relative compute performance in relation to size
Because of the increased cores, GPUs can do more FLOPS than a CPU can. In many cases, a good GPU is an order of magnitude faster than a good CPU. That means that while a good CPU might be able to pull a few dozen to about a hundred GigaFLOPS, a good GPU could, theoretically, handle  TeraFLOPS of compute workloads. 

Practical Test

The best GPUs have thousands of cores in them. The Radeon HD 7970 has 2048 programable cores. In contrast: my laptop's best GPU, the GeForce 9600M GT, has just 32 cores. Even still, It's plenty to show off the power of GPUs. 

For a test, I used OpenCL, a parallel programming language that can be used to program CPUs and GPUs alike. I wrote an OpenCL program to compute matrix dot products, between matrices of varying sizes. Computing Matrix dot products are a good way to test performance because they require many computations. Furthermore, they're used in a lot of scientific and graphics calculations. To summarize: in the test, I give the different devices, on my computer, a giant work load, to see how long it takes for them all to finish it.

You can download the source code I made for the test. It is free software, you can use it in your own projects.

Results

Running each of the three OpenCL devices, on my laptop, to compute the dot product between matrices varying from 16x16 to 1024x1024 in size revealed the relative runtimes of each device. 



The red plot is my control, it is a naive, single threaded, implementation of a matrix dot product solver. Unsurprisingly, it took the most time. The violet plot, is the amount of time it took both cores of my CPU to compute the different sized matrices, using my OpenCL code. This was much faster, taking less than half the time. The other two lines, if you can see them, are squished along the x-axis. Both my GPUs took almost no time to compute the matrix dot product. To illustrate this more clearly, I present the last three lines of the outputted data.

Runtimes of computing dot products between nxn matrices on different OpenCL devices
n Single Threaded GeForce 9600M GT GeForce 9400M Core 2 Duo T9600
992 11416.339 ms 22.507 ms 29.262 ms 4256.869 ms
1008 12232.509 ms 23.188 ms 30.678 ms 4754.069 ms
1024 12251.256 ms 24.979 ms 30.846 ms 4464.266 ms

As you can see, while both cores of the CPU combined took nearly 5 seconds to compute a 1024x1024 matrix, A GPU could do it in 30 milliseconds.

In other words, what takes a CPU several seconds, a GPU can do in the blink of an eye.

Limitations

If GPUs are so fast, why haven't they replaced CPUs? The answer is: they're not fast all the time. A matrix dot product is an ideal problem for a GPU because it's a fine grained parallel problem. A Fine grained parallel problem is a problem that can be divided into many small, identical pieces. Such a problem can be easily divided across many cores. Not all problems are like that though. Many problems  have data dependencies between pieces, need to have pieces be solved one at a time, or can't be broken down at all. A GPU can't handel problems like that, but CPUs are exceedingly good at solving them. 

Sunday, July 15, 2012

Home Security Camera

This week I've set my sights on creating a home security camera that can broadcast a live video stream over the Internet!

Setup


I started out with an AMD, 64-bit based, Desktop PC with GNU/Linux Debian Wheezy Testing installed.

I didn't have a webcam, but I did have a MiniDV camcorder that hooked in through the FireWire port, which my computer happened to have a few of.


Lastly, because I was sort of far away from my router, I set up a wireless connection using a USB WiFi adaptor. Driver installation wasn't too bad and my adaptor was compatible with Linux!

Script

To run the transmit script: I made sure to have a full install of GStreamer through my package manager. The script its self was pretty simple and simply took input from the camera, compressed it as a series of JPEG images and sent it out as a TCP stream over port 6000. The IP address my router assigned my streaming computer was 192.168.1.105, this address will vary based on when the router registers a PC. 
#!/bin/bash
gst-launch-0.10 -v dv1394src ! dvdemux ! dvdec ! ffmpegcolorspace ! videoscale ! videorate ! video/x-raw-yuv, height=240, width=427, framerate=3/1 ! jpegenc ! multipartmux ! tcpserversink host=192.168.1.105 port=6000
I stored this script in the file: network-stream.sh and ran the script, in the terminal, using sh network-stream.sh, once I navigated to the directory I stored it in.

Network

Receiving the video over the Local Area Network is very easy. All I really have to do is open VLC on a computer, on the local network, and then type in tcp://192.168.1.105:6000. 

To receive the stream over the internet is a more involved process which requires setting up port forwarding to allow access, for this application, to the Wide Area Network. 

My router's IP address, on the network, is 192.168.1.1. Different routers will have different IP addresses. On the administration page, I set up port forwarding along port 6000 for the TCP protocol. All routers are different, but most will support doing this in one way or another.


Don't forget to set up port forwarding on the modem as well. The process should be similar to setup on the router. Again, you will need to find the network IP address for your modem. Mine was 192.168.0.1. Like with the router, IP address will vary by modem.

Once you have that set up, a user can access your camera through the Internet, all they'll need is the IP address to your house. Unfortunately, most ISPs dynamically assign IP addresses to their customers, the IP address you have will change. If that's the case, you will not be able to access your camera.

The solution to this comes from DNS forwarding services. Basically they can give you an internet domain like my-domain.com. The domain is an alias for your IP address. They will give you a program to run on your computer that updates the IP address your domain is associated with, every time it changes.

In 2012, a decent free DNS forwarding service I've come to use is no-ip.com they haven't yet forced their users to pay for their basic service or only provide a limited time trial. So their service is a good one to use to experiment with servers and web technology if you're not sure you want to invest much money yet.

Testing

Once the DNS forwarding service, network port forwarding and security camera server have been set up: an Internet stream is conceivable. 

If you go into VLC, you can open up the network stream in a very similar maner to opening it up on the LAN, this time you just change the address in VLC: tcp://my-domain.no-ip.org:6000.

You should see the output of the security camera. 


My server has to really crank along to keep up with producing the stream:


The CPU doesn't have too many clock cycles left and the network output, going out at about 100 KiB/s is about as fast as my internet can take. 

Future Work

There are a number of issues with this streaming setup. The most obvious one is the slow frame rate of 3 frames per second. This is a consequence of sending series of JPEG images over the internet, they take about 100 KiB/s to stream. On my internet this is about the upload limit. Faster internet connections can provide a higher frame rate because they can provide more data throughout. 

Another issue is the quality. 240p is barely acceptable. A more appropriate level of definition would be 360p or 480p. 

Both these issues stem from the fact that I'm streaming a JPEG sequence over TCP. First of all, JPEG sequences don't do compression between frames, each frame is individually packaged. This really makes the stream a lot larger that it has to be, especially if the camera is sitting still. A better solution would be to broadcast an actual video stream using codecs like H.264 or VP8. These can provide higher quality and frame rates. I just need to find out how to directly stream these types of formats directly to VLC from GStreamer over the Internet.

Another problem is the use of TCP. TCP is not ideal for live streams because it scales how fast it transmits over the lines of the Internet and guarantees packet transmission. 100% packet transmission is not required for this application. Although TCP is great for transmitting static media over the internet, UDP is better for streaming media because it's designed around sending packets in real time as opposed guaranteeing 100% packet transmission. With TCP, if there's not enough bandwidth for the stream, the stream will become increasingly delayed, with UDP if there isn't enough bandwidth for the stream, packets will be dropped, the image quality will look worse, but it will be in real time. The only problem with UDP is that it doesn't scale like TCP does. ISPs can't make UDP take less bandwidth, like TCP, because all it does is throw out packets. As a result, some ISPs do not allow home internet users the ability to broadcast UDP streams. They can't control it. So in this, initial attempt, I used TCP, even though I'd like to have a UDP stream working. 

Tuesday, July 3, 2012

Creating a Particle Collision Simulator

I've decided to create a particle collision simulator durring this summer as a pet coding project to hone my C programming skills.

My hope is to, eventually, have a system that can simulate thermo-dynamic systems ranging from airfoils to sterling engines. But first I need to get the basics down and be able to detect particle collisions and recalculate particle velocities and positions.

Below: a description of the theoretical underpinnings of particle collision detection.

You can check up on the project at the google code page I created.