Keith and Dawg 5 Solution

By: Jason Yang

Flag: JOHN HERSCHEL GLENN JR

Introduction

This program is a GPU based parallel processing program using OpenCL and Java bindings using LWJGL 21. In essence, the program reads a string (the input password), and sends the entire string, as an array of integers, to the GPU. The GPU then takes each integer (representing a character), performs operations on it as outlined in the kernel program, and outputs an integer. After every instance of the kernel finishes executing, the output integer array is then retrieved from memory, converted back into a string, and compared against the key. The correct flag, when inputted, will match the key and a success message is displayed.

Details

Luckily for you, the source is provided (you’re welcome). Some helpful debugging stuff is also occasionally outputted by the program for you (you’re welcome again).

We begin by initializing the program and calling the run() function. run() calls the initializeCL() function, which initializes the OpenCL context necessary for running the program, such as choosing the GPU platform and device to run (that’s basically it). The program loops through the available platforms and prompts for an integer, and does the same for available devices. Most computers have only one GPU, so entering 0 when prompted should suffice (or it’ll cause your computer to crash and restart :((( ).

Next, loadPass() is called, which simply loads numbers from a file, pass.key into the global variable array. Returning to run(), the next thing it does is to compile the kernel, found in kernel.cl. Compiling and creating the kernel is done through clCreateProgramWithSource(), clBuildProgram(), and clCreateKernel(). Moving on, the integer size is essentially the number of kernels to run simultaneously (this is parallel processing after all), and is equal to the number of characters in the string to be checked against.

The user is now prompted for the password. Pretty simple.

Arrays of integers (well technically they’re float arrays, but they store integers and it doesn’t really matter that much) are created for two sets of data: the password the user inputs and the numbers read from pass.key. For the input password, each character is cast to an integer. For the array of numbers, the float array just stores that number.

Next up, these newly created arrays are stored into locations in memory; the actual location of the memory, such as a dGPU’s VRAM or system memory depends on the OpenCL context and the platform/device specified (see above). The assignment and allocation of memory for GPUs is very complicated; basically, buffers are magic. clCreateBuffer() creates the buffer, with context, certain read/write properties, and of course the actual data to put in the buffer, which is the array we created. This is done for both the input password and the array of numbers from pass.key. In addition, a buffer called resultMemory is also created, which is where the output from the kernels will be stored.

The kernel.setArg() function sets the arguments for the kernel. In essence, the kernel is a single function written in OpenCl C (which is based on C) and is executed by the GPU. This kernel function takes four arguments in, and the kernel.setArg() function called CPU-side instructs the GPU on where to look for each argument.

It’s time for some basic GPU stuff. A work item is basically an instance of the kernel. Work items are run simultaneously. A work group of work items can contain N dimensions (like an array). Because the string we’re checking is linear, we only need a single dimension, so the work group is 1 by size, which is the length of the string to be checked against (see above). A buffer of size 1 is created, and size is put into the buffer, representing a 1 by size work group.

Time to run the program GPU-side. clEnqueueNDRangeKernel() is called, which runs the N-dimensional (in this case one dimensional) kernel.

Moving over to the GPU now2. The kernel function is declared with kernel void ctf(), with arguments of the array constructed from the input password (a), the array from the bunch of numbers (b), the result arrayresult, and the size of the array (equal to the size of the work group, and the length of the string to be checked against). First, the index of the work-item must be obtained to get actual numbers from the arrays. This is done with get_global_id() into the integer itemId.

Now for the fun part (hope you paid attention in math class). A float4 is a 4-component vector of floats (vectors are very common in GPU side programs, like in graphics). Such a vector has components w, x, y, and z. Two float4’s are created, p and q, and are defined as follows:

p = <4, 0, a, 3 >
q = <8, b, -6, 7 >

where a and b are a[itemId] and b[itemId], respectively.

Next, two float8’s are created. Like you may have guessed, this is an 8-component vector of floats. Now for some swizzling (yes that’s what it’s called). OpenCL C has an interesting and extremely useful function called swizzling where:

float4 A = B.wxyz;

is the same as:

float4 A = (float4)(B.x, B.y, B.z, B.w);

or in pseudocode:

A = < B.x, B.y, B.z, B.w >

This can be used for vectors of any size, and the order of the components can be switched around. So:

m = p.xwxyzyzy
  = < p.x, p.w, p.x, p.y, p.z, p.y, p.z, p.y >
  = <4, 3, 4, 0, a, 0, a, 0 >

n = q.zyzwxyzw
  = < q.z, q.y, q.z, q.w, q.x, q.y, q.z, q.w >
  = <-6, b, -6, 7, 8, b, -6, 7 >

Another handy function of OpenCL C is that A.even takes the even components of the vector, that is, the 0th, 2nd, 4th, 6th, etc. components, and that A.odd takes the odd components of the vector, that is, the 1st, 3rd, 5th, 7th, etc. components. Furthermore, A.lo takes the lower half of a vector, in this case components 0 through 3, and A.hi takes the upper half of a vector, in this case components 4 through 7 (it’s an 8-component vector).

The line:

float s = dot(m.even, n.lo);

takes the dot product of the vectors m.even and n.lo:

s = < 4, 4, a, a > · < -6, b, -6, 7 >
  = 4 * -6 + 4 * b + a * -6 + a * 7
  = a + 4b - 24

The line:

float t = dot(m.odd, n.hi);

Takes the dot product of the vectors m.odd and n.hi:

t = < 3, 0, 0, 0 > · < 8, b, -6, 7 >
  = 3 * 8 + 0 * b + 0 * -6 + 0 * 7
  = 24

Now, the output is calculated as:

result = s + t
       = a + 4b - 24 + 24
       = a + 4b

and this number is put into the array result. In essence, this program, when simplified, takes the number of the character inputted and adds 4 times the number from pass.key. Whew.

Back to the Java program. After the GPU finishes, the results from the program are read into a FloatBuffer through clEnqueueReadBuffer(). The numbers from the FloatBuffer are then cast back to a string, concatenated, and the resulting string compared against the string check. If they are the same, then good job!


Now, to solve the problem, this is probably the most efficient way.

First, inkernel.cl, replace

result[itemId] = s + t;

with

result[itemId] = a[itemId] - 4 * b[itemId];

which will reverse the expression a + 4b the we found earlier. On the Java side of things, add a

System.out.println(s);

after the string s is concatenated from the numbers in resultBuff. Now, run the program, and enter as the password when prompted:

NgLn TQvscdUp@k\e^n(Jb

which is the string check, with a missing backslash since in the string the \\ is the escape sequence for a single backslash.

If all goes well, the program well output the flag:

JOHN HERSCHEL GLENN JR

Done!

1. The source code does not come packaged with LWJGL 2, so in order to run the program, LWJGL 2 can be downloaded here: http://legacy.lwjgl.org/. For ease of use and understanding, LWJGL 2 is used rather than the more recent LWJGL 3. LWJGL 2 is Copyright (c) 2002-2008 Lightweight Java Game Library Project.
2. If you're really struggling, the OpenCL 1.0 specification can be found here: https://www.khronos.org/registry/OpenCL/specs/opencl-1.0.pdf.

results matching ""

    No results matching ""