sábado, 25 de octubre de 2014

Algorithms, part I Sumup

Now that the course is over, this a summary of the algorithms' performance seen in the lectures.

1.a.Quick-Union Algorithm

M union-find operations on N objects
QF M*N
QU M*N
Weighted QU N + M*lg(N)
QU + path compression N + M*lg(N)
Weighted QU + path compression N + M*lg*(N)

Where lg* is the iterative 2-base logarithm:

N lg*(N)
1 0
2 1
4 2
16 3
65536 4
pow(2, 65536) 5

1.b. Algorithm theory and Memory

We can assume the performance in time of an algorithm follows a linear rule on logarithms:

T(N) = a * pow(N, b) and lg(T(N)) = b * lg(N) + c

where b is the slope.

b represents system independent effects such as algorithm design and data input.
a represents system dependent effects like hardware, software, systems.

The 32-bits java data model specifies the following amount of memory distributed on:



data type bytes
boolean 1
byte 1
char 2
int 4
float 4
long 8
double 8
char[] 2N + 24
int[] 4N + 24
double[] 8N + 24
object overhead 16
if inner class 8
reference 8
padding multiple of 8


2. Sorting algorithms


sorting inplace? stable? worst average best remarks
selection yes no pow(N,2)/2 pow(N,2)/2 pow(N,2)/2 N exchanges
insertion yes yes pow(N,2)/2 pow(N,2)/4 N small N or partially ordered arrays
shell (3x+1) yes no ? ? N subquadratic, tight code
merge no yes NlgN NlgN NlgN NlgN guarantee, stable
quick yes no pow(N,2)/2 2NlgN NlgN faster in practice
3-way quick yes no NlgN 2NlgN N improves quicksort in presence of duplicate keys
heapsort yes no 2NlgN 2NlgN 2NlgN poor use of cache memory, inner loop longer than quicksort,not stable
????? yes yes NlgN NlgN NlgN holy sorting grail

3. Priority Queue and binary heap

Find largest M items in a steam of N items

implementation time space
sort NlgN N
elementary PQ MN M
binary heap NlgM M
best in theory N M

Any representation of a heap-ordered complete binary tree where keys in nodes and parent key no smaller than children's keys.


implementation insert del max max
unordered array 1 N N
ordered array N 1 1
binary heap lgN lgN 1


4. Symbol Tables

worst case average
implementation search insert delete search insert delete key interface
sequential search (unordered list) N N N N/2 N N/2 equals
binary search (ordered array) lgN N N lgN N/2 N/2 compareTo
BST N N N 1.39lgN 1.39lgN ? compareTo
2-3 tree clgN clgN clgN lgN clgN clgN compareTo
red-black BST 2lgN 2lgN 2lgN lgN lgN lgN compareTo


5. Hash Tables

worst case average
implementation search insert delete search insert delete key interface
separate chaining lgN* lgN* lgN* 3-5* 3-5* 3-5* equals
linear probing lgN* lgN* lgN* 3-5* 3-5* 3-5* equals
(*) under uniform hashing assumption

domingo, 12 de octubre de 2014

Algortihms, part I, Week 5: Kd-Tree

As it's become usual here is the task of the week:

"Write a data type to represent a set of points in the unit square (all points have x- and y-coordinates between 0 and 1) using a 2d-tree to support efficient range search (find all of the points contained in a query rectangle) and nearest neighbor search (find a closest point to a query point). 2d-trees have numerous applications, ranging from classifying astronomical objects to computer animation to speeding up neural networks to mining data to image retrieval."



Search & Insert Operations
The first step is to be able to build the kd-tree by inserting points to the corresponding node. This was easy because we implemented two Comparator<Point2D> classes one for the X_ORDER and another for the Y_ORDER and store a pointer to the corresponding comparator static object at each node (recall that we each node subdivides the space vertically or horizontally alternatively).

We wanted to get something like:


Now the one tricky point here is that before inserting a new node we need to check if the tree already contains that point. If we are careless we can find out that for a single insert opertion we are performing two searches one for the proper insert operation and another for the contains check. 

So one first optimization was to take advantage of the insert walk-down-tree operation to check at each node if the point is already in the tree.

Range Search & Nearest Neighbor Search
"The prime advantage of a 2d-tree over a BST is that it supports efficient implementation of range search and nearest neighbor search. Each node corresponds to an axis-aligned rectangle in the unit square, which encloses all of the points in its subtree. The root corresponds to the unit square; the left and right children of the root corresponds to the two rectangles split by the x-coordinate of the point at the root; and so forth."

As it always occur here in the programming assignments, it's not enough to code a working solution but it has to do it the faster it can.

Now Range Search didn't suppose any troubly troubles, but the Nearest Neighbor Search did!!
The NNS was a recursive function that had to perform as follows (from http://en.wikipedia.org/wiki/K-d_tree )


It took me a while since i hit the correct logic the function yet very simple though. 
Now i was passing the timing tests but not 3 of the Correctness tests the ones related to nearest. It seemed my nearest function wasn't returning the exact nearest neighbor in some special cases. So i started debugging the circle10.txt BST and see if i could reproduce the same error, and hell if i couldn't!!

Finally i realised that the problem was in an if-statement inside my nearest function, changed it, and solved it!!

Here are the results:
Compilation:  PASSED
Style:        FAILED
Findbugs:     No potential bugs found.
API:          PASSED

Correctness:  19/19 tests passed
Memory:       8/8 tests passed
Timing:       44/44 tests passed

Aggregate score: 100.00% [Correctness: 65%, Memory: 10%, Timing: 25%, Style: 0%]

And here is the video i did of my working solution:


And this is the end of the course in what concerns the programming assignments!!

Next week is the last one with only lectures and a no-coding exam, wish me luck!!


viernes, 3 de octubre de 2014

Algorithms, part I, Week 4: 8-Puzzle

The Task of the Week:

Write a program to solve the 8-puzzle problem (and its natural generalizations) using the A* search algorithm.
The problem. The 8-puzzle problem is a puzzle invented and popularized by Noyes Palmer Chapman in the 1870s. It is played on a 3-by-3 grid with 8 square blocks labeled 1 through 8 and a blank square. Your goal is to rearrange the blocks so that they are in order, using as few moves as possible. You are permitted to slide blocks horizontally or vertically into the blank square. The following shows a sequence of legal moves from an initial board (left) to the goal board (right).
1 3 1 3 1 2 3 1 2 3 1 2 3 4 2 5 => 4 2 5 => 4 5 => 4 5 => 4 5 6 7 8 6 7 8 6 7 8 6 7 8 6 7 8 initial 1 left 2 up 5 left goal
Best-first search. Now, we describe a solution to the problem that illustrates a general artificial intelligence methodology known as the A* search algorithm. We define a search node of the game to be a board, the number of moves made to reach the board, and the previous search node. First, insert the initial search node (the initial board, 0 moves, and a null previous search node) into a priority queue. Then, delete from the priority queue the search node with the minimum priority, and insert onto the priority queue all neighboring search nodes (those that can be reached in one move from the dequeued search node). Repeat this procedure until the search node dequeued corresponds to a goal board. The success of this approach hinges on the choice of priority function for a search node. We consider two priority functions:
  • Hamming priority function. The number of blocks in the wrong position, plus the number of moves made so far to get to the search node. Intutively, a search node with a small number of blocks in the wrong position is close to the goal, and we prefer a search node that have been reached using a small number of moves.
  • Manhattan priority function. The sum of the Manhattan distances (sum of the vertical and horizontal distance) from the blocks to their goal positions, plus the number of moves made so far to get to the search node.
For example, the Hamming and Manhattan priorities of the initial search node below are 5 and 10, respectively.
8 1 3 1 2 3 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 4 2 4 5 6 ---------------------- ---------------------- 7 6 5 7 8 1 1 0 0 1 1 0 1 1 2 0 0 2 2 0 3 initial goal Hamming = 5 + 0 Manhattan = 10 + 0
First Approach
Compilation:  PASSED
Style:        FAILED
Findbugs:     No Potential bugs found.
API:          PASSED

Correctness:  16/25 tests passed
Memory:       8/8 tests passed
Timing:       0/17 tests passed

Raw score: 51.60% [Correctness: 65%, Memory: 10%, Timing: 25%, Style: 0%]

For this first attempt, i changed the 2d array of int from the API to a 1D array of char which occupies at least half as much memory and is twice faster (2 memory accesses in the 2d array for 1 access for the 1D array). This improved notably the time spent to solve the less harder puzzles.

Another improvement i made was to implement the isSolvable() method not by putting a twin board in the MinPQ with the initial one, but rather analyzing the properties of the initial board itself following the rules of the Parity Check hereby:

  1. If the grid width is odd, then the number of inversions in a solvable situation is even.
  2. If the grid width is even, and the blank is on an even row counting from the bottom (second-last, fourth-last etc), then the number of inversions in a solvable situation is odd.
  3. If the grid width is even, and the blank is on an odd row counting from the bottom (last, third-last, fifth-last etc) then the number of inversions in a solvable situation is even.

This allowed to leverage the amount of items in the Priority Queue or even having two Priority Queues working at the same time, one resolving for the initial board and the other resolving the twin() board. If the twin board was solved then we knew the initial board was not solvable.

So whether using one or two Priority Queues, this improvement avoids having to solve first one of the two cases to get to know if a given board is solvable.

The final improvement was to cache the manhattan distance for each board. It was calculated in the constructor and for the following calls to the method int manhattan() return the value

Another problem i figured out was when checking if a neighbor was already in the closedSet

I was doing something like:

closedSet.contains(neighbor);

When neighbor is a SearchNode. I was comparing references instead of the board object!!! thus no calls to board.equals() method. So i changed it to:

closedSet.contains(neighbor.board);

Second Approach


Compilation:  PASSED
Style:        FAILED
Findbugs:     No Potential bugs found.
API:          PASSED

Correctness:  25/25 tests passed
Memory:       8/8 tests passed
Timing:       12/17 tests passed

Raw score: 92.65% [Correctness: 65%, Memory: 10%, Timing: 25%, Style: 0%]

Almost got there!! Finally, i realized i could get rid of the ClosedSet which in my case was an ArrayList<SearchNode> and thus avoiding to check against a huge list if one of the four possible neighbors was already there.

I coded the "critical optimization" as said in the checklist and it worked:

Compilation:  PASSED
Style:        FAILED
Findbugs:     No Potential bugs found.
API:          PASSED

Correctness:  25/25 tests passed
Memory:       8/8 tests passed
Timing:       17/17 tests passed

Raw score: 100.00% [Correctness: 65%, Memory: 10%, Timing: 25%, Style: 0%]