domingo, 28 de septiembre de 2014

A Lossless codec for video editing in Premiere

This time i will dedicate this short entry to video editing in Premiere.

I think it's worth writing here if i ever forget how i solved the problem i had when trying to make videotutorials with Camtasia Studio (CamStudio).

When i tried to make the first video recording of my percolation solution (programming assignment week 1 of the Algorithms, part I course) i used several codecs i had available in the camstudio software to record the evolution of the program running.

It turned out that none of the codecs was Premiere friendly and the preview of the footage when importing to Premiere CS5 was failing. It just didn't look right. I tried to export in several video formats just to see if the problem disappeared but didn't.

Searching through the internet i found some people having the same problem and finally i ran into a codec that did the trick. It's the Lagarith Lossless Codec

I configured CamStudio to record using that codec, saved the file and immediately tried to play it using VLC. I thought the codec was wrong because VLC got stuck in the first frame and couldn't reproduce the whole video. Well, doesn't matter at all, let's try in Adobe Premiere... After importing the footage i hit the preview play button and voilà.. it worked!!.

So, if you are planning to make some videotutorials in your PC and want to edit it using Premiere, you can use CamSudio (which is a free audio and video recording software) with this relatively new free codec and you will have no problem.

There might be other good codecs for this purpose for sure, but this one is tested by me to just work fine.

viernes, 26 de septiembre de 2014

Algorithms, part I week 3 : Collinear points

This last week has been a little bit tougher because of the requisites the assignment had to match.
This week' s exercise has been also very interesting because of the timig tests. 

This time the challenge wasn't much about coding an algorithm that works but also that works in the most efficient way this is, not all the possible solutions were fast enough and hence did not pass all the timing tests.

This was the task of the week:

"Write a program to recognize line patterns in a given set of points.

Computer vision involves analyzing patterns in visual images and reconstructing the real-world objects that produced them. The process in often broken up into two phases: feature detection and pattern recognition. Feature detection involves selecting important features of the image; pattern recognition involves discovering patterns in the features. We will investigate a particularly clean pattern recognition problem involving points and line segments. This kind of pattern recognition arises in many other applications such as statistical data analysis."

The Problem
Given a set of N distinct points in the plane, draw every (maximal) line segment that connects a subset of 4 or more of the points.

"For full credit, do not print permutations of points on a line segment (e.g., if you output pqrs, do not also output either srqp or prqs). Also, for full credit in Fast.java, do not print or plot subsegments of a line segment containing 5 or more points (e.g., if you output pqrst, do not also output either pqst or qrst); you should print out such subsegments in Brute.java."

The solution

The algorithm I designed was as follows:

1. Sort all the points in the array
2.  Find a set of k>=4 collinear points
3. Check if already processed (this means that this set is a permutation of an earlier set printed)
4. if not, then draw the segment with a single call from the first to the last point in the segment.

The tasks the algorithm had to perform were well determined, now the problem arises in how to accomplish those points.

First Approach

My first implementation did pass all the timing tests 17/17 but didn't satisfy all the Correctness tests:
Compilation:  PASSED
Style:        FAILED
Findbugs:     No potential bugs found.
API:          PASSED

Correctness:  32/36 tests passed
Memory:       1/1 tests passed
Timing:       17/17 tests passed

Raw score: 92.78% [Correctness: 65%, Memory: 10%, Timing: 25%, Style: 0%]
which means that the algorithm doesnt deal with all the possible cases and thus it's wrong.

The problem was in the step3 check. I was checking if all k collinear points with p were present in an ArrayList<Point> if they were all then the segment has already been printed and drawn. This wouldnt work all the time because there may be k points already in the list but that belong to a different new segment.

Second Approach

Despite the good raw score of 92.78% i wasnt satisfied so in this next step i wrote again the algorithm using an ArrayList<ArrayList<Point>> were each index corresponds to a different line segment that holds the array of all the collinear points of the set.

This solution passed all the Correctness tests but now was failing in the timing tests:
Compilation:  PASSED
Style:        FAILED
Findbugs:     No potential bugs found.
API:          PASSED

Correctness:  36/36 tests passed
Memory:       1/1 tests passed
Timing:       4/17 tests passed

Raw score: 80.88% [Correctness: 65%, Memory: 10%, Timing: 25%, Style: 0%]

The solution was too slow and the step responsible for this was number 3 the check operation.

Final Solution

I was a bit stuck until i realised that the check step number 3 could be performed by a simple binarySearch adding only a complexity of Lg N.
The line segment has been processed only if we can find a point P0 that appears in the ordered array before the point P. In another way: a point P has collinear points p1, p2, p3... pk that all occur after the point P. If there is a point p0 whose index in the array is less than the one from the point P currently being processed.
This, and the use of an extra array of size N gave the following by the grader:

Compilation:  PASSED
Style:        FAILED
Findbugs:     No potential bugs found.
API:          PASSED

Correctness:  36/36 tests passed
Memory:       1/1 tests passed
Timing:       17/17 tests passed

Raw score: 100.00% [Correctness: 65%, Memory: 10%, Timing: 25%, Style: 0%]

I did code three different solutions until I came in with this one with time complexity order of O(N^2 LgN). N times two mergeSorts of NLgN.

There is still a more efficient algorithm with O(N^2) based on the topological sweep.


Algorithms, part I (Princeton University) Week 2: Queues

The goal of this assignment was to implement elementary data structures using arrays and linked lists, and to be introduced to generics and iterators.

Deque

double-ended queue or deque (pronounced "deck") is a generalization of a stack and a queue that supports inserting and removing items from either the front or the back of the data structure.

The deque implementation had to support each deque operation in constant worst-case time and use space proportional to the number of items currently in the deque. This lead to the use of a singly linked list as the basic datastructure. But the removeLast() operation at first caused a problem: the fact that we had to keep track of the node inmediately before the last one made this operation of time O(n) since I had to iterate from the first node to that one and then be able to get the new tail node.

So the solution was to use a doubly linked list. This way, i had access to the previous node of the last node in the list without having to iterate. Needless to say that a doubly linked list is a bit more complicated than a singly one!!

 Randomized Queue

randomized queue is similar to a stack or queue, except that the item removed is chosen uniformly at random from items in the data structure.

The randomized queue implementation had to support each randomized queue operation (besides creating an iterator) in constant amortized time and use space proportional to the number of items currently in the queue. In that case the use of a resizable array was the solution.

Also, the order of two or more iterators to the same randomized queue should be mutually independent; each iterator had to maintain its own random order.

The interesting thing about resizing the array is that it has been proved that the best practice is to double the size of the array when it gets full but only downsize the array by half the current size if the array is 1/4 full at most and not 1/2. This way we avoid upsizing and downsizing the array if we perform multiples consecutive enqueue() and dequeue() operations.

Bonus: Reservoir Sampling Algorithm

Now the interesting thing about this assignment was the possibility to discover and implement the reservoir sampling algorithm (http://en.wikipedia.org/wiki/Reservoir_sampling)

The requirement was as follows:
"Write a client program Subset.java that takes a command-line integer k; reads in a sequence of N strings from standard input using StdIn.readString(); and prints out exactly k of them, uniformly at random. Each item from the sequence can be printed out at most once. You may assume that 0 ≤ k ≤ N, where N is the number of string on standard input.
(For an extra challenge, use only one Deque or RandomizedQueue object of maximum size at most k.)"

The first approach would be to store each new n-th item and then select k elements but then we need a storage capacity of N. This algorithm solves that by using only k-items storage capacity.

This algorithm is useful when we need to get a random subset of fixed k elements from a set of unknown size N.

Finally the results i obtained by the grader:

Compilation:  PASSED
Style:        FAILED
Findbugs:     No potential bugs found.
API:          PASSED

Correctness:  35/35 tests passed
Memory:       50/49 tests passed
Timing:       24/24 tests passed

Raw score: 100.20% [Correctness: 65%, Memory: 10%, Timing: 25%, Style: 0%]


domingo, 21 de septiembre de 2014

Coursera MOOC Algorithms, part I (Princeton University) week I

WEEK I Percolation
It's been two weeks since i began a MOOC in coursera.org entitled "Algorithms, part I" and I'm glad to post here the first programming assignment i had to do with the following score:

Compilation:  PASSED
Style:        FAILED
Findbugs:     No potential bugs found.
API:          PASSED

Correctness:  24/24 tests passed
Memory:       8/8 tests passed
Timing:       9/9 tests passed

Raw score: 100.00% [Correctness: 65%, Memory: 10%, Timing: 25%, Style: 0%]

In order to accomplish the assignment we had to use an algorithm called "union-find", more precisely an improvement of this algorithm that we previously studied through the lectures.

The goal of the problem was to calculate the Percolation Threshold of a system using Monte Carlo simulation. This threshold is known to be around 0.5927 for square sites and cannot be analytically determined.

We repeated T times the following experiment and then calculated the mean:

We set a grid of NxN sites and we randomly open sites til there is a path connecting the top row with the bottom row, when this happens the system is said to "percolate". We then divide the number of opened sites by the number of total sites in the grid (NxN) this gives us a value for the threshold.

The evaluation of this threshold is very important because it determines the minimum percentage of sites that need to be opened for us to expect a system that percolates.

BACKWASH Problem

This is a problem that arose from the first implementations when using a 2d visualizer class provided by the staff of the course and is due to the fact that i was using a virtual bottom node that caused some bottom sites to be full when there was no path connecting them to top row sites.

In the following video you can see the final result with the backwash problem solved: