Announcement

Collapse
No announcement yet.

Art of Huffman PI Coding

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Art of Huffman PI Coding

    Art of Huffman PI Coding by DonCaffeDeLaSkobalj !

    Huffman PI coding is a lossless PI signal acuaring and analyse algorithm. The main idea is to assign variable-length codes to different input signals including ground noise signals, emi signals etc., etc., where the lengths of the assigned signals are based on the frequencies of their apperiance after ADC implementation procesdure. The most frequent input signals (like environment or groundsignals) gets the smallest code weight coding, while the
    least frequent signals (like golden coins, gets the largest code weight.

    Due to the well known fact that the normal accuired signals in noise backgrounds like heavy mineralised soils keep the law of uniform Gaussian distribution or keep the law of central limit theorem. In its most general form, under normal conditions (naturally finite signals variance), averages of input random samples of ground signals variables independently drawn from independent gaussian distributions converge in distribution to the normal, that is, they normally become distributed when the number of aquired samples is sufficiently large.

    Physical quantities of random ground signals that are expected to be the sum of many independent processes (but always finite for given time frame) have distributions that are nearly normal.

    In high tech prospecting art, many results and methods like are propagation of signals uncertainty and least squares parameter fitting, can be analytically derived in explicit form when the random variable sampled signals are normally distributed. The normal distribution is also called the bell curve. However, many other similar distributions have also bell-shaped form lik Cauchy, Student's, and logistic distributions.

    In Huffman PI coding the variable-length codes were assigned to random input signals like Prefix Codes, means the codes (random sequences) are assigned in such a way that the code assigned to one input signal is not prefix of code assigned to any other input signal. This is how Huffman Coding makes that there is no ambiguity when decoding random input signals stream.

    Let us understand how prefix codes work in Huffman PI with a small counter example.

    Suppose there are four input Pi signals after preamplification and sampling and let's mark them like signal a, b, c and d, and let's mark their corresponding variable length like codes 00, 01, 0 and 1 (from a to d). In this simple example such coding leads to ambiguity because code assigned to signal c is prefix of codes assigned to a and b. If the compressed code stream is 0001, than the decompressed output may be any of “cccd” or “ccb” or “acd” or “ab”.

    In Huffman PI Coding there are mainly two major parts

    P1. Make a build of Huffman PI Tree from input signals.
    P2. Traverse the Huffman PI Tree and assign codes to input.


    Simple steps how to build Huffman PI Tree:
    Input is array of unique signals along with their frequency of occurrences and output is Huffman PI Tree.
    a. Let's create a leaf node for each unique signal and build a min heap of all leaf nodes. Here min heap is used as a priority queue. The value of frequency table is used to compare two nodes in min heap. Initially, the least frequent signal is at root of Huffman PI Tree.

    Now we would extract two nodes with the minimum occurrence frequency from the min heap. Then we would create a new internal node with occurrence frequency equal to the sum of the two nodes frequencies. Now we would make the first extracted node as its left child and the other extracted node as its right child. Than we would add this node to the min heap. Then we would repeat these steps until the heap contains only one node.
    This remaining node is the root node and the tree is complete.

    Let us understand the algorithm with an example:

    Input Signal Frequency of occurrences
    a 5
    b 9
    c 12
    d 13
    e 16
    f 45

    First we need to build a min heap that contains 6 nodes where each node represents root of a Huffman PI Tree with single node.
    Than we would extract two minimum frequency nodes from min heap. And add a new internal node with frequency 5 + 9 = 14.

    Now our min heap contains 5 nodes where 4 nodes are roots of trees with single element each, and one heap node is root of tree with 3 elements

    Input Signal Frequency of occurrences
    c 12
    d 13
    Internal Node =14
    e 16
    f 45

    Than we should extract two minimum frequency nodes from the heap and add a new internal node with frequency 12 + 13 = 25.

    Now our min heap contains 4 nodes, where 2 nodes are roots of trees with single element each, and two heap nodes are root of tree with more than one nodes.

    Input Signal Frequency of occurrences
    Internal Node 14
    e 16
    Internal Node =25
    f 45

    Next step would be extraction of two minimum frequency nodes. So we add a new internal node with frequency 14 + 16 = 30.

    Now our min heap contains 3 nodes:
    Input Signal Frequency of occurrences
    Internal Node 25
    Internal Node = 30
    f 45

    Then we woulkd extract two minimum frequency nodes and add a new internal node with frequency 25 + 30 = 55

    Now our min heap contains 2 nodes.
    Input Signal Frequency of occurrences
    f 45
    Internal Node =55

    Last step is extraction of two minimum frequency nodes. Add a new internal node with frequency 45 + 55 = 100. Now we got min heap which contains only one node.

    Input Signal Frequency of occurrences
    Internal Node 100

    Since our heap contains only one remaining node, the Huffman PI algorithm stops here.

    Now we would decode codes from Huffman PI Tree. We would traverse the formed tree, starting from the root, while maintain an auxiliary code array. While moving to the left child we would assign 0 to the array. While we are moving to the right child, we would assign 1 to the code array. Then decode array till the end when a leaf node is encountered.

    The decoded codes are now:

    Signal Code weight
    f 0
    c 100
    d 101
    a 1100
    b 1101
    e 111


    Now we would write small C-program for Huffman PI Coding

    #include <stdio.h>
    #include <stdlib.h>

    #define MAX_TREE_HT 100

    // Huffman PI tree node

    struct MinHeapNode {

    // One of the input signals
    signal data;

    // Frequency of the occurrenced signal
    unsigned freq;

    // Let's make left and right child of this node
    struct MinHeapNode *left, *right;
    };

    // Let's make min heap - collection of Hufmman PI Tree nodes
    struct MinHeap {

    // Size of min heap
    unsigned size;

    // capacity of min heap
    unsigned capacity;

    // minheap node pointers array
    struct MinHeapNode** array;
    };

    // This small function would allocate a new min heap node with given signal and his occurrence frequency
    struct MinHeapNode* newNode(signal data, unsigned freq)
    {
    struct MinHeapNode* temp
    = (struct MinHeapNode*)malloc
    (sizeof(struct MinHeapNode));

    temp->left = temp->right = NULL;
    temp->data = data;
    temp->freq = freq;

    return temp;
    }

    // This function would create min heap of given capacity
    struct MinHeap* createMinHeap(unsigned capacity)

    {
    struct MinHeap* minHeap
    = (struct MinHeap*)malloc(sizeof(struct MinHeap));

    // current size is 0
    minHeap->size = 0;

    minHeap->capacity = capacity;

    minHeap->array
    = (struct MinHeapNode**)malloc(minHeap->
    capacity * sizeof(struct MinHeapNode*));
    return minHeap;
    }

    // Small swap min heap nodes function

    void swapMinHeapNode(struct MinHeapNode** a,
    struct MinHeapNode** b)

    {

    struct MinHeapNode* t = *a;
    *a = *b;
    *b = t;
    }

    // The normal min heap function
    void minHeapify(struct MinHeap* minHeap, int idx)

    {
    int smallest = idx;
    int left = 2 * idx + 1;
    int right = 2 * idx + 2;

    if (left < minHeap->size && minHeap->array[left]->
    freq < minHeap->array[smallest]->freq)
    smallest = left;

    if (right < minHeap->size && minHeap->array[right]->
    freq < minHeap->array[smallest]->freq)
    smallest = right;

    if (smallest != idx) {
    swapMinHeapNode(&minHeap->array[smallest],
    &minHeap->array[idx]);
    minHeapify(minHeap, smallest);
    }
    }

    // Check size of heap function
    int isSizeOne(struct MinHeap* minHeap)
    {

    return (minHeap->size == 1);
    }

    // Small extract function for extraction of min. value node from heap
    struct MinHeapNode* extractMin(struct MinHeap* minHeap)

    {

    struct MinHeapNode* temp = minHeap->array[0];
    minHeap->array[0]
    = minHeap->array[minHeap->size - 1];

    --minHeap->size;
    minHeapify(minHeap, 0);

    return temp;
    }

    // Insert new node to Min Heap function
    void insertMinHeap(struct MinHeap* minHeap,
    struct MinHeapNode* minHeapNode)

    {
    ++minHeap->size;
    int i = minHeap->size - 1;

    while (i && minHeapNode->freq < minHeap->array[(i - 1) / 2]->freq) {

    minHeap->array[i] = minHeap->array[(i - 1) / 2];
    i = (i - 1) / 2;
    }

    minHeap->array[i] = minHeapNode;
    }

    // min heap build funvtion
    void buildMinHeap(struct MinHeap* minHeap)

    {
    int n = minHeap->size - 1;
    int i;

    for (i = (n - 1) / 2; i >= 0; --i)
    minHeapify(minHeap, i);
    }

    // Decode function of an array of size n
    void printArr(int arr[], int n)
    {
    int i;
    for (i = 0; i < n; ++i)
    printf("%d", arr[i]);

    printf("\n");
    }

    // Check function if this node is leaf
    int isLeaf(struct MinHeapNode* root)

    {

    return !(root->left) && !(root->right);
    }

    // This fuction creates a min heap of capacity equal to size and inserts all signals data in min heap
    // Initially size of min heap is equal to capacity
    struct MinHeap* createAndBuildMinHeap(signal data[], int freq[], int size)

    {
    struct MinHeap* minHeap = createMinHeap(size);

    for (int i = 0; i < size; ++i)
    minHeap->array[i] = newNode(data[i], freq[i]);

    minHeap->size = size;
    buildMinHeap(minHeap);

    return minHeap;
    }

    // This is main Huffman Tree Build fucntion
    struct MinHeapNode* buildHuffmanTree(signal data[], int freq[], int size)

    {
    struct MinHeapNode *left, *right, *top;

    // Step 1: Create a min heap of capacity
    // equal to size. Initially, there are
    // modes equal to size.
    struct MinHeap* minHeap = createAndBuildMinHeap(data, freq, size);

    // Do iterate while size of heap doesn't become 1
    while (!isSizeOne(minHeap)) {

    // Extraction the two minimum freq. from min heap
    left = extractMin(minHeap);
    right = extractMin(minHeap);

    // This function would create new internal node with occurrence frequency
    // is equal to the sum of the two nodes frequencies. Then we make the two extracted
    // nodes as left and right children of this new node.
    // Finally we would add this node to the min heap

    top = newNode('$', left->freq + right->freq);

    top->left = left;
    top->right = right;

    insertMinHeap(minHeap, top);
    }

    // Root node is remaining node and the tree is complete.
    return extractMin(minHeap);
    }

    // Decode Huffman PI codes from the root of Huffman Tree.
    // This function uses array to store codes

    void printCodes(struct MinHeapNode* root, int arr[], int top)

    {
    // Assign 0 to left edge and recur
    if (root->left) {

    arr[top] = 0;
    printCodes(root->left, arr, top + 1);
    }

    // Assign 1 to right edge and recur
    if (root->right) {

    arr[top] = 1;
    printCodes(root->right, arr, top + 1);
    }

    // If this is a leaf node, it would contains one of the input signals, its corresponding code from array
    if (isLeaf(root)) {

    printf("%c: ", root->data);
    printArr(arr, top);
    }
    }

    // Traversing function which builds Huffman PI Tree and decode codes by traversing built Huffman Tree
    void HuffmanCodes(signal data[], int freq[], int size)

    {
    // Let's buid Huffman PI Tree
    struct MinHeapNode* root
    = buildHuffmanTree(data, freq, size);

    // Decode Huffman PI codes
    int arr[MAX_TREE_HT], top = 0;

    printCodes(root, arr, top);
    }

    // Let's test all functions

    int main()
    {

    signal arr[] = { 'a', 'b', 'c', 'd', 'e', 'f' };
    int freq[] = { 5, 9, 12, 13, 16, 45 };

    int size = sizeof(arr) / sizeof(arr[0]);

    HuffmanCodes(arr, freq, size);

    return 0;
    }

    Result would be such:

    f: 0
    c: 100
    d: 101
    a: 1100
    b: 1101
    e: 111

    Huffman PI coding overal time complexity is n*log(n), where n is the number of unique sampled signals
    If there are n nodes, ExtractMin(function) is called 2*(n – 1) times. Each ExtractMin(function) takes log(n) time as

    it calles minHeapify(function).

    That's it. Happy analysing and prospecting.

    DonCaffeDeLaSkobalj - LRL Master

  • #2
    https://en.wikipedia.org/wiki/Huffman_coding

    Not obvious to me what use this has in metal detectors.

    The numbers coming out of the A/D converter, we do not know in advance what probably distribution they are going to have or what physical phenomenon will cause them. With appropriate DSP algorithms we can accumulate a time series and based on properties of that time series separate the incoming signals into different categories which map more or less to hardware DC offsets, thermal noise, ground noise, and metal targets. None of this has anything to do with lossless compression. In fact the first thing we almost always do, is to low pass filter the A/D data stream.

    Comment


    • #3
      I admit I don't understand all your post .... but it appears that you consider 'ground noise' to be random-noise-like, having a Normal Distribution Curve shape, and averaging out to Zero. In my experience, Ground Signal is not random, it is generated by real characteristics of the ground, and factors such as the height the coil is above the ground. As such, it is highly repeatable, not random, and if a detector was swept by a 'robot' arm over and over the same patch of dirt, the exact same 'ground noise' would repeat every sweep.
      True, there are additional noise sources, such as electrical noise ( generated by resistors/wire, flicker noise in semiconductors etc) and radio interference pickup ( many sources, some more repetitive than others), but these can usually be mitigated, eg. by transmitting a stronger signal from the detector. Thus increasing Signal to noise ratio.

      Comment


      • #4
        Originally posted by DonCaffeDelaSkobalj View Post
        Simple steps how to build Huffman PI Tree:
        Input is array of unique signals...
        What, exactly, distinguishes the unique signals for binning?

        I suspect this technique is way more useful in LRLs than in metal detectors.

        Comment


        • #5
          Morse Code is probably the oldest and best known method of lossless coding, when the source document is English text of the kind codable by Morse.

          I've used codes something like that for local com links but not for processing "metal detector signals" for which the method has no application.

          Comment


          • #6
            Huffman tries to minimize the length of a compressed data stream by assigning the shortest codes to the most probable data. I don't understand how discrimination in a metal detector would benefit from compression, after all if you need statistics in order to Huffman code then the statistics themselves can provide the same discrimination without Huffman coding. Huge overhead for zero benefit.

            Comment

            Working...
            X