Sunday, 15 September 2019

Excel File Reader

Hello Everyone,

Recently I came across a problem in my org to have a system which processes (reads) an excel file super fast and keeps the memory consumption minimal at the same time.
My first and obvious choice was Apache POI library. Which helped me building the system, and it was serving the purpose TILL I was bit by a block. System was not able to handle huge excel files and was breaking down if the file size goes above 20MB.

After spending much time on internet I came across http://poi.apache.org/components/spreadsheet/how-to.html#xssf_sax_api which is based on the idea of processing the underlying xml of excel file and using sax parser for super fast processing.

Obviously given implementation was very limited and was not enough to serve my purpose. Hence I had to enhance it to a point it became very understandable for anyone and useful at the same time.

This implementation is super fast and memory efficient to read huge excel file.
Find full implementation go to my git hub page here.

Just to give you how memory efficient this is I did a test with 23MB excel file-

Picture 1 represents the memory use of a code that uses XSSFWorkbook
Picture 2 represents the memory use of a code that uses SAXParser
                                                             






Clearly you can see in picture1 it is consuming memory more that 1 GB where as in Picture2 it is consuming around 250MB.

An example code would look something like this.

ExcelFileReader fileReader = new ExcelFileReader();
String path = "";
String sheetName = "";
File file = new File(path);
fileReader.initialize(file);
List> map = fileReader.getExcelDataAsListOfMap(1, sheetName, 1, null, 2);
map.forEach(stringStringMap -> System.out.println(stringStringMap));





Hope this sort note helps everyone looking for such solution for reading huge excel files.
The class ExcelFileReader has lots of useful methods make use of them try them out and let me know it helps.

Thanks!

Monday, 4 June 2018

Min Heap

Heap is the best data structure to arrange your data in such a way that you can get the min/max in constant time. In sort this is all heap is used for.
Further there are many variations of Heap:
1) Min Heap
2) Max Heap

1) Min Heap:
            With Min Heap at any point of time when you query the data structure you get the minimum item present in the list.

2) Max Heap:
             With Max Heap at any point of time when you query the data structure you get the maximum item present in the list.

Off-course there are some work need to be done to have such a data structure which keep the record of max/min values even after having multiple insert and delete operations.

Then there are further variations of the Heap which could be:
1) Fixed length Heap
2) Dynamic length Heap

I would suggest you to go through https://github.com/sayrajat/java-ds  and get the code of dynamic length Heap to have a better understanding of the concept.

To understand the basics of Heap this wiki link is best
https://en.wikipedia.org/wiki/Min-max_heap