Posts

Spark DataFrame - Array[ByteBuffer] - IllegalAurmentException

IllegalArgumentException - ByteBuffer - Spark DataFrame I was processing a several million documents (~ 20 million) in which we need to extract the NLP features using NLP4J, OpenNLP, and WordNet. The combination of the three NL features blows up each record to 11 times its original size. We are using all three because we do not know yet what feature sets will be helpful to us. The original dataset is in parquet files in HDFS (16 partitions). I thought that was convenient just use withColumn and pass a UDF (User Defined Function) on the column where it needs those features. withColumn adds the calculated column back to the DataFrame. So I created the spark job (I am on Spark 1.5.2-cdh5.5.2)for the above, and things started to get nasty. I am blowing up the ByteBuffer array on the in-memory columnar storage. This is the exception that I am getting. There seems to be no reference in my code in this stack trace. java.lang.IllegalArgumentException at java.nio.B...

Watson - The mystery after jeopardy

We have been deep diving in cognitive computing. One of the best platforms that a business can leverage to hit the ground on cognitive computing is IBM Watson. Watson has a lot of capabilities especially with the acquisition of Alchemy's API as well. ( Alchemy Acquisition - IBM ). You get a language translator, language classifier, retrieve and rank, text to speech, tone analyzer, and a lot more. It is just a matter of how these capabilities can be integrated to your business use cases. As part of "the answer" company we have a tremendous and diverse use case for searching - and giving you answers in a way that makes sense, relevant and make a user decide better is at the heart of what makes us "the answer" company. I was a part of the team given the freedom to explore IBM Watson (no matter what the cost). So we have tried the different APIs in a span of a few weeks. Of course, we have to take a look at the Watson's retrieve and rank ( IBM Watson;s Retrieve...

2016 - Movies Data Analysis - Linear Regression Modelling

Java 1.8 Migration - Performance and Garbage Collection

Java 7 to Java 8 - that is easy! I have been working on migrating our web application from Java 1.7 to Java 1.8. Migrating our web app is a lot of challenge. What makes it more challenging is that our web application has a really unique process footprint (well that can be said for all web application). You have to know your application like the back of your hand especially if you want to tune garbage collection for it. When I accepted the challenge of changing our web application from Java 1.7 to Java 1.8. I thought that it was going to just a breeze considering that from 1.7 to 1.8 was not that far of a version. It turned out that I was totally wrong. Here are some of the major challenges that I encountered: 1. Permanent Generation turned into MetaSpace     Before Java, 1.8 class metadata is located in the permanent generation of the java heap. This can be set using the -XX:PermSize option. This was removed in Java 1.8 ( Remove Permanent Generation ). The reason why it...

Agile is not a process!

Agile - What is it? I have recently reinforced my understanding of what Agile is in relation to software development. One of the things that I realized is that the Agile manifesto does not dictate a process you have to follow but is more like a culture of what you need to value. One of the most popular agile methodologies is Scrum.  Scrum Guide  (The last time I read the guide was in 2009 - they have released a new guide in 2013. They will probably release a new one soon)- If you follow Scrum, you need to follow everything otherwise you are not doing Scrum. You can do stand ups but if that is only you are doing then it is not Scrum. Scrum will ensure that an Agile culture is developed with the team - and the person to see that through is the Scrum Master. Since Agile is in its teen years - we probably need to reconnect with it and revisit where it is being taken. Agile in its teen year!

Speech to Text - HTML 5

Technology Can Help Recently there was a news in Good Morning America were a deaf cashier is taking orders and customers patiently writing out there orders. Here is a link   Deaf Cashier Well with a computer in every pocket there is technology to help. Here is a quick demo that i have created to use HTML 5 technology to create a speech to text web page. Here is a link to the working page http://petabyte.github.io/textToSpeech.html https://www.youtube.com/watch?v=27SZISZAPEA

Internet of Things Project - Music Teacher

Image
Why ? Music Education usually involves a student using several instruments. A student needs to log practice time which should meet a certain amount of time spent practicing with the instrument. This is usually done using a log book. The logbook usually needs to be initialed by the parent. ( Sometimes the parent does not really know if the student practiced or not.). So, the logbook usually gets signed by the parent but the student did not really practice. How can you verify this ? - You can record every practice via video (but you are going to need to watch the video in order to determine the amount of practice time) There must be a better way of logging music practice. Music Practice Logbook How to solve this issue ? Good thing is that there is a lot of makers out there connecting things to the internet.  We can create a device that will log music practice so that we can see a summary of the length of practice every day. Using a mic sensor, led lights , and a p...