Αγορά Εργασίας – Senior Java Developer (NTUA) Athens – Unit of Automatic Control and Informatics

The Unit of Automatic Control and Informatics of the National Technical University of Athens is seeking to employ a Senior Java Developer in Athens.

Job Role:

Designing and developing open source software for European research Projects based on Rest Web Services.
Required Skills and Experience:
•   Bachelor’s degree in Electrical & Computer Engineering/Computer Science or related field
•   Experience in Semantic web technologies (HTML, XML, RDF) and RESTful web services
•   Programming/query/markup Languages: MySQL, Java SE/EE with JDBC, JSP, XML
•   Ability to work in a team environment, manage assigned tasks and schedules and meet project deadlines
•   Excellent verbal and written communication skills
•   Excellent organizational and project management skills
•   Excellent command of English

Desirable experience:
•   Netbeans
•   Eclipse IDE
•   Ant/Maven

Please, apply for an interview, sending your detailed CV, to the following e-mail address: controlntualab@gmail.com

Book review – Hadoop real world solutions (Packt)

Nowadays, many developers are starting discussing and using Hadoop framework to solve different problems in big data and Hadoop space. Many tools and practices, are existing and is very difficult for a developer to choose the appropriate tool/approach for his project. This book provides important information not only to non-experience developers but also to experienced ones, since it covers a wide variety of Hadoop-related tools, software frameworks and best practices, like Apache Hive, Pig, MapReduce, HDFS, Giraph and others.

Τhe choice of examples is done very carefully and I think that is quite successful and accurate, because they provide to the reader, including experience and non-experience developers, all the needed information to understand the tools and its usage. The structure of each chapter with the topics of “Getting Ready”, “How to do it” and “How it works” is  very efficient, since it breaks the deal problem into discrete steps and helps also the reader to understand the provided information.  As a suggestion, it would very helpful to add a topic “How to test it” in each chapter.

The organization of the chapters is very good, since it has been separated in logic areas, like importing/export data, make operations and analysis and finally administration and persistence. Personally, I would prefer the chapters of  the administration and persistence (Chapters 9 & 10)  to be in the beginning instead of the end, since the operations that are described can be needed at any time in the book.

Let’s go to chapters:

Chapters 1 – 3, are covering topics regarding Importing, Exporting, Extracting and Transforming  data to/from HDFS, by providing several approaches including importing data from MySQL, MS SQL Server and MongoDB. Interesting also is the Chapter 3 which show the how to use tools, like Hive and Pig, MapReduce Java API in order to perform batch-process in data samples and produce transformed data outputs.  Personally I found very interesting, the provided information regarding the extension of Pig to support and use Hadoop Streaming API for time series analytics. Another interesting topic is the usage of Protocol Buffers, which actually helps you to generate binding in different languages (something that you will  needed a lot, dealing with big data systems).

Chapters 4 – 5, are focused to provide information how to perform operations that are needed when you have to deal with bg data and you will always meet these tasks. The chapters are covering a wide range of operations,  from string concatenation and external table mapping to advanced joins. The chapter 5 is very useful, since the these joins are not so trivial (based on my previous experience) , and the reader will find important information regarding this topic and the provided “recipes” from more than one tool (Apache Pig and Hive).

Chapter 6-7, present a few big data problems and provide solutions how to cope  these problems, using Apache Hive and Pig, and Apache Mahout and Girard for applying  and running machine-learning algorithms in large-scale systems. In general this topic is very difficult, and requires a strong theoretical and technical background. The understanding of the code examples, which actually help the reader to understand the concept, and can easily be extended to larger and more complicated problems. These chapters can be used also as the entry point, for more detailed study of big data analytics.

Chapter 8, my favourite one, is presenting how to troubleshoot and test MapReduce jobs. Since testing and troubleshooting is very important, and sometimes can be very painful, this chapter provides all the needed information regarding tools and techniques, how to debug MapReduce jobs, but also how to write your own tests. This chapter is very rare in most of the books, and is very important to all developers. For

Chapter 9 – 10, are discussing topics regarding maintenance, configuration and administration of Hadoop clusters, including also some fine/job tuning hints. It is also presented technologies regarding the storage of big-data, having in mind also scalability and distributions.
Overall this is a great book, or better a cookbook, that provides real solutions to the reader. The code examples are very carefully selected to address the presented topic. It covers a wide range of tools and best practices, and can help developers to understand and use these tools.  I think authors managed to understand the need of the developers and provide the appropriate information.

Thomas Pliakas

Book link