I have recently decided that it is a good idea to show you some books that I read and share my opinion about them with you. So here I want to introduce you to a book about HP Vertica named "HP Vertica Essentials", written by Rishabh Agrawal.
I have worked with HP Vertica in the past and have done some setup operations, querying and aggregation. I read only official docs, so when I saw this book about HP Vertica, I decided to read it.
First of all I want to point out that this book is mostly for DBAs, but some chapters would be interesting for developers too. If you're a DBA and would like to know more about Vertica or at least want to know how to setup a Vertica server to play with, then you can read the book.
About the author
Rishabh Agrawal is a senior database research engineer and consultant at Impetus India. He has been working with different databases for the last four years, including relational databases, MPP databases and NoSQL databases. Here I want to point out that I like to read books written by real developers, database administrators, people with practical experience, not theorists. In my opinion only such people have enough knowledge to teach other people. That's why I was full of motivation and wanted to read this book.
Let's quickly go through the chapters, but first I want to say a few words. The first thing that I noticed that this book was really short - only ~80 pages. HP Vertica's documentation is quite large, also MySQL documentation is really big, nearly 3500 pages, so it's hard to understand how the author could describe and explain so many aspects of Vertica using only 80 pages.
Chapter 1: Installing Vertica
This chapter is interesting both for developers and DBAs. Here we have step by step guide on how to install Vertica's server and create the first database. It's easy to follow and quick way to setup a Vertica server. After that, depending on your needs you'll be able to play with it and explore queries or if you're a DBA, then you may read about more specific and complex admin tasks.
In this chapter Rishabh provides a few advices about your host configuration to make things smooth while installing Vertica, such as swap space, CPU frequency scaling and so on. I would recommend to remove all unnecessary install_vertica script output that is specific for everyone and doesn't have too much sense to be in this book and take so much place in a really short book.
Probably you'll have some issues installing Vertica on your own OS, so I would recommend author add link to Vertica's Community, which can save your time and help you to solve your issues and continue your way with Vertica.
Chapter 2: Cluster Management
I like how Rishabh Agrawal describes configuration of elastic cluster, when and why we need it, what it can do for us and so on in easy to read and understand manner. He explains us all operations that we may need working with Vertica's cluster: adding, removing and replacing nodes, changing K-safety and local segmentation of nodes. He shows us how he does it and what we should do before doing these operations, for example backup our data, check hosts availability for each other and so on. Also Rishabh mentioned Vertica's Management Console which is available in enterprise edition only. In my opinion you won't buy enterprise edition if you're doing a research and for example want to figure out if your company need this technology or not. Of course you want to quickly setup it, play with it and then make a decision, so I would remove all things related to Vertica's Management Console from this book.
If you follow Rishabh's instructions step by step, probably you need to read chapter 4 first, about backup/restore operations and then come back to this chapter again.
Chapter 3: Monitoring Vertica
Here we become familiar with two ways how we can monitor Vertica. First one is using system tables and second one - monitoring through log files. As in previous chapter, I would remove last approach, related to management console. Also I would like author to describe system tables more, I mean it would be great to hear from him which system tables are more important in a day per day basis. Finally, author did a good description of Vertica's events and how we can check them via system tables.
Chapter 4: Backup and restore
In this chapter author describes how to do full and incremental backups and restore data using vbr.py. It's an important chapter both for DBAs and developers, because it's critical to avoid data loss.
Chapter 5: Performance Improvement
Here Rishabh Agrawal explains projections, both segmented and unsegmented and in which case you should use one or another. Then author describes how to create projections using your queries set for some tables and Database designer - tool for projections creation. Plus we have example how to create projection manually. In the second part of the chapter, author describes tuple mover, its operations and how we can optimize its work. Here I would like to see at least some research wich shows performance improvements in percentage or something like this, achieved by author after applying all modifications. I would like to see if all these improvements worth my time.
Chapter 6: Bulk Loading
This chapter is important for developers and DBAs too. At my previous work I used bulk loading in a lot of cases. Author explains different load methods like auto, direct and trickle, plus he shows examples of data loading from different places, including copy from local location and so on.
I would add more examples to this chapters, because now I don't see more complex cases when you need to avoid or transform some columns from your original data file, before copying it into some table.
This book won't solve all your problems, but it's good point to start from. Probably this book will make your start with HP Vertica a little bit more comfortable and straight, but you won't get the whole understanding of Vertica. I think that my time wasn't wasted and now I know a little bit more about Vertica's administration, but sure, you definitely need to read official documentation after this book anyway. Chapter 1 (installing Vertica) and chapters from 4 to 6 (backup and restore, projections and bulk loading) are interested not only for DBAs, but for developers too. This book is helpful for beginners, but I think people that already have some experience with HP Vertica will find interesting parts or chapters in it too.
P.S. You can find this book here