When I started in Big Data I figured that using AWS was a great idea. Hell, it was free.
And then you go to bed only to remember in the morning that you left 6 clusters running because you're an idiot. Turns out that those mistakes will suck the juice out of your credit card faster than an Omega Compact CNC80.
Like days past (PHP, MySql, ect... ) you need to do this stuff on your crappy laptop before you can step into the big leagues.
Lucky for us all it's possible to do now, and only a little bit of a pain in the ass.
If you're on a Mac then you're on your own. I can't afford a precious little mac so get a $300 Dell and do the following ....
First you need a little Oracle tech. It's their VirtualBox that you can grab for free, the only trick is that it's huge:
Click the big green button and go away on vacation then once you're back install whatever it is that you got.
On it's own, this VirtualBox is virtually useless until you add the HortonWorks Sandbox (HDP). This is also free, however it's a monster. So grab it here:
... then go on vacation for a few weeks while it downloads. Once it's done you'll need to run the VirtualBox, then select File/Import Appliance and pick the sandbox you downloaded.
One huge caveat; there are many versions of HortonWorks. I would suggest getting a handful of different versions from the Archives. My eventual goto version of choice was 2.65. It seems a tad lighter than the newer ones, but feel free to experiment. I've got at least 4 versions loaded to go, and you may have better luck with the 3.* series than I did.
Once it's loaded up you just need to Start your SandBox and follow the instructions to get a vidual Hadoop UI.
You will definitely also require a command line into it, so make sure to get Putty or some other SSH client.
The UI opening screen will tell you what address to SSH into, take note of this and login to it. It's probably something like...
The default user is maria_dev (same password). But at some point you'll need to login as an admin so just get it over with now:
The default admin login is 'admin' and 'hadoop'. Change this in your SSH session first:
You have to restart everyting all the time. It is time consuming but not difficult.
For the record, the above command is the most useful thing in the whole repetoir. Whenever ANYTHING goes wrong with Ambari pop this into your console and see if it fixes it.