Wątki

[ Pobierz całość w formacie PDF ]
.ntpThe ability to correlate events on a cluster is necessary to diagnose and fix problems.One of the common gotchas is to forget to synchronize clocks between machines.Pick a node in the cluster usually one of the master nodes and make it a localNTP server for all other nodes.Details on configuring NTP properly are availableat http://www.ntp.org/.sshHadoop itself does not rely on SSH,4 although it is incredibly useful for adminis-tration and debugging.Depending on the environment, developers may also havedirect access to machines to view logs.4.Some of the supporting shell scripts do use ssh to start and stop services on the cluster.56 | Chapter 4: Planning a Hadoop Cluster postfix/sendmailWhile nothing in Hadoop sends email, it is sometimes useful to have an MTA thatsupports outbound email only.This is useful for automated tasks running fromcron to be able to notify administrators of exceptional circumstances.Both postfixand sendmail are fine for this purpose.rsyncOne of the most underrated tools, rsync allows administrators to copy files effi-ciently locally and between hosts.If you re not already familiar with rsync, learn it.Hostnames, DNS, and IdentificationLet s just get this out of the way: when it comes to host identification, discovery, andthe treatment of hostnames, Hadoop is complicated and extremely picky.This topic isresponsible for a fair number of cries for support on the mailing lists and almost cer-tainly an equal amount of lost sleep on the part of many who are new to Hadoop.But before we get into the list of things that can go wrong, let s first talk about howHadoop actually discovers and identifies hosts.As we discussed previously, Hadoopworker processes such as the tasktracker and datanodes heartbeat into the jobtrackerand namenode (respectively) every few seconds.The first time this occurs, Hadooplearns about the worker s existence.Part of this heartbeat includes the identity of themachine, either by hostname or by IP address.This identifier again, either the host-name or the IP address is how Hadoop will refer to this machine.This means thatwhen an HDFS client, for instance, asks the namenode to open a file, the namenodewill return this identifier to the client as the proper way in which to contact the worker.The exact implications of this are far-reaching; both the client and the worker nowmust be able to directly communicate, but the client must also be able to resolve thehostname and communicate with the worker using the identifier as it was reported tothe namenode.But what name does the datanode report to the namenode? That s thereal question.When the datanode starts up, it follows a rather convoluted process to discover thename of the machine.There are a few different configuration parameters that can affectthe final decision.These parameters are covered in Chapter 5, but in its default con-figuration the datanode executes the following series of steps:1.Get the hostname of the machine, as returned by Java s InetAddress.getLocalHost().2.Canonicalize the hostname by calling InetAddress#getCanonicalHostName().3.Set this name internally and send it to either the namenode or the jobtracker.This seems simple enough.The only real question is what getLocalHost() and getCanonicalHostName() do, under the hood.Unfortunately, this turns out to be platform-specific and sensitive to the environment in a few ways.On Linux, with the HotSpotJVM, getLocalHost() uses the POSIX, gethostname() which in Linux, uses theOperating System Selection and Preparation | 57 uname() syscall.This has absolutely no relationship to DNS or /etc/hosts, although thename it returns is usually similar or even identical.The command hostname, for instance,exclusively uses gethostname() and sethostname() whereas host and dig usegethostbyname() and gethostbyaddr().The former is how you interact with thehostname as the kernel sees it, while the latter follows the normal Linux name resolutionpath.The implementation of getLocalHost() on Linux gets the hostname of the machine andthen immediately calls gethostbyname().As a result, if the hostname doesn t resolve toan IP address, expect issues.Normally, this isn t a concern because there s usually atleast an entry in /etc/hosts as a result of the initial OS installation.Oddly enough, onMac OS X, if the hostname doesn t resolve, it still returns the hostname and the IPaddress active on the preferred network interface.The second half of the equation is the implementation of getCanonicalHostName(),which has an interesting quirk [ Pobierz całość w formacie PDF ]

  • zanotowane.pl
  • doc.pisz.pl
  • pdf.pisz.pl
  • mikr.xlx.pl
  • Powered by MyScript