SPRING for APACHE HADOOP CHANGELOG ================================== http://www.springsource.org/spring-data/hadoop Commit changelog: http://github.com/SpringSource/spring-hadoop/tree/v[version] Issues changelog: http://jira.springsource.org/secure/ReleaseNote.jspa?projectId=10801 Changes in version 1.0 GA (2013-02-26) -------------------------------------- General * Adjusted 'input-path' attribute in 'job' element from required to optional * Added scope attribute to all runner namespace elements * Improved Hive/Pig namespace to allow property placeholder replacement Package o.s.data.hadoop.cascading * Replaced HadoopFlowFactoryBean/cascading-flow jarSetup param/attribute with addCascadingJars Package o.s.data.hadoop.fs * Fixed incorrect DistCp.copy() method * Removed 'GenericOptionsParser' warning when invoking DistCp * Added user impersonation to DistCp utility * Improved execution of chmod in Hadoop 2.x distros Package o.s.data.hadoop.mapreduce * Reverted job default task execution to synchronous * Improved job behaviour when setting no-wait flag Package o.s.data.hadoop.pig * Improved registerScript method lookup in Pig 0.9 or higher Package o.s.data.hadoop.script * Refined ScriptTasklet class to cope with CGLIB class proxies Changes in version 1.0 RC2 (2013-01-21) --------------------------------------- General * Enhanced Cascading, Pig and MapReduce tasklets to expose execution stats to the Spring Batch infrastructure * Enhanced Cascading and MapReduce tasklets and runners to support cancellation * Introduced Cascading namespace * Improved spring-hadoop.xsd namespace * Extended test-suite to include CDH3u4, CDH4u2, Greenplum HD 1.2 and Apache Hadoop 1.1.x * Improved reference documentation * Upgraded to Cascading 2.1.2 * Upgraded to Hadoop 1.0.4 * Upgraded to Hive 0.10 * Upgraded to Pig 0.10.1 * Upgraded to Spring Integration 2.1.4 * Upgraded to Gradle 1.3 * Revised artifact Maven pom.xml to minimize number of dependencies * Removed any logging library declaration (project relies on the 'commons-logging' transitive dependency) * Removed o.s.data.hadoop.batch package Package o.s.data.hadoop.cascading * Refined local Taps timestamp for compatibility with Cascading 2.1 * Enhanced FlowFactoryBean to automatically add Cascading jars & dependencies to the target Flow when needed Package o.s.data.hadoop.configuration * Added top-level file-system/job-tracker uri attributes to the Configuration element Package o.s.data.hadoop.fs * Refined classpath setup for DistributedCache * Added warning regarding invalid classpath setup on Windows platform for DistributedCache (see HADOOP-9123) * Added work-around for early closing of the Hadoop file-system Package o.s.data.hadoop.hbase * Replaced HTable (class type) with HTableInterface (interface) * Improved HTable release process based on their producing factories * Added 'zk-host' and 'zk-port' attributes for ZooKeeper configuration to HBase element Package o.s.data.hadoop.mapreduce * Introduced depends-on attibute to Hadoop configuration, file-system and resource-loader elements for staged start-up * Refined the job configuration under Jar/Tool runner * Improved 'output' Job attribute by making it optional * Introduced utility for discovering a Job status from provisioning to completion Changes in version 1.0 RC1 (2012-10-07) --------------------------------------- General * Introduced Hive, Pig runner for declarative script execution * Refactored all (Cascading, M/R, Hive, Pig) runners as Callables instead of FactoryBeans * Renamed 'pig' to 'pig-factory' and 'pig-ref' to 'pig-factory-ref' * Renamed 'hive-client' to 'hive-client-factory' and 'hive-client-ref' to 'hive-client-factory-ref' * Introduced pre and post actions to all (Cascading, M/R, Hive, Pig) runners * Introduced embedded execution of Hadoop jars * Improved spring-hadoop.xsd namespace * Improved, refined and extended reference documentation * Improved artifacts pom * Upgraded to Spring Batch 2.1.9 * Upgraded to Hive 0.9.0 * Upgraded to Pig 0.10.0 * Upgraded to Gradle 1.2 Package o.s.data.hadoop.cascading * Introduced FlowFactoryBean Package o.s.data.hadoop.configuration * Fixed potential cycle with FileSystem url registration Package o.s.data.hadoop.fs * Added codecs support to hdfs resources * Refined DistributedCache fragment creation for CDH4/Hadoop 0.23 distros * Introduced options for closing the FileSystem * Fine-tuned the DistributedCache API for setting cache entries Package o.s.data.hadoop.hbase * Refined resource management of HBase tables Package o.s.data.hadoop.hive * Addressed swallowed exception occuring script execution * Improved HiveQL parsing for multi-line statements * Introduced variable binding and substitution per Hive script * Refined namespace to preserve parameter ordering * Introduced HiveClient factory (to deal with thread-safety issues) * Introduced HiveTemplate & callback * Introduced extended exception conversion to DataAccessException * Introduced HiveRunner Package o.s.data.hadoop.mapreduce * Introduced scope attribute for job definitions * Introduced verbose flag to job tasklet * Introduced more options for job and streaming namespace * Introduced jar executor * Refined Tool and Jar execution to prevent class loading leaks * Refactored JobRunner FactoryBean into a Callable * Introduced namespace for job-runner * Removed path validation from JobFactoryBean Package o.s.data.hadoop.pig * Refined namespace to preserve parameter ordering * Introduced PigServer factory (to deal with thread-safety issues) * Introduced PigTemplate & callback * Introduced extended exception conversion to DataAccessException * Refined execution of Pig scripts * Introduced PigRunner Package o.s.data.hadoop.scripting * Refactored HdfsScriptFactoryBean into HdfsScriptRunner * Script definitions no longer cause execution on container lookup Changes in version 1.0 M2 (2012-06-13) -------------------------------------- General * Introduced support for Hadoop Security * Enhanced namespace declaration * Introduced DAO support (Template & Callback) for HBase * Added pig and hbase samples * Upgraded to Hadoop 1.0.3 Package org.springframework.data.hadoop.cascading * Added local Taps for Spring Framework Resource * Added local Taps for Spring Integration MessageHandler and MessageSource Package org.springframework.data.hadoop.configuration * Refined creation of Configuration objects to be 'old' API aware Package org.springframework.data.hadoop.fs * Added support for hftp, hsftp and webhdfs Package org.springframework.data.hadoop.mapreduce * Added jar-based class-loading support * Introduced support for Hadoop generic options * Improved diagnostics in JobRunner and JobTasklet * Introduced identify mapper/reducer as defaults * Improved propagation of the thread context classloader (tccl) for Tool execution Package org.springframework.data.hadoop.scripting * Extended HdfsScriptFactory to allow direct wiring of Hadoop Configuration Changes in version 1.0 M1 (2012-02-23) -------------------------------------- General * Aligned build system with Spring Framework 3.2 * Improved namespace * Introduced support for Hadoop Tool * Introduced support for Cascading Package org.springframework.data.hadoop.batch * Introduce Spring Batch ItemReader for HDFS Package org.springframework.data.hadoop.configuration * Introduced more utility methods Package org.springframework.data.hadoop.mapreduce * Introduced support for Hadoop Tool * Fixed incorrect return value/type for JobRunner Changes in version 0.9 RELEASE (2012-02-06) ------------------------------------------- Spring XML namespace with support for creating and/or configuring - Hadoop Configuration object - MapReduce and Streaming Jobs - HBase configuration - Hive server and Thrift client - Pig server instances that register and execute scripts either locally or remotely - Hadoop DistributedCache Spring XML namespace for executing scripts authored in JSR233 compatible scripting languages Support for executing HDFS operations in Groovy, JRuby, Jython or Rhino based on Hadoop Configuration Embedded shell API for HDFS Spring Batch Integration - tasklets for - Map Reduce and Streaming jobs - Hive - Pig - Script execution Sample applications Reference documentation