{"595496":{"#nid":"595496","#data":{"type":"event","title":"PACE Big Data Workshop","body":[{"value":"\u003Cp\u003EAbout this workshop:\u003Cbr \/\u003E\r\n\u003Cbr \/\u003E\r\nThis workshop is sponsored by the NSF\u0026#39;s XSEDE (The Extreme Science and Engineering Development Environment, \u003Ca href=\u0022https:\/\/www.xsede.org\/\u0022 id=\u0022LPlnk509258\u0022 rel=\u0022noopener noreferrer\u0022 target=\u0022_blank\u0022\u003Ehttps:\/\/www.xsede.org\/\u003C\/a\u003E) program. Staff members from Texas Advanced Computing Center (\u003Ca href=\u0022https:\/\/www.tacc.utexas.edu\/\u0022 rel=\u0022noopener noreferrer\u0022 target=\u0022_blank\u0022\u003Ehttps:\/\/www.tacc.utexas.edu\/\u003C\/a\u003E) will teach the workshop. The workshop is organized as four separate sessions to cover various topics in Big Data Analysis.\u0026nbsp; Although participants are strongly encouraged to attend all sessions, the workshop is designed in a way such that participants may just attend selected sessions based on their background, schedule and needs.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EAbout Instructors:\u003Cbr \/\u003E\r\n\u003Cbr \/\u003E\r\nRuizhu Huang is a research associate in the data intensive computing group at TACC. He has years of experience in big data analytics, machine learning, and data visualization. He has involved in various projects developing technologies that bridge the gap between traditional machine learning approaches and next-generation, data intensive computing methods involving High-Performance Computing (HPC) resources\u003Cbr \/\u003E\r\n\u003Cbr \/\u003E\r\nAmit Gupta is a Research Engineering\/Scientist Associate III in the Data Mining and Statistics group at TACC. His research interests are in Distributed Systems and Tools to enable scaling of Big Data Applications on HPC infrastructure, Parallel Programming and Information Retrieval Systems for text. He has extensive experience with various applications ranging from scaling Transportation Simulations to Text Mining of Biological literature. He earned an MS in Computer Science from the University of Colorado at Boulder with Thesis research in the area of Operating Systems.\u003Cbr \/\u003E\r\n\u003Cbr \/\u003E\r\nDr. Weijia Xu is a research scientist and manager of Data Mining and Statistics group at TACC. He received his Ph.D. in Computer Science from The University of Texas At Austin. Dr. Xu has over 50 peer-reviewed conference and journal publications in similarity-based data retrieval, data analysis, and information visualization with data from various scientific domains. He has served on program committees for several workshops and conferences in big data and high-performance computing area.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EPart One: Introduction to Hadoop and Spark [\u003Ca class=\u0022x_OWAAutoLink\u0022 href=\u0022http:\/\/training.gatech.edu\/courses\/searchupcoming#view-14949\u0022 rel=\u0022noopener noreferrer\u0022 target=\u0022_blank\u0022\u003Eregister here\u003C\/a\u003E]\u003C\/p\u003E\r\n\r\n\u003Cdiv\u003ETime: Sept 28 08:30am-12:30pm\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003ELocation: Marcus Nano Rm 1116\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003ECapacity: 30 people\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003E\u0026nbsp;\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003EThe session will focus on introducing Hadoop and Spark cluster to beginner, the topic includes:\u003C\/div\u003E\r\n\r\n\u003Cul\u003E\r\n\t\u003Cli\u003Ebasic concepts used in MapReduce programming model\u003C\/li\u003E\r\n\t\u003Cli\u003Emajor components of a Hadoop cluster\u003C\/li\u003E\r\n\t\u003Cli\u003Ehow to get started with Hadoop on your own computer and with computing resources at TACC\u003C\/li\u003E\r\n\t\u003Cli\u003Eintroduce Spark programming models and how Spark can work with a Hadoop cluster\u003C\/li\u003E\r\n\t\u003Cli\u003Edifferent ways to use Hadoop and Spark for analysis\u003C\/li\u003E\r\n\u003C\/ul\u003E\r\n\r\n\u003Cdiv\u003E\u0026nbsp;\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003EParticipants do not need have any particular programming background, but working knowledge of Linux operating system is preferred. Class includes 3 hours lecture and 1 hour hands-on.\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003E\u0026nbsp;\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003ENo show fee $25.00 applies if you don\u0026#39;t show up in the session without cancelling it 5 days before the class.\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003E\u0026nbsp;\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003EPart Two: Developing a scalable application with Spark [\u003Ca class=\u0022x_OWAAutoLink\u0022 href=\u0022http:\/\/training.gatech.edu\/courses\/searchupcoming#view-14950\u0022 rel=\u0022noopener noreferrer\u0022 target=\u0022_blank\u0022\u003Eregister here\u003C\/a\u003E]\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003E\u0026nbsp;\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003ETime: Sept 28 1:30pm-5:30pm\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003ELocation: Marcus Nano Rm 1116\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003ECapacity: 30 people\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003E\u0026nbsp;\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003EThis session will focus on how to develop a scalable application with Spark programming model, the topic includes:\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003E\u0026nbsp;\u003C\/div\u003E\r\n\r\n\u003Cul\u003E\r\n\t\u003Cli\u003Ereview Spark programming model\u003C\/li\u003E\r\n\t\u003Cli\u003Ebasic introduction to the Scala programming language\u003C\/li\u003E\r\n\t\u003Cli\u003Ehow to run a Spark application\u003C\/li\u003E\r\n\t\u003Cli\u003Ekeys features to make scalable application\u003C\/li\u003E\r\n\t\u003Cli\u003Ehow to get started development using Spark after the class\u003C\/li\u003E\r\n\u003C\/ul\u003E\r\n\r\n\u003Cdiv\u003E\u0026nbsp;\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003EParticipant is expected to have prior knowledge on the concept of Hadoop and Spark cluster, knowledge of any programming language is preferred but not required.Class includes 3 hours lecture and 1 hour hands-on.\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003E\u0026nbsp;\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003ENo show fee $25.00 applies if you don\u0026#39;t show up in the session without cancelling it 5 days before the class.\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003E\u0026nbsp;\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003EPart Three: Common Practices on Hadoop and Spark Ecosystem [\u003Ca class=\u0022x_OWAAutoLink\u0022 href=\u0022http:\/\/training.gatech.edu\/courses\/searchupcoming#view-14951\u0022 rel=\u0022noopener noreferrer\u0022 target=\u0022_blank\u0022\u003Eregister here\u003C\/a\u003E]\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003E\u0026nbsp;\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003ETime: Sept 29 08:30am-12:30pm\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003ELocation: Marcus Nano Rm 1116\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003ECapacity: 30 people\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003E\u0026nbsp;\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003EThis session will focus on general practices for practical analysis problem, the topic includes:\u003C\/div\u003E\r\n\r\n\u003Cul\u003E\r\n\t\u003Cli\u003Erunning batch jobs with different cluster deployment mode\u003C\/li\u003E\r\n\t\u003Cli\u003Erunning interactive jobs\u003C\/li\u003E\r\n\t\u003Cli\u003Eexplore existing libraries and applications including Hadoop streaming, MLlib, SparkSQL and Graph X\u003C\/li\u003E\r\n\t\u003Cli\u003EUsing Hadoop\/Spark with R and Python\u003C\/li\u003E\r\n\u003C\/ul\u003E\r\n\r\n\u003Cdiv\u003E\u0026nbsp;\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003EParticipants should have basic knowledge, experience and are comfortable with coding with knowledge of the Hadoop system, concepts of parallelism. Class includes 3 hours lecture and 1 hour hands-on.\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003E\u0026nbsp;\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003ENo show fee $25.00 applies if you don\u0026#39;t show up in the session without cancelling it 5 days before the class.\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003E\u0026nbsp;\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003EPart Four: Advanced Topic on Big Data Analysis [\u003Ca class=\u0022x_OWAAutoLink\u0022 href=\u0022http:\/\/training.gatech.edu\/courses\/searchupcoming#view-14952\u0022 rel=\u0022noopener noreferrer\u0022 target=\u0022_blank\u0022\u003Eregister here\u003C\/a\u003E]\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003E\u0026nbsp;\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003ETime: Sept 29 01:30pm-03:30pm\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003ELocation: Marcus Nano Rm 1116\u003C\/div\u003E\r\n\r\n\u003Cdiv\u003ECapacity: 30 people\u003C\/div\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cdiv\u003EThis session will cover more algorithm details and also provides a hands-on consultation for GT researchers\u0026#39; application, we will collect the use cases before the session, and walk through the selected use cases in details to demonstrate how to resolve the real world problem more efficiently.\u003C\/div\u003E\r\n","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003EThis workshop is provided by Texas Advanced Computing Center\u0026nbsp; (TACC) researchers, and the aim is to introduce the Big Data Toolset to GT researchers and help researchers to identify and map their research problem to Big Data world, and find solution to the problem in the hand. There are four sessions, and researchers can choose one or more sessions to attend based on programming level and experience.\u003C\/p\u003E\r\n","format":"limited_html"}],"field_summary_sentence":[{"value":"Big Data Training"}],"uid":"34003","created_gmt":"2017-09-05 16:49:22","changed_gmt":"2017-09-11 16:34:36","author":"fliu67","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2017-09-28T01:00:00-04:00","event_time_end":"2017-09-29T01:00:00-04:00","event_time_end_last":"2017-09-29T01:00:00-04:00","gmt_time_start":"2017-09-28 05:00:00","gmt_time_end":"2017-09-29 05:00:00","gmt_time_end_last":"2017-09-29 05:00:00","rrule":null,"timezone":"America\/New_York"},"extras":[],"groups":[{"id":"337231","name":"Georgia Tech High Performance Computing (PACE)"}],"categories":[],"keywords":[{"id":"15092","name":"big data"},{"id":"175412","name":"Hadoop"},{"id":"167041","name":"spark"},{"id":"9167","name":"machine learning"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1789","name":"Conference\/Symposium"}],"invited_audience":[{"id":"78761","name":"Faculty\/Staff"},{"id":"78771","name":"Public"},{"id":"174045","name":"Graduate students"},{"id":"78751","name":"Undergraduate students"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003EFang (Cherry) Liu (Ph.D.)\u003C\/p\u003E\r\n\r\n\u003Cp\u003Efang.liu at gatech.edu\u003C\/p\u003E\r\n","format":"limited_html"}],"email":[],"slides":[],"orientation":[],"userdata":""}}}