<node id="670825">
  <nid>670825</nid>
  <type>event</type>
  <uid>
    <user id="27707"><![CDATA[27707]]></user>
  </uid>
  <created>1698859580</created>
  <changed>1698859580</changed>
  <title><![CDATA[PhD Proposal by Matthew Whitlock]]></title>
  <body><![CDATA[<p><span><span><strong><span><span><span>Title:</span></span></span></strong><span><span><span>&nbsp;Designing and Automating Asynchronous, Localized, Multi-Level Fault-Tolerance at the Application Level</span></span></span></span></span></p>

<p>&nbsp;</p>

<p><span><span><strong><span><span><span>Date: </span></span></span></strong><span><span><span>Monday, Nov 6, 2023</span></span></span></span></span></p>

<p><span><span><strong><span><span><span>Time: </span></span></span></strong><span><span><span>4:15 pm - 6:15 pm ET</span></span></span></span></span></p>

<p><span><span><strong><span><span><span>Location: </span></span></span></strong><span><span><span>Klaus 3402 and <a href="https://teams.microsoft.com/l/meetup-join/19%3ameeting_NmU3ZTdkYWMtYjk3ZC00ZGRlLWI5NWUtZDEzMmQwZDU4Yjdl%40thread.v2/0?context=%7b%22Tid%22%3a%22482198bb-ae7b-4b25-8b7a-6d7f32faa083%22%2c%22Oid%22%3a%221ca2054b-a945-4131-9ad7-f3552a880051%22%7d" title="https://teams.microsoft.com/l/meetup-join/19%3ameeting_NmU3ZTdkYWMtYjk3ZC00ZGRlLWI5NWUtZDEzMmQwZDU4Yjdl%40thread.v2/0?context=%7b%22Tid%22%3a%22482198bb-ae7b-4b25-8b7a-6d7f32faa083%22%2c%22Oid%22%3a%221ca2054b-a945-4131-9ad7-f3552a880051%22%7d">Virtual</a></span></span></span></span></span></p>

<p><span><span><a href="https://teams.microsoft.com/l/meetup-join/19%3ameeting_NmU3ZTdkYWMtYjk3ZC00ZGRlLWI5NWUtZDEzMmQwZDU4Yjdl%40thread.v2/0?context=%7b%22Tid%22%3a%22482198bb-ae7b-4b25-8b7a-6d7f32faa083%22%2c%22Oid%22%3a%221ca2054b-a945-4131-9ad7-f3552a880051%22%7d" target="_blank"><span><span><img alt="" src="https://statics.teams.cdn.office.net/hashedassets-launcher/favicon/favicon-96x96.png" /></span></span></a></span></span></p>

<p><span><span><span><span><a href="https://teams.microsoft.com/l/meetup-join/19%3ameeting_NmU3ZTdkYWMtYjk3ZC00ZGRlLWI5NWUtZDEzMmQwZDU4Yjdl%40thread.v2/0?context=%7b%22Tid%22%3a%22482198bb-ae7b-4b25-8b7a-6d7f32faa083%22%2c%22Oid%22%3a%221ca2054b-a945-4131-9ad7-f3552a880051%22%7d" target="_blank">Join conversation</a></span></span></span></span></p>

<p><span><span><span><span><span>teams.microsoft.com</span></span></span></span></span></p>

<p>&nbsp;</p>

<p>&nbsp;</p>

<p><span><span><strong><span><span><span>Matthew Whitlock</span></span></span></strong></span></span></p>

<p><span><span><span><span><span>Ph.D. Student</span></span></span></span></span></p>

<p><span><span><span><span><span>School of Computer Science</span></span></span></span></span></p>

<p><span><span><span><span><span>Georgia Institute of Technology</span></span></span></span></span></p>

<p>&nbsp;</p>

<p><span><span><strong><span><span><span>Committee:</span></span></span></strong></span></span></p>

<p><span><span><span><span><span>Dr. Vivek Sarkar (Advisor) - School of Computer Science, Georgia Institute of Technology</span></span></span></span></span></p>

<p><span><span><span><span><span><span>Dr Keita Teranishi - Programming Systems, Oak Ridge National Laboratory</span></span></span></span></span></span></p>

<p><span><span><span><span><span>Dr. Ada Gavrilovska -&nbsp;<span>School of Computer Science, Georgia Institute of Technology</span></span></span></span></span></span></p>

<p><span><span><span><span><span>Dr. Tom Conte -&nbsp;<span>School of Computer Science, Georgia Institute of Technology</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>Dr. Umakishore Ramachandran -&nbsp;</span></span></span></span><span><span><span><span>School of Computer Science, Georgia Institute of Technology</span></span></span></span></span></span></p>

<p>&nbsp;</p>

<p><span><span><strong><span><span><span><span>Abstract:</span></span></span></span></strong></span></span></p>

<p><span><span><span><span><span><span>Though hardware reliability improvements have extended the lifespan of traditional, </span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>inefficient application resilience based on global Checkpoint/Recovery (C/R), it is increas-</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>ingly apparent that this burden has a cost. Year-over-year, chips implement more complex</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>functions and components to handle the reliability impacts of meeting performance, power,</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>and density demands of next-gen computing. Software-based resilience is no longer just a</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>necessary burden for long-running applications – it is a key component of hardware/soft-</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>ware codesign that opens the door for improvements in component performance, efficiency,</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>and cost. However, application-level resilience must display certain key properties before</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>these benefits can be realized at large scales. The traditional global teardown-restart re-</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>sponse to failures compounds the costs of faults and quickly reaches a scalability cliff;</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>resilience designs must localize the cost of fault tolerance with online process recovery,</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>asynchronous checkpointing, and preservation of progress on processes distant from those</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>lost. Meeting these standards requires complex, multi-layered resilience designs – of the</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>type developers are reticent to implement. We extend existing state-of-the-art resilience</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>tools and design new approaches for simplifying and enhancing the most difficult aspects of</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>contemporary fault tolerance. With them, we implement and evaluate algorithms capable of</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>maintaining performance even as fault rates exceed checkpoint rates. Through integrations</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>with modern programming models and composable layers of resilience, we demonstrate</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>highly effective avenues for relieving the burden of implementing optimal application- and</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>platform-tailored resilience in complex, asynchronous, and dynamic programs.</span></span></span></span></span></span></p>
]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Designing and Automating Asynchronous, Localized, Multi-Level Fault-Tolerance at the Application Level]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[<p><span><span><span><span><span>Designing and Automating Asynchronous, Localized, Multi-Level Fault-Tolerance at the Application Level</span></span></span></span></span></p>
]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2023-11-06T16:15:00-05:00]]></value>
      <value2><![CDATA[2023-11-06T18:15:00-05:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[Klaus 3402 and Virtual]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>221981</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Graduate Studies]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>102851</tid>
        <value><![CDATA[Phd proposal]]></value>
      </item>
      </field_keywords>
  <userdata><![CDATA[]]></userdata>
</node>
