<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Ben Greenman | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ben-greenman/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ben-greenman/index.xml" rel="self" type="application/rss+xml"/><description>Ben Greenman</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ben-greenman/avatar_hu15b82112c71ccccb6e11566f71247303_189646_270x270_fill_lanczos_center_3.png</url><title>Ben Greenman</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ben-greenman/</link></image><item><title>Type Narrowing: A Language Design Benchmark</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uutah/type-narrowing/</link><pubDate>Sat, 01 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uutah/type-narrowing/</guid><description>&lt;p>Untyped languages such as JavaScript and Python provide a flexible starting
point for software projects, but eventually, the lack of reliable types
makes code hard to debug and maintain.
Gradually typed languages such
as
&lt;a href="https://www.typescriptlang.org/" target="_blank" rel="noopener">TypeScript&lt;/a>,
&lt;a href="https://flow.org/" target="_blank" rel="noopener">Flow&lt;/a>,
&lt;a href="https://www.mypy-lang.org/" target="_blank" rel="noopener">Mypy&lt;/a>,
and
&lt;a href="https://microsoft.github.io/pyright/#/" target="_blank" rel="noopener">Pyright&lt;/a>
address the problem with type checkers that can reason about an
ever-growing subset of untyped code.
Widening the subset with precise types is an ongoing challenge.&lt;/p>
&lt;p>Furthermore, designs for precise gradual types need to be reproducible
across languages.
Ideas that works well in one language need to be validated
in other contexts in a principled, scientific way to separate
deep insights from language-specific hacks.&lt;/p>
&lt;p>Type narrowing is a key feature of gradual languages.
Narrowing uses type tests in code to refine types and push
information forward along the paths that the program may follow.
For example, when a type test checks an object field, later
code can trust the type of the field:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">// item :: JSON Object
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">if typeof(item[&amp;#34;price&amp;#34;] == &amp;#34;number&amp;#34;):
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // item :: JSON Object,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // where field &amp;#34;price&amp;#34; :: Number
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> return item[&amp;#34;price&amp;#34;] + (item[&amp;#34;price&amp;#34;] * 0.30) // add tax
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Nearly every gradual language agrees that &lt;em>some form&lt;/em> of type narrowing is needed,
but there is widespread disagreement about how much support is enough.
TypeScript lets users define custom type tests, but it does not analyze
those tests to see whether they are reliable.
Flow does analyze tests.
TypeScript does not allow asymmetric type tests (example: &lt;code>is_even_number&lt;/code>),
but Flow, Mypy and Pyright all do!
None of the above track information compositionally through program
execution, but another gradual language called Typed Racket does
Is the extra machinery in Typed Racket really worth the effort?&lt;/p>
&lt;p>Over the past several months, we have curated a language design
benchmark for type narrowing, &lt;strong>If-T&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/utahplt/ift-benchmark" target="_blank" rel="noopener">https://github.com/utahplt/ift-benchmark&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The benchmark presents type system challenges in a language-agnostic way
to facilitate reproducibility across languages.
It also includes a &lt;a href="https://github.com/utahplt/ifT-benchmark/blob/main/DATASHEET.md" target="_blank" rel="noopener">&lt;em>datasheet&lt;/em>&lt;/a>
to encourage cross-language comparisons
that focus on fundamental typing features rather than incidental difference
between languages.
So far, we have implemented the benchmark for five gradual languages.
There are many others to explore, and much more to learn.&lt;/p>
&lt;p>The goal of this project is to replicate and extend the If-T type narrowing
benchmark.
Outcomes include a deep understanding of principled type narrowing,
and of how to construct a benchmark that enables reproducible
cross-language comparisons.&lt;/p>
&lt;p>Related Work:&lt;/p>
&lt;ul>
&lt;li>Type Narrowing in TypeScript
&lt;a href="https://www.typescriptlang.org/docs/handbook/2/narrowing.html" target="_blank" rel="noopener">https://www.typescriptlang.org/docs/handbook/2/narrowing.html&lt;/a>&lt;/li>
&lt;li>Type Narrowing in Python
&lt;a href="https://typing.readthedocs.io/en/latest/spec/narrowing.html#typeguard" target="_blank" rel="noopener">https://typing.readthedocs.io/en/latest/spec/narrowing.html#typeguard&lt;/a>&lt;/li>
&lt;li>Logical Types for Untyped Languages
&lt;a href="https://doi.org/10.1145/1863543.1863561" target="_blank" rel="noopener">https://doi.org/10.1145/1863543.1863561&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="evaluate-new-gradual-languages">Evaluate New Gradual Languages&lt;/h3>
&lt;ul>
&lt;li>Topics: &lt;code>benchmark implementation&lt;/code>, &lt;code>programming languages&lt;/code>, &lt;code>types&lt;/code>&lt;/li>
&lt;li>Skills: Ruby, Lua, Python, Clojure, or PHP&lt;/li>
&lt;li>Difficulty: Medium&lt;/li>
&lt;li>Size: Small&lt;/li>
&lt;li>Mentor: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ben-greenman/">Ben Greenman&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Bring the If-T Benchmark to new typecheckers.
Examples include
&lt;a href="https://sorbet.org/" target="_blank" rel="noopener">Sorbet&lt;/a>,
&lt;a href="https://hacklang.org/" target="_blank" rel="noopener">Hack&lt;/a>,
&lt;a href="https://luau.org/" target="_blank" rel="noopener">Luau&lt;/a>,
&lt;a href="https://pyre-check.org/" target="_blank" rel="noopener">Pyre&lt;/a>,
&lt;a href="https://github.com/facebookincubator/cinder" target="_blank" rel="noopener">Cinder / Static Python&lt;/a>,
&lt;a href="https://typedclojure.org/" target="_blank" rel="noopener">Typed Clojure&lt;/a>,
and
(potentially) &lt;a href="https://elixir-lang.org/blog/2024/06/12/elixir-v1-17-0-released/" target="_blank" rel="noopener">Elixir&lt;/a>.
Conduct a scientific, cross-language analysis to discuss the implications
of benchmark results.&lt;/p>
&lt;h3 id="do-unsound-narrowings-lead-to-exploits">Do Unsound Narrowings Lead to Exploits?&lt;/h3>
&lt;ul>
&lt;li>Topics: &lt;code>corpus study&lt;/code>, &lt;code>types&lt;/code>, &lt;code>counterexamples&lt;/code>&lt;/li>
&lt;li>Skills: TypeScript or Python&lt;/li>
&lt;li>Difficulty: Medium&lt;/li>
&lt;li>Size: Small&lt;/li>
&lt;li>Mentor: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ben-greenman/">Ben Greenman&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Investigate type narrowing in practice through a corpus study of software projects.
Use the GitHub or Software Heritage APIs to search code for user-defined predicates
and other instances of narrowing. Search for vulnerabilities due to the unsound
typing of user-defined predicates.&lt;/p></description></item><item><title>Static Python Perf: Measuring the Cost of Sound Gradual Types</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uutah/static-python-perf/</link><pubDate>Sat, 06 Jan 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uutah/static-python-perf/</guid><description>&lt;p>Gradual typing is a solution to the longstanding tension between typed and
untyped languages: let programmers write code in any flexible language (such
as Python), equip the language with a suitable type system that can describe
invariants in part of a program, and use run-time checks to ensure soundness.&lt;/p>
&lt;p>For now, though, the cost of run-time checks can be enormous.
Order-of-magnitude slowdowns are common. This high cost is a main reason why
TypeScript is unsound by design &amp;mdash; its types are not trustworthy in order
to avoid run-time costs.&lt;/p>
&lt;p>Recently, a team at Meta built a gradually-typed variant of Python called
(&lt;em>drumroll&lt;/em>) Static Python. They report an incredible 4% increase in CPU
efficiency at Instagram thanks to the sound types in Static Python. This
kind of speedup is unprecedented.&lt;/p>
&lt;p>Other languages may want to follow the Static Python approach to gradual types,
but there are big reasons to doubt the Instagram numbers:&lt;/p>
&lt;ul>
&lt;li>the experiment code is closed source, and&lt;/li>
&lt;li>the experiment itself is not easily reproducible (even for Instagram!).&lt;/li>
&lt;/ul>
&lt;p>Static Python needs a rigorous, reproducible performance evaluation to test
whether it is indeed a fundamental advance for gradual typing.&lt;/p>
&lt;p>Related Work:&lt;/p>
&lt;ul>
&lt;li>Gradual Soundness: Lessons from Static Python
&lt;a href="https://programming-journal.org/2023/7/2/" target="_blank" rel="noopener">https://programming-journal.org/2023/7/2/&lt;/a>&lt;/li>
&lt;li>Producing Wrong Data Without Doing Anything Obviously Wrong!
&lt;a href="https://users.cs.northwestern.edu/~robby/courses/322-2013-spring/mytkowicz-wrong-data.pdf" target="_blank" rel="noopener">https://users.cs.northwestern.edu/~robby/courses/322-2013-spring/mytkowicz-wrong-data.pdf&lt;/a>&lt;/li>
&lt;li>On the Cost of Type-Tag Soundness
&lt;a href="https://users.cs.utah.edu/~blg/resources/pdf/gm-pepm-2018.pdf" target="_blank" rel="noopener">https://users.cs.utah.edu/~blg/resources/pdf/gm-pepm-2018.pdf&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="design-and-run-an-experiment">Design and Run an Experiment&lt;/h3>
&lt;ul>
&lt;li>Topics: &lt;code>performance&lt;/code>, &lt;code>cluster computing&lt;/code>, &lt;code>statistics&lt;/code>&lt;/li>
&lt;li>Skills: Python AST parsing, program generation, scripting, measuring performance&lt;/li>
&lt;li>Difficulty: Medium&lt;/li>
&lt;li>Size: Medium (175 hours)&lt;/li>
&lt;li>Mentor: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ben-greenman/">Ben Greenman&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Design an experiment that covers the space of gradually-typed Static Python programs
in a fair way. Since every variable in a program can have up to 3 different types,
there are easily 3^20 possibilities in small programs &amp;mdash; far too many to measure
exhaustively.&lt;/p>
&lt;p>Run the experiment on an existing set of benchmarks using a cluster such as CloudLab.
Manage the cluster machines across potentially dozens of reservations and combine
the results into one comprehensive view of Static Python performance.&lt;/p>
&lt;h3 id="derive-benchmarks-from-python-applications">Derive Benchmarks from Python Applications&lt;/h3>
&lt;ul>
&lt;li>Topics: &lt;code>types&lt;/code>, &lt;code>optimization&lt;/code>, &lt;code>benchmark design&lt;/code>&lt;/li>
&lt;li>Skills: Python&lt;/li>
&lt;li>Difficulty: Medium&lt;/li>
&lt;li>Size: Small to Large&lt;/li>
&lt;li>Mentor: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ben-greenman/">Ben Greenman&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Build or find realistic Python applications, equip them with rich types,
and modify them to run a meaningful performance benchmark. Running a benchmark
should produce timing information, and the timing should not be significantly
influenced by random variables, I/O actions, or system events.&lt;/p></description></item></channel></rss>