<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:media="http://search.yahoo.com/mrss/"
>

<channel>
	<title>Instruments &#8211; Wade Tregaskis</title>
	<atom:link href="https://wadetregaskis.com/tags/instruments/feed/" rel="self" type="application/rss+xml" />
	<link>https://wadetregaskis.com</link>
	<description></description>
	<lastBuildDate>Thu, 25 Jan 2024 18:25:43 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://wadetregaskis.com/wp-content/uploads/2016/03/Stitch-512x512-1-256x256.png</url>
	<title>Instruments &#8211; Wade Tregaskis</title>
	<link>https://wadetregaskis.com</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">226351702</site>	<item>
		<title>-fomit-frame-pointer</title>
		<link>https://wadetregaskis.com/fomit-frame-pointer/</link>
					<comments>https://wadetregaskis.com/fomit-frame-pointer/#respond</comments>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Wed, 24 Jan 2024 22:30:30 +0000</pubDate>
				<category><![CDATA[Ancient History]]></category>
		<category><![CDATA[Coding]]></category>
		<category><![CDATA[Ramblings]]></category>
		<category><![CDATA[-fomit-frame-pointer]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[ARM]]></category>
		<category><![CDATA[backtracing]]></category>
		<category><![CDATA[frame pointers]]></category>
		<category><![CDATA[i386]]></category>
		<category><![CDATA[Instruments]]></category>
		<category><![CDATA[Intel Core Duo]]></category>
		<category><![CDATA[Merom]]></category>
		<category><![CDATA[Shark]]></category>
		<category><![CDATA[Swift Forums]]></category>
		<category><![CDATA[x86-64]]></category>
		<category><![CDATA[Yonah]]></category>
		<guid isPermaLink="false">https://wadetregaskis.com/?p=7536</guid>

					<description><![CDATA[This is an elaboration of a post I made in a Swift Forums thread, SE-0419: Swift Backtracing API. The question was raised whether an official Swift backtracer should try to support code that doesn&#8217;t use frame pointers. Which immediately raised the question &#8211; in my mind &#8211; of if anyone is still using the &#8220;optimisation&#8221;&#8230; <a class="read-more-link" href="https://wadetregaskis.com/fomit-frame-pointer/" data-wpel-link="internal">Read more</a>]]></description>
										<content:encoded><![CDATA[
<p>This is an elaboration of <a href="https://forums.swift.org/t/se-0419-swift-backtracing-api/69595/13" data-wpel-link="external" target="_blank" rel="external noopener">a post I made</a> in a Swift Forums thread, <a href="https://forums.swift.org/t/se-0419-swift-backtracing-api/69595" data-wpel-link="external" target="_blank" rel="external noopener">SE-0419: Swift Backtracing API</a>.</p>



<p>The question was raised whether an official Swift backtracer should try to support code that doesn&#8217;t use frame pointers.  Which immediately raised the question &#8211; in my mind &#8211; of if anyone is still using the &#8220;optimisation&#8221; of omitting frame pointers, anyway.  And perhaps more importantly, whether they <em>should</em> still be omitting frame pointers.</p>



<div class="wp-block-group"><div class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained">
<h4 class="wp-block-heading">What is a frame pointer?</h4>



<p>A pointer to a stack frame, <em>held in a well-known location</em>.  That location can be in the stack itself (forming a linked-list of the stack frames) or in registers (e.g. the x29 register on AArch64, or RBP register on x86-64).</p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img fetchpriority="high" decoding="async" width="806" height="541" src="https://wadetregaskis.com/wp-content/uploads/2024/01/Frame-pointers-explanatory-diagram.webp" alt="Explanatory diagram of frame pointers, showing a link from the x86-64 register %rbp to the start of the current frame, which holds the prior value of %rbp that points to the top of the previous frame, and so on." class="wp-image-7549" srcset="https://wadetregaskis.com/wp-content/uploads/2024/01/Frame-pointers-explanatory-diagram.webp 806w, https://wadetregaskis.com/wp-content/uploads/2024/01/Frame-pointers-explanatory-diagram-256x172.webp 256w, https://wadetregaskis.com/wp-content/uploads/2024/01/Frame-pointers-explanatory-diagram-512x344.webp 512w, https://wadetregaskis.com/wp-content/uploads/2024/01/Frame-pointers-explanatory-diagram@2x.webp 1612w" sizes="(max-width: 806px) 100vw, 806px" /><figcaption class="wp-element-caption">Diagram <a href="https://fedoraproject.org/wiki/Changes/fno-omit-frame-pointer" data-wpel-link="external" target="_blank" rel="external noopener">courtesy of the Fedora Project</a> (specific author unknown).</figcaption></figure>
</div>


<p>The controversial part &#8211; insofar as there is any controversy &#8211; is in dedicating a CPU register to hold a frame pointer (to point to the start of the current stack frame).  It&#8217;s super convenient for a lot of things, but particularly for debuggers and profilers as it gives them a reliable and very fast way to find the top of the current callstack.  But it&#8217;s not <em>technically</em> required for the program to function.</p>



<p>No live CPU architectures, that I&#8217;m aware of, have a dedicated hardware register for frame pointers.  So you nominally have to &#8220;give up&#8221; a GPR (general-purpose register) in order to have a frame pointer.</p>
</div></div>



<p><a href="https://github.com/FranzBusch" data-wpel-link="external" target="_blank" rel="external noopener">Franz Busch</a> <a href="https://forums.swift.org/t/se-0419-swift-backtracing-api/69595/12" data-wpel-link="external" target="_blank" rel="external noopener">pointed out</a> that some notable software <em>still</em> ships with frame pointers omitted, e.g. apparently some major Linux distros.  I suspect it&#8217;s merely some inertia (or simply oversight) that&#8217;s delaying getting people off of that old crutch.  I&#8217;m not remotely surprised that some big Linux distros are in this bucket &#8211; they tend to be absurdly conservative and slow to change<sup data-fn="6e48e51a-6c80-46a9-b2d0-4729eb123f42" class="fn"><a href="#6e48e51a-6c80-46a9-b2d0-4729eb123f42" id="6e48e51a-6c80-46a9-b2d0-4729eb123f42-link">1</a></sup>.  And it&#8217;s mind-boggling how much vitriol restoring frame pointers generates from <a href="https://news.ycombinator.com/item?id=34632677" data-wpel-link="external" target="_blank" rel="external noopener">the peanut gallery</a>.</p>



<p>From watches to servers these days &#8211; and frankly most of the embedded space, since it&#8217;s mostly <a href="https://en.wikipedia.org/wiki/ARM_architecture_family#32-bit_architecture" data-wpel-link="external" target="_blank" rel="external noopener">ARM</a> &#8211; everything generally has an ISA with sufficiently many GPRs to negate any big benefit from omitting frame pointers.  Giving up one of 31 GPRs (for e.g. <a href="https://en.wikipedia.org/wiki/AArch64" data-wpel-link="external" target="_blank" rel="external noopener">AArch64</a>, the dominant CPU architecture family today) is pretty insignificant for the vast majority of code, because almost nothing actually uses all 31 GPRs anyway.  It only makes a significant difference<sup data-fn="188c96e5-8b0d-4e30-84f9-989822dfd065" class="fn"><a href="#188c96e5-8b0d-4e30-84f9-989822dfd065" id="188c96e5-8b0d-4e30-84f9-989822dfd065-link">2</a></sup> when the CPU design is register-starved to begin with, like <a href="https://en.wikipedia.org/wiki/IA-32" data-wpel-link="external" target="_blank" rel="external noopener">i386</a>.  And those architectures are largely dead, in museums, or restricted to <em>very</em> tiny CPUs as used in some microcontrollers (&#8220;embedded&#8221; systems).</p>



<p>Even back when i386 et al were still a concern, the proponents of <code>-fomit-frame-pointer</code> often argued not on the potential merits of the trade-off, but rather that it was a &#8220;free&#8221; performance boost, so even if it was only by a percentage point or two, why not?  They of course were either naively or deliberately overlooking the detrimental effects.</p>



<p>There may still be software for which omitting frame pointers is the right trade-off, even on modern CPUs.  But I find it hard to believe there&#8217;s <em>enough</em> cases like that to warrant accomodation in standard tools.</p>



<h3 class="wp-block-heading">A brief trip back to Apple circa 2007</h3>



<p>Back in the brief window of time when i386 was a thing for the Mac (32-bit Intel, e.g. <a href="https://en.wikipedia.org/wiki/Intel_Core#Core" data-wpel-link="external" target="_blank" rel="external noopener">Core Duos</a><sup data-fn="b67d5fbb-7aa1-4c29-bc93-1341ac28771a" class="fn"><a href="#b67d5fbb-7aa1-4c29-bc93-1341ac28771a" id="b67d5fbb-7aa1-4c29-bc93-1341ac28771a-link">3</a></sup> as used in <a href="https://everymac.com/systems/apple/macbook/specs/macbook_1.83.html" data-wpel-link="external" target="_blank" rel="external noopener">the first MacBooks</a>), I was at Apple in the Performance Tools teams (<a href="https://web.archive.org/web/20100124025810/https://developer.apple.com/tools/sharkoptimize.html" data-wpel-link="external" target="_blank" rel="external noopener">Shark</a> &amp; <a href="https://help.apple.com/instruments/mac/current/#/dev7b09c84f5" data-wpel-link="external" target="_blank" rel="external noopener">Instruments</a>), and it was a frustration of ours that&nbsp;<code>-fomit-frame-pointer</code>&nbsp;<em>was</em>&nbsp;a noticeable performance-booster on the register-starved i386<sup data-fn="29afcd2a-0449-44c0-a274-0b06c9ddce8a" class="fn"><a href="#29afcd2a-0449-44c0-a274-0b06c9ddce8a" id="29afcd2a-0449-44c0-a274-0b06c9ddce8a-link">4</a></sup> architecture<sup data-fn="e0287bfa-c0ab-44b3-8fef-812983216ca6" class="fn"><a href="#e0287bfa-c0ab-44b3-8fef-812983216ca6" id="e0287bfa-c0ab-44b3-8fef-812983216ca6-link">5</a></sup>, so it was hard to just bluntly tell people not to use it… yet, by breaking the ability to profile their code, people who used it often left even&nbsp;<em>bigger</em>&nbsp;performance gains on the table (or otherwise had to invest much more labour into identifying &amp; resolving performance problems).</p>



<p>At one point there was even an Apple-internal debate about whether to abandon kernel-based profiling in favour of user-space profiling<sup data-fn="d563aa47-98d0-4761-9c0f-63194d9f7d20" class="fn"><a href="#d563aa47-98d0-4761-9c0f-63194d9f7d20" id="d563aa47-98d0-4761-9c0f-63194d9f7d20-link">6</a></sup> because <a href="https://developers.redhat.com/articles/2023/07/31/frame-pointers-untangling-unwinding#where_do_frame_pointers_fit_into_this_" data-wpel-link="external" target="_blank" rel="external noopener">implementing backtracing without frame pointers is&nbsp;<em>possible</em></a>&nbsp;but <a href="https://rwmj.wordpress.com/2023/02/14/frame-pointers-vs-dwarf-my-verdict/" data-wpel-link="external" target="_blank" rel="external noopener">very expensive</a> and requires masses of debug metadata (e.g. <a href="https://en.wikipedia.org/wiki/DWARF" data-wpel-link="external" target="_blank" rel="external noopener">DWARF</a>), making it highly unpalatable to put in the kernel. Thankfully there were too many obvious problems with user-space profiling, so that notion never really got its legs, and then x86-64 finally arrived<sup data-fn="3ad966e2-a3a1-41ee-a13a-84e58f5e8981" class="fn"><a href="#3ad966e2-a3a1-41ee-a13a-84e58f5e8981" id="3ad966e2-a3a1-41ee-a13a-84e58f5e8981-link">7</a></sup> and it was mooted.</p>


<ol class="wp-block-footnotes"><li id="6e48e51a-6c80-46a9-b2d0-4729eb123f42">e.g. <a href="https://wadetregaskis.com/how-to-install-imagemagick-7-for-wordpress-under-plesk-obsidian-on-ubuntu-22-04/" data-wpel-link="internal">Ubuntu <em>still</em> not officially supporting ImageMagick 7</a> even though it&#8217;s been out for nearly a decade. <a href="#6e48e51a-6c80-46a9-b2d0-4729eb123f42-link" aria-label="Jump to footnote reference 1">↩︎</a></li><li id="188c96e5-8b0d-4e30-84f9-989822dfd065">Aside from the question of register space, there <em>is</em> additional cost to implementing frame pointers, as additional instructions are required around function entry &amp; exit in order to maintain the frame pointers &#8211; to push &amp; pop them off the stack, etc.  The cost of those is usually insignificant &#8211; especially in <a href="https://en.wikipedia.org/wiki/Superscalar_processor" data-wpel-link="external" target="_blank" rel="external noopener">superscalar</a> microarchitectures, as is the norm &#8211; so that aspect is not typically the focus of the controversy. <a href="#188c96e5-8b0d-4e30-84f9-989822dfd065-link" aria-label="Jump to footnote reference 2">↩︎</a></li><li id="b67d5fbb-7aa1-4c29-bc93-1341ac28771a">Tangentially, I vaguely recall us Apple engineers kinda hating the Core Duo (Yonah), or more specifically Apple&#8217;s choice to use it.  Apple used them only for a tiny window of time, from May 2006 to about November 2006 when the Core 2 Duo (Merom) finally replaced them across the line.  I don&#8217;t recall <em>all</em> the reasons that the Core 2 Duo was superior, but they included that Core 2 Duo corrected the 32-bit regression (for Macs) and performed <em>much</em> better.  Anytime Apple releases a Mac with a dud processor in it, like those Core Duos, a lot of Apple engineers die a little inside because they know they&#8217;re going to be stuck supporting the damn things for many years even after the last cursed one rolls off the assembly line.<br><br>It&#8217;s still a mystery to me why Apple rushed the Intel transition in this regard.  They only had to wait six more months and they could have had a clean start on Intel, with no 32-bit to burden on them for the next seven years. <a href="#b67d5fbb-7aa1-4c29-bc93-1341ac28771a-link" aria-label="Jump to footnote reference 3">↩︎</a></li><li id="29afcd2a-0449-44c0-a274-0b06c9ddce8a">Why do I keep calling it &#8220;i386&#8221;?  Isn&#8217;t it officially &#8220;IA-32&#8221;?  Well, yes, but that&#8217;s (a) only retroactively and (b) only ever used by Intel.  Though I guess &#8220;x86&#8221; is probably the more common name?  Yet &#8220;i386&#8221; is in my mental muscle memory.  Maybe that&#8217;s just how we used to refer to it, at Apple?  Maybe just because that&#8217;s the name used in gcc / clang arch &amp; target flags?<br><br>Incidentally, <code>clang -arch i386 -print-supported-cpus</code> on my M2 MacBook Air still lists Yonah (those damn Core Duos) as supported.  Gah!  They won&#8217;t die! 😆 <a href="#29afcd2a-0449-44c0-a274-0b06c9ddce8a-link" aria-label="Jump to footnote reference 4">↩︎</a></li><li id="e0287bfa-c0ab-44b3-8fef-812983216ca6">It&#8217;s funny how the Intel transition is now heralded as being amazing and how much better Intel Macs were than PPC Macs, but for a while there we lost a <em>lot</em> of things, like a 64-bit architecture, an excellent SIMD implementation, and the notion of more than [effectively] six GPRs. <img decoding="async" width="20" height="20" src="https://emoji.discourse-cdn.com/apple/stuck_out_tongue_closed_eyes.png?v=12" alt=":stuck_out_tongue_closed_eyes:"> <a href="#e0287bfa-c0ab-44b3-8fef-812983216ca6-link" aria-label="Jump to footnote reference 5">↩︎</a></li><li id="d563aa47-98d0-4761-9c0f-63194d9f7d20">There were at the time already some Apple developer tools that did user-space profiling, most notably Sampler (now a niche feature in Activity Monitor) and early versions of Instruments (in fact Instruments <em>still</em> has the Sampler plug-in which does this, although I can&#8217;t really fathom why anyone would ever intentionally use it over the Time Profiler plug-in). <a href="#d563aa47-98d0-4761-9c0f-63194d9f7d20-link" aria-label="Jump to footnote reference 6">↩︎</a></li><li id="3ad966e2-a3a1-41ee-a13a-84e58f5e8981">In the sense of <em>all</em> Macs adopting it, not just the Mac Pro.  It was easy to ignore i386 at that point because it was then all but officially a dead architecture as far as Apple were concerned. <a href="#3ad966e2-a3a1-41ee-a13a-84e58f5e8981-link" aria-label="Jump to footnote reference 7">↩︎</a></li></ol>]]></content:encoded>
					
					<wfw:commentRss>https://wadetregaskis.com/fomit-frame-pointer/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			<media:content url="https://wadetregaskis.com/wp-content/uploads/2024/01/Frame-pointers-explanatory-diagram.webp" medium="image" />
<post-id xmlns="com-wordpress:feed-additions:1">7536</post-id>	</item>
		<item>
		<title>I was into io_uring before it existed</title>
		<link>https://wadetregaskis.com/i-was-into-io_uring-before-it-existed/</link>
					<comments>https://wadetregaskis.com/i-was-into-io_uring-before-it-existed/#respond</comments>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Wed, 17 Jan 2024 08:07:54 +0000</pubDate>
				<category><![CDATA[Ancient History]]></category>
		<category><![CDATA[Coding]]></category>
		<category><![CDATA[Ramblings]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[Instruments]]></category>
		<category><![CDATA[io_uring]]></category>
		<category><![CDATA[rample]]></category>
		<category><![CDATA[Shark]]></category>
		<guid isPermaLink="false">https://wadetregaskis.com/?p=7419</guid>

					<description><![CDATA[I just read up a bit on io_uring, prompted by a Swift Forums thread relating to it, and it made me laugh. To a lot of people it&#8217;s an amazing new[ish] high-performance I/O system for Linux. Which it is (albeit with some serious security concerns, apparently). A lot of people are very excited by it.&#8230; <a class="read-more-link" href="https://wadetregaskis.com/i-was-into-io_uring-before-it-existed/" data-wpel-link="internal">Read more</a>]]></description>
										<content:encoded><![CDATA[
<p>I just read up a bit on <a href="https://man.archlinux.org/man/io_uring.7" data-wpel-link="external" target="_blank" rel="external noopener">io_uring</a>, prompted by <a href="https://forums.swift.org/t/blocking-i-o-and-concurrency/67276" data-wpel-link="external" target="_blank" rel="external noopener">a Swift Forums thread relating to it</a>, and it made me laugh.  To a lot of people it&#8217;s an amazing new[ish] high-performance I/O system for Linux.  Which it is (albeit <a href="https://security.googleblog.com/2023/06/learnings-from-kctf-vrps-42-linux.html" data-wpel-link="external" target="_blank" rel="external noopener">with some serious security concerns</a>, apparently).  A lot of people are very excited by it.  Which they should be.  But it&#8217;s not new or novel, despite what many folks seem to think.</p>



<p>It immediately reminded me of the Shark 5 re-implementation of the profiling interface between the kernel and Shark.  Shark 5 of course didn&#8217;t actually survive to birth &#8211; it was snuffed out by politics and, admittedly, a bit of our own hubris in the Shark team &#8211; but that underlying infrastructure did, as the guts of <a href="https://help.apple.com/instruments/mac/current/#" data-wpel-link="external" target="_blank" rel="external noopener">Instruments</a>&#8216; Time Profiler and System Trace features.</p>



<div class="wp-block-group"><div class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained">
<p>I&#8217;m not even <em>pretending</em> to claim that what we came up with for Shark 5 was novel, either.  As far as we know it was novel to our domain, of profiling tools, but I&#8217;d be amazed if there aren&#8217;t earlier implementations of the same sort of thing many decades prior.</p>
</div></div>



<p>In Shark 4 and earlier, userspace would allocate a single big buffer of memory for the profiling data, hand it [back] to the kernel<sup data-fn="e9992b27-788f-4f5c-9c79-5c6c2396f729" class="fn"><a href="#e9992b27-788f-4f5c-9c79-5c6c2396f729" id="e9992b27-788f-4f5c-9c79-5c6c2396f729-link">1</a></sup>, the kernel &#8211; specifically the Shark kernel extension &#8211; would write into that buffer, then when the buffer was full profiling would end<sup data-fn="eb054551-a35d-4599-9763-bc0d7415909a" class="fn"><a href="#eb054551-a35d-4599-9763-bc0d7415909a" id="eb054551-a35d-4599-9763-bc0d7415909a-link">2</a></sup>.  The userspace driver &#8211; Shark &#8211; <em>could</em> technically start profiling again immediately, but of course you&#8217;d often have a gap in your profiling &#8211; potentially a big one, for the more expensive profiling modes (e.g. System Trace) or on heavily-loaded machines.  Shark didn&#8217;t run with elevated priority, as far as I recall, so it could get crowded off the CPU(s) by other programs.  All communication between user &amp; kernel spaces was via Mach messages, which are ultimately syscalls, and was time-sensitive since kernel- &amp; user-space had to be in lock-step all the time.</p>



<p>For Shark 5, we wanted to up the ante and remove limitations.  We wanted to be able to record indefinitely, limited only by disk space.  More importantly, we also wanted to be able show profiling results <em>live</em>.</p>



<p>So, we &#8211; and I use the term loosely, as I think it was mostly <a href="https://www.linkedin.com/in/mxshift/" data-wpel-link="external" target="_blank" rel="external noopener">Rick Altherr</a> with possibly the help of <a href="https://www.linkedin.com/in/ryandubois/" data-wpel-link="external" target="_blank" rel="external noopener">Ryan du Bois</a> &#8211; came up with a mechanism that&#8217;s very similar to io_uring, but years earlier, in ~2008.  Userspace would allocate multiple smaller buffers &#8211; also easier to acquire given the requirement to be physically contiguous &#8211; and hand those to the kernel somewhat as needed.  Userspace merely needed to stay ahead of the kernel&#8217;s use, which didn&#8217;t necessarily mean pre-allocating all the buffers.  When each of those individual buffers filled up with profiling data from the kernel, they&#8217;d be made available back to the userspace side<sup data-fn="9a13c4b7-e511-4844-815e-13771d10af96" class="fn"><a href="#9a13c4b7-e511-4844-815e-13771d10af96" id="9a13c4b7-e511-4844-815e-13771d10af96-link">3</a></sup>.  Shark (in userspace) would optionally &#8211; if in &#8216;live&#8217; mode or if running short on empty buffers &#8211; process those buffers while profiling was still actively occurring, freeing them up to go back into the ring for the kernel&#8217;s use.  It did mean you slightly increased the probability of perturbing the program(s) under profiling, but that was mostly just a matter of ensuring the buffers were large enough to sufficiently amortise the cost of their processing &amp; handling.  I don&#8217;t recall the extent of the processing, but I think it was not necessarily much more than copying the data out into non-wired memory (maybe even a memory-mapped file?).  That was cheaper than allocating <em>and wiring</em> new buffers.</p>



<p>It worked really well, and is possibly still the implementation used to date &#8211; while it appears a lot of the frameworks (e.g. CoreProfile) have disappeared from macOS, presumably rewritten somewhere else, I&#8217;d be surprised if the kernel-user interface itself has changed much.  It was pretty much perfect.</p>



<p>The creation of this new, superior profiling system also spurred me to create <code>rample</code>, a CLI tool kind of like <code>top</code> which showed you what your CPUs were doing in real time<sup data-fn="4abb5c72-5961-4445-bddf-3ef39ffb6a43" class="fn"><a href="#4abb5c72-5961-4445-bddf-3ef39ffb6a43" id="4abb5c72-5961-4445-bddf-3ef39ffb6a43-link">4</a></sup> as a heavy tree, much like you&#8217;d get in a Time Profile in Shark (and later Instruments).  It was super quick to write &#8211; just a day or two, I believe &#8211; and yet I unexpectedly found that it was more useful than Shark itself.  Being able to see &#8211; practically instantly by virtue of how lightweight it is to launch a simple CLI program &#8211; what&#8217;s chewing on the CPU, <em>including inside the kernel</em>, was incredibly useful.  Instruments is the closest you can get today on a Mac, but it&#8217;s super slow and clunky in comparison.  It&#8217;s one of my top regrets, of my time at Apple, that I didn&#8217;t get <code>rample</code> into Mac OS X, or at least into the Dev Tools.</p>



<p></p>


<ol class="wp-block-footnotes"><li id="e9992b27-788f-4f5c-9c79-5c6c2396f729">I don&#8217;t recall precisely why, but it was apparently better (or only possible?) to allocate the buffer in userspace, even though that was ultimately done with syscalls serviced by the kernel.  I think it had something to do with being easier to handle failures, like being out of memory, in particular <em>contiguous</em> memory &#8211; the memory was wired during use to ensure the kernel wouldn&#8217;t have to deal with the practical nor performance problems of page faults while recording profiling data. <a href="#e9992b27-788f-4f5c-9c79-5c6c2396f729-link" aria-label="Jump to footnote reference 1">↩︎</a></li><li id="eb054551-a35d-4599-9763-bc0d7415909a">Which was a problem because generally profiling sessions were specified to run for a specific time duration, by the end-user.  So the buffer had to be sized to avoid filling prematurely, which meant wasting memory. <a href="#eb054551-a35d-4599-9763-bc0d7415909a-link" aria-label="Jump to footnote reference 2">↩︎</a></li><li id="9a13c4b7-e511-4844-815e-13771d10af96">Technically they were always available &#8211; we didn&#8217;t bother changing access permissions &#8211; but the userspace app would coordinate with the kernel via an elegant lockfree state machine.  That state machine was a superb design &#8211; it deserves its own detailed nostalgia trip. <a href="#9a13c4b7-e511-4844-815e-13771d10af96-link" aria-label="Jump to footnote reference 3">↩︎</a></li><li id="4abb5c72-5961-4445-bddf-3ef39ffb6a43">Well, updated every second or so, like <code>top</code>. <a href="#4abb5c72-5961-4445-bddf-3ef39ffb6a43-link" aria-label="Jump to footnote reference 4">↩︎</a></li></ol>]]></content:encoded>
					
					<wfw:commentRss>https://wadetregaskis.com/i-was-into-io_uring-before-it-existed/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">7419</post-id>	</item>
	</channel>
</rss>
