<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:media="http://search.yahoo.com/mrss/"
>

<channel>
	<title>Shark &#8211; Wade Tregaskis</title>
	<atom:link href="https://wadetregaskis.com/tags/shark/feed/" rel="self" type="application/rss+xml" />
	<link>https://wadetregaskis.com</link>
	<description></description>
	<lastBuildDate>Mon, 24 Jun 2024 23:39:21 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://wadetregaskis.com/wp-content/uploads/2016/03/Stitch-512x512-1-256x256.png</url>
	<title>Shark &#8211; Wade Tregaskis</title>
	<link>https://wadetregaskis.com</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">226351702</site>	<item>
		<title>When all you have is a Core Data, everything looks like…</title>
		<link>https://wadetregaskis.com/when-all-you-have-is-a-core-data-everything-looks-like/</link>
					<comments>https://wadetregaskis.com/when-all-you-have-is-a-core-data-everything-looks-like/#comments</comments>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Mon, 24 Jun 2024 23:32:00 +0000</pubDate>
				<category><![CDATA[Ancient History]]></category>
		<category><![CDATA[Coding]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[Core Data]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NSCoding]]></category>
		<category><![CDATA[Shark]]></category>
		<category><![CDATA[SQLite]]></category>
		<category><![CDATA[SwiftData]]></category>
		<category><![CDATA[Time Profile]]></category>
		<category><![CDATA[XML]]></category>
		<guid isPermaLink="false">https://wadetregaskis.com/?p=8235</guid>

					<description><![CDATA[Reading SwiftData vs Realm: Performance Comparison reminded me of an anecdote from my days working on Shark, at Apple. I don&#8217;t really remember the timing &#8211; sometime between 2006 and 2010 &#8211; but presumably around 2006 as I recall it was when Core Data was still relatively new. For whatever reason, there was a huge&#8230; <a class="read-more-link" href="https://wadetregaskis.com/when-all-you-have-is-a-core-data-everything-looks-like/" data-wpel-link="internal">Read more</a>]]></description>
										<content:encoded><![CDATA[
<p>Reading <a href="https://www.emergetools.com/blog/posts/swiftdata-vs-realm-performance-comparison" data-wpel-link="external" target="_blank" rel="external noopener">SwiftData vs Realm: Performance Comparison</a> reminded me of an anecdote from my days working on <a href="https://leopard-adc.pepas.com/documentation/DeveloperTools/Conceptual/SharkUserGuide/Introduction/Introduction.html#//apple_ref/doc/uid/TP40005233-CH1-DontLinkElementID_6" data-wpel-link="external" target="_blank" rel="external noopener">Shark</a>, at Apple.</p>



<p>I don&#8217;t really remember the timing &#8211; sometime between 2006 and 2010 &#8211; but presumably around 2006 as I recall it was when <a href="https://en.wikipedia.org/wiki/Core_Data" data-wpel-link="external" target="_blank" rel="external noopener">Core Data</a> was still relatively new.  For whatever reason, there was a huge push internal to Apple to use Core Data <em>everywhere</em>.  People were running around all over the place asking &#8220;can it be made to use Core Data?&#8221;, for Apple&#8217;s frameworks and applications.</p>



<p>Keep in mind that Core Data at that time was similar to <a href="https://developer.apple.com/documentation/swiftdata" data-wpel-link="external" target="_blank" rel="external noopener">SwiftData</a> now &#8211; very limited functionality, and <em>chock full</em> of bugs.  But of course it&#8217;s the nature of &#8216;shiny&#8217; new things that their proponents think it&#8217;s the second coming and the cure for all ills.</p>



<p>So, I recall sitting down with a couple of folks from the Core Data team, that were there to see if Shark could adopt Core Data.  A little like letting the missionaries in, if only out of morbid curiosity.</p>



<figure class="wp-block-image size-full"><img fetchpriority="high" decoding="async" width="1200" height="675" src="https://wadetregaskis.com/wp-content/uploads/2024/06/orgazmo-mormon-missionaries-have-you-heard-the-good-news-about-core-data.avif" alt="Still from the scene in Orgazmo with the Mormon Missionaries greeting a homeowner at their door and asking &quot;Have you heard the good news about Core Data?&quot;." class="wp-image-8240" srcset="https://wadetregaskis.com/wp-content/uploads/2024/06/orgazmo-mormon-missionaries-have-you-heard-the-good-news-about-core-data.avif 1200w, https://wadetregaskis.com/wp-content/uploads/2024/06/orgazmo-mormon-missionaries-have-you-heard-the-good-news-about-core-data-256x144.avif 256w, https://wadetregaskis.com/wp-content/uploads/2024/06/orgazmo-mormon-missionaries-have-you-heard-the-good-news-about-core-data-1024x576.avif 1024w, https://wadetregaskis.com/wp-content/uploads/2024/06/orgazmo-mormon-missionaries-have-you-heard-the-good-news-about-core-data-768x432.avif 768w, https://wadetregaskis.com/wp-content/uploads/2024/06/orgazmo-mormon-missionaries-have-you-heard-the-good-news-about-core-data@2x.avif 2400w, https://wadetregaskis.com/wp-content/uploads/2024/06/orgazmo-mormon-missionaries-have-you-heard-the-good-news-about-core-data-256x144@2x.avif 512w" sizes="(max-width: 1200px) 100vw, 1200px" /></figure>



<p>Have you heard the good news?  Core Data is here to save your very data.  It&#8217;s effortless and divine and its unintuitive, thread-unsafe API will definitely not be the bane of all its users for the next fifteen years.</p>



<p>Jokes aside, they were in fact earnestly curious if Shark could use Core Data, instead of its own purpose-built binary formats, for storing &amp; querying its profiling data.  It was perhaps the classic case of naively underestimating the complexity of a foreign domain.  By my recollection, they assumed our profiling data was just a small handful of homogenous, relatively trivial records.  &#8220;At second N, the program ran the function named XYZ&#8221; or somesuch.</p>



<p>I think we (Shark engineers) tried to be open-minded and kind.  We were sceptical, but you never know until you actually look.  We could see some potential for a more general query capability, for example.  But of course the first and most obvious hurdle was: how well does Core Data handle sizeable numbers of records?  Oh yes, was the response, it&#8217;s great even with tens of thousands of records.</p>



<figure class="wp-block-image size-full"><img decoding="async" width="1128" height="480" src="https://wadetregaskis.com/wp-content/uploads/2024/06/star-trek-iv-the-voyage-home-is-that-a-lot.avif" alt="Still image of the pawn shop scene from Star Trek IV (The Voyage Home) showing Kirk &amp; Spock responding to the offer of $100 for the antique spectacles with &quot;Is that a lot?&quot;." class="wp-image-8236" srcset="https://wadetregaskis.com/wp-content/uploads/2024/06/star-trek-iv-the-voyage-home-is-that-a-lot.avif 1128w, https://wadetregaskis.com/wp-content/uploads/2024/06/star-trek-iv-the-voyage-home-is-that-a-lot-256x109.avif 256w, https://wadetregaskis.com/wp-content/uploads/2024/06/star-trek-iv-the-voyage-home-is-that-a-lot-1024x436.avif 1024w, https://wadetregaskis.com/wp-content/uploads/2024/06/star-trek-iv-the-voyage-home-is-that-a-lot-768x327.avif 768w, https://wadetregaskis.com/wp-content/uploads/2024/06/star-trek-iv-the-voyage-home-is-that-a-lot@2x.avif 2256w, https://wadetregaskis.com/wp-content/uploads/2024/06/star-trek-iv-the-voyage-home-is-that-a-lot-256x109@2x.avif 512w" sizes="(max-width: 1128px) 100vw, 1128px" /></figure>



<p>We asked how it did with tens of <em>millions</em> of records, and that was pretty much the end of the conversation.</p>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Background on the Time Profile data structure</summary>
<div class="wp-block-group"><div class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained">
<p>For context, the data in a Shark Time Profile (for example) was basically an array of samples, where each sample records the process &amp; thread IDs, and the callstack (expressed as an array of pointer-sized values; the first being the current PC and the rest the return addresses found by walking back up the thread&#8217;s stack).</p>



<p>Callstacks back then were relatively small, by modern standards &#8211; this was predominately C/C++/Objective-C code which tended to be far simpler in its structure than e.g. Swift; <em>way</em> fewer closures (blocks), no async suspension points to split logical functions up into numerous implementation functions, etc.  So the average was probably something in the low tens.  A hundred frames was considered a <em>big</em> callstack (which is sadly funny in hindsight, given that&#8217;s trivial by e.g. SwiftUI&#8217;s standards 😒).</p>



<p>A useful profile had at least thousands of such samples, and typical profiles were in the tens to hundreds of thousands (the latter usually for All Threads States profiles, particularly those of the whole system).  Some profiles could run into the millions or tens of millions (it&#8217;s not always easy or predictable as to when a performance problem will exhibit itself, so recording sometimes had to start early and run long).</p>



<p>I&#8217;m pretty sure Shark used <code><a href="https://developer.apple.com/documentation/foundation/nscoding" data-wpel-link="external" target="_blank" rel="external noopener">NSCoding</a></code> for the overall serdes, but a lot of that serdes was of huge chunks of (as far as <code>NSCoding</code> was concerned) arbitrary bytes. The file format was overall fairly efficient (though I don&#8217;t recall it ever using explicit data compression, nor even delta encoding for callstacks).</p>
</div></div>
</details>



<p>It wasn&#8217;t just the volume of data, it was also the dramatic difference in representation efficiency. The in-memory representation in Shark was basically as efficient as it could be &#8211; basically just arrays of compact structs, sometimes with pointers to other arrays (which might share a <code>malloc</code> block to avoid the overhead of small allocations) which were usually just of <code>uint32_t</code> or <code>uint64_t</code>. The most important operations &#8211; indexing to an arbitrary point in the profile&#8217;s timeline, then scanning forward over the data &#8211; were about as fast as they can possibly be.</p>



<p>In contrast, Core Data would have required an entire <em>object</em> (<code><a href="https://developer.apple.com/documentation/coredata/nsmanagedobject" data-wpel-link="external" target="_blank" rel="external noopener">NSManagedObject</a></code> subclass) for at least every sample, if not every <code>uintXX_t</code> in the callstack (depending on how &#8216;pure&#8217; you wanted the design to be). It would have increased memory usage by at least an order of magnitude &#8211; and Shark already struggled with big profiles on the hardware of the day, which typically had just a couple of GiB of RAM. Even the most trivial operations &#8211; like reading the data in from disk and iterating it sequentially would have been <em>thousands</em> of times slower.</p>



<p>In defence of the Core Data folks in the meeting &#8211; and I don&#8217;t remember who specifically it was &#8211; they never tried to misrepresent or exaggerate what Core Data could do.  I seem to recall them being quite nice people.  But as soon as we started explaining the type and volume of data that we worked with, they clearly gave up on any kind of pitch.  Core Data was designed for <em>developer convenience</em>, not runtime efficiency or performance.</p>



<p>It&#8217;s never ceased to surprise and disappoint me how many folks try to arbitrarily apply generalised data storage systems &#8211; <em>particularly</em> SQLite and MySQL, or wrappers thereover.  Usually for the same reasons &#8211; perceived convenience to them, right now, not necessarily efficiency (nor the convenience of their successors).</p>



<p>I guess by modern standards SQLite is considered efficient and fast, but &#8211; hah &#8211; <em>back in my day</em> SQLite was what you used when you didn&#8217;t have time to write your own, <em>efficient and fast</em> persistent data management system.</p>



<p>See also JSON and its older sister XML. 😔</p>
]]></content:encoded>
					
					<wfw:commentRss>https://wadetregaskis.com/when-all-you-have-is-a-core-data-everything-looks-like/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
			<media:content url="https://wadetregaskis.com/wp-content/uploads/2024/06/orgazmo-mormon-missionaries-have-you-heard-the-good-news-about-core-data.avif" medium="image" />
<post-id xmlns="com-wordpress:feed-additions:1">8235</post-id>	</item>
		<item>
		<title>-fomit-frame-pointer</title>
		<link>https://wadetregaskis.com/fomit-frame-pointer/</link>
					<comments>https://wadetregaskis.com/fomit-frame-pointer/#respond</comments>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Wed, 24 Jan 2024 22:30:30 +0000</pubDate>
				<category><![CDATA[Ancient History]]></category>
		<category><![CDATA[Coding]]></category>
		<category><![CDATA[Ramblings]]></category>
		<category><![CDATA[-fomit-frame-pointer]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[ARM]]></category>
		<category><![CDATA[backtracing]]></category>
		<category><![CDATA[frame pointers]]></category>
		<category><![CDATA[i386]]></category>
		<category><![CDATA[Instruments]]></category>
		<category><![CDATA[Intel Core Duo]]></category>
		<category><![CDATA[Merom]]></category>
		<category><![CDATA[Shark]]></category>
		<category><![CDATA[Swift Forums]]></category>
		<category><![CDATA[x86-64]]></category>
		<category><![CDATA[Yonah]]></category>
		<guid isPermaLink="false">https://wadetregaskis.com/?p=7536</guid>

					<description><![CDATA[This is an elaboration of a post I made in a Swift Forums thread, SE-0419: Swift Backtracing API. The question was raised whether an official Swift backtracer should try to support code that doesn&#8217;t use frame pointers. Which immediately raised the question &#8211; in my mind &#8211; of if anyone is still using the &#8220;optimisation&#8221;&#8230; <a class="read-more-link" href="https://wadetregaskis.com/fomit-frame-pointer/" data-wpel-link="internal">Read more</a>]]></description>
										<content:encoded><![CDATA[
<p>This is an elaboration of <a href="https://forums.swift.org/t/se-0419-swift-backtracing-api/69595/13" data-wpel-link="external" target="_blank" rel="external noopener">a post I made</a> in a Swift Forums thread, <a href="https://forums.swift.org/t/se-0419-swift-backtracing-api/69595" data-wpel-link="external" target="_blank" rel="external noopener">SE-0419: Swift Backtracing API</a>.</p>



<p>The question was raised whether an official Swift backtracer should try to support code that doesn&#8217;t use frame pointers.  Which immediately raised the question &#8211; in my mind &#8211; of if anyone is still using the &#8220;optimisation&#8221; of omitting frame pointers, anyway.  And perhaps more importantly, whether they <em>should</em> still be omitting frame pointers.</p>



<div class="wp-block-group"><div class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained">
<h4 class="wp-block-heading">What is a frame pointer?</h4>



<p>A pointer to a stack frame, <em>held in a well-known location</em>.  That location can be in the stack itself (forming a linked-list of the stack frames) or in registers (e.g. the x29 register on AArch64, or RBP register on x86-64).</p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" width="806" height="541" src="https://wadetregaskis.com/wp-content/uploads/2024/01/Frame-pointers-explanatory-diagram.webp" alt="Explanatory diagram of frame pointers, showing a link from the x86-64 register %rbp to the start of the current frame, which holds the prior value of %rbp that points to the top of the previous frame, and so on." class="wp-image-7549" srcset="https://wadetregaskis.com/wp-content/uploads/2024/01/Frame-pointers-explanatory-diagram.webp 806w, https://wadetregaskis.com/wp-content/uploads/2024/01/Frame-pointers-explanatory-diagram-256x172.webp 256w, https://wadetregaskis.com/wp-content/uploads/2024/01/Frame-pointers-explanatory-diagram-512x344.webp 512w, https://wadetregaskis.com/wp-content/uploads/2024/01/Frame-pointers-explanatory-diagram@2x.webp 1612w" sizes="(max-width: 806px) 100vw, 806px" /><figcaption class="wp-element-caption">Diagram <a href="https://fedoraproject.org/wiki/Changes/fno-omit-frame-pointer" data-wpel-link="external" target="_blank" rel="external noopener">courtesy of the Fedora Project</a> (specific author unknown).</figcaption></figure>
</div>


<p>The controversial part &#8211; insofar as there is any controversy &#8211; is in dedicating a CPU register to hold a frame pointer (to point to the start of the current stack frame).  It&#8217;s super convenient for a lot of things, but particularly for debuggers and profilers as it gives them a reliable and very fast way to find the top of the current callstack.  But it&#8217;s not <em>technically</em> required for the program to function.</p>



<p>No live CPU architectures, that I&#8217;m aware of, have a dedicated hardware register for frame pointers.  So you nominally have to &#8220;give up&#8221; a GPR (general-purpose register) in order to have a frame pointer.</p>
</div></div>



<p><a href="https://github.com/FranzBusch" data-wpel-link="external" target="_blank" rel="external noopener">Franz Busch</a> <a href="https://forums.swift.org/t/se-0419-swift-backtracing-api/69595/12" data-wpel-link="external" target="_blank" rel="external noopener">pointed out</a> that some notable software <em>still</em> ships with frame pointers omitted, e.g. apparently some major Linux distros.  I suspect it&#8217;s merely some inertia (or simply oversight) that&#8217;s delaying getting people off of that old crutch.  I&#8217;m not remotely surprised that some big Linux distros are in this bucket &#8211; they tend to be absurdly conservative and slow to change<sup data-fn="6e48e51a-6c80-46a9-b2d0-4729eb123f42" class="fn"><a href="#6e48e51a-6c80-46a9-b2d0-4729eb123f42" id="6e48e51a-6c80-46a9-b2d0-4729eb123f42-link">1</a></sup>.  And it&#8217;s mind-boggling how much vitriol restoring frame pointers generates from <a href="https://news.ycombinator.com/item?id=34632677" data-wpel-link="external" target="_blank" rel="external noopener">the peanut gallery</a>.</p>



<p>From watches to servers these days &#8211; and frankly most of the embedded space, since it&#8217;s mostly <a href="https://en.wikipedia.org/wiki/ARM_architecture_family#32-bit_architecture" data-wpel-link="external" target="_blank" rel="external noopener">ARM</a> &#8211; everything generally has an ISA with sufficiently many GPRs to negate any big benefit from omitting frame pointers.  Giving up one of 31 GPRs (for e.g. <a href="https://en.wikipedia.org/wiki/AArch64" data-wpel-link="external" target="_blank" rel="external noopener">AArch64</a>, the dominant CPU architecture family today) is pretty insignificant for the vast majority of code, because almost nothing actually uses all 31 GPRs anyway.  It only makes a significant difference<sup data-fn="188c96e5-8b0d-4e30-84f9-989822dfd065" class="fn"><a href="#188c96e5-8b0d-4e30-84f9-989822dfd065" id="188c96e5-8b0d-4e30-84f9-989822dfd065-link">2</a></sup> when the CPU design is register-starved to begin with, like <a href="https://en.wikipedia.org/wiki/IA-32" data-wpel-link="external" target="_blank" rel="external noopener">i386</a>.  And those architectures are largely dead, in museums, or restricted to <em>very</em> tiny CPUs as used in some microcontrollers (&#8220;embedded&#8221; systems).</p>



<p>Even back when i386 et al were still a concern, the proponents of <code>-fomit-frame-pointer</code> often argued not on the potential merits of the trade-off, but rather that it was a &#8220;free&#8221; performance boost, so even if it was only by a percentage point or two, why not?  They of course were either naively or deliberately overlooking the detrimental effects.</p>



<p>There may still be software for which omitting frame pointers is the right trade-off, even on modern CPUs.  But I find it hard to believe there&#8217;s <em>enough</em> cases like that to warrant accomodation in standard tools.</p>



<h3 class="wp-block-heading">A brief trip back to Apple circa 2007</h3>



<p>Back in the brief window of time when i386 was a thing for the Mac (32-bit Intel, e.g. <a href="https://en.wikipedia.org/wiki/Intel_Core#Core" data-wpel-link="external" target="_blank" rel="external noopener">Core Duos</a><sup data-fn="b67d5fbb-7aa1-4c29-bc93-1341ac28771a" class="fn"><a href="#b67d5fbb-7aa1-4c29-bc93-1341ac28771a" id="b67d5fbb-7aa1-4c29-bc93-1341ac28771a-link">3</a></sup> as used in <a href="https://everymac.com/systems/apple/macbook/specs/macbook_1.83.html" data-wpel-link="external" target="_blank" rel="external noopener">the first MacBooks</a>), I was at Apple in the Performance Tools teams (<a href="https://web.archive.org/web/20100124025810/https://developer.apple.com/tools/sharkoptimize.html" data-wpel-link="external" target="_blank" rel="external noopener">Shark</a> &amp; <a href="https://help.apple.com/instruments/mac/current/#/dev7b09c84f5" data-wpel-link="external" target="_blank" rel="external noopener">Instruments</a>), and it was a frustration of ours that&nbsp;<code>-fomit-frame-pointer</code>&nbsp;<em>was</em>&nbsp;a noticeable performance-booster on the register-starved i386<sup data-fn="29afcd2a-0449-44c0-a274-0b06c9ddce8a" class="fn"><a href="#29afcd2a-0449-44c0-a274-0b06c9ddce8a" id="29afcd2a-0449-44c0-a274-0b06c9ddce8a-link">4</a></sup> architecture<sup data-fn="e0287bfa-c0ab-44b3-8fef-812983216ca6" class="fn"><a href="#e0287bfa-c0ab-44b3-8fef-812983216ca6" id="e0287bfa-c0ab-44b3-8fef-812983216ca6-link">5</a></sup>, so it was hard to just bluntly tell people not to use it… yet, by breaking the ability to profile their code, people who used it often left even&nbsp;<em>bigger</em>&nbsp;performance gains on the table (or otherwise had to invest much more labour into identifying &amp; resolving performance problems).</p>



<p>At one point there was even an Apple-internal debate about whether to abandon kernel-based profiling in favour of user-space profiling<sup data-fn="d563aa47-98d0-4761-9c0f-63194d9f7d20" class="fn"><a href="#d563aa47-98d0-4761-9c0f-63194d9f7d20" id="d563aa47-98d0-4761-9c0f-63194d9f7d20-link">6</a></sup> because <a href="https://developers.redhat.com/articles/2023/07/31/frame-pointers-untangling-unwinding#where_do_frame_pointers_fit_into_this_" data-wpel-link="external" target="_blank" rel="external noopener">implementing backtracing without frame pointers is&nbsp;<em>possible</em></a>&nbsp;but <a href="https://rwmj.wordpress.com/2023/02/14/frame-pointers-vs-dwarf-my-verdict/" data-wpel-link="external" target="_blank" rel="external noopener">very expensive</a> and requires masses of debug metadata (e.g. <a href="https://en.wikipedia.org/wiki/DWARF" data-wpel-link="external" target="_blank" rel="external noopener">DWARF</a>), making it highly unpalatable to put in the kernel. Thankfully there were too many obvious problems with user-space profiling, so that notion never really got its legs, and then x86-64 finally arrived<sup data-fn="3ad966e2-a3a1-41ee-a13a-84e58f5e8981" class="fn"><a href="#3ad966e2-a3a1-41ee-a13a-84e58f5e8981" id="3ad966e2-a3a1-41ee-a13a-84e58f5e8981-link">7</a></sup> and it was mooted.</p>


<ol class="wp-block-footnotes"><li id="6e48e51a-6c80-46a9-b2d0-4729eb123f42">e.g. <a href="https://wadetregaskis.com/how-to-install-imagemagick-7-for-wordpress-under-plesk-obsidian-on-ubuntu-22-04/" data-wpel-link="internal">Ubuntu <em>still</em> not officially supporting ImageMagick 7</a> even though it&#8217;s been out for nearly a decade. <a href="#6e48e51a-6c80-46a9-b2d0-4729eb123f42-link" aria-label="Jump to footnote reference 1">↩︎</a></li><li id="188c96e5-8b0d-4e30-84f9-989822dfd065">Aside from the question of register space, there <em>is</em> additional cost to implementing frame pointers, as additional instructions are required around function entry &amp; exit in order to maintain the frame pointers &#8211; to push &amp; pop them off the stack, etc.  The cost of those is usually insignificant &#8211; especially in <a href="https://en.wikipedia.org/wiki/Superscalar_processor" data-wpel-link="external" target="_blank" rel="external noopener">superscalar</a> microarchitectures, as is the norm &#8211; so that aspect is not typically the focus of the controversy. <a href="#188c96e5-8b0d-4e30-84f9-989822dfd065-link" aria-label="Jump to footnote reference 2">↩︎</a></li><li id="b67d5fbb-7aa1-4c29-bc93-1341ac28771a">Tangentially, I vaguely recall us Apple engineers kinda hating the Core Duo (Yonah), or more specifically Apple&#8217;s choice to use it.  Apple used them only for a tiny window of time, from May 2006 to about November 2006 when the Core 2 Duo (Merom) finally replaced them across the line.  I don&#8217;t recall <em>all</em> the reasons that the Core 2 Duo was superior, but they included that Core 2 Duo corrected the 32-bit regression (for Macs) and performed <em>much</em> better.  Anytime Apple releases a Mac with a dud processor in it, like those Core Duos, a lot of Apple engineers die a little inside because they know they&#8217;re going to be stuck supporting the damn things for many years even after the last cursed one rolls off the assembly line.<br><br>It&#8217;s still a mystery to me why Apple rushed the Intel transition in this regard.  They only had to wait six more months and they could have had a clean start on Intel, with no 32-bit to burden on them for the next seven years. <a href="#b67d5fbb-7aa1-4c29-bc93-1341ac28771a-link" aria-label="Jump to footnote reference 3">↩︎</a></li><li id="29afcd2a-0449-44c0-a274-0b06c9ddce8a">Why do I keep calling it &#8220;i386&#8221;?  Isn&#8217;t it officially &#8220;IA-32&#8221;?  Well, yes, but that&#8217;s (a) only retroactively and (b) only ever used by Intel.  Though I guess &#8220;x86&#8221; is probably the more common name?  Yet &#8220;i386&#8221; is in my mental muscle memory.  Maybe that&#8217;s just how we used to refer to it, at Apple?  Maybe just because that&#8217;s the name used in gcc / clang arch &amp; target flags?<br><br>Incidentally, <code>clang -arch i386 -print-supported-cpus</code> on my M2 MacBook Air still lists Yonah (those damn Core Duos) as supported.  Gah!  They won&#8217;t die! 😆 <a href="#29afcd2a-0449-44c0-a274-0b06c9ddce8a-link" aria-label="Jump to footnote reference 4">↩︎</a></li><li id="e0287bfa-c0ab-44b3-8fef-812983216ca6">It&#8217;s funny how the Intel transition is now heralded as being amazing and how much better Intel Macs were than PPC Macs, but for a while there we lost a <em>lot</em> of things, like a 64-bit architecture, an excellent SIMD implementation, and the notion of more than [effectively] six GPRs. <img loading="lazy" decoding="async" width="20" height="20" src="https://emoji.discourse-cdn.com/apple/stuck_out_tongue_closed_eyes.png?v=12" alt=":stuck_out_tongue_closed_eyes:"> <a href="#e0287bfa-c0ab-44b3-8fef-812983216ca6-link" aria-label="Jump to footnote reference 5">↩︎</a></li><li id="d563aa47-98d0-4761-9c0f-63194d9f7d20">There were at the time already some Apple developer tools that did user-space profiling, most notably Sampler (now a niche feature in Activity Monitor) and early versions of Instruments (in fact Instruments <em>still</em> has the Sampler plug-in which does this, although I can&#8217;t really fathom why anyone would ever intentionally use it over the Time Profiler plug-in). <a href="#d563aa47-98d0-4761-9c0f-63194d9f7d20-link" aria-label="Jump to footnote reference 6">↩︎</a></li><li id="3ad966e2-a3a1-41ee-a13a-84e58f5e8981">In the sense of <em>all</em> Macs adopting it, not just the Mac Pro.  It was easy to ignore i386 at that point because it was then all but officially a dead architecture as far as Apple were concerned. <a href="#3ad966e2-a3a1-41ee-a13a-84e58f5e8981-link" aria-label="Jump to footnote reference 7">↩︎</a></li></ol>]]></content:encoded>
					
					<wfw:commentRss>https://wadetregaskis.com/fomit-frame-pointer/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			<media:content url="https://wadetregaskis.com/wp-content/uploads/2024/01/Frame-pointers-explanatory-diagram.webp" medium="image" />
<post-id xmlns="com-wordpress:feed-additions:1">7536</post-id>	</item>
		<item>
		<title>I was into io_uring before it existed</title>
		<link>https://wadetregaskis.com/i-was-into-io_uring-before-it-existed/</link>
					<comments>https://wadetregaskis.com/i-was-into-io_uring-before-it-existed/#respond</comments>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Wed, 17 Jan 2024 08:07:54 +0000</pubDate>
				<category><![CDATA[Ancient History]]></category>
		<category><![CDATA[Coding]]></category>
		<category><![CDATA[Ramblings]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[Instruments]]></category>
		<category><![CDATA[io_uring]]></category>
		<category><![CDATA[rample]]></category>
		<category><![CDATA[Shark]]></category>
		<guid isPermaLink="false">https://wadetregaskis.com/?p=7419</guid>

					<description><![CDATA[I just read up a bit on io_uring, prompted by a Swift Forums thread relating to it, and it made me laugh. To a lot of people it&#8217;s an amazing new[ish] high-performance I/O system for Linux. Which it is (albeit with some serious security concerns, apparently). A lot of people are very excited by it.&#8230; <a class="read-more-link" href="https://wadetregaskis.com/i-was-into-io_uring-before-it-existed/" data-wpel-link="internal">Read more</a>]]></description>
										<content:encoded><![CDATA[
<p>I just read up a bit on <a href="https://man.archlinux.org/man/io_uring.7" data-wpel-link="external" target="_blank" rel="external noopener">io_uring</a>, prompted by <a href="https://forums.swift.org/t/blocking-i-o-and-concurrency/67276" data-wpel-link="external" target="_blank" rel="external noopener">a Swift Forums thread relating to it</a>, and it made me laugh.  To a lot of people it&#8217;s an amazing new[ish] high-performance I/O system for Linux.  Which it is (albeit <a href="https://security.googleblog.com/2023/06/learnings-from-kctf-vrps-42-linux.html" data-wpel-link="external" target="_blank" rel="external noopener">with some serious security concerns</a>, apparently).  A lot of people are very excited by it.  Which they should be.  But it&#8217;s not new or novel, despite what many folks seem to think.</p>



<p>It immediately reminded me of the Shark 5 re-implementation of the profiling interface between the kernel and Shark.  Shark 5 of course didn&#8217;t actually survive to birth &#8211; it was snuffed out by politics and, admittedly, a bit of our own hubris in the Shark team &#8211; but that underlying infrastructure did, as the guts of <a href="https://help.apple.com/instruments/mac/current/#" data-wpel-link="external" target="_blank" rel="external noopener">Instruments</a>&#8216; Time Profiler and System Trace features.</p>



<div class="wp-block-group"><div class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained">
<p>I&#8217;m not even <em>pretending</em> to claim that what we came up with for Shark 5 was novel, either.  As far as we know it was novel to our domain, of profiling tools, but I&#8217;d be amazed if there aren&#8217;t earlier implementations of the same sort of thing many decades prior.</p>
</div></div>



<p>In Shark 4 and earlier, userspace would allocate a single big buffer of memory for the profiling data, hand it [back] to the kernel<sup data-fn="e9992b27-788f-4f5c-9c79-5c6c2396f729" class="fn"><a href="#e9992b27-788f-4f5c-9c79-5c6c2396f729" id="e9992b27-788f-4f5c-9c79-5c6c2396f729-link">1</a></sup>, the kernel &#8211; specifically the Shark kernel extension &#8211; would write into that buffer, then when the buffer was full profiling would end<sup data-fn="eb054551-a35d-4599-9763-bc0d7415909a" class="fn"><a href="#eb054551-a35d-4599-9763-bc0d7415909a" id="eb054551-a35d-4599-9763-bc0d7415909a-link">2</a></sup>.  The userspace driver &#8211; Shark &#8211; <em>could</em> technically start profiling again immediately, but of course you&#8217;d often have a gap in your profiling &#8211; potentially a big one, for the more expensive profiling modes (e.g. System Trace) or on heavily-loaded machines.  Shark didn&#8217;t run with elevated priority, as far as I recall, so it could get crowded off the CPU(s) by other programs.  All communication between user &amp; kernel spaces was via Mach messages, which are ultimately syscalls, and was time-sensitive since kernel- &amp; user-space had to be in lock-step all the time.</p>



<p>For Shark 5, we wanted to up the ante and remove limitations.  We wanted to be able to record indefinitely, limited only by disk space.  More importantly, we also wanted to be able show profiling results <em>live</em>.</p>



<p>So, we &#8211; and I use the term loosely, as I think it was mostly <a href="https://www.linkedin.com/in/mxshift/" data-wpel-link="external" target="_blank" rel="external noopener">Rick Altherr</a> with possibly the help of <a href="https://www.linkedin.com/in/ryandubois/" data-wpel-link="external" target="_blank" rel="external noopener">Ryan du Bois</a> &#8211; came up with a mechanism that&#8217;s very similar to io_uring, but years earlier, in ~2008.  Userspace would allocate multiple smaller buffers &#8211; also easier to acquire given the requirement to be physically contiguous &#8211; and hand those to the kernel somewhat as needed.  Userspace merely needed to stay ahead of the kernel&#8217;s use, which didn&#8217;t necessarily mean pre-allocating all the buffers.  When each of those individual buffers filled up with profiling data from the kernel, they&#8217;d be made available back to the userspace side<sup data-fn="9a13c4b7-e511-4844-815e-13771d10af96" class="fn"><a href="#9a13c4b7-e511-4844-815e-13771d10af96" id="9a13c4b7-e511-4844-815e-13771d10af96-link">3</a></sup>.  Shark (in userspace) would optionally &#8211; if in &#8216;live&#8217; mode or if running short on empty buffers &#8211; process those buffers while profiling was still actively occurring, freeing them up to go back into the ring for the kernel&#8217;s use.  It did mean you slightly increased the probability of perturbing the program(s) under profiling, but that was mostly just a matter of ensuring the buffers were large enough to sufficiently amortise the cost of their processing &amp; handling.  I don&#8217;t recall the extent of the processing, but I think it was not necessarily much more than copying the data out into non-wired memory (maybe even a memory-mapped file?).  That was cheaper than allocating <em>and wiring</em> new buffers.</p>



<p>It worked really well, and is possibly still the implementation used to date &#8211; while it appears a lot of the frameworks (e.g. CoreProfile) have disappeared from macOS, presumably rewritten somewhere else, I&#8217;d be surprised if the kernel-user interface itself has changed much.  It was pretty much perfect.</p>



<p>The creation of this new, superior profiling system also spurred me to create <code>rample</code>, a CLI tool kind of like <code>top</code> which showed you what your CPUs were doing in real time<sup data-fn="4abb5c72-5961-4445-bddf-3ef39ffb6a43" class="fn"><a href="#4abb5c72-5961-4445-bddf-3ef39ffb6a43" id="4abb5c72-5961-4445-bddf-3ef39ffb6a43-link">4</a></sup> as a heavy tree, much like you&#8217;d get in a Time Profile in Shark (and later Instruments).  It was super quick to write &#8211; just a day or two, I believe &#8211; and yet I unexpectedly found that it was more useful than Shark itself.  Being able to see &#8211; practically instantly by virtue of how lightweight it is to launch a simple CLI program &#8211; what&#8217;s chewing on the CPU, <em>including inside the kernel</em>, was incredibly useful.  Instruments is the closest you can get today on a Mac, but it&#8217;s super slow and clunky in comparison.  It&#8217;s one of my top regrets, of my time at Apple, that I didn&#8217;t get <code>rample</code> into Mac OS X, or at least into the Dev Tools.</p>



<p></p>


<ol class="wp-block-footnotes"><li id="e9992b27-788f-4f5c-9c79-5c6c2396f729">I don&#8217;t recall precisely why, but it was apparently better (or only possible?) to allocate the buffer in userspace, even though that was ultimately done with syscalls serviced by the kernel.  I think it had something to do with being easier to handle failures, like being out of memory, in particular <em>contiguous</em> memory &#8211; the memory was wired during use to ensure the kernel wouldn&#8217;t have to deal with the practical nor performance problems of page faults while recording profiling data. <a href="#e9992b27-788f-4f5c-9c79-5c6c2396f729-link" aria-label="Jump to footnote reference 1">↩︎</a></li><li id="eb054551-a35d-4599-9763-bc0d7415909a">Which was a problem because generally profiling sessions were specified to run for a specific time duration, by the end-user.  So the buffer had to be sized to avoid filling prematurely, which meant wasting memory. <a href="#eb054551-a35d-4599-9763-bc0d7415909a-link" aria-label="Jump to footnote reference 2">↩︎</a></li><li id="9a13c4b7-e511-4844-815e-13771d10af96">Technically they were always available &#8211; we didn&#8217;t bother changing access permissions &#8211; but the userspace app would coordinate with the kernel via an elegant lockfree state machine.  That state machine was a superb design &#8211; it deserves its own detailed nostalgia trip. <a href="#9a13c4b7-e511-4844-815e-13771d10af96-link" aria-label="Jump to footnote reference 3">↩︎</a></li><li id="4abb5c72-5961-4445-bddf-3ef39ffb6a43">Well, updated every second or so, like <code>top</code>. <a href="#4abb5c72-5961-4445-bddf-3ef39ffb6a43-link" aria-label="Jump to footnote reference 4">↩︎</a></li></ol>]]></content:encoded>
					
					<wfw:commentRss>https://wadetregaskis.com/i-was-into-io_uring-before-it-existed/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">7419</post-id>	</item>
	</channel>
</rss>
