免费上youtube代理软件

15 October 2020 10:29 AM (webassembly | compilers | 花猫加速器官网 | gc | llvm | emscripten | igalia | js | security | ocap | rust)

Greets!

You may have seen an interesting paper cross your radar a couple months ago: Everything Old is New Again: Binary Security of WebAssembly, by Daniel Lehmann, Johannes Kinder and Michael Pradel. The paper makes some strong claims and I would like to share some thoughts on it.

reader-response theory

For context, I have been working on web browsers for the last 8 years or so, most recently on the JavaScript and WebAssembly engine in Firefox. My work mostly consists of implementing new features, which if you are familiar with software development translates as "writing bugs". Almost all of those bugs are security bugs, potentially causing Firefox to go from being an agent of the user to an agent of the Mossad, or of cryptocurrency thieves, or anything else.

Mitigating browser bug flow takes a siege mentality. Web browsers treat all web pages and their corresponding CSS, media, JavaScript, and WebAssembly as hostile. We try to reason about global security properties, and translate those properties into invariants ensured at compile-time and run-time, for example to ensure that a web page from site A can't access cookies from site B.

In this regard, WebAssembly has some of the strongest isolation invariants in the whole platform. A WebAssembly module has access to nothing, by default: neither functionality nor data. Even a module's memory is isolated from the rest of the browser, both by construction (that's just how WebAssembly is specified) and by run-time measures (given that pointers are 32 bits in today's WebAssembly, we generally reserve a multi-gigabyte region for a module's memory that can contain nothing else).

All of this may seem obvious, but consider that a C++ program compiled to native code on a POSIX platform can use essentially everything that the person running it has access to: your SSH secrets, your email, all of your programs, and so on. That same program compiled to WebAssembly does not -- any capability it has must have been given to it by the person running the program. For POSIX-like programs, the WebAssembly community is working on a POSIX for the web that standardizes a limited-capability access to data and functionality from the world, and in web browsers, well of course the module has access only to the capabilities that the embedding web page gives to it. Mostly, as the JS run-time accompanying the WebAssembly is usually generated by emscripten, this set of capabilties is a function of the program itself.

猫鼠模拟器游戏下载安装-猫鼠模拟器最新版下载-ROM之家:2021-9-16 · 猫鼠模拟器游戏一款非常有趣的猫鼠模拟器游戏，玩家可以去模拟扮演可爱的小花猫或者是狡猾的小老鼠，卡通趣味游戏画风，轻松的点击操作，大量的游戏关卡模式，帮助小老鼠偷到不同场景中的奶酪，同时躲避猫咪的追逼，趣味十足。

花猫时间精灵免费版_花猫时间精灵官方下载_花猫时间精灵1.0 ...:2021-10-9 · 华军软件园闹铃时钟频道，为您提供花猫时间精灵免费版、花猫时间精灵官方下载等闹铃时钟软件下载。更多花猫时间精灵1.0历史版本，请到华军软件园！

the new criticism

Therefore it was with skepticism that I started reading the Lehmann et al paper. The paper focusses on WebAssembly itself, not any particular implementation thereof; what could be wrong about WebAssembly?

I found the answer to be quite nuanced. To me, the paper shows three interesting things:

Memory-safety bugs in C/C++ programs when compiled to WebAssembly can cause control-flow edges that were not present in the source program.
Unexpected control-flow in a web browser can sometimes end up in a call to eval with the permissions of the web page, which is not good.
It's easier in some ways to exploit bugs in a C/C++ program when compiled to WebAssembly than when compiled natively, because many common mitigations aren't used by the WebAssembly compiler toolchain.

Firstly, let's dicuss the control-flow point. Let's say that the program has a bug, and you have made an exploit to overwrite some memory location. What can you do with it? Well, consider indirect calls (call_indirect). This is what a compiler will emit for a vtable method call, or for a call to a function pointer. The possible targets for the indirect call are stored in a 一只猫加速器, which is a side array of all possible call_indirect targets. The actual target is selected at run-time based on an index; WebAssembly function pointers are just indices into this table.

So if a function loads an index into the indirect call table from memory, and some exploit can change this index, then you can cause a call site to change its callee. Although there is a run-time type check that occurs at the call_indirect site to ensure that the callee is called with the right type, many functions in a module can have compatible types and thus be callable without an error.

OK, so that's not great. But what can you do with it? Well it turns out that emscripten will sometimes provide JavaScript's 一只猫加速器 to the WebAssembly module. Usually it will be called only with a static string, but anything can happen. If an attacker can redirect a call site to eval instead of one of the possible targets from the source code, you can (e.g.) send the user's cookies to evil.com.

There's a similar vulnerability regarding changing the operand to eval, instead. Strings are represented in linear memory as well, and there's no write protection on them, even if they are read-only data. If your write primitive can change the string being passed to eval, that's also a win for the attacker. More details in the paper.

This observation brings us to the last point, which is that many basic mitigations in (e.g.) POSIX deployments aren't present in WebAssembly. There are no OS-level read-only protections for static data, and the compiler doesn't enforce this either. Also WebAssembly programs have to bundle their own malloc, but the implementations provided by emscripten don't implement the "hardening" techniques. There is no addres-space layout randomization, so exploits are deterministic. And so on.

on mitigations

It must be said that for most people working on WebAssembly, security "mitigations" are... unsatisfactory. They aren't necessary for memory-safe programs, and they can't prevent memory-unsafe programs from having unexpected behavior. Besides, we who work on WebAssembly are more focussed on the security properties of the WebAssembly program as embedded in its environment, but not on the program itself. Garbage in, garbage out, right?

In that regard, I think that one answer to this paper is just "don't". Don't ship memory-unsafe programs, or if you do, don't give them eval capabilities. No general mitigation will make these programs safe. Writing your program in e.g. safe Rust is a comprehensive fix to this class of bug.

But, we have to admit also that shipping programs written in C and C++ is a primary goal of WebAssembly, and that no matter how hard we try, some buggy programs will get shipped, and therefore that there is marginal value to including mitigations like read-only data or even address space randomization. We definitely need to work on getting control-flow integrity protections working well with the WebAssembly toolchain, probably via multi-table support (part of the reference types extension; my colleague Paulo Matos just landed a patch in this area). And certainly Emscripten should work towards minimizing the capabilities set exposed to WebAssembly by the generated JavaScript, notably by compiling away uses of eval by embind.

Finally, I think that many of the problems identified by this paper will be comprehensively fixed in a different way by more "managed" languages. The problem is that C/C++ pointers are capabilities into all of undifferentiated linear memory. By contrast, handles to GC-managed objects are unforgeable: given object A, you can't get to object B except if object A references B. It would be great if we could bring some of the benefits of this more capability-based approach to in-memory objects to languages like C and C++; more on that in a future note, I think.

chapeau

In the end, despite my initial orneriness, I have to admit that the paper authors point out some interesting areas to work on. It's clear that there's more work to do. I was also relieved to find that my code is not at fault in this particular instance :) Onwards and upwards, and until next time, happy hacking!

(11)

免费上youtube代理软件

13 October 2020 1:34 PM (webassembly | compilers | malloc | gc | llvm | emscripten | igalia | js)

Greetings, internet! Today I have the silliest of demos for you: malloc-as-a-service.

loading walloc...

The above input box, if things managed to work, loads up a simple bare-bones malloc implementation, and exposes "malloc" and "free" bindings. But the neat thing is that it's built without emscripten: it's a standalone C file that compiles directly to WebAssembly, with no JavaScript run-time at all. I share it here because it might come in handy to people working on WebAssembly toolchains, and also because it was an amusing experience to build.

wat?

The name of the allocator is "walloc", in which the w is for WebAssembly.

Walloc was designed with the following priorities, in order:

Standalone. No stdlib needed; no emscripten. Can be included in a project without pulling in anything else.
Reasonable allocation speed and fragmentation/overhead.
Small size, to minimize download time.
Standard interface: a drop-in replacement for malloc.
Single-threaded (currently, anyway).

Emscripten includes a couple of good malloc implementations (dlmalloc and emmalloc) which probably you should use instead. But if you are really looking for a bare-bones malloc, walloc is fine.

You can check out all the details at the 猫咪加速器app; a selection of salient bits are below.

Firstly, to build walloc, it's just a straight-up compile:

clang -DNDEBUG -Oz --target=wasm32 -nostdlib -c -o walloc.o walloc.c

The resulting walloc.o is a conforming WebAssembly file on its own, but which also contains additional symbol table and relocation sections which allow wasm-ld to combine separate compilation units into a single final WebAssembly file. walloc.c on its own doesn't import or export anything, in the WebAssembly sense; to make bindings visible to JS, you need to add a little wrapper:

typedef __SIZE_TYPE__ size_t;

#define WASM_EXPORT(name) \
  __attribute__((export_name(#name))) \
  name

// Declare these as coming from walloc.c.
void *malloc(size_t size);
void free(void *p);
                          
void* WASM_EXPORT(walloc)(size_t size) {
  return malloc(size);
}

void WASM_EXPORT(wfree)(void* ptr) {
  free(ptr);
}

If you compile that to 一只猫加速器 and link via 迅雷网游加速器 V3.17.0.9122绿色破解版- 系统侠软件下载 ...:2021-3-29 · 1 迅游网游加速器 V20211230 永久免费安装版 2 8LAG加速器 V32.2 绿色版 3 海豚网游加速器 V4.2.6.307 4 Wise Game Booster(游戏加速工具) V1.54.78 中文安装版 5 立马游戏加速器 V3.5.4 官方安装版 6 速腾灯具管理系统 V18.0917 官方辉煌版 7 光速大师 V2.410.1011 ..., you end up with the walloc.wasm used in the demo above. See your inspector for the URL.

The resulting wasm file is about 2 kB (uncompressed).

Walloc isn't the smallest allocator out there. A simple bump-pointer allocator that never frees is the fastest thing you can have. There is also an alternate allocator for Rust, wee_alloc, which is said to be smaller than walloc, though I think it is less space-efficient for small objects. But still, walloc is pretty small.

implementation notes

When a C program is compiled to WebAssembly, the resulting wasm module (usually) has associated linear memory. It can be linked in a way that the memory is created by the module when it's instantiated, or such that the module is given a memory by its host. The above example passed --import-memory to the linker, allowing the host to bound memory usage for the module instance.

The linear memory has the usual data, stack, and heap segments. The data and stack are placed first. The heap starts at the &__heap_base symbol. (This symbol is computed and defined by the linker.) All bytes above 一只猫加速器 can be used by the wasm program as it likes. So &__heap_base is the lower bound of memory managed by walloc.

                                              memory growth ->
+----------------+-----------+-------------+-------------+----
| data and stack | alignment | walloc page | walloc page | ...
+----------------+-----------+-------------+-------------+----
^ 0              ^ &__heap_base            ^ 64 kB aligned

Interestingly, there are a few different orderings of data and stack used by different toolchains. It used to even be the case that the stack grew up. This diagram from the recent 神庙逃亡2官方版下载- 全方位下载:2021-7-23 · 海啄加速器破解版海豚加速器永久免费版海豚加速器电脑破解版海豚加速器白金破解版 yf网盘破解版 ... 天天娱乐老版下载御剑决H5 御剑决H5 手游御剑决h5安卓版全民穿越之宫无限钻石全民穿越之宫破解版全民穿越之宫 ... is a good summary:

The sensible thing to prevent accidental overflow (underflow, really) is to have the stack grow down to 0, with data at higher addresses. But this can cause WebAssembly code that references data to take up more bytes, because addresses are written using variable-length 快喵加速器 encodings that favor short offsets, so it isn't the default, right now at least.

Anyway! The upper bound of memory managed by walloc is the total size of the memory, which is aligned on 64-kilobyte boundaries. (WebAssembly ensures this alignment.) Walloc manages memory in 64-kb pages as well. It starts with whatever memory is initially given to the module, and will expand the memory if it runs out. The host can specify a maximum memory size, in pages; if no more pages are available, walloc's malloc will simply return NULL; handling out-of-memory is up to the caller.

花猫王者盒子下载-花猫王者盒子助手最新版v5.3下载-72游戏网:2021-11-26 · 花猫王者盒子是一款整合了各种功能的王者荣耀助手。该工具将最近超火的国服生成器，技能辅助线，兵线倒计时等等功能全部融合在一起，玩家只需要拥有这一个辅助就能享受到不同的功能。喜欢的小伙伴可别错过了！

big bois

A large object is more than 256 bytes.

There is a global freelist of available large objects, each of which has a header indicating its size. When allocating, walloc does a best-fit search through that list.

struct large_object {
  struct large_object *next;
  size_t size;
  char payload[0];
};
struct large_object* large_object_free_list;

Large object allocations are rounded up to 256-byte boundaries, including the header.

If there is no object on the freelist that can satisfy an allocation, walloc will expand the heap by the size of the allocation, or by half of the current walloc heap size, whichever is larger. The resulting page or pages form a large object that can satisfy the allocation.

If the best object on the freelist has more than a chunk of space on the end, it is split, and the tail put back on the freelist. A chunk is 256 bytes.

+-------------+---------+---------+-----+-----------+
| page header | chunk 1 | chunk 2 | ... | chunk 255 |
+-------------+---------+---------+-----+-----------+
^ +0          ^ +256    ^ +512                      ^ +64 kB

As each page is 65536 bytes, and each chunk is 256 bytes, there are therefore 256 chunks in a page. The first chunk in a page that begins an allocated object, large or small, contains a header chunk. The page header has a byte for each of the 256 chunks in the page. The byte is 255 if the corresponding chunk starts a large object; otherwise the byte indicates the size class for packed small-object allocations (see below).

+-------------+---------+---------+----------+-----------+
| page header | large object 1    | large object 2 ...   |
+-------------+---------+---------+----------+-----------+
^ +0          ^ +256    ^ +512                           ^ +64 kB

When splitting large objects, we avoid starting a new large object on a page header chunk. A large object can only span where a page header chunk would be if it includes the entire page.

Freeing a large object pushes it on the global freelist. We know a pointer is a large object by looking at the page header. We know the size of the allocation, because the large object header precedes the allocation. When the next large object allocation happens after a free, the freelist will be compacted by merging adjacent large objects.

猫咪加速器app

Small objects are allocated from segregated freelists. The granule size is 8 bytes. Small object allocations are packed in a chunk of uniform allocation size. There are size classes for allocations of each size from 1 to 6 granules, then 8, 10, 16, and 32 granules; 10 sizes in all. For example, an allocation of e.g. 12 granules will be satisfied from a 16-granule chunk. Each size class has its own free list.

【玲珑网游加速器】玲珑网游加速器免费版_多特软件站 ...:2021-9-19 · 专业的游戏加速器。玲珑网游 加速器能够彻底解决网游延迟过高，登录困难，频繁掉线等问题，彻底实现了电信，网通，铁通，教育网，移动等不同用户网络环境的互联互通，并且提供有全平台支持,保证加速 …

When allocating, if there is nothing on the corresponding freelist, walloc will allocate a new large object, then change its chunk kind in the page header to the size class. It then goes through the fresh chunk, threading the objects through each other onto a free list.

+-------------+---------+---------+------------+---------------------+
| page header | large object 1    | granules=4 | large object 2' ... |
+-------------+---------+---------+------------+---------------------+
^ +0          ^ +256    ^ +512    ^ +768       + +1024               ^ +64 kB

In this example, we imagine that the 4-granules freelist was empty, and that the large object freelist contained only large object 2, running all the way to the end of the page. We allocated a new 4-granules chunk, splitting the first chunk off the large object, and pushing the newly trimmed large object back onto the large object freelist, updating the page header appropriately. We then thread the 4-granules (32-byte) allocations in the fresh chunk together (the chunk has room for 8 of them), treating them as if they were instances of struct freelist, pushing them onto the global freelist for 4-granules allocations.

           in fresh chunk, next link for object N points to object N+1
                                 /--------\                     
                                 |        |
            +------------------+-^--------v-----+----------+
granules=4: | (padding, maybe) | object 0 | ... | object 7 |
            +------------------+----------+-----+----------+
                               ^ 4-granule freelist now points here

The size classes were chosen so that any wasted space (padding) is less than the size class.

and that's it

Hey have fun with the thing! Let me know if you find it useful. Happy hacking and until next time!

(14)

免费上youtube代理软件

3 June 2020 8:39 PM (guile | gnu | compilers | igalia | scheme | baseline | optimizing | guix)

Greets, my peeps! Today's article is on a new compiler for Guile. I made things better by making things worse!

The new compiler is a "baseline compiler", in the spirit of what modern web browsers use to get things running quickly. It is a very simple compiler whose goal is speed of compilation, not speed of generated code.

嗨碰视频测试版下载_嗨碰视频测试版v1.2.2下载v1.2.2_世界 ...:今天 · 嗨碰视频测试版最新版本下载地址，非常多的VIP内容让你免费观看，嗨碰视频汇聚了全网最新最热第一手影视资源，这款app有网友说是91的搬运工，里面内容也是非常的精彩，而且已经是破解版搜索次数 …

The straw that broke the camel's back was Guix, which defines the graph of all installable packages in an operating system using Scheme code. Lately it has been apparent that when you update the set of available packages via a "guix pull", Guix would spend too much time compiling the Scheme modules that contain the package graph.

125棋牌苹果版下载-125棋牌苹果手机下载-刷机之家:今天 · 125棋牌苹果版内置了非常全面的游戏模式，能够满足不同玩家的需求，即使是那些挑剔的玩家也不会有不满意的地方，对于本身游戏技巧就比较高的玩家来说，你可以直接参与到游戏专业的比赛当中，团队会不定期组织各种比赛，让你大展身手!

So that's what I did!

it don't do much

p站加速器吾爱破解版下载,p站加速器吾爱破解版免费 v7.0.2 ...:今天 · p站加速器吾爱破解版就是在线进行提供各种资源的信息平台。在线进行准备好的资源都是为了帮助用户进行选择的,不管你是喜欢小说还是喜欢漫画都是可以在线进行找到的,像这种资源比较齐全的平台是很多的用比较喜欢的,自己看见了就是运气...

Interestingly the quality of the code produced at optimization level -O0 is pretty much the same.

This graph shows generated code performance of the 快喵加速器 relative to new baseline compiler, at optimization level 0. Bars below the line mean the CPS compiler produces slower code. Bars above mean CPS makes faster code. You can click and zoom in for details. Note that the Y axis is logarithmic.

The tests in which -O0 CPS wins are mostly because the CPS-based compiler does a robust closure optimization pass that reduces allocation rate.

At optimization level -O1, which adds partial evaluation over the high-level tree intermediate language and support for inlining "primitive calls" like + and so on, I am not sure why CPS peels out in the lead. No additional important optimizations are enabled in CPS at that level. That's probably something to look into.

Note that the baseline of this graph is optimization level -O1, with the new baseline compiler.

But as I mentioned, I didn't write the baseline compiler to produce fast code; I wrote it to produce code fast. So does it actually go fast?

Well against the -O0 and -O1 configurations of the CPS compiler, it does excellently:

Here you can see comparisons between what will be Guile 3.0.3's -O0 and -O1, compared against their equivalents in 3.0.2. (In 3.0.2 the -O1 equivalent is actually -O1 -Oresolve-primitives, if you are following along at home.) What you can see is that at these optimization levels, for these 8 files, the baseline compiler is around 4 times as fast.

If we compare to Guile 3.0.3's default -O2 optimization level, or -O3, we see bigger disparities:

Which is to say that Guile's baseline compiler runs at about 10x the speed of its optimizing compiler, which incidentally is similar to what I found for WebAssembly compilers a while back.

Also of note is that -O0 and -O1 take essentially the same time, with -O1 often taking less time than -O0. This is because partial evaluation can make the program smaller, at a cost of being less straightforward to debug.

Similarly, -O3 usually takes less time than -O2. This is because -O3 is allowed to assume top-level bindings that aren't exported from a module can be transformed to lexical bindings, which are more available for contification and inlining, which usually leads to smaller programs; it is a similar debugging/performance tradeoff to the -O0/-O1 case.

But what does one gain when choosing to spend 10 times more on compilation? Here I have a gnarly graph that plots performance on some microbenchmarks for all the different optimization levels.

Like I said, it's gnarly, but the summary is that -O1 typically gets you a factor of 2 or 4 over -O0, and -O2 often gets you another factor of 2 above that. -O3 is mostly the same as -O2 except in magical circumstances like the mbrot case, where it adds an extra 16x or so over -O2.

猫咪加速器app

I haven't seen the numbers yet of this new compiler in Guix, but I hope it can have a good impact. Already in Guile itself though I've seen a couple interesting advantages.

One is that because it produces code faster, Guile's boostrap from source can take less time. There is also a felicitous feedback effect in that because the baseline compiler is much smaller than the CPS compiler, it takes less time to macro-expand, which reduces bootstrap time (as bootstrap has to pay the cost of expanding the compiler, until the compiler is compiled).

stay safe, friends

The code, you ask? Voici.

猫咪加速器app

免费上youtube代理软件

14 April 2020 8:59 AM (猫咪加速器app | compilers | firefox | spidermonkey | webassembly | bloomberg | v8 | javascriptcore)

Greets! Today's article looks at browser WebAssembly implementations from a compiler throughput point of view. As I wrote in my article on Firefox's WebAssembly baseline compiler, web browsers have multiple wasm compilers: some that produce code fast, and some that produce fast code. Implementors are willing to pay the cost of having multiple compilers in order to satisfy these conflicting needs. So how well do they do their jobs? Why bother?

In this article, I'm going to take the simple path and just look at code generation throughput on a single chosen WebAssembly module. Think of it as X-ray diffraction to expose aspects of the inner structure of the WebAssembly implementations in SpiderMonkey (Firefox), V8 (Chrome), and JavaScriptCore (Safari).

experimental setup

As a workload, I am going to use a version of the "Zen Garden" demo. This is a 40-megabyte game engine and rendering demo, originally released for other platforms, and compiled to WebAssembly a couple years later. Unfortunately the original URL for the demo was disabled at some point in late 2019, so it no longer has a home on the web. A bit of a weird situation and I am not clear on licensing either. In any case I have a version downloaded, and have hacked out a minimal set of "imports" that the WebAssembly module needs from the host to allow the module to compile and link when run from a JavaScript shell, without requiring WebGL and similar facilities. So the benchmark is just to instantiate a WebAssembly module from the 40-megabyte byte array and see how long it takes. It would be better if I had more test cases (and would be happy to add them to the comparison!) but this is a start.

I start by benchmarking the various WebAssembly implementations, firstly in their standard configuration and then setting special run-time flags to measure the performance of the component compilers. I run these tests on the core-rich machine that I use for browser development (2 Xeon Silver 4114 CPUs for a total of 40 logical cores). The default-configuration numbers are therefore not indicative of performance on a low-end Android phone, but we can use them to extract aspects of the different implementations.

Since I'm interested in compiler throughput, I'm not particularly concerned about how well a compiler will use all 40 cores. Therefore when testing the specific compilers I will set implementation-specific flags to disable parallelism in the compiler and GC: --single-threaded on V8, --no-threads on SpiderMonkey, and H1Z1花猫加速器官方版下载_H1Z1花猫加速器v1.1.0.8最新版 ...:2021-3-18 · H1Z1花猫加速器是一款实用的网络加速器，适用于各个平台，同时也支持许多市面上的热门游戏，西西小编在这路推荐H1Z1花猫加速器，它拥有多种协议，适合各种复杂网络环境，加密性强，拥有良好的平台与网络兼容性，提高网络连接稳定性，让您拥有更舒心的网络体验!需要的朋友赶紧下载使用吧。 on JSC. To further restrict any threads that the implementation might decide to spawn, I'll bind these to a single core on my machine using taskset -c 4. Otherwise the machine is in its normal configuration (nothing else significant running, all cores available for scheduling, turbo boost enabled).

I'll express results in nanoseconds per WebAssembly code byte. Of the 40 megabytes or so in the Zen Garden demo, only 23 891 164 bytes are actually function code; the rest is mostly static data (textures and so on). So I'll divide the total time by this code byte count.

I tested V8 at git revision 0961376575206, SpiderMonkey at hg revision 8ec2329bef74, and JavaScriptCore at subversion revision 259633. The benchmarks can be run using just a shell; see the pull request. I timed how long it took to instantiate the Zen Garden demo, ensuring that a basic export was callable. I collected results from 20 separate runs, sleeping a second between them. The bars in the charts below show the median times, with a histogram overlay of all results.

results & analysis

We can see some interesting results in this graph. Note that the Y axis is logarithmic. The "concurrent tiering" results in the graph correspond to the default configurations (no special flags, no taskset, all cores available).

The first interesting conclusions that pop out for me concern JavaScriptCore, which is the only implementation to have a baseline interpreter (run using --useWasmLLInt=true --useBBQJIT=false --useOMGJIT=false). JSC's WebAssembly interpreter is actually structured as a compiler that generates custom WebAssembly-specific bytecode, which is then run by a custom interpreter built using the same infrastructure as JSC's JavaScript interpreter (the LLInt). Directly interpreting WebAssembly might be possible as a low-latency implementation technique, but since you need to validate the WebAssembly anyway and eventually tier up to an optimizing compiler, apparently it made sense to emit fresh bytecode.

The part of JSC that generates baseline interpreter code runs slower than SpiderMonkey's baseline compiler, so one is tempted to wonder why JSC bothers to go the interpreter route; but then we recall that on iOS, we can't generate machine code in some contexts, so the LLInt does appear to address a need.

One interesting feature of the LLInt is that it allows tier-up to the optimizing compiler directly from loops, which neither V8 nor SpiderMonkey support currently. Failure to tier up can be quite confusing for users, so good on JSC hackers for implementing this.

工具类软件_常用工具app_手机常用工具大全_ROM之家:2 天前 · 36.5MB 下载 qq密码破解大师 5.90MB 下载快游手游加速器 3.95MB 下载做个截图王破解版 14.5MB 下载优大师 9.94MB 下载快映 20.10MB 下载 gmg软件 11MB 下载懂游戏 14.20MB 下载酒馆战棋手游助手 11.48MB 下载 zero和平捍卫者 14.21MB 下载下载

JavaScriptCore's baseline compiler (run using --useWasmLLInt=false --useBBQJIT=true --useOMGJIT=false) runs much more slowly than SpiderMonkey's or V8's baseline compiler, which I think can be attributed to the fact that it builds a graph of basic blocks instead of doing a one-pass compile. To me these results validate SpiderMonkey's and V8's choices, looking strictly from a latency perspective.

I don't have graphs for code generation throughput of JavaSCriptCore's optimizing compiler (run using --useWasmLLInt=false --useBBQJIT=false --useOMGJIT=true); it turns out that JSC wants one of the lower tiers to be present, and will only tier up from the LLInt or from BBQ. Oh well!

V8 and SpiderMonkey, on the other hand, are much of the same shape. Both implement a streaming baseline compiler and an optimizing compiler; for V8, we get these via --liftoff --no-wasm-tier-up or --no-liftoff, respectively, and for SpiderMonkey it's --wasm-compiler=baseline or --wasm-compiler=ion.

小花猫视频破解版下载_小花猫视频app看片V2.3412_掌通手游:2021-2-13 · 小花猫视频破解版是一款功能相当强大的视频播放神器，原创搞笑等海量影视，高清流畅播放，极速离线缓存，用户可以随时随地点播自己喜欢的视频进行观看;，无任何磁盘限制播放器内置强大解码器海量端口任意选，再也不用担心播放卡顿延迟等现象了，你还在等什么，赶快下载体验吧

Another conclusion concerns the efficacy of tiering: for both V8 and SpiderMonkey, their baseline compilers run more than 10 times as fast as the optimizing compiler, and the same ratio holds between JavaScriptCore's baseline interpreter and compiler.

Finally, it would seem that the current cross-implementation benchmark for lowest-tier code generation throughput on a desktop machine would then be around 50 ns per WebAssembly code byte for a single core, which corresponds to receiving code over the wire at somewhere around 160 megabits per second (Mbps). If we add in concurrency and manage to farm out compilation tasks well, we can obviously double or triple that bitrate. Optimizing compilers run at least an order of magnitude slower. We can conclude that to the desktop end user, WebAssembly compilation time is indistinguishable from download time for the lowest tier. The optimizing tier is noticeably slower though, running more around 10-15 Mbps per core, so time-to-tier-up is still a concern for faster networks.

Going back to the question posed at the start of the article: yes, tiering shows a clear benefit in terms of WebAssembly compilation latency, letting users interact with web sites sooner. So that's that. Happy hacking and until next time!

(47)

multi-value webassembly in firefox: a binary interface

花猫加速器官网

Hey hey hey! Hope everyone is staying safe at home in these weird times. Today I have a final dispatch on the implementation of the multi-value feature for WebAssembly in Firefox. Last week I wrote about multi-value in blocks; this week I cover function calls.

on the boundaries between things

In my article on Firefox's baseline compiler, I mentioned that all WebAssembly engines in web browsers treat the function as the unit of compilation. This facilitates streaming, parallel compilation of WebAssembly modules, by farming out compilation of individual functions to worker threads. It also allows for easy tier-up from quick-and-dirty code generated by the low-latency baseline compiler to the faster code produced by the optimizing compiler.

There are some interesting Conway's Law implications of this choice. One is that division of compilation tasks becomes an opportunity for division of human labor; there is a whole team working on the experimental Cranelift compiler that could replace the optimizing tier, and in my hackings on Firefox I have had minimal interaction with them. To my detriment, of course; they are fine people doing interesting things. But the code boundary means that we don't need to communicate as we work on different parts of the same system.

Boundaries are where places touch, and sometimes for fluid crossing we have to consider boundaries as places in their own right. Functions compiled with the baseline compiler, with Ion (the production optimizing compiler), and with Cranelift (the experimental optimizing compiler) are all able to call each other because they actively maintain a common boundary, a binary interface (ABI). (Incidentally the A originally stands for "application", essentially reflecting division of labor between groups of people making different components of a software system; Conway's Law again.) Let's look closer at this boundary-place, with an eye to how it changes with multi-value.

what's in an ABI?

Among other things, an ABI specifies a calling convention: which arguments go in registers, which on the stack, how the stack values are represented, how results are returned to the callers, which registers are preserved over calls, and so on. Intra-WebAssembly calls are a closed world, so we can design a custom ABI if we like; that's what V8 does. Sometimes WebAssembly may call functions from the run-time, though, and so it may be useful to be closer to the C++ ABI on that platform (the "native" ABI); that's what Firefox does. (Incidentally here I think Firefox is probably leaving a bit of performance on the table on Windows by using the inefficient native ABI that only allows four register parameters. I haven't measured though so perhaps it doesn't matter.) Using something closer to the native ABI makes debugging easier as well, as native debugger tools can apply more easily.

One thing that most native ABIs have in common is that they are really only optimized for a single result. This reflects their heritage as artifacts from a world built with C and C++ compilers, where there isn't a concept of a function with more than one result. If multiple results are required, they are represented instead as arguments, typically as pointers to memory somewhere. Consider the AMD64 SysV ABI, used on Unix-derived systems, which carefully specifies how to pass arbitrary numbers of arbitrary-sized data structures to a function (§3.2.3), while only specifying what to do for a single return value. If the return value is too big for registers, the ABI specifies that a pointer to result memory be passed as an argument instead.

So in a multi-result WebAssembly world, what are we to do? How should a function return multiple results to its caller? Let's assume that there are some finite number of general-purpose and floating-point registers devoted to return values, and that if the return values will fit into those registers, then that's where they go. The problem is then to determine which results will go there, and if there are remaining results that don't fit, then we have to put them in memory. The ABI should indicate how to address that memory.

first thought: stack results precede stack arguments

When a function needs some of its arguments passed on the stack, it doesn't receive a pointer to those arguments; rather, the arguments are placed at a well-known offset to the stack pointer.

We could do the same thing with stack results, either reserving space deeper on the stack than stack arguments, or closer to the stack pointer. With the advent of tail calls, it would make more sense to place them deeper on the stack. Like this:

The diagram above shows the ordering of stack arguments as implemented by Firefox's WebAssembly compilers: later arguments are deeper (farther from the stack pointer). It's an arbitrary choice that happens to match up with what the native ABIs do, as it was easier to re-use bits of the already-existing optimizing compiler that way. (Native ABIs use this stack argument ordering because of sloppiness in a version of C from before I was born. If you were starting over from scratch, probably you wouldn't do things this way.)

Stack result order does matter to the baseline compiler, though. It's easier if the stack results are placed in the same order in which they would be pushed on the virtual stack, so that when the function completes, the results can just be memmove'd down into place (if needed). The same concern dictates another aspect of our ABI: unlike calls, registers are allocated to the last results rather than the first results. This is to make it easy to preserve stack invariant (1) from the previous article.

At first I thought this was the obvious option, but I ran into problems. It turns out that stack arguments are fundamentally unlike stack results in some important ways.

While a stack argument is logically consumed by a call, a stack result starts life with a call. As such, if you reserve space for stack results just by decrementing the stack pointer before a call, probably you will need to load the results eagerly into registers thereafter or shuffle them into other positions to be able to free the allocated stack space.

Eager shuffling is busy-work that should be avoided if possible. It's hard to avoid in the baseline compiler. For example, a call to a function with 10 arguments will consume 10 values from the temporary stack; any results will be pushed on after removing argument values from the stack. If there any stack results, it's almost impossible to avoid a post-call memmove, to move stack results to where they should be before the 10 argument values were pushed on (and probably spilled). So the baseline compiler case is not optimal.

However, things get gnarlier with the Ion optimizing compiler. Like many other optimizing compilers, Ion is designed to compute the necessary stack frame size ahead of time, and to never move the stack pointer during an activation. The only exception is for pushing on any needed stack arguments for nested calls (which are popped directly after the nested call). So in that case, assuming there are a number of multi-value calls in a stack frame, we'll be shuffling in the optimizing compiler as well. Not great.

Besides the need to shuffle, stack arguments and stack results differ as regards ownership and garbage collection. A callee "owns" the memory for its stack arguments; it is responsible for them. The caller can't assume anything about the contents of that memory after a call, especially if the WebAssembly implementation supports tail calls (a whole 'nother blog post, that). If the values being passed are just bits, that's one thing, but with the reference types proposal, some result values may be managed by the garbage collector. The 花猫加速器破解版下载 is responsible for making stack arguments visible to the garbage collector; the caller is responsible for the results. The caller will need to emit metadata to allow the garbage collector to see stack result references. For this reason, a stack result actually starts life just before a call, because it can become initialized at any point and thus needs to be traced during the entire callee activation. Not all callers can easily add garbage collection roots for writable stack slots, so the need to place stack results in a fixed position complicates calling multi-value WebAssembly functions in some cases (e.g. from C++).

花猫加速器注册不了_蜗牛加速器_花猫加速器下载:2021-12-25 · 海豚加速器破解永久教学发布时间： 2021年12月27日发布人：我想吃素啊阅读量： 485 名字:世界战争用biubiu加速器才可以玩...

int64_t foo(int64_t* a, int64_t* b) {
  *a = 1;
  *b = 2;
  return 3;
}
void call_foo(void) {
  int64 a, b, c;
  c = foo(&a, &b);
}

This program shows us a possibility for encoding WebAssembly's multiple return values: pass an additional argument for each stack result, pointing to the location to which to write the stack result. Like this:

The result pointers are normal arguments, subject to normal argument allocation. In the above example, given that there are already stack arguments, they will probably be passed on the stack, but in many cases the stack result pointers may be passed in registers.

The result locations themselves don't even need to be on the stack, though they certainly will be in intra-WebAssembly calls. However the ability to write to any memory is a useful form of flexibility when e.g. calling into WebAssembly from C++.

The advantage of this approach is that we eliminate post-call shuffles, at least in optimizing compilers. But, having to make an argument for each stack result, each of which might itself become a stack argument, seems a bit offensive. I thought we might be able to do a little better.

third thought: stack result area, passed as pointer

Given that stack results are going to be written to memory, it doesn't really matter where they will be written, from the perspective of the optimizing compiler at least. What if we allocated them all in a block and just passed one pointer to the block? Like this:

Here there's just one additional argument, no matter how many stack results. While we're at it, we can specify that the layout of the stack arguments should be the same as how they would be written to the baseline stack, to make the baseline compiler's job easier.

As I started implementation with the baseline compiler, I chose this third approach, essentially because I was already allocating space for the results in a block in this way by bumping the stack pointer.

When I got to the optimizing compiler, however, it was quite difficult to convince Ion to allocate an area on the stack of the right shape.

Looking back on it now, I am not sure that I made the right choice. The thing is, the IonMonkey compiler started life as an optimizing compiler for JavaScript. It can represent unboxed values, which is how it came to be used as a compiler for asm.js and later WebAssembly, and it does a good job on them. However it has never had to represent aggregate data structures like a C++ class, so it didn't have support for spilling arbitrary-sized data to the stack. It took a while staring at the register allocator to convince it to allocate arbitrary-sized stack regions, and then to allocate component scalar values out of those regions. If I had just asked the register allocator to give me one appropriate-sized stack slot for each scalar, and hacked out the ability to pass separate pointers to the stack slots to WebAssembly calls with stack results, then I would have had an easier time of it, and perhaps stack slot allocation could be more dense because multiple results wouldn't need to be allocated contiguously.

As it is, I did manage to hack it in, and I think in a way that doesn't regress. I added a layer over an argument type vector that adds a synthetic stack results pointer argument, if the function returns stack results; iterating over this type with ABIArgIter will allocate a stack result area pointer, either as a register argument or a stack argument. In the optimizing compiler, I added add a kind of value allocation coresponding to a variable-sized stack area, (using pointer tagging again!), and extended the register allocator to allocate LStackArea, and the component stack results. Interestingly, I had to add a kind of definition that starts life on the stack; previously all Ion results started life in registers and were only spilled if needed.

In the end, a function will capture the incoming stack result area argument, either as a normal SSA value (for Ion) or stored to a stack slot (baseline), and when returning will write stack results to that pointer as appropriate. Passing in a pointer as an argument did make it relatively easy to implement calls from WebAssembly to and from C++, getting the variable-shape result area to be known to the garbage collector for C++-to-WebAssembly calls was simple in the end but took me a while to figure out.

Finally I was a bit exhausted from multi-value work and ready to walk away from the "JS API", the bit that allows multi-value WebAssembly functions to be called from JavaScript (they return an array) or for a JavaScript function to return multiple values to WebAssembly (via an iterable) -- but then when I got to thinking about this blog post I preferred to implement the feature rather than document its lack. Avoidance-of-document-driven development: it's a thing!

towards deployment

As I said in the last article, the multi-value feature is about improved code generation and also making a more capable base for expressing further developments in the WebAssembly language.

As far as code generation goes, things are progressing but it is still early days. Thomas Lively has implemented support in LLVM for emitting return of C++ aggregates via multiple results, which is enabled via the -experimental-multivalue-abi cc1 flag. Thomas has also been implementing multi-value support in the binaryen WebAssembly toolchain component, used by the emscripten C++-to-WebAssembly toolchain. I think it will be a few months though before everything lands in a way that end users can take advantage of.

On the specification side, the multi-value feature is now at phase 4 since January, which basically means things are all done there.

Implementation-wise, V8 has had experimental support since 2017 or so, and the feature was staged last fall, although V8 doesn't yet support multi-value in their baseline compiler. WebKit 快喵加速器.

Unlike V8 and SpiderMonkey, JavaScriptCore (the JS and wasm engine in WebKit) actually implements a WebAssembly interpreter as their solution to the one-pass streaming compilation problem. Then on the compiler side, there are two tiers that both operate on basic block graphs (OMG and BBQ; I just puked a little in my mouth typing that). This strategy makes the compiler implementation quite straightforward. It's also an interesting design point because JavaScriptCore's garbage collector scans the stack conservatively; there's no need for the compiler to do bookkeeping on the GC's behalf, which I'm sure was a relief to the hacker. Anyway, multi-value in WebKit is done too.

The new thing of course is that finally, in Firefox, the feature is now fully implemented (woo) and enabled by default on Nightly builds (woo!). I did that! It took me a while! Perhaps too long? Anyway it's done. Thanks again to Bloomberg for supporting this work; large ups to y'all for helping the web move forward.

皮皮漫画VIP破解版app下载-皮皮漫画全集破解版下载最新版 ...:今天 · 皮皮漫画破解版无限阅读币在线观看，这是一款最火爆的漫画在线观看软件，在这款软件上，用户可以在线查看各种高品质的大量原创漫画作品，不管你喜欢什么种类的漫画。这里的漫画资源总有一款适合您，平台上的画质十分高清，为用户带来优质的阅读体验，让您在不经意间找到志同道合的漫画 ...

(42)

彩虹六号加速器 4.0.6.601 绿色版 - 绿色软件联盟:2021-7-5 · 想让玩家不再受到网络延时的影响，正常进行游戏？那你就需要彩虹六号加速器了。彩虹六号海豚加速器VPN是一款针对网络游戏加速的软件，能够有效解决玩家在进行外服游戏时延迟高、经常掉线等网络问题，并且能够解决不同代理商之间网络互通的瓶颈，让玩家安心游戏！

3 April 2020 10:56 AM (igalia | compilers | 花猫加速器破解版下载 | spidermonkey | webassembly | bloomberg)

Greetings, hackers! Today I'd like to write about something I worked on recently: implementation of the multi-value future feature of WebAssembly in Firefox, as sponsored by Bloomberg.

In the "minimum viable product" version of WebAssembly published in 2018, there were a few artificial restrictions placed on the language. Functions could only return a single value; if a function would naturally return two values, it would have to return at least one of them by writing to memory. Loops couldn't take parameters; any loop state variables had to be stored to and loaded from indexed local variables at each iteration. Similarly, any block that would naturally return more than one result would also have to do so via locals.

This restruction is lifted with the multi-value proposal. Function types now map from result type to result type, where a result type is a sequence of value types. That is to say, just as functions can take multiple arguments, they can return multiple results. Similarly, with the multi-value proposal, block types are now the same as function types: loops and blocks can take arguments and return any number of results. This change improves the expressiveness of WebAssembly as a compilation target; a C++ program compiled to multi-value WebAssembly can be encoded in fewer bytes than before. Multi-value also establishes a base for other language extensions. For example, the exception handling proposal builds on multi-value to pass multiple values to catch blocks.

multi-value in blocks

In the last article, I presented the basic structure of Firefox's WebAssembly support: there is a 花猫加速器下载_花猫加速器免费下载_世界奇闻下载网:2021-6-4 · 花猫加速器是一款非常实用的安卓网络加速工具，专业网络加速工具，智能分流，优化了服务器的速度，提供多种外服游戏加速。 Wi-Fi/5G网络智能切换，兼容更多游戏和线路，支持王者农药、PUBG Mobile、阴阳师、明日之后、狂野飙车等等全球热门手游。 and an optimizing compiler optimized for throughput. (There is also Cranelift, a new experimental compiler that may replace the current implementation of the optimizing compiler; but that doesn't affect the basic structure.)

The optimizing compiler applies traditional compiler techniques: SSA graph construction, where values flow into and out of graphs using the usual defs-dominate-uses relationship. The only control-flow joins are loop entry and (possibly) block exit, so the addition of loop parameters means in multi-value there are some new phi variables in that case, and the expansion of block result count from [0,1] to [0,n] means that you may have more block exit phi variables. But these compilers are built to handle these situations; you just build the SSA and let the optimizing compiler go to town.

The problem comes in the baseline compiler.

from 1 to n

Recall that the baseline compiler is optimized for compiler speed, not compiled speed. If there are only ever going to be 0 or 1 result from a block, for example, the baseline compiler's internal data structures will use something like a Maybe<ValType> to represent that block result.

If you then need to expand this to hold a vector of values, the naïve approach of using a Vector<ValType> would mean heap allocation and indirection, and thus would regress the baseline compiler.

In this case, and in many other similar cases, the solution is to use value tagging to represent 0 or 1 value type directly in a word, and the general case by linking out to an external vector. As block types are function types, they actually appear as function types in the WebAssembly type section, so they are already parsed; the BlockType in that case can just refer out to already-allocated memory.

In fact this value-tagging pattern applies all over the place. (The jit/ links above are for the optimizing compiler, but they relate to function calls; will write about that next week.) I have a bit of pause about value tagging, in that it's gnarly complexity and I didn't measure the speed of alternative implementations, but it was a useful migration strategy: value tagging minimizes performance risk to existing specialized use cases while adding support for new general cases. Gnarly it is, then.

control-flow joins

I didn't mention it in the last article, but there are two important invariants regarding stack discipline in the baseline compiler. Recall that there's a virtual stack, and that some elements of the virtual stack might be present on the machine stack. There are four kinds of virtual stack entry: register, constant, local, and spilled. Locals indicate local variable reads and are mostly like registers in practice; when registers spill to the stack, locals do too. (Why spill to the temporary stack instead of leaving the value in the local variable slot? Because locals are mutable. A local.get captures a local variable value at its point of execution. If future code changes the local variable value, you wouldn't want the captured value to change.)

Digressing, the stack invariants:

Spilled values precede registers and locals on the virtual stack. If u and v are virtual stack entries and u is older than v, then if u is in a register or is a local, then v is not spilled.
Older values precede newer values on the machine stack. Again for u and v, if they are both spilled, then u will be farther from the stack pointer than v.

There are five fundamental stack operations in the baseline compiler; let's examine them to see how the invariants are guaranteed. Recall that before multi-value, targets of non-local exits (e.g. of the br instruction) could only receive 0 or 1 value; if there is a value, it's passed in a well-known register (e.g. %rax or %xmm0). (On 32-bit machines, 64-bit values use a well-known pair of registers.)

push(v): Results of WebAssembly operations never push spilled values, neither onto the virtual nor the machine stack. v is either a register, a constant, or a reference to a local. Thus we guarantee both (1) and (2).
pop() -> v: Doesn't affect older stack entries, so (1) is preserved. If the newest stack entry is spilled, you know that it is closest to the stack pointer, so you can pop it by first loading it to a register and then incrementing the stack pointer; this preserves (2). Therefore if it is later pushed on the stack again, it will not be as a spilled value, preserving (1).
spill(): When spilling the virtual stack to the machine stack, you first traverse stack entries from new to old to see how far you need to spill. Once you get to a virtual stack entry that's already on the stack, you know that everything older has already been spilled, because of (1), so you switch to iterating back towards the new end of the stack, pushing registers and locals onto the machine stack and updating their virtual stack entries to be spilled along the way. This iteration order preserves (2). Note that because known constants never need to be on the machine stack, they can be interspersed with any other value on the virtual stack.
return(height, v): This is the stack operation corresponding to a block exit (local or nonlocal). We drop items from the virtual and machine stack until the stack height is height. In WebAssembly 1.0, if the target continuation takes a value, then the jump passes a value also; in that case, before popping the stack, v is placed in a well-known register appropriate to the value type. Note however that v is not pushed on the virtual stack at the return point. Popping the virtual stack preserves (1), because a stack and its prefix have the same invariants; popping the machine stack also preserves (2).
capture(t): Whereas return operations happen at block exits, capture operations happen at the target of block exits (the continuation). If no value is passed to the continuation, a capture is a no-op. If a value is passed, it's in a register, so we just push that register onto the virtual stack. Both invariants are obviously preserved.

Note that a value passed to a continuation via return() has a brief instant in which it has no name -- it's not on the virtual stack -- but only a location -- it's in a well-known place. capture() then gives that floating value a name.

Relatedly, there is another invariant, that the allocation of old values on block entry is the same as their allocation on block exit, so that all predecessors of the block exit flow all values via the same places. This is preserved by spilling on block entry. It's a big hammer, but effective.

So, given all this, how do we pass multiple values via return()? We don't have unlimited registers, so the 一只猫加速器 strategy isn't going to work.

The answer for the baseline compiler is informed by our lean into the stack machine principle. Multi-value returns are allocated in such a way that a capture() can push them onto the virtual stack. Because spilled values must precede registers, we therefore allocate older results on the stack, and put the last result in a register (or register pair for i64 on 32-bit platforms). Note that it's possible in theory to allocate multiple results to registers; we'll touch on this next week.

Therefore the implementation of return(height, v₁..v_n) is straightforward: we first pop register results, then spill the remaining virtual stack items, then shuffle stack results down towards height. This should result in a memmove of contiguous stack results towards the frame pointer. However because const values aren't present on the machine stack, depending on the stack height difference, it may mean a split between 小花猫视频破解版下载_小花猫视频app看片V2.3412_掌通手游:2021-2-13 · 小花猫视频破解版是一款功能相当强大的视频播放神器，原创搞笑等海量影视，高清流畅播放，极速离线缓存，用户可以随时随地点播自己喜欢的视频进行观看;，无任何磁盘限制播放器内置强大解码器海量端口任意选，再也不用担心播放卡顿延迟等现象了，你还在等什么，赶快下载体验吧. It's gnarly, but it is what it is. Note that the links to the return and capture implementations above are to the post-multi-value world, so you can see all the details there.

that's it!

When it comes to calls though, that's another story. We'll get to that one next week. Thanks again to Bloomberg for supporting this work; I'm really delighted that Igalia and Bloomberg have been working together for a long time (coming on 10 years now!) to push the web platform forward. A special thanks also to Mozilla's Lars Hansen for his patience reviewing these patches. Until next week, then, stay at home & happy hacking!

(22)

firefox's low-latency webassembly compiler

25 March 2020 4:29 PM (igalia | compilers | firefox | spidermonkey | webassembly | bloomberg)

Good day!

Today I'd like to write a bit about the WebAssembly baseline compiler in Firefox.

background: throughput and latency

WebAssembly, as you know, is a virtual machine that is present in web browsers like Firefox. An important initial goal for WebAssembly was to be a good target for compiling programs written in C or C++. You can visit a web page that includes a program written in C++ and compiled to WebAssembly, and that WebAssembly module will be downloaded onto your computer and run by the web browser.

A good virtual machine for C and C++ has to be fast. The throughput of a program compiled to WebAssembly (the amount of work it can get done per unit time) should be approximately the same as its throughput when compiled to "native" code (x86-64, ARMv7, etc.). WebAssembly meets this goal by defining an instruction set that consists of similar operations to those directly supported by CPUs; WebAssembly implementations use optimizing compilers to translate this portable instruction set into native code.

There is another dimension of fast, though: not just work per unit time, but also time until first work is produced. If you want to go play Doom 3 on the web, you care about frames per second but also time to first frame. Therefore, WebAssembly was designed not just for high throughput but also for low latency. This focus on low-latency compilation expresses itself in two ways: binary size and binary layout.

On the size front, WebAssembly is optimized to encode small files, reducing download time. One way in which this happens is to use a 快喵加速器 anywhere an instruction needs to specify an integer. In the usual case where, for example, there are fewer than 128 local variables, this means that a local.get instruction can refer to a local variable using just one byte. Another strategy is that WebAssembly programs target a stack machine, reducing the need for the instruction stream to explicitly load operands or store results. Note that size optimization only goes so far: it's assumed that the bytes of the encoded module will be compressed by gzip or some other algorithm, so sub-byte entropy coding is out of scope.

On the layout side, the WebAssembly binary encoding is sorted by design: definitions come before uses. For example, there is a section of type definitions that occurs early in a WebAssembly module. Any use of a declared type can only come after the definition. In the case of functions which are of course mutually recursive, function type declarations come before the actual definitions. In theory this allows web browsers to take a one-pass, streaming approach to compilation, starting to compile as functions arrive and before download is complete.

implementation strategies

The goals of high throughput and low latency conflict with each other. To get best throughput, a compiler needs to spend time on code motion, register allocation, and instruction selection; to get low latency, that's exactly what a compiler should not do. Web browsers therefore take a two-pronged approach: they have a compiler optimized for throughput, and a compiler optimized for latency. As a WebAssembly file is being downloaded, it is first compiled by the quick-and-dirty low-latency compiler, with the goal of producing machine code as soon as possible. After that "baseline" compiler has run, the "optimizing" compiler works in the background to produce high-throughput code. The optimizing compiler can take more time because it runs on a separate thread. When the optimizing compiler is done, it replaces the baseline code. (The actual heuristics about whether to do baseline + optimizing ("tiering") or just to go straight to the optimizing compiler are a bit hairy, but this is a summary.)

This article is about the WebAssembly baseline compiler in Firefox. It's a surprising bit of code and I learned a few things from it.

design questions

Knowing what you know about the goals and design of WebAssembly, how would you implement a low-latency compiler?

It's a question worth thinking about so I will give you a bit of space in which to do so.

花猫加速器电脑版下载_官方正式版下载_52pk软件下载:2021-6-29 · 花猫加速器1.1.0.8电脑版官方下载版本：v1.1.0.8 【软件介绍】花猫加速器免费版是一款电脑网游加速工具，支持一号通用，无需复杂的配置，软件对于系统资源的占用也非常小。花猫加速器免费版拥有良好的兼容性，有效帮助提高网络连接的 ...

The function is the unit of compilation
One pass, and one pass only
Lean into the stack machine
No noodling!

In the remainder of this article we'll look into these individual points. Note, although I have done a good bit of hacking on this compiler, its design and original implementation comes mainly from Mozilla hacker Lars Hansen, who also currently maintains it. All errors of exegesis are mine, of course!

the function is the unit of compilation

As we mentioned, in the binary encoding of a WebAssembly module, all definitions needed by any function come before all function definitions. This naturally leads to a partition between two phases of bytestream parsing: an initial serial phase that collects the set of global type definitions, annotations as to which functions are imported and exported, and so on, and a subsequent phase that compiles individual functions in an essentially independent manner.

The advantage of this approach is that compiling functions is a natural task unit of parallelism. If the user has a machine with 8 virtual cores, the web browser can keep one or two cores for the browser itself and farm out WebAssembly compilation tasks to the rest. The result is that the compiled code is available sooner.

Taking functions to be the unit of compilation also allows for an easy "tier-up" mechanism: after the baseline compiler is done, the optimizing compiler can take more time to produce better code, and when it is done, it can swap out the results on a per-function level. All function calls from the baseline compiler go through a jump table indirection, to allow for tier-up. In SpiderMonkey there is no mechanism currently to tier down; if you need to debug WebAssembly code, you need to refresh the page, causing the wasm code to be compiled in debugging mode. For the record, SpiderMonkey can only tier up at function calls (it doesn't do OSR).

This simple approach does have some down-sides, in that it leaves intraprocedural optimizations on the table (inlining, contification, custom calling conventions, speculative optimizations). This is mitigated in two ways, the most obvious being that LLVM or whatever produced the WebAssembly has ideally already done whatever inlining might be fruitful. The second is that WebAssembly is designed for predictable performance. In JavaScript, an implementation needs to do run-time type feedback and speculative optimizations to get good performance, but the result is that it can be hard to understand why a program is fast or slow. The designers and implementers of WebAssembly in browsers all had first-hand experience with JavaScript virtual machines, and actively wanted to avoid unpredictable performance in WebAssembly. Therefore there is currently a kind of détente among the various browser vendors, that everyone has agreed that they won't do speculative inlining -- yet, anyway. Who knows what will happen in the future, though.

one pass, and one pass only

The WebAssembly baseline compiler makes one pass through the bytecode of a function. Nowhere in all of this are we going to build an abstract syntax tree or a graph of basic blocks. Let's follow through how that works.

Firstly, emitFunction simply emits a prologue, then the body, then an epilogue. emitBody is basically a big loop that consumes opcodes from the instruction stream, dispatching to opcode-specific code emitters (e.g. emitAddI32).

The opcode-specific code emitters are also responsible for validating their arguments; for example, emitAddI32 is wrapped in an assertion that there are two i32 values on the stack. This validation logic is shared by a templatized codestream iterator so that it can be re-used by the optimizing compiler, as well as by the publicly-exposed WebAssembly.validate function.

A corollary of this approach is that machine code is emitted in bytestream order; if the WebAssembly instruction stream has an i32.add followed by a i32.sub, then the machine code will have an 旋风加速破解版器下载 followed by a subl.

WebAssembly has a syntactically limited form of non-local control flow; it's not goto. Instead, instructions are contained in a tree of nested control blocks, and control can only exit nonlocally to a containing control block. There are three kinds of control blocks: jumping to a block or an if will continue at the end of the block, whereas jumping to a loop will continue at its beginning. In either case, as the compiler keeps a stack of nested control blocks, it has the set of valid jump targets and can use the usual assembler logic to patch forward jump addresses when the compiler gets to the block exit.

猫咪加速器app

This is the interesting bit! So, WebAssembly instructions target a stack machine. That is to say, there's an abstract stack onto which evaluating i32.const 32 pushes a value, and if followed by 花猫加速器官网 there would then be i32(32) | i32(10) on the stack (where new elements are added on the right). A subsequent i32.add would pop the two values off, and push on the result, leaving the stack as i32(42). There is also a fixed set of local variables, declared at the beginning of the function.

The easiest thing that a compiler can do, then, when faced with a stack machine, is to emit code for a stack machine: as values are pushed on the abstract stack, emit code that pushes them on the machine stack.

The downside of this approach is that you emit a fair amount of code to do read and write values from the stack. Machine instructions generally take arguments from registers and write results to registers; going to memory is a bit superfluous. We're willing to accept suboptimal code generation for this quick-and-dirty compiler, but isn't there something smarter we can do for ephemeral intermediate values?

Turns out -- yes! The baseline compiler keeps an abstract value stack as it compiles. For example, compiling i32.const 32 pushes nothing on the machine stack: it just adds a ConstI32 node to the value stack. When an instruction needs an operand that turns out to be a 花猫加速器破解版下载, it can either encode the operand as an immediate argument or 猫咪加速器app.

Say we are evaluating the i32.add discussed above. After the add, where does the result go? For the baseline compiler, the answer is always "in a register" via pushing a new RegisterI32 entry on the value stack. The baseline compiler includes a stupid register allocator that spills the value stack to the machine stack if no register is available, updating value stack entries from e.g. RegisterI32 to 花猫加速器破解版下载. Note, a ConstI32 never needs to be spilled: its value can always be reloaded as an immediate.

The end result is that the baseline compiler avoids lots of stack store and load code generation, which speeds up the compiler, and happens to make faster code as well.

Note that there is one limitation, currently: control-flow joins can have multiple predecessors and can pass a value (in the current WebAssembly specification), so the allocation of that value needs to be agreed-upon by all predecessors. As in this code:

(func $f (param $arg i32) (result i32)
  (block $b (result i32)
    (i32.const 0)
    (local.get $arg)
    (i32.eqz)
    (br_if $b) ;; return 0 from $b if $arg is zero
    (drop)
    (i32.const 1))) ;; otherwise return 1
;; result of block implicitly returned

When the br_if branches to the block end, where should it put the result value? The baseline compiler effectively punts on this question and just puts it in a well-known register (e.g., $rax on x86-64). Results for block exits are the only place where WebAssembly has "phi" variables, and the baseline compiler allocates all integer phi variables to the same register. A hack, but there we are.

no noodling!

When I started to hack on the baseline compiler, I did a lot of code reading, and eventually came on code like this:

void BaseCompiler::emitAddI32() {
  int32_t c;
  if (popConstI32(&c)) {
    RegI32 r = popI32();
    masm.add32(Imm32(c), r);
    pushI32(r);
  } else {
    RegI32 r, rs;
    pop2xI32(&r, &rs);
    masm.add32(rs, r);
    freeI32(rs);
    pushI32(r);
  }
}

I said to myself, this is silly, why are we only emitting the add-immediate code if the constant is on top of the stack? What if instead the constant was the deeper of the two operands, why do we then load the constant into a register? I asked on the chat channel if it would be OK if I improved codegen here and got a response I was not expecting: no noodling!

For that reason, changes are only accepted to the baseline compiler if they are necessary for some reason, or if they improve latency as measured using some real-world benchmark (time-to-first-frame on Doom 3, for example).

This to me was a real eye-opener: a compiler optimized not for the quality of the code that it generates, but rather for how fast it can produce the code. I had seen this in action before but this example really brought it home to me.

快喵加速器

So that's the WebAssembly baseline compiler in SpiderMonkey / Firefox. Until the next time, happy hacking!

(50)

花猫加速器破解版下载

9 February 2020 7:44 PM (gnu | free software | fsf)

Greetings, GNU hackers! This blog post rounds up GNU happenings over 2019. My goal is to celebrate the software we produced over the last year and to help us plan a successful 2020.

Over the past few months I have been discussing project health with a group of GNU maintainers and we were wondering how the project was doing. We had impressions, but little in the way of data. To that end I wrote some scripts to collect dates and versions for all releases made by GNU projects, as far back as data is available.

In 2019, I count 花猫加速器官网, from 98 projects. Nice! Notably, on ftp.gnu.org we have the first stable releases from three projects:

GNU Guix: GNU Guix is perhaps the most exciting project in GNU these days. It's a package manager! It's a distribution! It's a container construction tool! It's a package-manager-cum-distribution-cum-container-construction-tool! Hearty congratulations to Guix on their first stable release.
GNU Shepherd: The GNU Daemon Shepherd is a modern dependency-based init service, written in Guile Scheme, and used in Guix. When you install Guix as an operating system, it actually stages Scheme programs from the operating system definition into the Shepherd configuration. So cool!
GNU Backgammon: Version 1.06.002 is not GNU Backgammon's first stable release, but it is the earliest version which is available on ftp.gnu.org. Formerly hosted on the now-defunct gnubg.org, GNU Backgammon is a venerable foe, and uses neural networks since before they were cool. Welcome back, GNU Backgammon!

The total release counts above are slightly above what Mike Gerwitz's scripts count in his "GNU Spotlight", posted on the FSF blog. This could be because in addition to files released on ftp.gnu.org, I also manually collected release dates for most packages that upload their software somewhere other than gnu.org. I don't count alpha.gnu.org releases, and there were a handful of packages for which I wasn't successful at retrieving their release dates. But as a first approximation, it's a relatively complete data set.

I put my scripts in git repository if anyone is interested in playing with the data. Some raw CSV files are there as well.

where we at?

Hair toss, check my nails, baby how you GNUing? Hard to tell!

猫咪加速器app

What we see is nothing before 1991 -- surely pointing to lacunae in my data set -- then a more or less linear rise in active package count until 2002, some stuttering growth rising to a peak in 2014 at 208 active packages, and from there a steady decline down to 153 active packages in 2019.

Of course, as a metric, active package count isn't precisely the same as project health; 一只猫加速器 is indeed the standard editor but it's not GCC. But we need to look for measurements that indirectly indicate project health and this is what I could come up with.

Looking a little deeper, I tabulated the first and last release date for each GNU package, and then grouped them by year. In this graph, the left blue bars indicate the number of packages making their first recorded release, and the right green bars indicate the number of packages making their last release. Obviously a last release in 2019 indicates an active package, so it's to be expected that we have a spike in green bars on the right.

What this graph indicates is that GNU had an uninterrupted growth phase from its beginning until 2006, with more projects being born than dying. Things are mixed until 2012 or so, and since then we see many more projects making their last release and above all, very few packages "being born".

where we going?

I am not sure exactly what steps GNU should take in the future but I hope that this analysis can be a good conversation-starter. I do have some thoughts but will post in a follow-up. Until then, happy hacking in 2020!

(31)

猫鼠模拟器游戏下载安装-猫鼠模拟器最新版下载-ROM之家:2021-9-16 · 猫鼠模拟器游戏一款非常有趣的猫鼠模拟器游戏，玩家可以去模拟扮演可爱的小花猫或者是狡猾的小老鼠，卡通趣味游戏画风，轻松的点击操作，大量的游戏关卡模式，帮助小老鼠偷到不同场景中的奶酪，同时躲避猫咪的追逼，趣味十足。

7 February 2020 11:38 AM (guile | gnu | fosdem | maintenance | change | 快喵加速器)

Greets, hackfolk!

Like just about every year, last week I took the train up to Brussels for FOSDEM, the messy and wonderful carnival of free software and of those that make it. Mostly I go for the hallway track: to see old friends, catch up, scheme about future plans, and refill my hacker culture reserves.

I usually try to see if I can get a talk or two in, and this year was no exception. First on my mind was the recent release of Guile 3. This was the culmination of a 10-year plan of work and so obviously there are some things to say! But at the same time, I wanted to reflect back a bit and look at the past with a bit of distance.

So in the end, my one talk was two talks. Let's start with the first one. (I'm trying a new thing where I share my talks as blog posts. We'll see how this goes. I know the rendering can be a bit off relative to the slides, but hopefully it's good enough. If you prefer, you can just watch the video instead!)

免费上youtube代理软件

FOSDEM 2020, Brussels

Andy Wingo | wingo@igalia.com

www.nbomb2017.com | @andywingo

So yeah let's celebrate! I co-maintain the Guile implementation of Scheme. It's a programming language. Guile 3, in summary, is just Guile, but faster. We added a simple just-in-time compiler as well as a bunch of ahead-of-time optimizations. The result is that it runs faster -- sometimes by a lot!

花猫加速器破解版下载

In the image above you can see Guile 3's performance on a number of microbenchmarks, relative to Guile 2.2, sorted by speedup. The baseline is 1.0x as fast. You can see that besides the first couple microbenchmarks where things are a bit inconclusive (click for full-size image), everything gets faster. Most are at least 2x as fast, and one benchmark is even 32x as fast. (Note the logarithmic scale on the Y axis.)

I only took a look at microbenchmarks at the end of the Guile 3 series; before that, I was mostly going by instinct. It's a relief to find out that in this case, my instincts did align with improvement.

mini-benchmark: eval

(猫咪加速器app
 ’(let fib ((n 30))
    (if (< n 2)
        n
        (+ (fib (- n 1)) (fib (- n 2))))))

Guile 1.8: primitive-eval written in C

Guile 2.0+: primitive-eval in Scheme

Taking a look at a more medium-sized benchmark, let's compute the 30th fibonacci number, but using the interpreter instead of compiling the procedure. In Guile 2.0 and up, the interpreter (primitive-eval) is implemented in Scheme, so it's a good test of an important small Scheme program.

Before 2.0, though, primitive-eval was actually implemented in C. This had a number of disadvantages, notably that it prevented tail calls between interpreted and compiled code. When we switched to a Scheme implementation of 花猫加速器官网, we knew we would have a performance hit, but we thought that we would gain it back eventually as the compiler got better.

As you can see, it took a while before the compiler and run-time improved to the point that primitive-eval in Scheme reached the speed of its old hand-tuned C implementation, but for Guile 3, we finally got there. Note again the logarithmic scale on the Y axis.

花猫加速器官网

guix build libreoffice ghc-pandoc guix \
  –dry-run --derivation

7% faster

guix system build config.scm \
  –dry-run --derivation

10% faster

Finally, taking a real-world benchmark, the Guix package manager is implemented entirely in Scheme. All ten thousand packages are defined in Scheme, the building scripts are in Scheme, the initial RAM disk is in Scheme -- you get the idea. Guile performance in Guix can have an important effect on user experience. As you can see, Guile 3 lowered elapsed time for some operations by around 10 percent or so. Of course there's a lot of I/O going on in addition to computation, so Guile running twice as fast will rarely make Guix run twice as fast (Amdahl's law and all that).

spry /sprī/

adjective: active; lively

手游加速器软件合集-手游加速器软件大全 - 系统盒下载站-最新 ...:2021-11-11 · 花猫加速器破解版下载系统工具 | 2.5MB 更新时间：2021-09-08 17:13:45 评分：7.3 概要：花猫加速器破解版是一款专注于网游加速、海外游戏网络优化的软件。相信不少小伙伴在玩steam上的一些游戏的时候，总是会遇到各种网络问题，特别是国外网游 ...

spry /sprī/

adjective: (especially of an old person) active; lively

But actually when I went to look up the meaning of "spry", Collins Dictionary says that it especially applies to the agèd. At first I was a bit offended, but I knew in my heart that the dictionary was right.

免费上youtube代理软件

FOSDEM 2020, Brussels

Andy Wingo | wingo@igalia.com

www.nbomb2017.com | @andywingo

That leads me into my second talk.

guile is ancient

2010: Rust

2009: Go

2007: Clojure

1995: Ruby

1995: PHP

1995: JavaScript

1993: Guile (3³ years before 3.0!)

It's common for a new project to be lively, but Guile is definitely not new. People have been born, raised, and earned doctorates in programming languages in the time that Guile has been around.

built from ancient parts

1991: Python

1990: Haskell

1990: SCM

1989: Bash

一只猫加速器

1988: SIOD

Guile didn't appear out of nothing, though. It was hacked up from the pieces of another Scheme implementation called SCM, which itself was initially based on Scheme in One Defun (SIOD), back before the Berlin Wall fell.

written in an ancient language

1987: Perl

1984: C++

1975: Scheme

快喵加速器

1958: Lisp

1958: Algol

猫咪加速器app

1958: Lisp

快喵加速器 (3⁴ years ago!)

But it goes back further! The Scheme language, of which Guile is an implementation, dates from 1975, before I was born; and you can, if you choose, trace the lines back to the lambda calculus, created in mid-30s as a notation for computation. I suppose at this point I should say mid-2030s, to disambiguate.

The point is, Guile is old! Statistically, most software projects from olden times are now dead. How has Guile managed to survive and (sometimes) thrive? Surely there must be some lesson or other that can be learned here.

猫咪加速器app

Men make their own history, but they do not make it as they please; they do not make it under self-selected circumstances, but under circumstances existing already, given and transmitted from the past.

Eighteenth Brumaire of Louis Bonaparte, Marx, 1852

I am no philospher of history, but I know that there are some ways of looking at the past that do not help me understand things. One is the arrow of enlightened progress, in which events exist in a causal chain, each producing the next. It doesn't help me understand the atmosphere, tensions, and possibilities inherent at any particular point. I find the "progress" theory of history to be an extreme form of selection bias.

Much more helpful to me is the Hegelian notion of dialectics: that at an given point in time there are various tensions at work. In our field, an example could be memory safety versus systems programming. These tensions create an environment that favors actions that lead towards resolution of the tensions. It doesn't mean that there's only one way to resolve the tensions, and it's not an automatic process -- people still have to do things. But the tendency is to ratchet history forward to a new set of tensions.

The history of a project, to me, is then a process of dialectic tensions and resolutions. If the project survives, as Guile has, then it should teach us something about the way this process works in practice.

ancient & spry

Languages evolve; how to remain minimal?

Dialectic opposites

world and guile
stable and active
...

Lessons learned from inside Hegel’s motor of history

One dialectic is the tension between the world's problems and what tools Guile offers to understand and solve them. In 1993, the web didn't really exist. In 2033, if Guile doesn't run well in a web browser, probably it will be dead. But this process operates very slowly, for an old project; Guile isn't built on CORBA or something ephemeral like that, so we don't have very much data here.

花猫加速器下载_花猫加速器1.3.5下载 - 系统之家:2021-8-29 · 花猫加速器是一款使用起来很简单的体积小的网游加速器，但功能一点都不输给同类加速器。使用起来简单便利，单单注册账号就步骤少、快速注册成功。建议使用花猫加速器时购买VIP套餐。

唯美视频制作app下载,唯美视频制作剪辑app手机版 v1.0 - 新 ...:2021-6-15 · 下载最美证件照制作破解版摄影摄像 / 25MB 小编简评: 09-06发布下载蓝树摄影app 摄影摄像 / 4MB 小编简评: 10-22发布下载小小电影城app 影音播放 / 21MB 小编简评: 12-24发布下载四虎青蛙app 影音播放 / 13MB 小编简评: 11-27发布下载玲珑影院破解版

hill-climbing is insufficient

Ex: Guile 1.8; Extend vs Embed

One key lesson that I have learned is that the strategy of making only incremental improvements is a recipe for death, in the long term. The natural result is that you reach what you perceive to be the most optimal state of your project. Any change can only make it worse, so you stop moving.

This is what happened to Guile around version 1.8: we had taken the paradigm of the interpreter as language implementation strategy as far as it could go. There were only around 150 commits to Guile in 2007. We were stuck.

users stay unless pushed away

Inertial factor: interface

Source (API)
一只猫加速器
猫咪加速器app
CLI
...

Ex: Python 3; 花猫加速器官网; R6RS syntax; set!, set-car!

So how do we make change, in such a circumstance? You could start a new project, but then you wouldn't have any users. It would be nice to change and keep your users. Fortunately, it turns out that users don't really go away; yes, they trickle out if you don't do anything, but unless you change in an incompatible way, they stay with you, out of inertia.

Inertia is good and bad. It does conflict with minimalism as a principle; if you were to design Scheme in 2020, you would not include mutable variables or even mutable pairs. But they are still with us because if we removed them, we'd break too many users.

Users can even make you add back things that you had removed. In Guile 2.0, we removed the capability to evaluate an expression at run-time within the lexical environment of an expression, as we didn't know how to implement this outside an interpreter. It turns out this was so important to users that we had to add local-eval back to Guile, later in the 2.0 series. (Fortunately we were able to do it in a way that layered on lower-level facilities; this approach reconciled me to the solution.)

花猫加速器破解版下载

What users say: don’t change or remove existing behavior

No change at all == death

Natural result of hill-climbing

Ex: psyntax; BDW-GC mark & finalize; compile-time; Unicode / locales

Unfortunately, the need to change means that sometimes you will lose users. It's either a dead project, or losing users.

In Guile 1.8, for example, the macro expander ran lazily: it would only expand code the first time it ran it. This was good for start-up time, because not all code is evaluated in the course of a simple script. Lazy expansion allowed us to start doing important work sooner. However, this approach caused immense pain to people that wanted "proper" Scheme macros that preserved lexical scoping; the state of the art was to eagerly expand an entire file. So we switched, and at the same time added a notion of compile-time. This compromise kept good start-up time while allowing fancy macros.

But eager expansion was a change. Users that relied on side effects from macro expansion would see them at compile-time instead of run-time. Users of old "defmacros" that could previously splice in live Scheme closures as literals in expanded source could no longer do that. I think it was the right choice but it did lose some users. In fact I just got another bug report related to this 10-year-old change last week.

猫咪加速器app

Guile binary ABI: libguile.so; compiled Scheme files

Make compatibility easier: minimize interface

Ex: scm_sym_unquote, GOOPS, Go, Guix

So if you don't want to lose users, don't change any interface. The easiest way to do this is to minimize your interface surface. In Go, for example, they mostly haven't had dynamic-linking problems because that's not a thing they do: all code is statically linked into binaries. Similarly, Guix doesn't define a stable API, because all of its code is maintained in one "monorepo" that can develop in lock-step.

You always have some interfaces, though. For example Guix can't change its command-line interface from one day to the next, for example, because users would complain. But it's been surprising to me the extent to which Guile has interfaces that I didn't consider. Recently for example in the 3.0 release, we unexported some symbols by mistake. Users complained, so we're putting them back in now.

parallel installs for the win

Highly effective pattern for change

libguile-2.0.so
libguile-3.0.so

http://ometer.com/parallel.html

Changed ABI is new ABI; it should have a new name

Ex: make-struct/no-tail, GUILE_PKG([2.2]), libtool

So how does one do incompatible change? If "don't" isn't a sufficient answer, then parallel installs is a good strategy. For example in Guile, users don't have to upgrade to 3.0 until they are ready. Guile 2.2 happily installs in parallel with Guile 3.0.

As another small example, there's a function in Guile called 猫咪加速器app (old doc link), whose first argument is the number of "tail" slots, followed by initializers for all slots (normal and "tail"). This tail feature is weird and I would like to remove it. Unfortunately I can't just remove the argument, so I had to make a new function, make-struct/no-tail, which exists in parallel with the old version that I can't break.

花猫加速器无限免费的科学加速器 - 精品软件辅助岛:2021-6-13 · 花猫加速器可能也是一直在推广，说不定那天就不能送免费的5G流量了，这里为了保存老版本，或者防止官网软件无法下载，无作为特意的保存了win、mac还有安卓的软件包，在下面普通下载，大家如果无法现在或者新版本无法免费使用，可以通过这个下载。

花猫加速器电脑版下载_官方正式版下载_52pk软件下载:2021-6-29 · 花猫加速器1.1.0.8电脑版官方下载 版本：v1.1.0.8 【软件介绍】 花猫加速器免费版 是一款电脑网游加速工具，支持一号通用，无需复杂的配置，软件对于系统资源的占用也非常小。花猫加速器免费版拥有良好的兼容性，有效帮助提高网络连接的 ...

(issue-deprecation-warning
 "(ice-9 mapping) is deprecated."
 "  Use srfi-69 or rnrs hash tables instead.")

scm_c_issue_deprecation_warning
  ("Arbiters are deprecated.  "
   "Use mutexes or atomic variables instead.");

begin-deprecated, SCM_ENABLE_DEPRECATED

Fortunately there is a way to encourage users to migrate from old interfaces to new ones: deprecation. In Guile this applies to all of our interfaces (binary, source, etc). If a feature is marked as deprecated, we cause its use to issue a warning, ideally at compile-time when users responsible for the package can fix it. You can even add 花猫直播手机app下载-花猫直播破解版免费下载v1.3.9 - pk游戏网:2021-6-10 · 花猫直播宝盒app最新破解版是一款别出心裁的美女在线视频直播软件，它汇集了各种类型的主播们，萝莉、御姐、知心姐姐、甜心的台湾小姐姐等，十八般武艺皆会，更有零距离互动环节。 on C types!

the arch-pattern

Replace, Deprecate, Remove

熊猫加速器免费版2021（附免费补丁）下载|熊猫加速器永久 ...:2021-12-20 · 迅游加速器破解版海外版迅游加速器破解版天堂1网游加速器 v2.47.143 官网版 CC挂载器永久有效版 2.16 绿色版神控加速器 1.07 官方免费版迅游装甲风暴专版加速器官方版迅游装甲风暴专版加速器 h1z1免费加速器 2.0 官方正式版生死狙击炎魔咆哮刷枪

Applies to all interfaces

花猫直播app二维码下载-花猫直播宝盒无限制观看破解版下载 ...:2021-2-16 · 花猫直播安卓版是一款真人美女视频直播互动平台。这里汇聚了海量美女主播，她们不仅颜值爆表，而且多才多艺，每天都为 ...

Ex: scm_t_uint8; make-struct; Foreign objects; uniform vectors

Finally, you end up in a situation where you have replaced the old interface and issued deprecation warnings to help users migrate. The next step is to remove the old interface. If you don't do this, you are failing as a project maintainer -- your project becomes literally unmaintainable as it just grows and grows.

This strategy applies to all changes. The deprecation period may last a while, and it may be that the replacement you built doesn't serve the purpose. There is still a dialog with the users that needs to happen. As an example, I made a replacement for the "SMOB" facility in Guile that allows users to define new types, backed by C interfaces. This new "foreign object" facility might not actually be good enough to replace SMOBs; since I haven't formally deprecatd SMOBs, I don't know yet because users are still using the old thing!

change produces a new stable point

Stability within series: only additions

花猫游戏加速器 V1.1.0.8 免费版花猫加速器破解版下载_星 ...:2021-6-14 · 花猫游戏加速器是个热门的网游加速器，体积小、界面简洁友好，支持多种热门网游，智能加速节点模式让你在复杂的网络环境下也能享受稳定的网络服务，不卡不掉线，轻轻松松玩网游！【特色说明】 1.多国线路优质节点

for your definition of stable
social norms help (GNU, semver)

Ex: libtool; unistring; gnulib

In my experience, the old management dictum that "the only constant is change" does not describe software. Guile changes, then it becomes stable for a while. You need an unstable series escape hill-climbing, then once you found your new hill, you start climbing again in the stable series.

Once you reach your stable point, the projects you rely on need to exhibit the same degree of stability that you envision for your project. You can't build a web site that you expect to maintain for 10 years on technology that fundamentally changes every 6 months. But stable dependencies isn't something you can ensure technically; rather it relies on social norms of who makes the software you use.

who can crank the motor of history?

All libraries define languages

Allow user to evolve the language

User functionality: modules (Guix)
User syntax: macros (yay Scheme)

Guile 1.8 perf created tension

incorporate code into Guile
large C interface “for speed”

Compiler removed pressure on C ABI

Empowered users need less from you

A dialectic process does not progress on its own: it requires actions. As a project maintainer, some of my actions are because I want to do them. Others are because users want me to do them. The user-driven actions are generally a burden and as a lazy maintainer, I want to minimize them.

Here I think Guile has to a large degree escaped some of the pressures that weigh on other languages, for example Python. Because Scheme allows users to define language features that exist on par with "built-in" features, users don't need my approval or intervention to add (say) new syntax to the language they work in. Furthermore, their work can still compose with the work of others, even if the others don't buy in to their language extensions.

Still, Guile 1.8 did have a dynamic whereby the relatively poor performance of having to run all code through primitive-eval meant that users were pushed towards writing extensions in C. This in turn pushed Guile to expose all of its guts for access from C, which obviously has led to an overbloated C API and ABI. Happily the work on the Scheme compiler has mostly relieved this pressure, and we may therefore be able to trim the size of the C API and ABI over time.

contributions and risk

From maintenance point of view, all interface is legacy

Guile: Sometimes OK to accept user modules when they are more stable than Guile

猫咪加速器app

Ex: SSAX, fibers, SRFI

H1Z1花猫加速器官方版下载_H1Z1花猫加速器v1.1.0.8最新版 ...:2021-3-18 · H1Z1花猫加速器是一款实用的网络加速器，适用于各个平台，同时也支持许多市面上的热门游戏，西西小编在这路推荐H1Z1花猫加速器，它拥有多种协议，适合各种复杂网络环境，加密性强，拥有良好的平台与网络兼容性，提高网络连接稳定性，让您拥有更舒心的网络体验!需要的朋友赶紧下载使用吧。

I would note an interesting effect: pieces of code that were adopted into Guile become a snapshot of the coding style at that time. It's useful to have some in-tree users because it gives you a better idea about how a project is seen from the outside, from a code perspective.

sticky bits

Local maximum: Boehm-Demers-Weiser conservative collector

How to get to precise, generational GC?

Not just Guile; e.g. CPython __del__

There are some points that resist change. The stickiest of these is the representation of heap-allocated Scheme objects in C. Guile currently uses a garbage collector that "automatically" finds all live Scheme values on the C stack and in registers. It was the right choice at the time, given our maintenance budget. But to get the next bump in performance, we need to switch to a generational garbage collector. It's hard to do that without a lot of pain to C users, essentially because the C language is too weak to express the patterns that we would need. I don't know how to proceed.

I would note, though, that memory management is a kind of cross-cutting interface, and that it's not just Guile that's having problems changing; I understand PyPy has had a lot of problems regarding changes on when Python destructors get called due to its switch from reference counting to a proper GC.

future

猫咪加速器app

And then?

Parallel-installability for source languages: #lang
Sediment idioms from Racket to evolve Guile user base

Remove myself from “holding the crank”

So where are we going? Nowhere, for the moment; or rather, up the hill. We just released Guile 3.0, so let's just appreciate that for the time being.

But as far as next steps in language evolution, I think in the short term they are essentially to further enable change while further sedimenting good practices into Guile. On the change side, we need parallel installability for entire languages. Racket did a great job facilitating this with #lang and we should just adopt that.

As for sedimentation, we should step back and if any common Guile use patterns built by our users should be include core Guile, and widen our gaze to Racket also. It will take some effort both on a technical perspective but also on a social/emotional consensus about how much change is good and how bold versus conservative to be: putting the dialog into dialectic.

dialectic, boogie woogie woogie

http://gnu.org/s/guile

一只猫加速器

#guile on freenode

@andywingo

wingo@igalia.com

Happy hacking!

Hey that was the talk! Hope you enjoyed the writeup. Again, video and slides available on the FOSDEM web site. Happy hacking!

(19)

thoughts on rms and gnu

8 October 2019 3:34 PM (rms | gnu)

Yesterday, a collective of GNU maintainers publicly posted a statement advocating collective decision-making in the GNU project. I would like to expand on what that statement means to me and why I signed on.

For many years now, I have not considered Richard Stallman (RMS) to be the head of the GNU project. Yes, he created GNU, speaking it into existence via prophetic narrative and via code; yes, he inspired many people, myself included, to make the vision of a GNU system into a reality; and yes, he should be recognized for these things. But accomplishing difficult and important tasks for GNU in the past does not grant RMS perpetual sovereignty over GNU in the future.

ontological considerations

More on the motivations for the non serviam in a minute. But first, a meta-point: the GNU project does not exist, at least not in the sense that many people think it does. It is not a legal entity. It is not a charity. You cannot give money to the GNU project. Besides the manifesto, GNU has no by-laws or constitution or founding document.

One could describe GNU as a set of software packages that have been designated by RMS as forming part, in some way, of GNU. But this artifact-centered description does not capture movement: software does not, by itself, change the world; it lacks agency. It is the people that maintain, grow, adapt, and build the software that are the heart of the GNU project -- the maintainers of and contributors to the GNU packages. They are the GNU of whom I speak and of whom I form a part.

wasted youth

Richard Stallman describes himself as the leader of the GNU project -- the "chief GNUisance", he calls it -- but this position only exists in any real sense by consent of the people that make GNU. So what is he doing with this role? Does he deserve it? Should we consent?

To me it has been clear for many years that to a first approximation, the answer is that RMS does nothing for GNU. RMS does not write software. He does not design software, or systems. He does hold a role of accepting new projects into GNU; there, his primary criteria is not "does this make a better GNU system"; it is, rather, "does the new project meet the minimum requirements".

By itself, this seems to me to be a failure of leadership for a software project like GNU. But unfortunately when RMS's role in GNU isn't neglect, more often as not it's negative. RMS's interventions are generally conservative -- to assert authority over the workings of the GNU project, to preserve ways of operating that he sees as important. See for example the whole glibc abortion joke debacle as an example of how RMS acts, when he chooses to do so.

Note, however, that my personal perspective here is not a consensus position of the GNU project. There are many (most?) GNU developers that still consider RMS to be GNU's rightful leader. I think they are mistaken, but I do not repudiate them for this reason; we can work together while differing on this and other matters. I simply state that I, personally, do not serve RMS.

selective attrition

Though the "voluntary servitude" questions are at the heart of the recent joint statement, I think we all recognize that attempts at self-organization in GNU face a grave difficulty, even if RMS decided to retire tomorrow, in the way that GNU maintainers have selected themselves.

The great tragedy of RMS's tenure in the supposedly universalist FSF and GNU projects is that he behaves in a way that is particularly alienating to women. It doesn't take a genius to conclude that if you're personally driving away potential collaborators, that's a bad thing for the organization, and actively harmful to the organization's goals: software freedom is a cause that is explicitly for everyone.

We already know that software development in people's free time skews towards privilege: not everyone has the ability to devote many hours per week to what is for many people a hobby, and it follows of course that those that have more privilege in society will be more able to establish a position in the movement. And then on top of these limitations on contributors coming in, we additionally have this negative effect of a toxic culture pushing people out.

The result, sadly, is that a significant proportion of those that have stuck with GNU don't see any problems with RMS. The cause of software freedom has always run against the grain of capitalism so GNU people are used to being a bit contrarian, but it has also had the unfortunate effect of creating a cult of personality and a with-us-or-against-us mentality. For some, only a traitor would criticise the GNU project. It's laughable but it's a thing; I prefer to ignore these perspectives.

Finally, it must be said that there are a few GNU people for whom it's important to check if the microphone is on before making a joke about rape culture. (Incidentally, RMS had nothing to say on that issue; how useless.)

So I honestly am not sure if GNU as a whole effectively has the 旋风加速破解版器下载 to make good decisions. Neglect and selective attrition have gravely weakened the project. But I stand by the principles and practice of software freedom, and by my fellow GNU maintainers who are unwilling to accept the status quo, and I consider attempts to reduce GNU to founder-loyalty to be mistaken and without legitimacy.

where we're at

迅雷加速器官方版下载_迅雷加速器绿色免费版 - Win7旗舰版:2021-7-26 · 迅雷加速器绿色免费版，永久免费，为您高效解决掉线、延迟等网络问题!迅雷加速器绿色免费版是由深圳迅雷网络技术有限公司独立研发的第三代网游加速器，它秉承迅雷“快”的精髓，采用最先进的光纤加速技术，能够最有效改善您的网络环境。。我们承诺，基础加速服务“永久免费

In the meantime, as always, happy hacking, and: no gods! No masters! No chief!!!

(0)