/
X32 – A X32 – A

X32 – A - PowerPoint Presentation

lois-ondreau
lois-ondreau . @lois-ondreau
Follow
364 views
Uploaded On 2017-11-13

X32 – A - PPT Presentation

Native 32bit ABI For X8664 HJ Lu H Peter Anvin Milind Girkar September 2011 Agenda Problem statement x32 psABI Code examples Performance Data Status Challenges 2 Problem Statement ID: 605047

movl x32 cpu eax x32 movl eax cpu edx spec bit esp psabi int system integer ecx 2006 support

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "X32 – A" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

X32 – A Native 32bit ABI For X86-64

H.J. Lu, H Peter Anvin, Milind Girkar

September, 2011Slide2

AgendaProblem statementx32 psABI

Code examples

Performance Data

StatusChallenges

2Slide3

Problem Statement

Intel Architecture must improve

performance in 32bit

.Intel6416 64-bit integer registers and 16 SSE registers.

SSE FP has much better performance than x87.

Many markets for Intel devices currently do not require more than a 32-bit address space.

In many environments, ABI Legacy compatibility is not a concern:No expectation of running existing desktop applications.Existing 32bit psABI doesn’t fully exploit Intel64i386 psABI was developed more than 20 years ago.Only 8 32-bit integer register and 8 SSE registers are usable.x87 is used for FP.Even if function uses SSE FP, return values are still passed in x87.Leaves performance on the table for IAi386 Position Independent Code (PIC), which is used extensively in shared libraries, slows down performance by >20%.Passes parameters in memory instead of registers.

3

Can we exploit Intel64 to improve 32bit performance

?Slide4

X32 – A New 32bit ABI for Intel Architecture

x32 psABI: Same as “small” model in x86-64

psABI

with 32-bit address space:Long and pointer size are 32 bits.

64bit

imul

latency is twice of 32bit imul on Atom. Some EEMBC 1.1 benchmarks are 2X slower when switching from IA32 to intel64.16 64-bit integer registers (8 additional integer registers)8 additional SSE registersUse SSE for floating point math (not slower x87)IP-relative addressing is fasterExtensively used in PIC code – all shared libraries. Very important for Android.Up to 6 integer/FP function parameters can be passed in registersOptimal data alignment:64bit integer and double are aligned to 8byte instead of 4byte.Efficient 64-bit integer code:

Resolve 64bit computation bottleneck on Android, with 64bit integer instructions like divdi3/moddi3.Bring new processor features to 32-bit applications:Some processor features only available in 64-bit mode.

128-bit integer arithmetic is only supported with Intel64.

4

Same memory footprint as IA32 with advantages of Intel64. No hardware changes are requiredSlide5

ABI Comparison

i386

x86-64

x32

Integer registers

6

(PIC)1515FP registers81616

Pointers4 bytes

8

bytes

4

bytes64-bit arithmetic

No

Yes

YesFloating pointx87SSESSECalling conventionMemoryRegistersRegistersPIC prologue2-3 insnNoneNone

5Slide6

Efficient Position Independent Code

i

386

psABI

call __i686.get_pc_thunk.cx

addl $_GLOBAL_OFFSET_TABLE_, %ecx movl y@GOT(%ecx), %eax movl x@GOT(%ecx

), %edx movl

(%

eax

), %eax

imull (%edx), %

eax

movl z@GOT(%ecx), %edx movl %eax, (%edx) ret__i686.get_pc_thunk.cx: movl (%esp), %ecx retextern int x, y, z;void foo () { z = x * y; }

x32

psABI

movl

x@GOTPCREL(%rip), %edx movl y@GOTPCREL(%rip), %eax movl (%rax), %rax imull (%rdx), %rax movl z@GOTPCREL(%rip), %edx movl %rax, (%rdx) ret

6

X32 PIC code is shorter and faster.Slide7

Efficient 64-bit Integer Arithmetic

i

386

psABI

movl x+4, %edx movl y+4, %eax imull y, %edx imull x, %eax

leal (%

edx,%eax

), %

ecx

movl y, %eax

mull x

addl %ecx, %edx movl %eax, z movl %edx, z+4 retextern long long x, y, z;void foo () { z = x * y; }x32 psABI

movq

x(%rip), %

rax

imulq y(%rip), %rax movq %rax, z(%rip) ret7X32 provides very efficient 64-bit integer support (3 instructions vs. 10 instructions).Slide8

Efficient Function Parameter Passing

i386

psABI

subl

$28, %

esp movl 32(%esp), %eax movl %eax, 8(%esp) movl 40(%esp

), %eax movl

%

eax

, 4(%esp)

movl 36(%esp), %

eax

movl %eax, (%esp) call bar addl $28, %esp retvoid bar (int x, int y, int z);void foo (int x, int y, int z) { bar (y, z, x); }

x32

psABI

subl

$8, %esp movl %edi, %eax movl %esi, %edi movl %edx, %esi movl %eax, %edx call bar addl $8, %esp ret8

X32 passes parameters in registers.Slide9

Efficient Floating Point Operation

i386

psABI

subl

$12, %

esp movsd bar, %xmm0 mulsd %xmm0, %xmm0 movsd %xmm0, (%esp) fldl (%esp)

addl $12, %esp ret

extern double bar;

float

foo

() { return bar * bar; }

x32

psABI

movsd bar(%rip), %xmm0 mulsd %xmm0, %xmm0 ret9X32 doesn’t use X87 to return FP value.Slide10

Possible Use CasesUseful in closed or semi-closed environments

Open environments tend to be constrained by compatibility

Classic embedded devices

Yocto

:

The minimal x32 image is available.

10Slide11

StatusExtended x86-64 psABI to support x32

Checked x32 support into Linux assembler/

disassembler

and linker.Checked x32 support into GCC 4.7:

Scheduled to be released in Q1’12.

Passed the GCC testsuite.

Passed SPEC CPU 2K/2006.Started hourly x32 tracking with:The GCC testsuite.SPEC CPU 2K/2006.Created a prototype Linux kernel with x32 support:Full system call implementation.vDSOCore dump64bit file system calls.64bit time_t.Target Linux kernel 3.2/Q4’11.

Added x32 support to the GNU C library x32 branch.Added x32 support to GDB x32 branch.Created the x32 project website:

https://sites.google.com/site/x32abi/

11Slide12

X32 System Call Interface

Use SYSCALL instructions:

Faster than “INT $80”.

Share system table to minimize overhead:

Use x86-64 system calls as much as possible.

Separate system call table adds overhead to the native x86-64 case.

Use 32bit compat system call paths for struct function parameters with:Indirect pointer references.Long fields.28 compat system calls.Use bit 30 in system call number to support the input system compatibility handling:Data structures that include pointers on the read/write path.sizeof(long)-sensitive text strings in sysfs.

12Slide13

Performance DataOn Core i7 2600K 3.40GHz:

Improved SPEC CPU 2K/2006 INT

geomean

by 7-10% over ia32 and 5-8% over Intel64.Improved SPEC CPU 2K/2006 FP geomean

by 5-11% over ia32.

Very little changes in SPEC CPU 2K/2006 FP

geomean, comparing against Intel64.Comparing against ia32 PIC, x32 PIC:Improved SPEC CPU 2K INT by another 10%.Improved SPEC CPU 2K FP by another 3%.Improved SPEC CPU 2006 INT by another 6%Improved SPEC CPU 2006 FP by another 2%.13Slide14

SPEC CPU 2000 INTSlide15

SPEC CPU 2000 FPSlide16

SPEC CPU 2006 INTSlide17

SPEC CPU 2006 FPSlide18

SPEC CPU 2000 INT (PIC)Slide19

SPEC CPU 2000 FP (PIC)Slide20

SPEC CPU 2006 INT (PIC)Slide21

SPEC CPU 2006 FP (PIC)Slide22

X32 Compatibilityx32 libraries can co-exist with existing libraries of 32-bit and 64-bit

psABIs

:

/lib Libraries for i386 psABI

/lib64

Libraries for x86-64 psABI

/libx32 Libraries for x32 psABICan’t link/load IA32 shared libraries into x32 executables/shared libraries. x32 is not compatible with IA32:Signal handlingException handlingProcedure Linkage Table (PLT), Global Offset Table (GOT), Thread Local Storage (TLS) and relocationsData structure memory layoutIA32 browser plugins work correctly with x32 browsers since plugins can be run as a separate process:Adobe Flash (via nsplugin)Acrobat ReaderThe x86-64 and x32 assembly codes are remarkably close. Porting x86-64 assembly code to x32:

Address and long are 32 bits instead of 64 bits

22Slide23

ChallengesRequire the whole new ecosystem:

Must provide new set of

libs

etc., port code, maintain a new version etc in addition to existing ABIs.No fuse off Intel64.

64bit BIOS/FW/Driver.

Need the Linux community support.

Some porting is required:Need x32 porting guide.Non-trivial resources needed to improve x32 tools:GCCOptimization for x32Improve x32 addressing modesAvoid unnecessary LEA instructions.Get other compilers e.g. icc and tools to use x32.Kernel:Deploying a new ABI gives an opportunity for a clean slateAdd some new compat system calls.

23Slide24

Unnecessary LEA InstructionsIA32:

mov

0x4(%ebx,%eax,4),%

edx cmp %ecx,%edx

X32:

lea (%r9,%rax,4),%

edx mov %edx,%ecx mov 0x4(%rcx),%ecxcmp %esi,%ecxSlide25

Call For ActionX32 support for Fedora 15 is available at

https://sites.google.com/site/x32abi/

Enable x32 via yum:

X32 kernel rpm.X32 glibc rpmsInstall x32

toolchain

:

Download x32 GCC/Binutils/GDBCollect performance data on your favorite benchmarks:ia32: -m32Intel64: -m64x32: -mx32Report x32 toolchain bugs:Provide testcases25Slide26

Backup

26Slide27

X32 HistoryMIPS has o32, n32 and n64

psABIs

. According to

http://www.prompro.com/Address_Models_Oracle.pdf

“Because the N32 ABI can increase an application's performance by 25 percent, many customers make it their choice when designing applications.”

27Slide28

Intel Confidential 28

Related Contents


Next Show more