1 Reference
- http://en.wikipedia.org/wiki/Java_class_file
- Java Specification Request-202 Chapter 4
2 文件头
struct Class_File_Format { u4 magic_number; u2 minor_version; u2 major_version; u2 constant_pool_count; cp_info constant_pool[constant_pool_count - 1]; u2 access_flags; u2 this_class; u2 super_class; u2 interfaces_count; u2 interfaces[interfaces_count]; u2 fields_count; field_info fields[fields_count]; u2 methods_count; method_info methods[methods_count]; u2 attributes_count; attribute_info attributes[attributes_count]; }
2.1 magic\number
固定值 "0xcafebabe",但是在一个文件中看到的竟然是 c38a c3be c2ba c2be ,只好换一个文件看了.
A Java virtual machine implementation can support a class file format of version v if and only if v lies in some contiguous range Mi.0 ≤ v ≤Mj.m. Only Sun can specify what range of versions a Java virtual machine implementation conforming to a certain release level of the Java platform may support.
2.2 minor\version and major\version
看到的两个都是 0000 0032 ,
2.3 constant\pool\count and constant\pool
constant\pool 范围是 1~constant\pool\count-1, constant\pool中的数据结构是变长的:
cp_info { u1 tag; u2 info[]; }
前面的tag表明类型.类型表格:
Constant Type | Value |
CONSTANT\Class | 7 |
CONSTANT\Fieldref | 9 |
CONSTANT\Methodref | 10 |
CONSTANT\InterfaceMethorref | 11 |
CONSTANT\String | 8 |
CONSTANT\Integer | 3 |
CONSTANT\Float | 4 |
CONSTANT\Long | 5 |
CONSTANT\Double | 6 |
CONSTANT\NameAndType | 12 |
CONSTANT\Utf8 | 1 |
后面跟随的是则是根据tag不同的结构.
2.3.1 CONSTANT\Class
CONSTANT_Class_info { u1 tag; u2 name_index; }
注意,这里代表的是cpinfo的整个,所以也包括了tag.name\index则代表的具体的名称的index,具体的名称同样也会在pool里面,指向的类型应该是 CONSTANT\Utf8\info 的类型.
数组也是对象,表示形式比较特殊:
int[][] => [[I
Thread[] => [Ljava/lang/Thread;
An array type descriptor is valid only if it represents 255 or fewer dimensions.
这句是什么意思呢?最多只支持255维度?
2.3.2 CONSTANT\*ref\info
*代表的是Field/Method/Interface,这几个结构一致.
CONSTANT_*ref_info { u1 tag; u2 class_index; u2 name_and_type_index; }
- class\index
指向的内容,必须是 CONSTANT\Class\info 的结构. 而Methodref指向的不能是interface而只能是class.Interfaceref的只能是interface. Fieldref的则两种随便.
- name\and\type\index
指向的则是 CONSTANT\NameAndType\info, 这个表述的是method/field的名字和描述1.field的必须是一个field descriptor. CONSTANT\Methodref\info的名字如果以 '<' 开始, 就需要是固定的 <init> ,表示 an instance initialization method, 其返回值必须是void.
- field descriptor
A field descriptor represents the type of a class, instance, or local variable. It is a series of characters generated by the grammar FieldDescriptor: FieldType ComponentType: FieldType FieldType: BaseType ObjectType ArrayType BaseType: *B* *C* *D* *F* *I* *J* *S* *Z* ObjectType: L Classname; ArrayType: [ComponentType
/BaseType/的对应表格如下:
BaseType Character Type Interpretation B byte signed byte C char Unicode character D double double-precision floating-point value F float single-precision floating-point value I int integer J long long integer L Classname; reference an instance of class <classname> S short signed short Z boolean true or false [ reference one array dimension - 剩下的都要是method descriptor
直接引用吧:
MethodDescriptor: ( ParameterDescriptor* ) ReturnDescriptor A parameter descriptor represents a parameter passed to a method: ParameterDescriptor: FieldType A return descriptor represents the type of the value returned from a method. It is a series of characters generated by the grammar: ReturnDescriptor: FieldType VoidDescriptor VoidDescriptor: *V*
这里,parameters的length要少于等于255.具体的length计算要包括所有的parameters的和, long 或者 double 代表两个单元,而其他的都代表一个单元2.还要注意,实体类和接口方法调用的时候,this这个参数也是要算进去的3.
Object mymethod(int i, double d, Thread t) => (IDLjava/lang/Thread;)Ljava/lang/Object;
- field descriptor
2.3.3 CONSTANT\String\info
CONSTANT_String_info { u1 tag; u2 string_index; }
比较简单,index指向的必须是 CONSTANT\Utf8\info
2.3.4 CONSTANT\Integer\info and CONSTANT\Float\info
CONSTANT_*_info { u1 tag; u4 bytes; }
bytes中保存是常量的值,其中float的是IEEE 754 floating-point single format.都是big-endian. 表示float的value,首先转换为int,直接引用:
- If bits is 0x7f800000, the float value will be positive infinity.
- If bits is 0xff800000, the float value will be negative infinity.
- If bits is in the range 0x7f800001 through 0x7fffffff or in the range 0xff800001 through 0xffffffff, the float value will be NaN.
- In all other cases, let s, e, and m be three values that might be computed from bits:
- int s = ((bits >> 31) == 0) ? 1 : -1;
- int e = ((bits >> 23) & 0xff);
- int m = (e == 0) ? (bits & 0x7fffff) << 1 : (bits & 0x7fffff) | 0x800000;
- Then the float value equals the result of the mathematical expression . s * m* 2(e-150)
2.3.5 CONSTANT\Long\info and CONSTANT\Double\info
CONSTANT_*_info { u1 tag; u4 high_bytes; u4 low_bytes; }
这里计算index有个特殊情况,这两个结构一次占用两个位置.
All 8-byte constants take up two entries in the constant_pool table of the class file. If a CONSTANT_Long_info or CONSTANT_Double_info structure is the item in the constant_pool table at index n, then the next usable item in the pool is located at index n +2. The constant_pool index n +1 must be valid but is considered unusable.
基本类似上面的integer和float,只不过长度加多了.有关double的确定也是一样:
- If bits is 0x7ff0000000000000L, the double value will be positive infinity.
- If bits is 0xfff0000000000000L, the double value will be negative infinity.
- If bits is in the range 0x7ff0000000000001L through 0x7fffffffffffffffL or in the range 0xfff0000000000001L through 0xffffffffffffffffL, the double value will be NaN.
- In all other cases, let s, e, and m be three values that might be computed from bits:
- int s = ((bits >> 63) == 0) ? 1 : -1;
- int e = (int)((bits >> 52) & 0x7ffL);
- long m = (e == 0) ? (bits & 0xfffffffffffffL) << 1 : (bits & 0xfffffffffffffL) | 0x10000000000000L;
- Then the floating-point value equals the double value of the mathematical expression s*m*2(e-1075)
2.3.6 CONSTANT\NameAndType\info
CONSTANT_NameAndType_info { u1 tag; u2 name_index; u2 descriptor_index; }
index指向的都是 CONSTANT\Utf8\info,一个是名字,一个是 descriptor.
2.3.7 CONSTANT\Utf8\info
The CONSTANT\Utf8\info structure is used to represent constant string values.String content is encoded in modified UTF-8.
和标准的UTF-8有小不同:
There are two differences between this format and the “standard” UTF-8 format. First, the null character (char)0 is encoded using the 2-byte format rather than the 1-byte format, so that modified UTF-8 strings never have embedded nulls. Second, only the 1-byte, 2-byte, and 3-byte formats of standard UTF-8 are used. The Java VM does not recognize the four-byte format of standard UTF-8; it uses its own two-times-three-byte format instead.
结构如下:
CONSTANT_Utf8_info { u1 tag; u2 length; u1 bytes[length]; }
没有可以多废话,很明显,限制条件是,不能是0 和 range(0xf0, 0xff)4
2.4 扯远了,要回来了,access\flags
列表就可以,如下:
Flag Name | Value | Intepretation |
ACC\PUBLIC | 0x0001 | Declared public; may be accessed from outside its package. |
ACC\FINAL | 0x0010 | Declared final; no subclasses allowed. |
ACC\SUPER | 0x0020 | Treat superclass methods specially when invoked by the invokespecial instruction. |
ACC\INTERFACE | 0x0200 | Is an interface, not a class. |
ACC\ABSTRACT | 0x0400 | Declared abstract; must not be instantiated. |
ACC\SYNTHETIC | 0x1000 | Declared synthetic; Not present in the source code. |
ACC\ANNOTATION | 0x2000 | Declared as an annotation type. |
ACC\ENUM | 0x4000 | Declared as an enum type. |
看位置就可以知道,几个flag可以同时存在,interface必须也要有abstract,annotation有了就要有interface.
super的用来向上兼容,新编译器都应该直接设置.应该是为了 invokespecial 这个指令.
2.5 this\index
指向pool里面的CONSTANTClassinfo类型.
2.6 super\class
除了object都要有,这是废话.不能是final的,这也是废话.interface的都要指向object,这个算不是废话.
2.7 interfaces\count and interfaces[]
顺序是代码中的从左到右,或者是direct superinterface.指向的,自然是pool里面的东西.
2.8 fields\count and fields[]
其中的field是field\info结构
2.8.1 field\info
field_info { u2 access_flags; u2 name_index; u2 descriptor_index; u2 attributes_count; attribute_info attributes[attributes_count]; }
主要就是 attribute\info ,结构:
attribute_info { u2 attribute_name_index; u4 attribute_length; u1 info[attribute_length]; }
有predefined的,SourceFile,ConstantValue,Code,StackMapTable,Exceptions,InnerClasses,EnclosingMethod, Synthetic, Signature, LineNumberTable, LocalVariableTable and Deprecated…后面太多了,不看了.
2.9 method\count and methods[]
methods里面存了所有的方法,除了superclass和superinterface的方法.