Java Class File Format

11 Jan 2013

1 Reference

  1. http://en.wikipedia.org/wiki/Java_class_file
  2. Java Specification Request-202 Chapter 4

2 文件头

struct Class_File_Format {
   u4 magic_number;   

   u2 minor_version;   
   u2 major_version;   

   u2 constant_pool_count;   

   cp_info constant_pool[constant_pool_count - 1];

   u2 access_flags;

   u2 this_class;
   u2 super_class;

   u2 interfaces_count;   

   u2 interfaces[interfaces_count];

   u2 fields_count;   
   field_info fields[fields_count];

   u2 methods_count;
   method_info methods[methods_count];

   u2 attributes_count;   
   attribute_info attributes[attributes_count];
}

2.1 magic\number

固定值 "0xcafebabe",但是在一个文件中看到的竟然是 c38a c3be c2ba c2be ,只好换一个文件看了.

A Java virtual machine implementation can support a class file format of version v if and only if v lies in some contiguous range Mi.0 ≤ v ≤Mj.m. Only Sun can specify what range of versions a Java virtual machine implementation conforming to a certain release level of the Java platform may support.

2.2 minor\version and major\version

看到的两个都是 0000 0032 ,

2.3 constant\pool\count and constant\pool

constant\pool 范围是 1~constant\pool\count-1, constant\pool中的数据结构是变长的:

cp_info {
  u1 tag;
  u2 info[];
}

前面的tag表明类型.类型表格:

Constant Type Value
CONSTANT\Class 7
CONSTANT\Fieldref 9
CONSTANT\Methodref 10
CONSTANT\InterfaceMethorref 11
CONSTANT\String 8
CONSTANT\Integer 3
CONSTANT\Float 4
CONSTANT\Long 5
CONSTANT\Double 6
CONSTANT\NameAndType 12
CONSTANT\Utf8 1

后面跟随的是则是根据tag不同的结构.

2.3.1 CONSTANT\Class

CONSTANT_Class_info {
  u1 tag;
  u2 name_index;
}

注意,这里代表的是cpinfo的整个,所以也包括了tag.name\index则代表的具体的名称的index,具体的名称同样也会在pool里面,指向的类型应该是 CONSTANT\Utf8\info 的类型.

数组也是对象,表示形式比较特殊:

int[][] => [[I
Thread[] => [Ljava/lang/Thread;

An array type descriptor is valid only if it represents 255 or fewer dimensions.

这句是什么意思呢?最多只支持255维度?

2.3.2 CONSTANT\*ref\info

*代表的是Field/Method/Interface,这几个结构一致.

CONSTANT_*ref_info {
  u1 tag;
  u2 class_index;
  u2 name_and_type_index;
}
  1. class\index

    指向的内容,必须是 CONSTANT\Class\info 的结构. 而Methodref指向的不能是interface而只能是class.Interfaceref的只能是interface. Fieldref的则两种随便.

  2. name\and\type\index

    指向的则是 CONSTANT\NameAndType\info, 这个表述的是method/field的名字和描述1.field的必须是一个field descriptor. CONSTANT\Methodref\info的名字如果以 '<' 开始, 就需要是固定的 <init> ,表示 an instance initialization method, 其返回值必须是void.

    1. field descriptor
        A field descriptor represents the type of a class, instance, or local variable. It is a series of characters generated by the grammar
      FieldDescriptor:
        FieldType
      ComponentType:
        FieldType
      FieldType:
        BaseType
      ObjectType
        ArrayType
      BaseType:
        *B*
        *C*
        *D*
        *F*
        *I*
        *J*
        *S*
        *Z*
      ObjectType:
        L Classname;
      ArrayType:
        [ComponentType
      

      /BaseType/的对应表格如下:

      BaseType Character Type Interpretation
      B byte signed byte
      C char Unicode character
      D double double-precision floating-point value
      F float single-precision floating-point value
      I int integer
      J long long integer
      L Classname; reference an instance of class <classname>
      S short signed short
      Z boolean true or false
      [ reference one array dimension
    2. 剩下的都要是method descriptor

      直接引用吧:

      MethodDescriptor:
        ( ParameterDescriptor* ) ReturnDescriptor
      A parameter descriptor represents a parameter passed to a method:
      ParameterDescriptor:
        FieldType
      A return descriptor represents the type of the value returned from a method. It is a
      series of characters generated by the grammar:
      
      ReturnDescriptor:
        FieldType
        VoidDescriptor
      
      VoidDescriptor:
        *V*
      

      这里,parameters的length要少于等于255.具体的length计算要包括所有的parameters的和, long 或者 double 代表两个单元,而其他的都代表一个单元2.还要注意,实体类和接口方法调用的时候,this这个参数也是要算进去的3.

      Object mymethod(int i, double d, Thread t) => (IDLjava/lang/Thread;)Ljava/lang/Object;

2.3.3 CONSTANT\String\info

CONSTANT_String_info {
  u1 tag;
  u2 string_index;
}

比较简单,index指向的必须是 CONSTANT\Utf8\info

2.3.4 CONSTANT\Integer\info and CONSTANT\Float\info

CONSTANT_*_info {
  u1 tag;
  u4 bytes;
}

bytes中保存是常量的值,其中float的是IEEE 754 floating-point single format.都是big-endian. 表示float的value,首先转换为int,直接引用:

  • If bits is 0x7f800000, the float value will be positive infinity.
  • If bits is 0xff800000, the float value will be negative infinity.
  • If bits is in the range 0x7f800001 through 0x7fffffff or in the range 0xff800001 through 0xffffffff, the float value will be NaN.
  • In all other cases, let s, e, and m be three values that might be computed from bits:
    • int s = ((bits >> 31) == 0) ? 1 : -1;
    • int e = ((bits >> 23) & 0xff);
    • int m = (e == 0) ? (bits & 0x7fffff) << 1 : (bits & 0x7fffff) | 0x800000;
    • Then the float value equals the result of the mathematical expression . s * m* 2(e-150)

2.3.5 CONSTANT\Long\info and CONSTANT\Double\info

CONSTANT_*_info {
  u1 tag;
  u4 high_bytes;
  u4 low_bytes;
}

这里计算index有个特殊情况,这两个结构一次占用两个位置.

All 8-byte constants take up two entries in the constant_pool table of the class
file. If a CONSTANT_Long_info or CONSTANT_Double_info structure is the item
in the constant_pool table at index n, then the next usable item in the pool is
located at index n +2. The constant_pool index n +1 must be valid but is
considered unusable.

基本类似上面的integer和float,只不过长度加多了.有关double的确定也是一样:

  • If bits is 0x7ff0000000000000L, the double value will be positive infinity.
  • If bits is 0xfff0000000000000L, the double value will be negative infinity.
  • If bits is in the range 0x7ff0000000000001L through 0x7fffffffffffffffL or in the range 0xfff0000000000001L through 0xffffffffffffffffL, the double value will be NaN.
  • In all other cases, let s, e, and m be three values that might be computed from bits:
    • int s = ((bits >> 63) == 0) ? 1 : -1;
    • int e = (int)((bits >> 52) & 0x7ffL);
    • long m = (e == 0) ? (bits & 0xfffffffffffffL) << 1 : (bits & 0xfffffffffffffL) | 0x10000000000000L;
    • Then the floating-point value equals the double value of the mathematical expression s*m*2(e-1075)

2.3.6 CONSTANT\NameAndType\info

CONSTANT_NameAndType_info {
  u1 tag;
  u2 name_index;
  u2 descriptor_index;
}

index指向的都是 CONSTANT\Utf8\info,一个是名字,一个是 descriptor.

2.3.7 CONSTANT\Utf8\info

The CONSTANT\Utf8\info structure is used to represent constant string values.String content is encoded in modified UTF-8.

和标准的UTF-8有小不同:

There are two differences between this format and the “standard” UTF-8 format. First, the null character (char)0 is encoded using the 2-byte format rather than the 1-byte format, so that modified UTF-8 strings never have embedded nulls. Second, only the 1-byte, 2-byte, and 3-byte formats of standard UTF-8 are used. The Java VM does not recognize the four-byte format of standard UTF-8; it uses its own two-times-three-byte format instead.

结构如下:

CONSTANT_Utf8_info {
  u1 tag;
  u2 length;
  u1 bytes[length];
}

没有可以多废话,很明显,限制条件是,不能是0 和 range(0xf0, 0xff)4

2.4 扯远了,要回来了,access\flags

列表就可以,如下:

Flag Name Value Intepretation
ACC\PUBLIC 0x0001 Declared public; may be accessed from outside its package.
ACC\FINAL 0x0010 Declared final; no subclasses allowed.
ACC\SUPER 0x0020 Treat superclass methods specially when invoked by the invokespecial instruction.
ACC\INTERFACE 0x0200 Is an interface, not a class.
ACC\ABSTRACT 0x0400 Declared abstract; must not be instantiated.
ACC\SYNTHETIC 0x1000 Declared synthetic; Not present in the source code.
ACC\ANNOTATION 0x2000 Declared as an annotation type.
ACC\ENUM 0x4000 Declared as an enum type.

看位置就可以知道,几个flag可以同时存在,interface必须也要有abstract,annotation有了就要有interface.

super的用来向上兼容,新编译器都应该直接设置.应该是为了 invokespecial 这个指令.

2.5 this\index

指向pool里面的CONSTANTClassinfo类型.

2.6 super\class

除了object都要有,这是废话.不能是final的,这也是废话.interface的都要指向object,这个算不是废话.

2.7 interfaces\count and interfaces[]

顺序是代码中的从左到右,或者是direct superinterface.指向的,自然是pool里面的东西.

2.8 fields\count and fields[]

其中的field是field\info结构

2.8.1 field\info

field_info {
 u2 access_flags;
 u2 name_index;
 u2 descriptor_index;
 u2 attributes_count;
 attribute_info attributes[attributes_count];
}

主要就是 attribute\info ,结构:

attribute_info {
  u2 attribute_name_index;
  u4 attribute_length;
  u1 info[attribute_length];
}

有predefined的,SourceFile,ConstantValue,Code,StackMapTable,Exceptions,InnerClasses,EnclosingMethod, Synthetic, Signature, LineNumberTable, LocalVariableTable and Deprecated…后面太多了,不看了.

2.9 method\count and methods[]

methods里面存了所有的方法,除了superclass和superinterface的方法.

2.10 attributes\count and attributes[]

Footnotes:

1

描述是什么东西?Doc?

2

为什么double和long有特殊呢?

3

原来Java也是要传this的……

4

这个区间到底是开是闭没说,估计应该是闭区间.