This article will explain in detail how to record and replay a configuration center, such as Apollo, using AREX Java Agent.
The current tools that support traffic replication for automated regression testing, such as tcpcopy and diffy, cannot be used for testing non-idempotent interfaces. This is because they record and replay traffic at the network layer outside of the application, which can only validate read-only pages. When it comes to testing interfaces that require writing to the database, it can result in dirty data and even affect the correctness of the business logic.
Unlike these tools, AREX implements traffic recording and playback internally within the application using AOP (Aspect-Oriented Programming). It utilizes Java Agent and bytecode enhancement technology to record the real traffic in the production environment. This approach avoids the impact of traffic replay on business data, making traffic replication regression testing suitable for various Java-based front-end and back-end business systems.
The basic principle of mocking in AREX is that the Java Agent intercepts the class loading process and substitutes the original class with a mock implementation.
Now most open-source components can be mocked with AREX Java Agent. This article will introduce how to mock Apollo Configuration with AREX Agent.
What is Apollo
Apollo is a reliable configuration management system. It can centrally manage the configurations of different applications and different clusters. It is suitable for microservice configuration management scenarios. And Configuration changes take effect in real-time.
Here's the official description of the Apollo base model:
- Users modify and publish the configuration in the configuration center
- The configuration center notifies Apollo clients of configuration updates
- Apollo client pulls the latest configuration from the configuration center, updates the local configuration, and notifies the application
Principle of Apollo client implementation
The above diagram briefly describes the principle of Apollo client implementation:
- The client and the server maintain a long connection to get the first push of configuration updates. (achieved through Http Long Polling)
- The client also regularly pulls the latest application configuration from the Apollo Configuration Center server.
- After the client gets the latest configuration of the application from the Apollo Configuration Center server, it will be saved in memory
Development Process
From the above figure, we can see that AREX only needs to support Apollo client recording and playback, i.e. Java application projects internally refer to the apollo-client component:
<dependency>
<groupId>com.ctrip.framework.apollo</groupId>
<artifactId>apollo-client</artifactId>
<version>{apollo-client.version}</version>
</dependency>
Typically, there are three ways in which Apollo is commonly used in projects:
- Spring Autowired annotation
configBean
(internally using EnableApolloConfig annotation) - Based on Apollo's built-in annotation ApolloConfig, such as the
config
object in the code - API Mode, such as the config1 object in the code below
@Autowired
ConfigBean configBean; // The first way, internally using EnableApolloConfig annotation
@ApolloConfig("TEST1.lucas")
private Config config; // The second way
private Config config1; // The third way, calling getAppConfig
public void test() {
config1 = ConfigService.getAppConfig();
System.out.println("timeout="+config.getProperty("timeout", "0"));
System.out.println("switch="+config.getBooleanProperty("switch", false));
System.out.println("json="+config.getProperty("json", ""));
System.out.println("white.list="+config1.getProperty("flight.change.white.list", ""));
System.out.println("configBean="+configBean);
// Listening for Apollo Configuration Changes
ConfigChangeListener changeListener = changeEvent -> {
System.out.println("Changes for namespace:" + changeEvent.getNamespace());
};
config.addChangeListener(changeListener);
}
@Component
@Configuration
@EnableApolloConfig("TEST1.sofia")
public class ConfigBean {
@Value("${age:0}")
int age;
@Value("${name:}")
String name;
@ApolloJsonValue("${resume:[]}")
private List<JsonBean> jsonBean;
}
If AREX needs to implement the recording and playback of Apollo, it needs to be compatible with these three ways of use. By examining the Apollo source code, it is found that the first two modes, based on the annotations EnableApolloConfig and ApolloConfig, and the last mode, which calls the API, all ultimately create instances using ConfigService.getAppConfig(). This means that the underlying API is shared. Therefore, we can modify these underlying Apollo methods by inserting AREX bytecode, in order to achieve the goal of recording and playback.
Recording Implementation
All configuration items in Apollo are distinguished based on namespaces. In order to perform recording, we need to obtain all configuration instances, which means getting the config instance corresponding to each namespace. Further examination of the apollo-client source code reveals that the config instances are maintained in the Map<String, Config> m_configs
of the DefaultConfigManager
class.
However, several issues need to be considered:
- The
m_configs
property is private and there is no related API to access it. - This instance is rarely called during business runtime, so it may not be possible to obtain
m_configs
through conventional Arex mocking methods. - After obtaining the
m_configs
instance, it is also necessary to obtain them_configProperties
in the Config class, as this is where the actual configuration data resides.
The dependency relationships in UML:
Therefore, it is advisable to use reflection to obtain all configurations and record them. This approach would only invoke reflection for recording during the initial startup or when configuration changes occur, which happens infrequently.
Another consideration is the timing of recording. Take the above code for example, the third way to use Apollo is to create the "config1" instance within the business interface "test()":
Config config1 = ConfigService.getAppConfig()
This can be regarded as an incremental configuration instance (compared to the first two ways where the full configuration instance is created at project startup using annotations). Therefore, we need to make sure that both can be recorded. The current approach is to perform recording at the "postHandle" point after the main entry servlet/Dubbo interface has completed the request and returned the result. This way, regardless of which way the "config" instance is created, we can obtain and record it.
If the Apollo configuration changes during recording, we can add modification code to the Apollo source code: com.ctrip.framework.apollo.internals.DefaultConfig#updateAndCalcConfigChanges
method to listen for change events and reopen our recording switch. Turn on our recording switch so that we can record to the new configuration the next time.
When recording, it will generate a version number to distinguish the batch of test cases recorded in different periods, i.e., the version number acts as a batch concept, as shown in the timeline below:
Replay Implementation
Similar to the recording implementation, in the case of replay, by using reflection, the m_configProperties
can be assigned values, and the mocked configuration can be used to override the real configuration.
There are several issues to consider:
- How to trigger a configuration change listener method set by the application, such as the changeListener method in the Apollo usage above;
- During replay, the long polling of Apollo for configuration changes may overwrite our configuration for replay, which needs to be avoided;
- How to ensure the correctness of configuration data when replaying multiple versions of configurations?
- How to restore the original configurations after replay?
Considering the above issues, the implementation of recording is not comprehensive for replaying and cannot meet these specific scenarios.
After a deep dive into the source code, we choose to modify the loadApolloConfig
method in the com.ctrip.framework.apollo.internals.RemoteConfigRepository
class. Before requesting the server configuration, we directly return our mocked configuration data. This allows us to leverage the existing mechanisms in Apollo to trigger the complete configuration update process.
Here's the solution:
- Return the mocked configuration directly if the real Apollo-Server service is not called during playback.
- After replay, the method is no longer mocked (the replay is considered complete if there is no replay activity for more than 1 minute), and then the normal logic is executed, i.e. the real configuration is used.
Since the long polling of Apollo is always running, if the replay activity is finished and Apollo detects that the server configuration is inconsistent with the configuration replayed by AREX, it will trigger the operation to update the local configuration, achieving the goal of restoring the original configuration.
Regarding the third point mentioned above, how to ensure the correctness of configuration data when replaying multiple versions of configurations?
The recording operation only occurs during the project startup and when there are changes in the configuration. The generated version number (UUID) is also used as the replay version number. If recorded multiple version numbers, the replay is done sequentially according to different numbers. This means that the version numbers generated by AREX are used to differentiate different versions of configurations. The implementation approach is to set the "releaseKey" attribute of the constructed Apollo configuration entity, com.ctrip.framework.apollo.core.dto.ApolloConfig
, to our AREX version number every time the configuration is replayed. This ensures the correctness of playback for multiple versions of configuration data.
The "releaseKey" is a crucial field for communication between Apollo client and server. The server uses this field to determine if the configuration is consistent with the client. If not, return a new "releaseKey" value; otherwise, return a 304 status.
Version Differences
It is also necessary to be aware of the differences in source code between different versions of the decorated apollo-client.
Some methods or classes may have variations in different versions. Before deciding to modify underlying methods, it is better to examine the differences between different versions first. Otherwise, there is a possibility of failure due to inconsistencies between the Apollo client version of the user's project and the version modified by AREX Java Agent.
For example, the method we modify, com.ctrip.framework.apollo.internals.DefaultConfig#updateAndCalcConfigChanges
has differences in the input parameters between versions 1.0.0 and 1.2.0.
v1.0.0
v1.2.0
When modifying this method, we need to ensure compatibility with such differences so that our injected bytecode can work properly across different versions.
It is recommended to choose a lower version when modifying with components in order to maintain as much compatibility with existing code as possible and to follow the open–closed principle.
Integration Testing
Now we need to perform integration testing between Agent and Schedule Service.
When a user starts to replay, the Schedule Service first groups all the configuration version numbers generated during recording. After grouping all the test cases within that time, the service switches the version number before each replay, informing the AREX agent that Apollo's configuration needs to be played back.
This response for version switch request is not considered as the replay result, but simply as a version pre-heating. The real replay process will be initiated once the configuration corresponding to the version number is successfully switched. The same process applies when switching to other version numbers, as shown in the following diagram.
The same batch of cases has the same version number, which only recorded or replayed once.